TVSum: Summarizing Web Videos Using Titles
Introduced by Song et al. in TVSum: Summarizing web videos using titles.
The TVSum dataset comprises 50 videos, with durations ranging from 1 to 11 minutes. These videos belong to 10 different categories associated with the TRECVid MED task, with 5 videos in each category, and were collected from YouTube. The video categories include various activities like changing a vehicle tire, making a sandwich, and flash mob gatherings. For annotation, each video was reviewed and rated by 20 users, who assigned frame-level importance scores on a scale from 1 (not important) to 5 (very important).
Variants: TvSum
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Highlight Detection | FlashVTG | FlashVTG: Feature Layering and Adaptive … | 2024-12-18 |
Highlight Detection | SG-DETR | Saliency-Guided DETR for Moment Retrieval … | 2024-10-02 |
Video Summarization | CSTA | CSTA: CNN-based Spatiotemporal Attention for … | 2024-05-20 |
Highlight Detection | UVCOM (train from scratch) | Bridging the Gap: A Unified … | 2023-11-28 |
Highlight Detection | CG-DETR | Correlation-Guided Query-Dependency Calibration for Video … | 2023-11-15 |
Highlight Detection | QD-DETR (only Video) | Query-Dependent Video Representation for Moment … | 2023-03-24 |
Highlight Detection | QD-DETR | Query-Dependent Video Representation for Moment … | 2023-03-24 |
Highlight Detection | UMT | UMT: Unified Multi-modal Transformers for … | 2022-03-23 |
Video Summarization | VASNet | Summarizing Videos with Attention | 2018-12-05 |
Video Summarization | M-AVS | Video Summarization with Attention-Based Encoder-Decoder … | 2017-08-31 |
Recent papers with results on this dataset: