VidChapters-7M is a dataset of 817K user-chaptered videos including 7M chapters in total. VidChapters-7M is automatically created from videos online in a scalable manner by scraping user-annotated chapters and hence without any additional manual annotation. It is designed for training and evaluating models for video chapter generation with or without ground-truth boundaries, and video chapter grounding, as well as for video-language pretraining.
Variants: VidChapters-7M
This dataset is used in 4 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Video Chaptering | Chapter-Llama | Chapter-Llama: Efficient Chaptering in Hour-Long … | 2025-03-31 |
Language-Based Temporal Localization | ReVisionLLM | ReVisionLLM: Recursive Vision-Language Model for … | 2024-11-22 |
Video Chaptering | Vid2Seq | VidChapters-7M: Video Chapters at Scale | 2023-09-25 |
Recent papers with results on this dataset: