VidChapters-7M

Dataset Information
Modalities
Videos, Texts
Languages
English
Introduced
2023
License
MIT
Homepage

Overview

VidChapters-7M is a dataset of 817K user-chaptered videos including 7M chapters in total. VidChapters-7M is automatically created from videos online in a scalable manner by scraping user-annotated chapters and hence without any additional manual annotation. It is designed for training and evaluating models for video chapter generation with or without ground-truth boundaries, and video chapter grounding, as well as for video-language pretraining.

Variants: VidChapters-7M

Associated Benchmarks

This dataset is used in 4 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Video Chaptering Chapter-Llama Chapter-Llama: Efficient Chaptering in Hour-Long … 2025-03-31
Language-Based Temporal Localization ReVisionLLM ReVisionLLM: Recursive Vision-Language Model for … 2024-11-22
Video Chaptering Vid2Seq VidChapters-7M: Video Chapters at Scale 2023-09-25

Research Papers

Recent papers with results on this dataset: