TVC

TV show Captions

Dataset Information
Modalities
Videos, Texts
Languages
English
License
Unknown
Homepage

Overview

TV show Caption is a large-scale multimodal captioning dataset, containing 261,490 caption descriptions paired with 108,965 short video moments. TVC is unique as its captions may also describe dialogues/subtitles while the captions in the other datasets are only describing the visual content.

Source: https://tvr.cs.unc.edu/tvc.html
Image Source: https://github.com/jayleicn/TVCaption

Variants: TVC

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Video Captioning COSA COSA: Concatenated Sample Pretrained Vision-Language … 2023-06-15
Video Captioning VAST VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation … 2023-05-29

Research Papers

Recent papers with results on this dataset: