TV show Captions
TV show Caption is a large-scale multimodal captioning dataset, containing 261,490 caption descriptions paired with 108,965 short video moments. TVC is unique as its captions may also describe dialogues/subtitles while the captions in the other datasets are only describing the visual content.
Source: https://tvr.cs.unc.edu/tvc.html
Image Source: https://github.com/jayleicn/TVCaption
Variants: TVC
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Video Captioning | COSA | COSA: Concatenated Sample Pretrained Vision-Language … | 2023-06-15 |
Video Captioning | VAST | VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation … | 2023-05-29 |
Recent papers with results on this dataset: