A large-scale dataset for retrieval and event localisation in video. A unique feature of the dataset is the availability of two audio tracks for each video: the original audio, and a high-quality spoken description of the visual content.
Source: QuerYD: A video dataset with high-quality textual and audio narrations
Variants: QuerYD
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Video Retrieval | TESTA (ViT-B/16) | TESTA: Temporal-Spatial Token Aggregation for … | 2023-10-29 |
Video Retrieval | VINDLU | VindLU: A Recipe for Effective … | 2022-12-09 |
Video Retrieval | LF-VILA | Long-Form Video-Language Pre-Training with Multimodal … | 2022-10-12 |
Video Retrieval | QB-Norm+TT-CE+ | Cross Modal Retrieval with Querybank … | 2021-12-23 |
Video Retrieval | Frozen | Frozen in Time: A Joint … | 2021-04-01 |
Recent papers with results on this dataset: