The MSRVTT-MC (Multiple Choice) dataset is a video question-answering dataset created based on the MSR-VTT dataset. It consists of 2,990 questions generated from 10,000 video clips with associated ground truth captions. For each question, there are five candidate captions, including the ground truth caption and four randomly sampled negative choices. The objective of the dataset is to choose the correct answer from the five candidate captions.
Variants: MSRVTT-MC, MSR-VTT-MC
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Video Question Answering | Norton | Multi-granularity Correspondence Learning from Long-term … | 2024-01-30 |
Video Question Answering | HiTeA | HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training | 2022-12-30 |
Video Question Answering | VindLU | VindLU: A Recipe for Effective … | 2022-12-09 |
Video Question Answering | VIOLETv2 | An Empirical Study of End-to-End … | 2022-09-04 |
Video Question Answering | Clover | Clover: Towards A Unified Video-Language … | 2022-07-16 |
Video Question Answering | Singularity-temporal | Revealing Single Frame Bias for … | 2022-06-07 |
Video Question Answering | Singularity | Revealing Single Frame Bias for … | 2022-06-07 |
Recent papers with results on this dataset: