MSVD-QA

Dataset Information
License
Unknown
Homepage

Overview

The MSVD-QA dataset is a Video Question Answering (VideoQA) dataset. It is based on the existing Microsoft Research Video Description (MSVD) dataset, which consists of about 120K sentences describing more than 2,000 video snippets. In the MSVD-QA dataset, Question-Answer (QA) pairs are generated from these descriptions. The dataset is mainly used in video captioning experiments but due to its large data size, it is also used for VideoQA. It contains 1970 video clips and approximately 50.5K QA pairs.

Variants: MSVD-QA

Associated Benchmarks

This dataset is used in 4 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Video Question Answering LocVLM-Vid-B Learning to Localize Objects Improves … 2024-04-11
Visual Question Answering (VQA) MA-LMM MA-LMM: Memory-Augmented Large Multimodal Model … 2024-04-08
Visual Question Answering (VQA) vid-TLDR (UMT-L) vid-TLDR: Training Free Token merging … 2024-03-20
Visual Question Answering (VQA) FrozenBiLM+ Open-vocabulary Video Question Answering: A … 2023-08-18
Visual Question Answering (VQA) All-in-one+ Open-vocabulary Video Question Answering: A … 2023-08-18
Visual Question Answering (VQA) JustAsk+ Open-vocabulary Video Question Answering: A … 2023-08-18
Visual Question Answering (VQA) VIOLET+ Open-vocabulary Video Question Answering: A … 2023-08-18
Visual Question Answering (VQA) AIO+MIF Self-Adaptive Sampling for Efficient Video … 2023-07-09
Visual Question Answering (VQA) GIT+MDF Self-Adaptive Sampling for Efficient Video … 2023-07-09
Visual Question Answering (VQA) COSA COSA: Concatenated Sample Pretrained Vision-Language … 2023-06-15
Visual Question Answering (VQA) VAST VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation … 2023-05-29
Visual Question Answering (VQA) VLAB VLAB: Enhancing Video Language Pre-training … 2023-05-22
Visual Question Answering (VQA) VALOR VALOR: Vision-Audio-Language Omni-Perception Pretraining Model … 2023-04-17
Visual Question Answering (VQA) MaMMUT (ours) MaMMUT: A Simple Architecture for … 2023-03-29
Visual Question Answering (VQA) UMT-L (ViT-L/16) Unmasked Teacher: Towards Training-Efficient Video … 2023-03-28
Visual Question Answering (VQA) VIOLET + MELTR MELTR: Meta Loss Transformer for … 2023-03-23
Visual Question Answering (VQA) MuLTI MuLTI: Efficient Video-and-Language Understanding with … 2023-03-10
Visual Question Answering (VQA) mPLUG-2 mPLUG-2: A Modularized Multi-modal Foundation … 2023-02-01
Visual Question Answering (VQA) HiTeA HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training 2022-12-30
Zero-Shot Learning HiTeA HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training 2022-12-30

Research Papers

Recent papers with results on this dataset: