iVQA

Instructional Video Question Answering

Dataset Information
Modalities
Videos, Texts
Languages
English
Introduced
2020
License
Unknown
Homepage

Overview

An open-ended VideoQA benchmark that aims to: i) provide a well-defined evaluation by including five correct answer annotations per question and ii) avoid questions which can be answered without the video.

iVQA contains 10,000 video clips with one question and five corresponding answers per clip. Moreover, we manually reduce the language bias by excluding questions that could be answered without watching the video.

Source: Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Variants: iVQA

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Video Question Answering VideoCoCa VideoCoCa: Video-Text Modeling with Zero-Shot … 2022-12-09
Video Question Answering Co-Tokenization Video Question Answering with Iterative … 2022-08-01
Zero-Shot Learning FrozenBiLM Zero-Shot Video Question Answering via … 2022-06-16
Video Question Answering FrozenBiLM Zero-Shot Video Question Answering via … 2022-06-16
Video Question Answering FrozenBiLM (0-shot) Zero-Shot Video Question Answering via … 2022-06-16
Video Question Answering Text + Text (no Multimodal Pretext Training) Towards Fast Adaptation of Pretrained … 2022-06-05
Video Question Answering Just Ask (fine-tune) Just Ask: Learning to Answer … 2020-12-01
Video Question Answering Just Ask (0-shot) Just Ask: Learning to Answer … 2020-12-01

Research Papers

Recent papers with results on this dataset: