Instructional Video Question Answering
An open-ended VideoQA benchmark that aims to: i) provide a well-defined evaluation by including five correct answer annotations per question and ii) avoid questions which can be answered without the video.
iVQA contains 10,000 video clips with one question and five corresponding answers per clip. Moreover, we manually reduce the language bias by excluding questions that could be answered without watching the video.
Source: Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Variants: iVQA
This dataset is used in 2 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Video Question Answering | VideoCoCa | VideoCoCa: Video-Text Modeling with Zero-Shot … | 2022-12-09 |
Video Question Answering | Co-Tokenization | Video Question Answering with Iterative … | 2022-08-01 |
Zero-Shot Learning | FrozenBiLM | Zero-Shot Video Question Answering via … | 2022-06-16 |
Video Question Answering | FrozenBiLM | Zero-Shot Video Question Answering via … | 2022-06-16 |
Video Question Answering | FrozenBiLM (0-shot) | Zero-Shot Video Question Answering via … | 2022-06-16 |
Video Question Answering | Text + Text (no Multimodal Pretext Training) | Towards Fast Adaptation of Pretrained … | 2022-06-05 |
Video Question Answering | Just Ask (fine-tune) | Just Ask: Learning to Answer … | 2020-12-01 |
Video Question Answering | Just Ask (0-shot) | Just Ask: Learning to Answer … | 2020-12-01 |
Recent papers with results on this dataset: