How2QA

Dataset Information
Modalities
Videos, Texts
Introduced
2020
License
Unknown
Homepage

Overview

To collect How2QA for video QA task, the same set of selected video clips are presented to another group of AMT workers for multichoice QA annotation. Each worker is assigned with one video segment and asked to write one question with four answer candidates (one correctand three distractors). Similarly, narrations are hidden from the workers to ensure the collected QA pairs are not biased by subtitles. Similar to TVQA, the start and end points are provided for the relevant moment for each question. After filtering low-quality annotations, the final dataset contains 44,007 QA pairs for 22k 60-second clips selected from 9035 videos.

Source: HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

Variants: How2QA

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Video Question Answering FrozenBiLM Zero-Shot Video Question Answering via … 2022-06-16
Video Question Answering FrozenBiLM (0-shot) Zero-Shot Video Question Answering via … 2022-06-16
Video Question Answering Text + Text (no Multimodal Pretext Training) Towards Fast Adaptation of Pretrained … 2022-06-05
Video Question Answering ATP Revisiting the "Video" in Video-Language … 2022-06-03
Video Question Answering Just Ask (0-shot) Just Ask: Learning to Answer … 2020-12-01
Video Question Answering Just Ask Just Ask: Learning to Answer … 2020-12-01
Video Question Answering Hero w/ pre-training HERO: Hierarchical Encoder for Video+Language … 2020-05-01

Research Papers

Recent papers with results on this dataset: