TVQA

Dataset Information
Modalities
Videos, Texts
Languages
English
Introduced
2018
License
Unknown
Homepage

Overview

The TVQA dataset is a large-scale video dataset for video question answering. It is based on 6 popular TV shows (Friends, The Big Bang Theory, How I Met Your Mother, House M.D., Grey's Anatomy, Castle). It includes 152,545 QA pairs from 21,793 TV show clips. The QA pairs are split into the ratio of 8:1:1 for training, validation, and test sets. The TVQA dataset provides the sequence of video frames extracted at 3 FPS, the corresponding subtitles with the video clips, and the query consisting of a question and four answer candidates. Among the four answer candidates, there is only one correct answer.

Source: Two-stream Spatiotemporal Feature for Video QA Task
Image Source: https://arxiv.org/abs/1809.01696

Variants: TVQA

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Zero-Shot Learning VideoChat2 MVBench: A Comprehensive Multi-modal Video … 2023-11-28
Video Question Answering LLaMA-VQA Large Language Models are Temporal … 2023-10-24
Video Question Answering VindLU VindLU: A Recipe for Effective … 2022-12-09
Video Question Answering FrozenBiLM Zero-Shot Video Question Answering via … 2022-06-16
Video Question Answering iPerceive (Chadha et al., 2020) iPerceive: Applying Common-Sense Reasoning to … 2020-11-16
Video Question Answering Hero w/ pre-training HERO: Hierarchical Encoder for Video+Language … 2020-05-01
Video Question Answering STAGE (Lei et al., 2019) TVQA+: Spatio-Temporal Grounding for Video … 2019-04-25

Research Papers

Recent papers with results on this dataset: