IntentQA

Dataset Information
Modalities
Videos, Texts
Introduced
2023
License
Unknown
Homepage

Overview

We contribute an IntentQA dataset with diverse intents in daily social activities.

We utilize NExT-QA as the source dataset to construct our dataset. NExT-QA dataset is a comprehensive VideoQA dataset with rich natural daily social activities and detailed QA annotations. Originally, the NExT-QA dataset categorizes itself into three types, i.e., Causal, Temporal, Descriptive. We select the inference QA types, i.e., Causal and Temporal, rather than the factoid Descriptive, to build our IntentQA dataset. Particularly, we select both the Causal Why and Causal How subtypes under Causal, and the Temporal Previous and Temporal Next subtypes under Temporal. The Causal Why (CW) QA usually takes the form of ‘Why [action]? For [intent]’, with the key action appearing in the question and the intent in the answer. On the contrary, the Causal How (CH) QA usually takes the form of ‘How [intent]? By [action]’, with the key action appearing in the answer and the intent in the question. The Temporal Previous (TP) QA usually takes the form of ‘What [action A] before [action B]? ’, while the Temporal Next (TN) QA takes the form of ‘What [action B] after [action A]? ’. In the TP&TN QA, the intent is not explicitly expressed in the question nor answer, but is the implicit causal factor linking the two sequential actions.

Variants: IntentQA

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Video Question Answering VideoChat2_HD_mistral MVBench: A Comprehensive Multi-modal Video … 2023-11-28
Video Question Answering VideoChat2_mistral MVBench: A Comprehensive Multi-modal Video … 2023-11-28
Video Question Answering VGT Video Graph Transformer for Video … 2022-07-12
Video Question Answering HQGA Video as Conditional Graph Hierarchy … 2021-12-12

Research Papers

Recent papers with results on this dataset: