ActivityNet-QA

Dataset Information
Modalities
Videos, Texts
Languages
English
License
Unknown
Homepage

Overview

The ActivityNet-QA dataset contains 58,000 human-annotated QA pairs on 5,800 videos derived from the popular ActivityNet dataset. The dataset provides a benchmark for testing the performance of VideoQA models on long-term spatio-temporal reasoning.

Source: ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering

Variants: ActivityNet-QA

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Video Question Answering LocVLM-Vid-B+ Learning to Localize Objects Improves … 2024-04-11
Video Question Answering LocVLM-Vid-B Learning to Localize Objects Improves … 2024-04-11
Video Question Answering MA-LMM MA-LMM: Memory-Augmented Large Multimodal Model … 2024-04-08
Video Question Answering LLaMA-VID-7B (2 Token) LLaMA-VID: An Image is Worth … 2023-11-28
Video Question Answering VideoChat2 MVBench: A Comprehensive Multi-modal Video … 2023-11-28
Video Question Answering LLaMA-VID-13B (2 Token) LLaMA-VID: An Image is Worth … 2023-11-28
Video Question Answering Video-LLaVA Video-LLaVA: Learning United Visual Representation … 2023-11-16
Video Question Answering Chat-UniVi-13B Chat-UniVi: Unified Visual Representation Empowers … 2023-11-14
Video Question Answering Mirasol3B Mirasol3B: A Multimodal Autoregressive model … 2023-11-09
Video Question Answering TESTA (ViT-B/16) TESTA: Temporal-Spatial Token Aggregation for … 2023-10-29
Video Question Answering BT-Adapter (zero-shot) BT-Adapter: Video Conversation is Feasible … 2023-09-27
Video Question Answering All-in-one+ Open-vocabulary Video Question Answering: A … 2023-08-18
Video Question Answering FrozenBiLM+ Open-vocabulary Video Question Answering: A … 2023-08-18
Video Question Answering VIOLET+ Open-vocabulary Video Question Answering: A … 2023-08-18
Video Question Answering MovieChat MovieChat: From Dense Token to … 2023-07-31
Video Question Answering COSA COSA: Concatenated Sample Pretrained Vision-Language … 2023-06-15
Video Question Answering Video-ChatGPT Video-ChatGPT: Towards Detailed Video Understanding … 2023-06-08
Video Question Answering VAST VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation … 2023-05-29
Video Question Answering Video Chat VideoChat: Chat-Centric Video Understanding 2023-05-10
Video Question Answering LLaMA Adapter V2 LLaMA-Adapter V2: Parameter-Efficient Visual Instruction … 2023-04-28

Research Papers

Recent papers with results on this dataset: