TVBench

Name: TVBench
Published: 2024-10-10
License: cc-by-4.0

Dataset Information

Modalities

Videos, Texts

Introduced

2024

License

cc-by-4.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

TVBench is a new benchmark specifically created to evaluate temporal understanding in video QA. We identified three main issues in existing datasets: (i) static information from single frames is often sufficient to solve the tasks (ii) the text of the questions and candidate answers is overly informative, allowing models to answer correctly without relying on any visual input (iii) world knowledge alone can answer many of the questions, making the benchmarks a test of knowledge replication rather than visual reasoning. In addition, we found that open-ended question-answering benchmarks for video understanding suffer from similar issues while the automatic evaluation process with LLMs is unreliable, making it an unsuitable alternative.

We defined 10 temporally challenging tasks that either require repetition counting (Action Count), properties about moving objects (Object Shuffle, Object Count, Moving Direction), temporal localization (Action Localization, Unexpected Action), temporal sequential ordering (Action Sequence, Scene Transition, Egocentric Sequence) and distinguishing between temporally hard Action Antonyms such as "Standing up" and "Sitting down".

Variants: TVBench

Associated Benchmarks

This dataset is used in 1 benchmark:

Video Question Answering - Metrics: Average Accuracy

Recent Benchmark Submissions

Task	Model	Paper	Date
Video Question Answering	V-JEPA 2 ViT-g 8B	V-JEPA 2: Self-Supervised Video Models …	2025-06-11
Video Question Answering	Seed1.5-VL thinking	Seed1.5-VL Technical Report	2025-05-11
Video Question Answering	Seed1.5-VL	Seed1.5-VL Technical Report	2025-05-11
Video Question Answering	PLM-1B	PerceptionLM: Open-Access Data and Models …	2025-04-17
Video Question Answering	PLM-3B	PerceptionLM: Open-Access Data and Models …	2025-04-17
Video Question Answering	PLM-8B	PerceptionLM: Open-Access Data and Models …	2025-04-17
Video Question Answering	RRPO	Self-alignment of Large Video Language …	2025-04-16
Video Question Answering	Tarsier2-7B	Tarsier2: Advancing Large Vision-Language Models …	2025-01-14
Video Question Answering	GPT4o 8 frames	GPT-4o System Card	2024-10-25
Video Question Answering	Aria	Aria: An Open Multimodal Native …	2024-10-08
Video Question Answering	LLaVA-Video 72B	Video Instruction Tuning With Synthetic …	2024-10-03
Video Question Answering	LLaVA-Video 7B	Video Instruction Tuning With Synthetic …	2024-10-03
Video Question Answering	Qwen2-VL-7B	Qwen2-VL: Enhancing Vision-Language Model's Perception …	2024-09-18
Video Question Answering	Qwen2-VL-72B	Qwen2-VL: Enhancing Vision-Language Model's Perception …	2024-09-18
Video Question Answering	mPLUG-Owl3	mPLUG-Owl3: Towards Long Image-Sequence Understanding …	2024-08-09
Video Question Answering	IXC-2.5 7B	InternLM-XComposer-2.5: A Versatile Large Vision …	2024-07-03
Video Question Answering	Tarsier-7B	Tarsier: Recipes for Training and …	2024-06-30
Video Question Answering	Tarsier-34B	Tarsier: Recipes for Training and …	2024-06-30
Video Question Answering	VideoGPT+	VideoGPT+: Integrating Image and Video …	2024-06-13
Video Question Answering	VideoLLaMA2 7B	VideoLLaMA 2: Advancing Spatial-Temporal Modeling …	2024-06-11

Research Papers

Recent papers with results on this dataset:

External Links:

TVBench

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview