ML Research Wiki / Benchmarks / Video Question Answering / TVBench

TVBench

Video Question Answering Benchmark

Performance Over Time

📊 Showing 28 results | 📏 Metric: Average Accuracy

Top Performing Models

Rank Model Paper Average Accuracy Date Code
1 Seed1.5-VL thinking Seed1.5-VL Technical Report 63.60 2025-05-11 -
2 PLM-8B PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding 63.50 2025-04-17 📦 facebookresearch/perception_models
3 Seed1.5-VL Seed1.5-VL Technical Report 61.50 2025-05-11 -
4 V-JEPA 2 ViT-g 8B V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning 60.60 2025-06-11 📦 facebookresearch/vjepa2
5 PLM-3B PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding 58.90 2025-04-17 📦 facebookresearch/perception_models
6 RRPO Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization 56.50 2025-04-16 -
7 Tarsier-34B Tarsier: Recipes for Training and Evaluating Large Video Description Models 55.50 2024-06-30 📦 bytedance/tarsier
8 Tarsier2-7B Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding 54.70 2025-01-14 📦 bytedance/tarsier
9 Qwen2-VL-72B Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution 52.70 2024-09-18 📦 qwenlm/qwen2-vl 📦 qwenlm/qwen2.5-vl 📦 juruobenruo/DexVLA
10 IXC-2.5 7B InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output 51.60 2024-07-03 📦 internlm/internlm-xcomposer

All Papers (28)

Seed1.5-VL Technical Report

2025
Seed1.5-VL thinking

GPT-4o System Card

2024
GPT4o 8 frames