MVBench

Name: MVBench
Published: 2023-11-28
License: Unknown

Dataset Information

Introduced

2023

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

MVBench is a comprehensive Multi-modal Video understanding Benchmark. It was introduced to evaluate the comprehension capabilities of Multi-modal Large Language Models (MLLMs), particularly their temporal understanding in dynamic video tasks. MVBench covers 20 challenging video tasks that cannot be effectively solved with a single frame. It introduces a novel static-to-dynamic method to define these temporal-related tasks. By transforming various static tasks into dynamic ones, it enables the systematic generation of video tasks that require a broad spectrum of temporal skills, ranging from perception to cognition.

Variants: MVBench

Associated Benchmarks

This dataset is used in 2 benchmarks:

Visual Question Answering (VQA) - Metrics: Acc
Video Question Answering - Metrics: Avg.

Recent Benchmark Submissions

Task	Model	Paper	Date
Visual Question Answering (VQA)	Lyra-Pro	Lyra: An Efficient and Speech-Centric …	2024-12-12
Video Question Answering	LinVT-Qwen2-VL (7B)	LinVT: Empower Your Image-level Large …	2024-12-06
Video Question Answering	PPLLaVA (7b)	PPLLaVA: Varied Video Sequence Understanding …	2024-11-04
Video Question Answering	VideoChat-T (7B)	TimeSuite: Improving MLLMs for Long …	2024-10-25
Video Question Answering	LongVU (7B)	LongVU: Spatiotemporal Adaptive Compression for …	2024-10-22
Video Question Answering	Oryx(34B)	Oryx MLLM: On-Demand Spatial-Temporal Understanding …	2024-09-19
Video Question Answering	mPLUG-Owl3(7B)	mPLUG-Owl3: Towards Long Image-Sequence Understanding …	2024-08-09
Video Question Answering	Tarsier (34B)	Tarsier: Recipes for Training and …	2024-06-30
Video Question Answering	VideoGPT+	VideoGPT+: Integrating Image and Video …	2024-06-13
Video Question Answering	VideoLLaMA2 (72B)	VideoLLaMA 2: Advancing Spatial-Temporal Modeling …	2024-06-11
Video Question Answering	PLLaVA	PLLaVA : Parameter-free LLaVA Extension …	2024-04-25
Video Question Answering	ST-LLM	ST-LLM: Large Language Models Are …	2024-03-30
Video Question Answering	InternVideo2	InternVideo2: Scaling Foundation Models for …	2024-03-22
Video Question Answering	HawkEye	HawkEye: Training Video-Text LLMs for …	2024-03-15
Video Question Answering	SPHINX-Plus	SPHINX-X: Scaling Data and Parameters …	2024-02-08
Video Question Answering	TimeChat	TimeChat: A Time-sensitive Multimodal Large …	2023-12-04
Video Question Answering	VideoChat2	MVBench: A Comprehensive Multi-modal Video …	2023-11-28
Video Question Answering	Video-ChatGPT	Video-ChatGPT: Towards Detailed Video Understanding …	2023-06-08
Video Question Answering	VideoLLaMA	Video-LLaMA: An Instruction-tuned Audio-Visual Language …	2023-06-05
Video Question Answering	InstructBLIP	InstructBLIP: Towards General-purpose Vision-Language Models …	2023-05-11

Research Papers

Recent papers with results on this dataset:

External Links:

MVBench

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview