MVBench

Dataset Information
Introduced
2023
License
Unknown
Homepage

Overview

MVBench is a comprehensive Multi-modal Video understanding Benchmark. It was introduced to evaluate the comprehension capabilities of Multi-modal Large Language Models (MLLMs), particularly their temporal understanding in dynamic video tasks. MVBench covers 20 challenging video tasks that cannot be effectively solved with a single frame. It introduces a novel static-to-dynamic method to define these temporal-related tasks. By transforming various static tasks into dynamic ones, it enables the systematic generation of video tasks that require a broad spectrum of temporal skills, ranging from perception to cognition.

Variants: MVBench

Associated Benchmarks

This dataset is used in 2 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Visual Question Answering (VQA) Lyra-Pro Lyra: An Efficient and Speech-Centric … 2024-12-12
Video Question Answering LinVT-Qwen2-VL (7B) LinVT: Empower Your Image-level Large … 2024-12-06
Video Question Answering PPLLaVA (7b) PPLLaVA: Varied Video Sequence Understanding … 2024-11-04
Video Question Answering VideoChat-T (7B) TimeSuite: Improving MLLMs for Long … 2024-10-25
Video Question Answering LongVU (7B) LongVU: Spatiotemporal Adaptive Compression for … 2024-10-22
Video Question Answering Oryx(34B) Oryx MLLM: On-Demand Spatial-Temporal Understanding … 2024-09-19
Video Question Answering mPLUG-Owl3(7B) mPLUG-Owl3: Towards Long Image-Sequence Understanding … 2024-08-09
Video Question Answering Tarsier (34B) Tarsier: Recipes for Training and … 2024-06-30
Video Question Answering VideoGPT+ VideoGPT+: Integrating Image and Video … 2024-06-13
Video Question Answering VideoLLaMA2 (72B) VideoLLaMA 2: Advancing Spatial-Temporal Modeling … 2024-06-11
Video Question Answering PLLaVA PLLaVA : Parameter-free LLaVA Extension … 2024-04-25
Video Question Answering ST-LLM ST-LLM: Large Language Models Are … 2024-03-30
Video Question Answering InternVideo2 InternVideo2: Scaling Foundation Models for … 2024-03-22
Video Question Answering HawkEye HawkEye: Training Video-Text LLMs for … 2024-03-15
Video Question Answering SPHINX-Plus SPHINX-X: Scaling Data and Parameters … 2024-02-08
Video Question Answering TimeChat TimeChat: A Time-sensitive Multimodal Large … 2023-12-04
Video Question Answering VideoChat2 MVBench: A Comprehensive Multi-modal Video … 2023-11-28
Video Question Answering Video-ChatGPT Video-ChatGPT: Towards Detailed Video Understanding … 2023-06-08
Video Question Answering VideoLLaMA Video-LLaMA: An Instruction-tuned Audio-Visual Language … 2023-06-05
Video Question Answering InstructBLIP InstructBLIP: Towards General-purpose Vision-Language Models … 2023-05-11

Research Papers

Recent papers with results on this dataset: