OVBench

Dataset Information
Modalities
Videos, Texts
Languages
English
Introduced
2024
License
Unknown
Homepage

Overview

OVBench is a benchmark tailored for real-time video understanding:

  • Memory, Perception, and Prediction of Temporal Contexts: Questions are framed to reference the present state of entities, requiring models to memorize/perceive/predict past/present/future temporal contexts over time.
  • Dynamic Spatio-temporal Interaction: The benchmark demands precise real-time interactions with video content, where actions, objects, and events must be understood in the context of their spatial and temporal relationships.
  • Contextual Awareness at Specific Moments: Real-time questions are contextual, changing based on the specific timestamp they are asked, requiring a deep understanding of how temporal context evolves.

Variants: OVBench

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Video Question Answering Seed1.5-VL Seed1.5-VL Technical Report 2025-05-11
Video Question Answering VideoChat-Online (4B) Online Video Understanding: OVBench and … 2024-12-31
Video Question Answering InternVL2 (4B) Expanding Performance Boundaries of Open-Source … 2024-12-06
Video Question Answering InternVL2 (7B) Expanding Performance Boundaries of Open-Source … 2024-12-06
Video Question Answering Qwen2-VL (7B) Qwen2-VL: Enhancing Vision-Language Model's Perception … 2024-09-18
Video Question Answering LLaVA-OneVision (7B) LLaVA-OneVision: Easy Visual Task Transfer 2024-08-06
Video Question Answering LongVA (7B) Long Context Transfer from Language … 2024-06-24
Video Question Answering VideoLLM-Online (7B) VideoLLM-online: Online Video Large Language … 2024-06-17
Video Question Answering Flash-Vstream (7B) Flash-VStream: Memory-Based Real-Time Understanding for … 2024-06-12
Video Question Answering LITA (7B) LITA: Language Instructed Temporal-Localization Assistant 2024-03-27
Video Question Answering Gemini-1.5-Flash Gemini 1.5: Unlocking multimodal understanding … 2024-03-08
Video Question Answering TimeChat (7B) TimeChat: A Time-sensitive Multimodal Large … 2023-12-04
Video Question Answering VTimeLLM (7B) VTimeLLM: Empower LLM to Grasp … 2023-11-30
Video Question Answering LLaMA-VID (7B) LLaMA-VID: An Image is Worth … 2023-11-28
Video Question Answering MovieChat (7B) MovieChat: From Dense Token to … 2023-07-31

Research Papers

Recent papers with results on this dataset: