OVBench

Name: OVBench
Published: 2024-12-31
License: Unknown

Dataset Information

Modalities

Videos, Texts

Languages

English

Introduced

2024

License

Unknown

Homepage

Official Website

Contents

Overview

OVBench is a benchmark tailored for real-time video understanding:

Memory, Perception, and Prediction of Temporal Contexts: Questions are framed to reference the present state of entities, requiring models to memorize/perceive/predict past/present/future temporal contexts over time.
Dynamic Spatio-temporal Interaction: The benchmark demands precise real-time interactions with video content, where actions, objects, and events must be understood in the context of their spatial and temporal relationships.
Contextual Awareness at Specific Moments: Real-time questions are contextual, changing based on the specific timestamp they are asked, requiring a deep understanding of how temporal context evolves.

Variants: OVBench

This dataset is used in 1 benchmark:

Task	Model	Paper	Date
Video Question Answering	Seed1.5-VL	Seed1.5-VL Technical Report	2025-05-11
Video Question Answering	VideoChat-Online (4B)	Online Video Understanding: OVBench and …	2024-12-31
Video Question Answering	InternVL2 (4B)	Expanding Performance Boundaries of Open-Source …	2024-12-06
Video Question Answering	InternVL2 (7B)	Expanding Performance Boundaries of Open-Source …	2024-12-06
Video Question Answering	Qwen2-VL (7B)	Qwen2-VL: Enhancing Vision-Language Model's Perception …	2024-09-18
Video Question Answering	LLaVA-OneVision (7B)	LLaVA-OneVision: Easy Visual Task Transfer	2024-08-06
Video Question Answering	LongVA (7B)	Long Context Transfer from Language …	2024-06-24
Video Question Answering	VideoLLM-Online (7B)	VideoLLM-online: Online Video Large Language …	2024-06-17
Video Question Answering	Flash-Vstream (7B)	Flash-VStream: Memory-Based Real-Time Understanding for …	2024-06-12
Video Question Answering	LITA (7B)	LITA: Language Instructed Temporal-Localization Assistant	2024-03-27
Video Question Answering	Gemini-1.5-Flash	Gemini 1.5: Unlocking multimodal understanding …	2024-03-08
Video Question Answering	TimeChat (7B)	TimeChat: A Time-sensitive Multimodal Large …	2023-12-04
Video Question Answering	VTimeLLM (7B)	VTimeLLM: Empower LLM to Grasp …	2023-11-30
Video Question Answering	LLaMA-VID (7B)	LLaMA-VID: An Image is Worth …	2023-11-28
Video Question Answering	MovieChat (7B)	MovieChat: From Dense Token to …	2023-07-31

Recent papers with results on this dataset:

External Links: