ML Research Wiki / Benchmarks / Visual Question Answering (VQA) / VLM2-Bench

VLM2-Bench

Visual Question Answering (VQA) Benchmark

Performance Over Time

📊 Showing 9 results | 📏 Metric: GC-mat

Top Performing Models

Rank Model Paper GC-mat Date Code
1 GPT-4o GPT-4o System Card 37.45 2024-10-25 -
2 Qwen2.5-VL-7B Qwen2.5-VL Technical Report 35.91 2025-02-19 📦 qwenlm/qwen2-vl 📦 qwenlm/qwen2.5-vl 📦 likaixin2000/screenspot-pro-gui-grounding 📦 princeton-nlp/CharXiv
3 InternVL2.5-26B Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling 30.50 2024-12-06 📦 opengvlab/internvl
4 Qwen2-VL-7B Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution 27.80 2024-09-18 📦 qwenlm/qwen2-vl 📦 qwenlm/qwen2.5-vl 📦 juruobenruo/DexVLA
5 InternVL2.5-8B Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling 21.24 2024-12-06 📦 opengvlab/internvl
6 LLaVA-Video-7B Video Instruction Tuning With Synthetic Data 18.53 2024-10-03 -
7 mPLUG-Owl3-7B mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models 17.37 2024-08-09 📦 x-plug/mplug-owl
8 LLaVA-OneVision-7B LLaVA-OneVision: Easy Visual Task Transfer 16.60 2024-08-06 📦 evolvinglmms-lab/lmms-eval 📦 MindSpore-scientific-2/code-14
9 LongVA-7B Long Context Transfer from Language to Vision 14.29 2024-06-24 📦 jzhang38/EasyContext 📦 evolvinglmms-lab/longva

All Papers (9)