ML Research Wiki / Benchmarks / Visual Question Answering / MM-Vet

MM-Vet

Visual Question Answering Benchmark

Performance Over Time

📊 Showing 222 results | 📏 Metric: GPT-4 score

1 pending edit

Top Performing Models

Rank Model Paper GPT-4 score Date Code
1 MMCTAgent (GPT-4 + GPT-4V) MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning 74.24 2024-05-28 -
2 Qwen2-VL-72B Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution 74.00 2024-09-18 📦 qwenlm/qwen2-vl 📦 qwenlm/qwen2.5-vl 📦 juruobenruo/DexVLA
3 InternVL2.5-78B Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling 72.30 2024-12-06 📦 opengvlab/internvl
4 GPT-4o +text rationale +IoT Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models 72.20 2024-05-22 -
5 Lyra-Pro Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition 71.40 2024-12-12 📦 dvlab-research/Lyra
6 GLM-4V-Plus CogVLM2: Visual Language Models for Image and Video Understanding 71.10 2024-08-29 📦 thudm/glm-4 📦 thudm/cogvlm2 📦 yangyucheng000/University
7 Phantom-7B Phantom of Latent for Large Language and Vision Models 70.80 2024-09-23 📦 byungkwanlee/phantom
8 InternVL2.5-38B Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling 68.80 2024-12-06 📦 opengvlab/internvl
9 InternVL2-26B (SGP, token ratio 64%) A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs 65.60 2024-12-04 📦 NUS-HPC-AI-Lab/SGL
10 Baichuan-Omni (7B) Baichuan-Omni Technical Report 65.40 2024-10-11 📦 westlake-baichuan-mllm/ocean-omni 📦 westlake-baichuan-mllm/bc-omni

All Papers (222)

Gamified crowd-sourcing of high-quality data for visual fine-tuning

2024
Qwen2-VL-7B (finetuned on GAP-VQA train)

Gamified crowd-sourcing of high-quality data for visual fine-tuning

2024
Qwen2-VL-2B (finetuned on GAP-VQA train)

Gamified crowd-sourcing of high-quality data for visual fine-tuning

2024
MiniCPM-Llama3-V-2.5-8B (finetuned on GAP-VQA train)

OmniFusion Technical Report

2024
OmniFusion (grid split + ruDocVQA)