ML Research Wiki / Benchmarks / Visual Question Answering / MM-Vet v2

MM-Vet v2

Visual Question Answering Benchmark

Performance Over Time

📊 Showing 17 results | 📏 Metric: GPT-4 score

Top Performing Models

Rank Model Paper GPT-4 score Date Code
1 GPT-4o (gpt-4o-2024-11-20) GPT-4 Technical Report 0.00 2023-03-15 📦 openai/evals 📦 shmsw25/factscore 📦 unispac/visual-adversarial-examples-jailbreak-large-language-models
2 GPT-4o (gpt-4o-2024-05-13) GPT-4 Technical Report 0.00 2023-03-15 📦 openai/evals 📦 shmsw25/factscore 📦 unispac/visual-adversarial-examples-jailbreak-large-language-models
3 Gemini 1.5 Pro Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context 0.00 2024-03-08 📦 dlvuldet/primevul
4 Qwen2-VL-72B (qwen-vl-max-0809) Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution 0.00 2024-09-18 📦 qwenlm/qwen2-vl 📦 qwenlm/qwen2.5-vl 📦 juruobenruo/DexVLA
5 gpt-4o-mini-2024-07-18 GPT-4 Technical Report 0.00 2023-03-15 📦 openai/evals 📦 shmsw25/factscore 📦 unispac/visual-adversarial-examples-jailbreak-large-language-models
6 GPT-4 Turbo (gpt-4-0125-preview) GPT-4 Technical Report 0.00 2023-03-15 📦 openai/evals 📦 shmsw25/factscore 📦 unispac/visual-adversarial-examples-jailbreak-large-language-models
7 Gemini Pro Vision Gemini: A Family of Highly Capable Multimodal Models 0.00 2023-12-19 📦 valdecy/pybibx
8 Qwen-VL-Max Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond 0.00 2023-08-24 📦 qwenlm/qwen-vl 📦 brandon3964/multimodal-task-vector
9 InternVL-Chat-V1-5 How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites 0.00 2024-04-25 📦 opengvlab/internvl
10 CogVLM-Chat CogVLM: Visual Expert for Pretrained Language Models 0.00 2023-11-06 📦 thudm/cogvlm 📦 THUDM/CogAgent 📦 2024-MindSpore-1/Code2 📦 MS-P3/code5

All Papers (17)