ML Research Wiki / Benchmarks / Visual Question Answering / BenchLMM

BenchLMM

Visual Question Answering Benchmark

Performance Over Time

📊 Showing 10 results | 📏 Metric: GPT-3.5 score

Top Performing Models

Rank Model Paper GPT-3.5 score Date Code
1 GPT-4V 📚 GPT-4 Technical Report 58.37 2023-03-15 📦 openai/evals 📦 shmsw25/factscore 📦 unispac/visual-adversarial-examples-jailbreak-large-language-models
2 Sphinx-V2-1K 📚 SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models 57.43 2023-11-13 📦 alpha-vllm/llama2-accessory
3 LLaVA-1.5-13B Improved Baselines with Visual Instruction Tuning 55.53 2023-10-05 📦 huggingface/transformers 📦 haotian-liu/LLaVA 📦 LLaVA-VL/LLaVA-NeXT
4 LLaVA-1.5-7B Visual Instruction Tuning 46.83 2023-04-17 📦 huggingface/transformers 📦 haotian-liu/LLaVA 📦 LLaVA-VL/LLaVA-NeXT
5 InstructBLIP-13B InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning 45.03 2023-05-11 📦 salesforce/lavis 📦 tabtoyou/kollava 📦 pwc-1/Paper-9 📦 MS-P3/code3
6 InstructBLIP-7B InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning 44.63 2023-05-11 📦 salesforce/lavis 📦 tabtoyou/kollava 📦 pwc-1/Paper-9 📦 MS-P3/code3
7 LLaVA-1-13B Visual Instruction Tuning 43.50 2023-04-17 📦 huggingface/transformers 📦 haotian-liu/LLaVA 📦 LLaVA-VL/LLaVA-NeXT
8 Otter-7B Otter: A Multi-Modal Model with In-Context Instruction Tuning 39.13 2023-05-05 📦 luodian/otter
9 MiniGPT4-13B MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models 34.93 2023-04-20 📦 vision-cair/minigpt-4 📦 zyang1580/binllm 📦 2024-MindSpore-1/Code6
10 MiniGPTv2-7B MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning 30.10 2023-10-14 📦 vision-cair/minigpt-4 📦 zebangcheng/emotion-llama

All Papers (10)