GPT-4o
|
GPT-4 Technical Report
|
94.60
|
2023-03-15
|
|
Gemini Pro 1.5
|
Gemini 1.5: Unlocking multimodal understanding ac…
|
90.34
|
2024-03-08
|
|
GPT-4V
|
GPT-4 Technical Report
|
86.09
|
2023-03-15
|
|
LLaVA-Llama-3
|
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and…
|
43.80
|
2024-03-18
|
|
Gemini Pro 1.0
|
Gemini: A Family of Highly Capable Multimodal Mod…
|
29.53
|
2023-12-19
|
|
IDEFICS2-8B
|
What matters when building vision-language models?
|
18.90
|
2024-05-03
|
|
CogVLM2-Llama-3
|
CogVLM: Visual Expert for Pretrained Language Mod…
|
7.30
|
2023-11-06
|
|
InstructBLIP-Flan-T5-XXL
|
InstructBLIP: Towards General-purpose Vision-Lang…
|
3.80
|
2023-05-11
|
|
mPLUG-Owl-v2
|
mPLUG-Owl2: Revolutionizing Multi-modal Large Lan…
|
1.90
|
2023-11-07
|
|
CogVLM-17B
|
CogVLM: Visual Expert for Pretrained Language Mod…
|
0.00
|
2023-11-06
|
|
InstructBLIP-Vicuna-13B
|
InstructBLIP: Towards General-purpose Vision-Lang…
|
0.00
|
2023-05-11
|
|