GPT-4o
|
GPT-4o: Visual perception performance of multimod…
|
457.00
|
2024-06-14
|
|
GPT-4V
|
The Dawn of LMMs: Preliminary Explorations with G…
|
415.00
|
2023-09-29
|
|
LLaVA-NEXT-34B
|
Visual Instruction Tuning
|
412.00
|
2023-04-17
|
|
Phi-3-Vision
|
Phi-3 Technical Report: A Highly Capable Language…
|
397.00
|
2024-04-22
|
|
InternVL2-8B
|
InternVL: Scaling up Vision Foundation Models and…
|
368.00
|
2023-12-21
|
|
Qwen-vl-max
|
Qwen-VL: A Versatile Vision-Language Model for Un…
|
366.00
|
2023-08-24
|
|
LLaVA-NEXT-13B
|
Visual Instruction Tuning
|
335.00
|
2023-04-17
|
|
Qwen-vl-plus
|
Qwen-VL: A Versatile Vision-Language Model for Un…
|
310.00
|
2023-08-24
|
|
Idefics-2-8B
|
What matters when building vision-language models?
|
256.00
|
2024-05-03
|
|
LLaVA-1.5-13B
|
Visual Instruction Tuning
|
243.00
|
2023-04-17
|
|
InternVL2-1B
|
InternVL: Scaling up Vision Foundation Models and…
|
237.00
|
2023-12-21
|
|
Monkey-Chat-7B
|
Monkey: Image Resolution and Text Label Are Impor…
|
214.00
|
2023-11-11
|
|
Idefics-80B
|
OBELICS: An Open Web-Scale Filtered Dataset of In…
|
139.00
|
2023-06-21
|
|