GPT-4V-turbo-detail:high (Visual Prompt)
|
GPT-4 Technical Report
|
60.70
|
2023-03-15
|
|
GPT-4V-turbo-detail:low (Visual Prompt)
|
GPT-4 Technical Report
|
52.80
|
2023-03-15
|
|
LLaVA-NeXT-Inst-IT-Qwen2-7B (Visual Prompt
|
Inst-IT: Boosting Multimodal Instance Understandi…
|
50.50
|
2024-12-04
|
|
ViP-LLaVA-13B (Visual Prompt)
|
Making Large Language Models Better Data Creators
|
48.30
|
2023-10-31
|
|
LLaVA-1.5-13B (Coordinates)
|
Improved Baselines with Visual Instruction Tuning
|
47.10
|
2023-10-05
|
|
Qwen-VL-Chat (Coordinates)
|
Qwen-VL: A Versatile Vision-Language Model for Un…
|
45.30
|
2023-08-24
|
|
LLaVA-NeXT-Inst-IT-Vicuna-7B (Visual Prompt
|
Inst-IT: Boosting Multimodal Instance Understandi…
|
45.10
|
2024-12-04
|
|
LLaVA-1.5-13B (Visual Prompt)
|
Improved Baselines with Visual Instruction Tuning
|
41.80
|
2023-10-05
|
|
Qwen-VL-Chat (Visual Prompt)
|
Qwen-VL: A Versatile Vision-Language Model for Un…
|
39.20
|
2023-08-24
|
|
InstructBLIP-13B (Visual Prompt)
|
InstructBLIP: Towards General-purpose Vision-Lang…
|
35.80
|
2023-05-11
|
|
GPT4ROI 7B (ROI)
|
GPT4RoI: Instruction Tuning Large Language Model …
|
35.10
|
2023-07-07
|
|
Shikra-7B (Coordinates)
|
Shikra: Unleashing Multimodal LLM's Referential D…
|
33.70
|
2023-06-27
|
|
Kosmos-2 (Discrete Token)
|
Kosmos-2: Grounding Multimodal Large Language Mod…
|
26.90
|
2023-06-26
|
|