GPT-4V
|
GPT-4 Technical Report
|
58.37
|
2023-03-15
|
|
Sphinx-V2-1K
|
SPHINX: The Joint Mixing of Weights, Tasks, and V…
|
57.43
|
2023-11-13
|
|
LLaVA-1.5-13B
|
Improved Baselines with Visual Instruction Tuning
|
55.53
|
2023-10-05
|
|
LLaVA-1.5-7B
|
Visual Instruction Tuning
|
46.83
|
2023-04-17
|
|
InstructBLIP-13B
|
InstructBLIP: Towards General-purpose Vision-Lang…
|
45.03
|
2023-05-11
|
|
InstructBLIP-7B
|
InstructBLIP: Towards General-purpose Vision-Lang…
|
44.63
|
2023-05-11
|
|
LLaVA-1-13B
|
Visual Instruction Tuning
|
43.50
|
2023-04-17
|
|
Otter-7B
|
Otter: A Multi-Modal Model with In-Context Instru…
|
39.13
|
2023-05-05
|
|
MiniGPT4-13B
|
MiniGPT-4: Enhancing Vision-Language Understandin…
|
34.93
|
2023-04-20
|
|
MiniGPTv2-7B
|
MiniGPT-v2: large language model as a unified int…
|
30.10
|
2023-10-14
|
|