GPT-4V
|
GPT-4 Technical Report
|
77.88
|
2023-03-15
|
|
SPHINX v2
|
SPHINX: The Joint Mixing of Weights, Tasks, and V…
|
49.85
|
2023-11-13
|
|
LLaVA-1.5
|
Improved Baselines with Visual Instruction Tuning
|
47.91
|
2023-10-05
|
|
CogVLM-Chat
|
CogVLM: Visual Expert for Pretrained Language Mod…
|
47.88
|
2023-11-06
|
|
LLaMA-Adapter V2
|
LLaMA-Adapter V2: Parameter-Efficient Visual Inst…
|
46.12
|
2023-04-28
|
|
Qwen-VL-Chat
|
Qwen-VL: A Versatile Vision-Language Model for Un…
|
44.39
|
2023-08-24
|
|
InstructBLIP
|
InstructBLIP: Towards General-purpose Vision-Lang…
|
37.76
|
2023-05-11
|
|
Emu
|
Emu: Generative Pretraining in Multimodality
|
36.57
|
2023-07-11
|
|
InternLM-XComposer-VL
|
InternLM-XComposer: A Vision-Language Large Model…
|
35.97
|
2023-09-26
|
|
Otter
|
Otter: A Multi-Modal Model with In-Context Instru…
|
33.64
|
2023-05-05
|
|
mPLUG-Owl2
|
mPLUG-Owl2: Revolutionizing Multi-modal Large Lan…
|
20.60
|
2023-11-07
|
|
BLIP-2-OPT2.7B
|
BLIP-2: Bootstrapping Language-Image Pre-training…
|
18.96
|
2023-01-30
|
|
MiniGPT-v2
|
MiniGPT-4: Enhancing Vision-Language Understandin…
|
13.28
|
2023-04-20
|
|
OpenFlamingo-v2
|
OpenFlamingo: An Open-Source Framework for Traini…
|
5.30
|
2023-08-02
|
|