GPT-4o (gpt-4o-2024-11-20)
|
GPT-4 Technical Report
|
|
2023-03-15
|
|
GPT-4o (gpt-4o-2024-05-13)
|
GPT-4 Technical Report
|
|
2023-03-15
|
|
Gemini 1.5 Pro
|
Gemini 1.5: Unlocking multimodal understanding ac…
|
|
2024-03-08
|
|
Qwen2-VL-72B (qwen-vl-max-0809)
|
Qwen2-VL: Enhancing Vision-Language Model's Perce…
|
|
2024-09-18
|
|
gpt-4o-mini-2024-07-18
|
GPT-4 Technical Report
|
|
2023-03-15
|
|
GPT-4 Turbo (gpt-4-0125-preview)
|
GPT-4 Technical Report
|
|
2023-03-15
|
|
Gemini Pro Vision
|
Gemini: A Family of Highly Capable Multimodal Mod…
|
|
2023-12-19
|
|
Qwen-VL-Max
|
Qwen-VL: A Versatile Vision-Language Model for Un…
|
|
2023-08-24
|
|
InternVL-Chat-V1-5
|
How Far Are We to GPT-4V? Closing the Gap to Comm…
|
|
2024-04-25
|
|
CogVLM-Chat
|
CogVLM: Visual Expert for Pretrained Language Mod…
|
|
2023-11-06
|
|
IXC2-VL-7B
|
InternLM-XComposer2: Mastering Free-form Text-Ima…
|
|
2024-01-29
|
|
Emu2-Chat
|
Generative Multimodal Models are In-Context Learn…
|
|
2023-12-20
|
|
CogAgent-Chat
|
CogAgent: A Visual Language Model for GUI Agents
|
|
2023-12-14
|
|
LLaVA-v1.5-13B
|
Improved Baselines with Visual Instruction Tuning
|
|
2023-10-05
|
|
LLaVA-v1.5-7B
|
Improved Baselines with Visual Instruction Tuning
|
|
2023-10-05
|
|
Otter-9B
|
MIMIC-IT: Multi-Modal In-Context Instruction Tuni…
|
|
2023-06-08
|
|
OpenFlamingo-9B
|
OpenFlamingo: An Open-Source Framework for Traini…
|
|
2023-08-02
|
|