Seed1.5-VL thinking
|
Seed1.5-VL Technical Report
|
63.60
|
2025-05-11
|
|
PLM-8B
|
PerceptionLM: Open-Access Data and Models for Det…
|
63.50
|
2025-04-17
|
|
Seed1.5-VL
|
Seed1.5-VL Technical Report
|
61.50
|
2025-05-11
|
|
V-JEPA 2 ViT-g 8B
|
V-JEPA 2: Self-Supervised Video Models Enable Und…
|
60.60
|
2025-06-11
|
|
PLM-3B
|
PerceptionLM: Open-Access Data and Models for Det…
|
58.90
|
2025-04-17
|
|
RRPO
|
Self-alignment of Large Video Language Models wit…
|
56.50
|
2025-04-16
|
|
Tarsier-34B
|
Tarsier: Recipes for Training and Evaluating Larg…
|
55.50
|
2024-06-30
|
|
Tarsier2-7B
|
Tarsier2: Advancing Large Vision-Language Models …
|
54.70
|
2025-01-14
|
|
Qwen2-VL-72B
|
Qwen2-VL: Enhancing Vision-Language Model's Perce…
|
52.70
|
2024-09-18
|
|
IXC-2.5 7B
|
InternLM-XComposer-2.5: A Versatile Large Vision …
|
51.60
|
2024-07-03
|
|
Aria
|
Aria: An Open Multimodal Native Mixture-of-Expert…
|
51.00
|
2024-10-08
|
|
PLM-1B
|
PerceptionLM: Open-Access Data and Models for Det…
|
50.40
|
2025-04-17
|
|
LLaVA-Video 72B
|
Video Instruction Tuning With Synthetic Data
|
50.00
|
2024-10-03
|
|
VideoLLaMA2 72B
|
VideoLLaMA 2: Advancing Spatial-Temporal Modeling…
|
48.40
|
2024-06-11
|
|
Gemini 1.5 Pro
|
Gemini 1.5: Unlocking multimodal understanding ac…
|
47.60
|
2024-03-08
|
|
Tarsier-7B
|
Tarsier: Recipes for Training and Evaluating Larg…
|
46.90
|
2024-06-30
|
|
LLaVA-Video 7B
|
Video Instruction Tuning With Synthetic Data
|
45.60
|
2024-10-03
|
|
Qwen2-VL-7B
|
Qwen2-VL: Enhancing Vision-Language Model's Perce…
|
43.80
|
2024-09-18
|
|
VideoLLaMA2 7B
|
VideoLLaMA 2: Advancing Spatial-Temporal Modeling…
|
42.90
|
2024-06-11
|
|
PLLaVA-34B
|
PLLaVA : Parameter-free LLaVA Extension from Imag…
|
42.30
|
2024-04-25
|
|
mPLUG-Owl3
|
mPLUG-Owl3: Towards Long Image-Sequence Understan…
|
42.20
|
2024-08-09
|
|
VideoLLaMA2.1
|
VideoLLaMA 2: Advancing Spatial-Temporal Modeling…
|
42.10
|
2024-06-11
|
|
VideoGPT+
|
VideoGPT+: Integrating Image and Video Encoders f…
|
41.70
|
2024-06-13
|
|
GPT4o 8 frames
|
GPT-4o System Card
|
39.90
|
2024-10-25
|
|
PLLaVA-13B
|
PLLaVA : Parameter-free LLaVA Extension from Imag…
|
36.40
|
2024-04-25
|
|
ST-LLM
|
ST-LLM: Large Language Models Are Effective Tempo…
|
35.70
|
2024-03-30
|
|
VideoChat2
|
MVBench: A Comprehensive Multi-modal Video Unders…
|
35.00
|
2023-11-28
|
|
PLLaVA-7B
|
PLLaVA : Parameter-free LLaVA Extension from Imag…
|
34.90
|
2024-04-25
|
|