Seed1.5-VL
|
Seed1.5-VL Technical Report
|
60.00
|
2025-05-11
|
|
VideoChat-Online (4B)
|
Online Video Understanding: OVBench and VideoChat…
|
54.90
|
2024-12-31
|
|
Gemini-1.5-Flash
|
Gemini 1.5: Unlocking multimodal understanding ac…
|
50.70
|
2024-03-08
|
|
Qwen2-VL (7B)
|
Qwen2-VL: Enhancing Vision-Language Model's Perce…
|
49.70
|
2024-09-18
|
|
LLaVA-OneVision (7B)
|
LLaVA-OneVision: Easy Visual Task Transfer
|
49.50
|
2024-08-06
|
|
InternVL2 (7B)
|
Expanding Performance Boundaries of Open-Source M…
|
48.70
|
2024-12-06
|
|
InternVL2 (4B)
|
Expanding Performance Boundaries of Open-Source M…
|
44.10
|
2024-12-06
|
|
LongVA (7B)
|
Long Context Transfer from Language to Vision
|
43.60
|
2024-06-24
|
|
LLaMA-VID (7B)
|
LLaMA-VID: An Image is Worth 2 Tokens in Large La…
|
41.90
|
2023-11-28
|
|
VTimeLLM (7B)
|
VTimeLLM: Empower LLM to Grasp Video Moments
|
33.10
|
2023-11-30
|
|
Flash-Vstream (7B)
|
Flash-VStream: Memory-Based Real-Time Understandi…
|
31.20
|
2024-06-12
|
|
MovieChat (7B)
|
MovieChat: From Dense Token to Sparse Memory for …
|
30.90
|
2023-07-31
|
|
LITA (7B)
|
LITA: Language Instructed Temporal-Localization A…
|
20.40
|
2024-03-27
|
|
TimeChat (7B)
|
TimeChat: A Time-sensitive Multimodal Large Langu…
|
12.80
|
2023-12-04
|
|
VideoLLM-Online (7B)
|
VideoLLM-online: Online Video Large Language Mode…
|
9.60
|
2024-06-17
|
|