UGround-V1-7B
|
Navigating the Digital World as Humans Do: Univer…
|
86.34
|
2024-10-07
|
|
Aguvis-7B
|
Aguvis: Unified Pure Vision Agents for Autonomous…
|
83.00
|
2024-12-05
|
|
OS-Atlas-Base-7B
|
OS-ATLAS: A Foundation Action Model for Generalis…
|
82.47
|
2024-10-30
|
|
Aria-UI
|
Aria-UI: Visual Grounding for GUI Instructions
|
81.10
|
2024-12-20
|
|
Aguvis-G-7B
|
Aguvis: Unified Pure Vision Agents for Autonomous…
|
81.00
|
2024-12-05
|
|
UGround-V1-2B
|
Navigating the Digital World as Humans Do: Univer…
|
77.67
|
2024-10-07
|
|
ShowUI
|
ShowUI: One Vision-Language-Action Model for GUI …
|
75.10
|
2024-11-26
|
|
ShowUI-G
|
ShowUI: One Vision-Language-Action Model for GUI …
|
75.00
|
2024-11-26
|
|
UGround
|
Navigating the Digital World as Humans Do: Univer…
|
73.30
|
2024-10-07
|
|
OmniParser
|
OmniParser for Pure Vision Based GUI Agent
|
73.00
|
2024-08-01
|
|
OS-Atlas-Base-4B
|
OS-ATLAS: A Foundation Action Model for Generalis…
|
68.00
|
2024-10-30
|
|
SeeClick
|
SeeClick: Harnessing GUI Grounding for Advanced V…
|
53.40
|
2024-01-17
|
|
CogAgent
|
CogAgent: A Visual Language Model for GUI Agents
|
47.40
|
2023-12-14
|
|
Qwen2-VL-7B
|
Qwen2-VL: Enhancing Vision-Language Model's Perce…
|
42.10
|
2024-09-18
|
|
Qwen-GUI
|
GUICourse: From General Vision Language Models to…
|
28.60
|
2024-06-17
|
|
MiniGPT-v2
|
MiniGPT-v2: large language model as a unified int…
|
5.70
|
2023-10-14
|
|
Groma
|
Groma: Localized Visual Tokenization for Groundin…
|
5.20
|
2024-04-19
|
|
Qwen-VL
|
Qwen-VL: A Versatile Vision-Language Model for Un…
|
5.20
|
2023-08-24
|
|