Gemini Ultra (pixel only)
|
Gemini: A Family of Highly Capable Multimodal Mod…
|
80.30
|
2023-12-19
|
|
SMoLA-PaLI-X Specialist
|
Omni-SMoLA: Boosting Generalist Multimodal Models…
|
66.20
|
2023-12-01
|
|
ScreenAI 5B (4.62 B params, w/ OCR)
|
ScreenAI: A Vision-Language Model for UI and Info…
|
65.90
|
2024-02-07
|
|
SMoLA-PaLI-X Generalist
|
Omni-SMoLA: Boosting Generalist Multimodal Models…
|
65.60
|
2023-12-01
|
|
UDOP (aux)
|
Unifying Vision, Text, and Layout for Universal D…
|
63.00
|
2022-12-05
|
|
PaLI-3 (w/ OCR)
|
PaLI-3 Vision Language Models: Smaller, Faster, S…
|
62.40
|
2023-10-13
|
|
TILT-Large
|
Going Full-TILT Boogie on Document Understanding …
|
61.20
|
2021-02-18
|
|
PaLI-3
|
PaLI-3 Vision Language Models: Smaller, Faster, S…
|
57.80
|
2023-10-13
|
|
ChatGPT 3.5 with LAPDoc Prompt (SpatialFormat)
|
LAPDoc: Layout-Aware Prompting for Documents
|
54.90
|
2024-02-15
|
|
PaLI-X (Single-task FT w/ OCR)
|
PaLI-X: On Scaling up a Multilingual Vision and L…
|
54.80
|
2023-05-29
|
|
Claude + LATIN-Prompt
|
Layout and Task Aware Instruction Prompt for Zero…
|
54.51
|
2023-06-01
|
|
PaLI-X (Multi-task FT)
|
PaLI-X: On Scaling up a Multilingual Vision and L…
|
50.70
|
2023-05-29
|
|
PaLI-X (Single-task FT)
|
PaLI-X: On Scaling up a Multilingual Vision and L…
|
49.20
|
2023-05-29
|
|
GPT-3.5 + LATIN-Prompt
|
Layout and Task Aware Instruction Prompt for Zero…
|
48.98
|
2023-06-01
|
|
DocFormerv2-large
|
DocFormerv2: Local Features for Document Understa…
|
48.80
|
2023-06-02
|
|
UDOP
|
Unifying Vision, Text, and Layout for Universal D…
|
47.40
|
2022-12-05
|
|
DUBLIN (variable resolution)
|
DUBLIN -- Document Understanding By Language-Imag…
|
42.60
|
2023-05-23
|
|
Pix2Struct-large
|
Pix2Struct: Screenshot Parsing as Pretraining for…
|
40.00
|
2022-10-07
|
|
Pix2Struct-base
|
Pix2Struct: Screenshot Parsing as Pretraining for…
|
38.20
|
2022-10-07
|
|
MatCha
|
MatCha: Enhancing Visual Language Pretraining wit…
|
37.20
|
2022-12-19
|
|
DUBLIN
|
DUBLIN -- Document Understanding By Language-Imag…
|
36.82
|
2023-05-23
|
|