ChartPaLI-5B + PaLM 2-S
|
Chart-based Reasoning: Transferring Capabilities …
|
81.30
|
2024-03-19
|
|
Gemini Ultra
|
Gemini: A Family of Highly Capable Multimodal Mod…
|
80.80
|
2023-12-19
|
|
DePlot+FlanPaLM+Codex (PoT Self-Consistency)
|
DePlot: One-shot visual language reasoning by plo…
|
79.30
|
2022-12-20
|
|
ChartPaLI-5B
|
Chart-based Reasoning: Transferring Capabilities …
|
77.30
|
2024-03-19
|
|
DePlot+Codex (PoT Self-Consistency)
|
DePlot: One-shot visual language reasoning by plo…
|
76.70
|
2022-12-20
|
|
ScreenAI 5B (4.62 B params, w/ OCR)
|
ScreenAI: A Vision-Language Model for UI and Info…
|
76.70
|
2024-02-07
|
|
SMoLA-PaLI-X Specialist Model
|
Omni-SMoLA: Boosting Generalist Multimodal Models…
|
74.60
|
2023-12-01
|
|
SMoLA-PaLI-X Generalist Model
|
Omni-SMoLA: Boosting Generalist Multimodal Models…
|
73.80
|
2023-12-01
|
|
MatCha4096 + LaMenDa
|
Synthesize Step-by-Step: Tools, Templates and LLM…
|
72.64
|
2024-01-01
|
|
PaLI-X (Single-task FT w/ OCR)
|
PaLI-X: On Scaling up a Multilingual Vision and L…
|
72.30
|
2023-05-29
|
|
PaLI-X (Single-task FT)
|
PaLI-X: On Scaling up a Multilingual Vision and L…
|
70.90
|
2023-05-29
|
|
PaLI-X (Multi-task FT)
|
PaLI-X: On Scaling up a Multilingual Vision and L…
|
70.60
|
2023-05-29
|
|
DePlot+FlanPaLM (Self-Consistency)
|
DePlot: One-shot visual language reasoning by plo…
|
70.50
|
2022-12-20
|
|
PaLI-3
|
PaLI-3 Vision Language Models: Smaller, Faster, S…
|
70.00
|
2023-10-13
|
|
PaLI-3 (w/ OCR)
|
PaLI-3 Vision Language Models: Smaller, Faster, S…
|
69.50
|
2023-10-13
|
|
DePlot+FlanPaLM (CoT)
|
DePlot: One-shot visual language reasoning by plo…
|
67.30
|
2022-12-20
|
|
Qwen-VL-Chat
|
Qwen-VL: A Versatile Vision-Language Model for Un…
|
66.30
|
2023-08-24
|
|
UniChart
|
UniChart: A Universal Vision-language Pretrained …
|
66.24
|
2023-05-24
|
|
Qwen-VL
|
Qwen-VL: A Versatile Vision-Language Model for Un…
|
65.70
|
2023-08-24
|
|
StructChart+GPT3.5 (STR ChartQA+SimChart9K)
|
StructChart: On the Schema, Metric, and Augmentat…
|
65.30
|
2023-09-20
|
|
MatCha
|
MatCha: Enhancing Visual Language Pretraining wit…
|
64.20
|
2022-12-19
|
|
StructChart+GPT3.5 (STR)
|
StructChart: On the Schema, Metric, and Augmentat…
|
60.70
|
2023-09-20
|
|
Pix2Struct-large
|
Pix2Struct: Screenshot Parsing as Pretraining for…
|
58.60
|
2022-10-07
|
|
Pix2Struct-base
|
Pix2Struct: Screenshot Parsing as Pretraining for…
|
56.00
|
2022-10-07
|
|
VisionTapas-OCR
|
ChartQA: A Benchmark for Question Answering about…
|
45.50
|
2022-03-19
|
|
DePlot+GPT3 (Self-Consistency)
|
DePlot: One-shot visual language reasoning by plo…
|
42.30
|
2022-12-20
|
|
DePlot+GPT3 (CoT)
|
DePlot: One-shot visual language reasoning by plo…
|
36.90
|
2022-12-20
|
|