ML Research Wiki / Benchmarks / Chart Question Answering / ChartQA

ChartQA

Chart Question Answering Benchmark

Performance Over Time

📊 Showing 27 results | 📏 Metric: 1:1 Accuracy

Top Performing Models

Rank	Model	Paper	1:1 Accuracy	Date	Code
1	ChartPaLI-5B + PaLM 2-S 📚	Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs	81.30	2024-03-19	-
2	Gemini Ultra	Gemini: A Family of Highly Capable Multimodal Models	80.80	2023-12-19	📦 valdecy/pybibx
3	DePlot+FlanPaLM+Codex (PoT Self-Consistency)	DePlot: One-shot visual language reasoning by plot-to-table translation	79.30	2022-12-20	📦 huggingface/transformers
4	ChartPaLI-5B 📚	Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs	77.30	2024-03-19	-
5	DePlot+Codex (PoT Self-Consistency)	DePlot: One-shot visual language reasoning by plot-to-table translation	76.70	2022-12-20	📦 huggingface/transformers
6	ScreenAI 5B (4.62 B params, w/ OCR) 📚	ScreenAI: A Vision-Language Model for UI and Infographics Understanding	76.70	2024-02-07	📦 google-research-datasets/screen_qa 📦 google-research-datasets/screen_annotation
7	SMoLA-PaLI-X Specialist Model 📚	Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts	74.60	2023-12-01	-
8	SMoLA-PaLI-X Generalist Model 📚	Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts	73.80	2023-12-01	-
9	MatCha4096 + LaMenDa 📚	Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA	72.64	2024-01-01	-
10	PaLI-X (Single-task FT w/ OCR) 📚	PaLI-X: On Scaling up a Multilingual Vision and Language Model	72.30	2023-05-29	📦 kyegomez/PALI 📦 doc-doc/NExT-OE

All Papers (27)

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

2024

ChartPaLI-5B + PaLM 2-S

Gemini: A Family of Highly Capable Multimodal Models

2023

Gemini Ultra

valdecy/pybibx

DePlot: One-shot visual language reasoning by plot-to-table translation

2022

DePlot+FlanPaLM+Codex (PoT Self-Consistency)

huggingface/transformers

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

2024

ChartPaLI-5B

DePlot: One-shot visual language reasoning by plot-to-table translation

2022

DePlot+Codex (PoT Self-Consistency)

huggingface/transformers

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

2024

ScreenAI 5B (4.62 B params, w/ OCR)

google-research-datasets/screen_qa google-research-datasets/screen_annotation

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

2023

SMoLA-PaLI-X Specialist Model

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

2023

SMoLA-PaLI-X Generalist Model

Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

2024

MatCha4096 + LaMenDa

PaLI-X: On Scaling up a Multilingual Vision and Language Model

2023

PaLI-X (Single-task FT w/ OCR)

kyegomez/PALI doc-doc/NExT-OE

PaLI-X: On Scaling up a Multilingual Vision and Language Model

2023

PaLI-X (Single-task FT)

kyegomez/PALI doc-doc/NExT-OE

PaLI-X: On Scaling up a Multilingual Vision and Language Model

2023

PaLI-X (Multi-task FT)

kyegomez/PALI doc-doc/NExT-OE

DePlot: One-shot visual language reasoning by plot-to-table translation

2022

DePlot+FlanPaLM (Self-Consistency)

huggingface/transformers

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

2023

PaLI-3

kyegomez/PALI3

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

2023

PaLI-3 (w/ OCR)

kyegomez/PALI3

DePlot: One-shot visual language reasoning by plot-to-table translation

2022

DePlot+FlanPaLM (CoT)

huggingface/transformers

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

2023

Qwen-VL-Chat

qwenlm/qwen-vl brandon3964/multimodal-task-vector

UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning

2023

UniChart

vis-nlp/unichart

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

2023

Qwen-VL

qwenlm/qwen-vl brandon3964/multimodal-task-vector

StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding

2023

StructChart+GPT3.5 (STR ChartQA+SimChart9K)

alpha-innovator/chartvlm unimodal4reasoning/chartvlm unimodal4reasoning/simchart9k

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

2022

MatCha

huggingface/transformers

StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding

2023

StructChart+GPT3.5 (STR)

alpha-innovator/chartvlm unimodal4reasoning/chartvlm unimodal4reasoning/simchart9k

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

2022

Pix2Struct-large

huggingface/transformers google-research/pix2struct

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

2022

Pix2Struct-base

huggingface/transformers google-research/pix2struct

ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning

2022

VisionTapas-OCR

vis-nlp/chartqa

DePlot: One-shot visual language reasoning by plot-to-table translation

2022

DePlot+GPT3 (Self-Consistency)

huggingface/transformers

DePlot: One-shot visual language reasoning by plot-to-table translation

2022

DePlot+GPT3 (CoT)

huggingface/transformers

ChartQA

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (27)

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Gemini: A Family of Highly Capable Multimodal Models

DePlot: One-shot visual language reasoning by plot-to-table translation

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

DePlot: One-shot visual language reasoning by plot-to-table translation

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

PaLI-X: On Scaling up a Multilingual Vision and Language Model

PaLI-X: On Scaling up a Multilingual Vision and Language Model

PaLI-X: On Scaling up a Multilingual Vision and Language Model

DePlot: One-shot visual language reasoning by plot-to-table translation

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

DePlot: One-shot visual language reasoning by plot-to-table translation

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning

DePlot: One-shot visual language reasoning by plot-to-table translation

DePlot: One-shot visual language reasoning by plot-to-table translation

Model	Paper	1:1 Accuracy	Date
ChartPaLI-5B + PaLM 2-S	Chart-based Reasoning: Transferring Capabilities …	81.30	2024-03-19
Gemini Ultra	Gemini: A Family of Highly Capable Multimodal Mod…	80.80	2023-12-19
DePlot+FlanPaLM+Codex (PoT Self-Consistency)	DePlot: One-shot visual language reasoning by plo…	79.30	2022-12-20
ChartPaLI-5B	Chart-based Reasoning: Transferring Capabilities …	77.30	2024-03-19
DePlot+Codex (PoT Self-Consistency)	DePlot: One-shot visual language reasoning by plo…	76.70	2022-12-20
ScreenAI 5B (4.62 B params, w/ OCR)	ScreenAI: A Vision-Language Model for UI and Info…	76.70	2024-02-07
SMoLA-PaLI-X Specialist Model	Omni-SMoLA: Boosting Generalist Multimodal Models…	74.60	2023-12-01
SMoLA-PaLI-X Generalist Model	Omni-SMoLA: Boosting Generalist Multimodal Models…	73.80	2023-12-01
MatCha4096 + LaMenDa	Synthesize Step-by-Step: Tools, Templates and LLM…	72.64	2024-01-01
PaLI-X (Single-task FT w/ OCR)	PaLI-X: On Scaling up a Multilingual Vision and L…	72.30	2023-05-29
PaLI-X (Single-task FT)	PaLI-X: On Scaling up a Multilingual Vision and L…	70.90	2023-05-29
PaLI-X (Multi-task FT)	PaLI-X: On Scaling up a Multilingual Vision and L…	70.60	2023-05-29
DePlot+FlanPaLM (Self-Consistency)	DePlot: One-shot visual language reasoning by plo…	70.50	2022-12-20
PaLI-3	PaLI-3 Vision Language Models: Smaller, Faster, S…	70.00	2023-10-13
PaLI-3 (w/ OCR)	PaLI-3 Vision Language Models: Smaller, Faster, S…	69.50	2023-10-13
DePlot+FlanPaLM (CoT)	DePlot: One-shot visual language reasoning by plo…	67.30	2022-12-20
Qwen-VL-Chat	Qwen-VL: A Versatile Vision-Language Model for Un…	66.30	2023-08-24
UniChart	UniChart: A Universal Vision-language Pretrained …	66.24	2023-05-24
Qwen-VL	Qwen-VL: A Versatile Vision-Language Model for Un…	65.70	2023-08-24
StructChart+GPT3.5 (STR ChartQA+SimChart9K)	StructChart: On the Schema, Metric, and Augmentat…	65.30	2023-09-20
MatCha	MatCha: Enhancing Visual Language Pretraining wit…	64.20	2022-12-19
StructChart+GPT3.5 (STR)	StructChart: On the Schema, Metric, and Augmentat…	60.70	2023-09-20
Pix2Struct-large	Pix2Struct: Screenshot Parsing as Pretraining for…	58.60	2022-10-07
Pix2Struct-base	Pix2Struct: Screenshot Parsing as Pretraining for…	56.00	2022-10-07
VisionTapas-OCR	ChartQA: A Benchmark for Question Answering about…	45.50	2022-03-19
DePlot+GPT3 (Self-Consistency)	DePlot: One-shot visual language reasoning by plo…	42.30	2022-12-20
DePlot+GPT3 (CoT)	DePlot: One-shot visual language reasoning by plo…	36.90	2022-12-20