ML Research Wiki / Benchmarks / Visual Question Answering (VQA) / A-OKVQA

A-OKVQA

Visual Question Answering (VQA) Benchmark

Performance Over Time

📊 Showing 15 results | 📏 Metric: MC Accuracy

Top Performing Models

Rank	Model	Paper	MC Accuracy	Date	Code
1	SMoLA-PaLI-X Specialist Model 📚	Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts	83.75	2023-12-01	-
2	PaLI-X-VPD	Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models	80.40	2023-12-05	-
3	Prophet	Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering	75.10	2023-03-03	📦 milvlg/prophet
4	PromptCap	PromptCap: Prompt-Guided Task-Aware Image Captioning	73.20	2022-11-15	📦 Yushi-Hu/PromptCap
5	MC-CoT	Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training	71.00	2023-11-23	📦 chengtan9907/mc-cot
6	A Simple Baseline for KB-VQA	A Simple Baseline for Knowledge-Based Visual Question Answering	57.50	2023-10-20	-
7	HYDRA	HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning	56.35	2024-03-19	📦 ControlNet/HYDRA
8	GPV-2	Webly Supervised Concept Expansion for General Purpose Vision Models	53.70	2022-02-04	-
9	KRISP	KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA	42.20	2020-12-20	-
10	ViLBERT - VQA	ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks	42.10	2019-08-06	📦 facebookresearch/vilbert-multi-task 📦 allenai/allennlp-models 📦 jiasenlu/vilbert_beta

All Papers (15)

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

2023

SMoLA-PaLI-X Specialist Model

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

2023

PaLI-X-VPD

Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering

2023

Prophet

milvlg/prophet

PromptCap: Prompt-Guided Task-Aware Image Captioning

2022

PromptCap

Yushi-Hu/PromptCap

Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training

2023

MC-CoT

chengtan9907/mc-cot

A Simple Baseline for Knowledge-Based Visual Question Answering

2023

A Simple Baseline for KB-VQA

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

2024

HYDRA

ControlNet/HYDRA

Webly Supervised Concept Expansion for General Purpose Vision Models

2022

GPV-2

KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA

2020

KRISP

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

2019

ViLBERT - VQA

facebookresearch/vilbert-multi-task allenai/allennlp-models

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

2019

LXMERT

huggingface/transformers airsplay/lxmert

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

2019

ViLBERT

facebookresearch/vilbert-multi-task allenai/allennlp-models

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

2018

Pythia

facebookresearch/mmf facebookresearch/pythia

VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge

2022

VLC-BERT

aditya10/vlc-bert

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

2019

ViLBERT - OK-VQA

facebookresearch/vilbert-multi-task allenai/allennlp-models

A-OKVQA

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (15)

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering

PromptCap: Prompt-Guided Task-Aware Image Captioning

Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training

A Simple Baseline for Knowledge-Based Visual Question Answering

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

Webly Supervised Concept Expansion for General Purpose Vision Models

KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Model	Paper	MC Accuracy	Date
SMoLA-PaLI-X Specialist Model	Omni-SMoLA: Boosting Generalist Multimodal Models…	83.75	2023-12-01
PaLI-X-VPD	Visual Program Distillation: Distilling Tools and…	80.40	2023-12-05
Prophet	Prophet: Prompting Large Language Models with Com…	75.10	2023-03-03
PromptCap	PromptCap: Prompt-Guided Task-Aware Image Caption…	73.20	2022-11-15
MC-CoT	Boosting the Power of Small Multimodal Reasoning …	71.00	2023-11-23
A Simple Baseline for KB-VQA	A Simple Baseline for Knowledge-Based Visual Ques…	57.50	2023-10-20
HYDRA	HYDRA: A Hyper Agent for Dynamic Compositional Vi…	56.35	2024-03-19
GPV-2	Webly Supervised Concept Expansion for General Pu…	53.70	2022-02-04
KRISP	KRISP: Integrating Implicit and Symbolic Knowledg…	42.20	2020-12-20
ViLBERT - VQA	ViLBERT: Pretraining Task-Agnostic Visiolinguisti…	42.10	2019-08-06
LXMERT	LXMERT: Learning Cross-Modality Encoder Represent…	41.60	2019-08-20
ViLBERT	ViLBERT: Pretraining Task-Agnostic Visiolinguisti…	41.50	2019-08-06
Pythia	Pythia v0.1: the Winning Entry to the VQA Challen…	40.10	2018-07-26
VLC-BERT	VLC-BERT: Visual Question Answering with Contextu…	38.05	2022-10-24
ViLBERT - OK-VQA	ViLBERT: Pretraining Task-Agnostic Visiolinguisti…	34.10	2019-08-06