ML Research Wiki / Benchmarks / Visual Question Answering (VQA) / A-OKVQA

A-OKVQA

Visual Question Answering (VQA) Benchmark

Performance Over Time

📊 Showing 15 results | 📏 Metric: MC Accuracy

Top Performing Models

Rank Model Paper MC Accuracy Date Code
1 SMoLA-PaLI-X Specialist Model 📚 Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts 83.75 2023-12-01 -
2 PaLI-X-VPD Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models 80.40 2023-12-05 -
3 Prophet Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering 75.10 2023-03-03 📦 milvlg/prophet
4 PromptCap PromptCap: Prompt-Guided Task-Aware Image Captioning 73.20 2022-11-15 📦 Yushi-Hu/PromptCap
5 MC-CoT Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training 71.00 2023-11-23 📦 chengtan9907/mc-cot
6 A Simple Baseline for KB-VQA A Simple Baseline for Knowledge-Based Visual Question Answering 57.50 2023-10-20 -
7 HYDRA HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning 56.35 2024-03-19 📦 ControlNet/HYDRA
8 GPV-2 Webly Supervised Concept Expansion for General Purpose Vision Models 53.70 2022-02-04 -
9 KRISP KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA 42.20 2020-12-20 -
10 ViLBERT - VQA ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks 42.10 2019-08-06 📦 facebookresearch/vilbert-multi-task 📦 allenai/allennlp-models 📦 jiasenlu/vilbert_beta

All Papers (15)