ML Research Wiki / Benchmarks / Visual Question Answering (VQA) / InfoSeek

InfoSeek

Visual Question Answering (VQA) Benchmark

Performance Over Time

📊 Showing 6 results | 📏 Metric: Accuracy

Top Performing Models

Rank Model Paper Accuracy Date Code
1 RA-VQAv2 w/ PreFLMR PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers 30.65 2024-02-13 📦 linweizhedragon/retrieval-augmented-visual-question-answering
2 PaLI-X PaLI-X: On Scaling up a Multilingual Vision and Language Model 24.00 2023-05-29 📦 kyegomez/PALI 📦 doc-doc/NExT-OE
3 CLIP + FiD Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? 20.90 2023-02-23 📦 open-vision-language/infoseek 📦 edchengg/infoseek_eval
4 CLIP + PaLM (540B) Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? 20.40 2023-02-23 📦 open-vision-language/infoseek 📦 edchengg/infoseek_eval
5 PaLI Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? 19.70 2023-02-23 📦 open-vision-language/infoseek 📦 edchengg/infoseek_eval
6 BLIP2 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models 14.60 2023-01-30 📦 huggingface/transformers 📦 salesforce/lavis 📦 thudm/visualglm-6b

All Papers (6)