InfographicVQA

Dataset Information
Modalities
Images, Texts
Languages
English
Introduced
2021
License
Unknown
Homepage

Overview

InfographicVQA is a dataset that comprises a diverse collection of infographics along with natural language questions and answers annotations. The collected questions require methods to jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with emphasis on questions that require elementary reasoning and basic arithmetic skills.

Variants: InfographicVQA

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Visual Question Answering (VQA) ChatGPT 3.5 with LAPDoc Prompt (SpatialFormat) LAPDoc: Layout-Aware Prompting for Documents 2024-02-15
Visual Question Answering (VQA) ScreenAI 5B (4.62 B params, w/ OCR) ScreenAI: A Vision-Language Model for … 2024-02-07
Visual Question Answering (VQA) Gemini Ultra (pixel only) Gemini: A Family of Highly … 2023-12-19
Visual Question Answering (VQA) SMoLA-PaLI-X Specialist Omni-SMoLA: Boosting Generalist Multimodal Models … 2023-12-01
Visual Question Answering (VQA) SMoLA-PaLI-X Generalist Omni-SMoLA: Boosting Generalist Multimodal Models … 2023-12-01
Visual Question Answering (VQA) PaLI-3 (w/ OCR) PaLI-3 Vision Language Models: Smaller, … 2023-10-13
Visual Question Answering (VQA) PaLI-3 PaLI-3 Vision Language Models: Smaller, … 2023-10-13
Visual Question Answering (VQA) DocFormerv2-large DocFormerv2: Local Features for Document … 2023-06-02
Visual Question Answering (VQA) Claude + LATIN-Prompt Layout and Task Aware Instruction … 2023-06-01
Visual Question Answering (VQA) GPT-3.5 + LATIN-Prompt Layout and Task Aware Instruction … 2023-06-01
Visual Question Answering (VQA) PaLI-X (Single-task FT w/ OCR) PaLI-X: On Scaling up a … 2023-05-29
Visual Question Answering (VQA) PaLI-X (Multi-task FT) PaLI-X: On Scaling up a … 2023-05-29
Visual Question Answering (VQA) PaLI-X (Single-task FT) PaLI-X: On Scaling up a … 2023-05-29
Visual Question Answering (VQA) DUBLIN (variable resolution) DUBLIN -- Document Understanding By … 2023-05-23
Visual Question Answering (VQA) DUBLIN DUBLIN -- Document Understanding By … 2023-05-23
Visual Question Answering (VQA) MatCha MatCha: Enhancing Visual Language Pretraining … 2022-12-19
Visual Question Answering (VQA) UDOP (aux) Unifying Vision, Text, and Layout … 2022-12-05
Visual Question Answering (VQA) UDOP Unifying Vision, Text, and Layout … 2022-12-05
Visual Question Answering (VQA) Pix2Struct-large Pix2Struct: Screenshot Parsing as Pretraining … 2022-10-07
Visual Question Answering (VQA) Pix2Struct-base Pix2Struct: Screenshot Parsing as Pretraining … 2022-10-07

Research Papers

Recent papers with results on this dataset: