TextVQA

Name: TextVQA
License: CC BY 4.0

Dataset Information

Modalities

Images, Texts

License

CC BY 4.0

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

TextVQA is a dataset to benchmark visual reasoning based on text in images.
TextVQA requires models to read and reason about text in images to answer questions about them. Specifically, models need to incorporate a new modality of text present in the images and reason over it to answer TextVQA questions.

Statistics
* 28,408 images from OpenImages
* 45,336 questions
* 453,360 ground truth answers

Variants: TextVQA Val, TextVQA Test, TextVQA test-standard, TextVQA

Associated Benchmarks

This dataset is used in 1 benchmark:

Visual Question Answering (VQA) - Metrics: Acc

Recent Benchmark Submissions

Task	Model	Paper	Date
Visual Question Answering (VQA)	Lyra-Pro	Lyra: An Efficient and Speech-Centric …	2024-12-12

Research Papers

Recent papers with results on this dataset:

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition (2024) -

External Links:

TextVQA

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview