GQA

Dataset Information
Modalities
Images, Texts
Introduced
2019
License
Homepage

Overview

The GQA dataset is a large-scale visual question answering dataset with real images from the Visual Genome dataset and balanced question-answer pairs. Each training and validation image is also associated with scene graph annotations describing the classes and attributes of those objects in the scene, and their pairwise relations. Along with the images and question-answer pairs, the GQA dataset provides two types of pre-extracted visual features for each image – convolutional grid features of size 7×7×2048 extracted from a ResNet-101 network trained on ImageNet, and object detection features of size Ndet×2048 (where Ndet is the number of detected objects in each image with a maximum of 100 per image) from a Faster R-CNN detector.

Source: Language-Conditioned Graph Networks for Relational Reasoning
Image Source: https://arxiv.org/pdf/1902.09506.pdf

Variants: GQA test-std, GQA test-dev, GQA Test2019, GQA, GQA-OOD

Associated Benchmarks

This dataset is used in 5 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Visual Question Answering LocVLM-L Learning to Localize Objects Improves … 2024-04-11
Visual Question Answering (VQA) PEVL+ PEVL: Position-enhanced Pre-training and Prompt … 2022-05-23
Visual Question Answering (VQA) RelViT RelViT: Concept-guided Vision Transformer for … 2022-04-24
Graph Question Answering GraphVQA GraghVQA: Language-Guided Graph Neural Networks … 2021-04-20

Research Papers

Recent papers with results on this dataset: