The GQA dataset is a large-scale visual question answering dataset with real images from the Visual Genome dataset and balanced question-answer pairs. Each training and validation image is also associated with scene graph annotations describing the classes and attributes of those objects in the scene, and their pairwise relations. Along with the images and question-answer pairs, the GQA dataset provides two types of pre-extracted visual features for each image – convolutional grid features of size 7×7×2048 extracted from a ResNet-101 network trained on ImageNet, and object detection features of size Ndet×2048 (where Ndet is the number of detected objects in each image with a maximum of 100 per image) from a Faster R-CNN detector.
Source: Language-Conditioned Graph Networks for Relational Reasoning
Image Source: https://arxiv.org/pdf/1902.09506.pdf
Variants: GQA test-std, GQA test-dev, GQA Test2019, GQA, GQA-OOD
This dataset is used in 5 benchmarks:
Task | Model | Paper | Date |
---|---|---|---|
Visual Question Answering | LocVLM-L | Learning to Localize Objects Improves … | 2024-04-11 |
Visual Question Answering (VQA) | PEVL+ | PEVL: Position-enhanced Pre-training and Prompt … | 2022-05-23 |
Visual Question Answering (VQA) | RelViT | RelViT: Concept-guided Vision Transformer for … | 2022-04-24 |
Graph Question Answering | GraphVQA | GraghVQA: Language-Guided Graph Neural Networks … | 2021-04-20 |
Recent papers with results on this dataset: