RetVQA

Name: RetVQA
Published: 2023-06-29
License: MIT License

Retrieval-Based Visual Question Answering

Dataset Information

Modalities

Images, Texts

Languages

English

Introduced

2023

License

MIT License

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

The RetVQA dataset is a large-scale dataset designed for Retrieval-Based Visual Question Answering (RetVQA). RetVQA is a more challenging task than traditional VQA, as it requires models to retrieve relevant images from a pool of images before answering a question. The need for RetVQA stems from the fact that information needed to answer a question may be spread across multiple images.

Here is a detailed summary of the RetVQA dataset:

It is 20 times larger than the closest dataset in this setting, WebQA.
It was derived from the Visual Genome dataset, utilising its questions and annotations of images.
It has 418K unique questions and 16,205 unique precise answers.
The questions are designed to be metadata-independent, meaning they do not rely on information such as captions or tags.
The questions are divided into five categories:
- color
- shape
- count
- object-attributes
- relation-based.
The dataset includes both binary (yes/no) questions and open-ended questions that require a generative answer.
All answers are free-form and fluent, even for binary questions. For example, a binary question may be "Do the rose and sunflower share the same colour?", and a corresponding answer would be "No, the rose and sunflower do not share the same colour".
Every question in RetVQA requires reasoning over multiple images to arrive at the answer. This contrasts with datasets like WebQA, where a majority of questions can be answered using a single image.
The dataset has, on average, two relevant images and 24.5 irrelevant images per question. This makes it more challenging than datasets like ISVQA, where images are homogeneous and no explicit retrieval is needed.

Variants: RetVQA

Associated Benchmarks

This dataset is used in 1 benchmark:

Visual Question Answering (VQA) - Metrics: Accuarcy, Accuracy * Fluency

Recent Benchmark Submissions

Task	Model	Paper	Date
Visual Question Answering (VQA)	MI-BART	Answer Mining from a Pool …	2023-06-29

Research Papers

Recent papers with results on this dataset:

Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering (2023) -

External Links:

RetVQA

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview