VizWiz

VizWiz-VQA

Dataset Information
Modalities
Images, Texts
Introduced
2018
License
Homepage

Overview

The VizWiz-VQA dataset originates from a natural visual question answering setting where blind people each took an image and recorded a spoken question about it, together with 10 crowdsourced answers per visual question. The proposed challenge addresses the following two tasks for this dataset: predict the answer to a visual question and (2) predict whether a visual question cannot be answered.

Source: https://vizwiz.org/tasks-and-datasets/vqa/
Image Source: https://vizwiz.org/tasks-and-datasets/vqa/

Variants: VizWiz Answer Differences 2019, VizWiz 2020 VQA, VizWiz 2020 test-dev, VizWiz 2020 test, VizWiz 2020 Answerability, VizWiz 2018 Answerability, VizWiz 2018, VizWiz

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Visual Question Answering Emu-I * Emu: Generative Pretraining in Multimodality 2023-07-11

Research Papers

Recent papers with results on this dataset: