VizWiz-VQA
The VizWiz-VQA dataset originates from a natural visual question answering setting where blind people each took an image and recorded a spoken question about it, together with 10 crowdsourced answers per visual question. The proposed challenge addresses the following two tasks for this dataset: predict the answer to a visual question and (2) predict whether a visual question cannot be answered.
Source: https://vizwiz.org/tasks-and-datasets/vqa/
Image Source: https://vizwiz.org/tasks-and-datasets/vqa/
Variants: VizWiz Answer Differences 2019, VizWiz 2020 VQA, VizWiz 2020 test-dev, VizWiz 2020 test, VizWiz 2020 Answerability, VizWiz 2018 Answerability, VizWiz 2018, VizWiz
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Visual Question Answering | Emu-I * | Emu: Generative Pretraining in Multimodality | 2023-07-11 |
Recent papers with results on this dataset: