We collect a new dataset of human-posed free-form natural language questions about CLEVR images. Many of these questions have out-of-vocabulary words and require reasoning skills that are absent from our model’s repertoire
Variants: CLEVR-Humans
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Visual Question Answering (VQA) | MDETR | MDETR -- Modulated Detection for … | 2021-04-26 |
Visual Question Answering (VQA) | NS-VQA (1K programs) | Neural-Symbolic VQA: Disentangling Reasoning from … | 2018-10-04 |
Visual Question Answering (VQA) | MAC | Compositional Attention Networks for Machine … | 2018-03-08 |
Visual Question Answering (VQA) | CNN+GRU+FiLM | FiLM: Visual Reasoning with a … | 2017-09-22 |
Visual Question Answering (VQA) | IEP-18K | Inferring and Executing Programs for … | 2017-05-10 |
Recent papers with results on this dataset: