CLEVR-X is a dataset that extends the CLEVR dataset with natural language explanations in the context of VQA. It consists of 3.6 million natural language explanations for 850k question-image pairs.
For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given question.
The CLEVR-X dataset consists of:
Variants: CLEVR-X
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Explanation Generation | PJ-X | CLEVR-X: A Visual Reasoning Dataset … | 2022-04-05 |
Explanation Generation | FM | CLEVR-X: A Visual Reasoning Dataset … | 2022-04-05 |
Recent papers with results on this dataset: