CLEVR-X

Name: CLEVR-X
Published: 2022-04-05
License: BSD-3-Clause License

Dataset Information

Modalities

Images

Introduced

2022

License

BSD-3-Clause License

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

CLEVR-X is a dataset that extends the CLEVR dataset with natural language explanations in the context of VQA. It consists of 3.6 million natural language explanations for 850k question-image pairs.

For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given question.

The CLEVR-X dataset consists of:

A training set of 2,401,275 natural language explanations for 70,000 images.
A validation set of 599,711 natural language explanations for 14,000 images.
A test set of 644,151 natural language explanations for 15,000 images.

Variants: CLEVR-X

Associated Benchmarks

This dataset is used in 1 benchmark:

Explanation Generation - Metrics: B4, M, RL, C, Acc

Recent Benchmark Submissions

Task	Model	Paper	Date
Explanation Generation	PJ-X	CLEVR-X: A Visual Reasoning Dataset …	2022-04-05
Explanation Generation	FM	CLEVR-X: A Visual Reasoning Dataset …	2022-04-05

Research Papers

Recent papers with results on this dataset:

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations (2022) -

External Links:

CLEVR-X

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview