CLEVR

Compositional Language and Elementary Visual Reasoning

Dataset Information
Modalities
Images, Texts
Introduced
2016
License
Homepage

Overview

CLEVR (Compositional Language and Elementary Visual Reasoning) is a synthetic Visual Question Answering dataset. It contains images of 3D-rendered objects; each image comes with a number of highly compositional questions that fall into different categories. Those categories fall into 5 classes of tasks: Exist, Count, Compare Integer, Query Attribute and Compare Attribute. The CLEVR dataset consists of: a training set of 70k images and 700k questions, a validation set of 15k images and 150k questions, a test set of 15k images and 150k questions about objects, answers, scene graphs and functional programs for all train and validation images and questions. Each object present in the scene, aside of position, is characterized by a set of four attributes: 2 sizes: large, small, 3 shapes: square, cylinder, sphere, 2 material types: rubber, metal, 8 color types: gray, blue, brown, yellow, red, green, purple, cyan, resulting in 96 unique combinations.

Source: On transfer learning using a MAC model variant
Image Source: Johnson et al

Variants: CLEVR, CLEVR-CoGenT

Associated Benchmarks

This dataset is used in 3 benchmarks:

Recent Benchmark Submissions

Task Model Paper Date
Visual Question Answering (VQA) NeSyCoCo NeSyCoCo: A Neuro-Symbolic Concept Composer … 2024-12-20
Visual Question Answering NeSyCoCo Neuro-Symbolic NeSyCoCo: A Neuro-Symbolic Concept Composer … 2024-12-20
Image Generation Projected GAN Projected GANs Converge Faster 2021-11-01
Visual Question Answering (VQA) MDETR MDETR -- Modulated Detection for … 2021-04-26
Image Generation SAGAN Generative Adversarial Transformers 2021-03-01
Image Generation VQGAN Generative Adversarial Transformers 2021-03-01
Image Generation GAN Generative Adversarial Transformers 2021-03-01
Image Generation StyleGAN2 Generative Adversarial Transformers 2021-03-01
Image Generation GANformer Generative Adversarial Transformers 2021-03-01
Visual Question Answering (VQA) OCCAM (ours) Interpretable Visual Reasoning via Induced … 2020-11-23
Visual Question Answering (VQA) single-hop + LCGN (ours) Language-Conditioned Graph Networks for Relational … 2019-05-10
Visual Question Answering (VQA) NS-CL The Neuro-Symbolic Concept Learner: Interpreting … 2019-04-26
Visual Question Answering (VQA) XNM-Det supervised Explainable and Explicit Visual Reasoning … 2018-12-05
Visual Question Answering (VQA) NS-VQA (1K programs) Neural-Symbolic VQA: Disentangling Reasoning from … 2018-10-04
Visual Question Answering (VQA) QGHC+Att+Concat Question-Guided Hybrid Convolution for Visual … 2018-08-08
Visual Question Answering (VQA) CNN + LSTM + RN + HAN Learning Visual Question Answering by … 2018-08-01
Visual Question Answering (VQA) DDRprog* DDRprog: A CLEVR Differentiable Dynamic … 2018-03-30
Visual Question Answering (VQA) TbD + reg + hres Transparency by Design: Closing the … 2018-03-14
Visual Question Answering (VQA) MAC Compositional Attention Networks for Machine … 2018-03-08
Visual Question Answering (VQA) CNN+GRU+FiLM FiLM: Visual Reasoning with a … 2017-09-22

Research Papers

Recent papers with results on this dataset: