VSR

Visual Spatial Reasoning

Dataset Information
Modalities
Images, Texts
Languages
English
Introduced
2022
Homepage

Overview

The Visual Spatial Reasoning (VSR) corpus is a collection of caption-image pairs with true/false labels. Each caption describes the spatial relation of two individual objects in the image, and a vision-language model (VLM) needs to judge whether the caption is correctly describing the image (True) or not (False).

Variants: VSR

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Visual Reasoning LXMERT Visual Spatial Reasoning 2022-04-30
Visual Reasoning ViLT Visual Spatial Reasoning 2022-04-30
Visual Reasoning CLIP (finetuned) Visual Spatial Reasoning 2022-04-30
Visual Reasoning CLIP (frozen) Visual Spatial Reasoning 2022-04-30
Visual Reasoning VisualBERT Visual Spatial Reasoning 2022-04-30

Research Papers

Recent papers with results on this dataset: