R2R

Room-to-Room

Dataset Information
Modalities
Images, Texts, Interactive
Introduced
2018
Homepage

Overview

R2R is a dataset for visually-grounded natural language navigation in real buildings. The dataset requires autonomous agents to follow human-generated navigation instructions in previously unseen buildings, as illustrated in the demo above. For training, each instruction is associated with a Matterport3D Simulator trajectory. 22k instructions are available, with an average length of 29 words. There is a test evaluation server for this dataset available at EvalAI.

Source: Natural language interaction with robots

Variants: Room2Room, R2R

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Visual Navigation SUSA Agent Journey Beyond RGB: Unveiling … 2024-12-09
Visual Navigation NaviLLM Towards Learning a Generalist Model … 2023-12-04
Visual Navigation VLN-PETL VLN-PETL: Parameter-Efficient Transfer Learning for … 2023-08-20
Visual Navigation Meta-Explore Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation … 2023-03-07
Visual Navigation BEV-BERT BEVBert: Multimodal Map Pre-training for … 2022-12-08
Visual Navigation HOP HOP: History-and-Order Aware Pre-training for … 2022-03-22
Visual Navigation DUET Think Global, Act Local: Dual-scale … 2022-02-23
Visual Navigation VLN-BERT A Recurrent Vision-and-Language BERT for … 2020-11-26
Visual Navigation Prevalent Towards Learning a Generic Agent … 2020-02-25
Visual Navigation RCM+SIL(no early exploration) Reinforced Cross-Modal Matching and Self-Supervised … 2018-11-25
Visual Navigation Seq2Seq baseline Vision-and-Language Navigation: Interpreting visually-grounded navigation … 2017-11-20

Research Papers

Recent papers with results on this dataset: