Room-to-Room
R2R is a dataset for visually-grounded natural language navigation in real buildings. The dataset requires autonomous agents to follow human-generated navigation instructions in previously unseen buildings, as illustrated in the demo above. For training, each instruction is associated with a Matterport3D Simulator trajectory. 22k instructions are available, with an average length of 29 words. There is a test evaluation server for this dataset available at EvalAI.
Variants: Room2Room, R2R
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Visual Navigation | SUSA | Agent Journey Beyond RGB: Unveiling … | 2024-12-09 |
Visual Navigation | NaviLLM | Towards Learning a Generalist Model … | 2023-12-04 |
Visual Navigation | VLN-PETL | VLN-PETL: Parameter-Efficient Transfer Learning for … | 2023-08-20 |
Visual Navigation | Meta-Explore | Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation … | 2023-03-07 |
Visual Navigation | BEV-BERT | BEVBert: Multimodal Map Pre-training for … | 2022-12-08 |
Visual Navigation | HOP | HOP: History-and-Order Aware Pre-training for … | 2022-03-22 |
Visual Navigation | DUET | Think Global, Act Local: Dual-scale … | 2022-02-23 |
Visual Navigation | VLN-BERT | A Recurrent Vision-and-Language BERT for … | 2020-11-26 |
Visual Navigation | Prevalent | Towards Learning a Generic Agent … | 2020-02-25 |
Visual Navigation | RCM+SIL(no early exploration) | Reinforced Cross-Modal Matching and Self-Supervised … | 2018-11-25 |
Visual Navigation | Seq2Seq baseline | Vision-and-Language Navigation: Interpreting visually-grounded navigation … | 2017-11-20 |
Recent papers with results on this dataset: