← ML Research Wiki / 2305.16291

VOYAGER: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Jim " Fan, Anima Anandkumar, Nvidia, Caltech, U T Austin, U W Madison (2023)

Paper Information

arXiv ID

2305.16291

Venue

Trans. Mach. Learn. Res.

Domain

artificial intelligence, reinforcement learning, natural language processing

SOTA Claim

Yes

Reproducibility

7/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

We introduce VOYAGER, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention.VOYAGER consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement.VOYAGER interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning.The skills developed by VOYAGER are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting.Empirically, VOYAGER shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft.It obtains 3.3× more unique items, travels 2.3× longer distances, and unlocks key tech tree milestones up to 15.3× faster than prior SOTA.VOYAGER is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize.

Summary

This paper presents VOYAGER, the first embodied lifelong learning agent powered by large language models (LLMs) designed for exploring and learning in Minecraft. VOYAGER operates based on three key components: an automatic curriculum that maximizes exploration, a skill library that stores and retrieves complex behaviors, and an iterative prompting mechanism to improve program execution based on real-time feedback. The agent interacts with the environment without human intervention, showcasing capabilities such as acquiring skills, unlocking tech tree milestones, and generating executable code in a novel fashion. The empirical results demonstrate significant advancements over state-of-the-art methods, as VOYAGER is able to navigate longer distances, craft more unique items, and learn tasks efficiently in a new environment. The study explores the necessity of each module, highlighting the importance of the automatic curriculum and skill library in supporting the agent's learning process. Additionally, the research discusses limitations and suggests future improvements for increased accuracy in task execution.

Methods

This paper employs the following methods:

automatic curriculum
iterative prompting mechanism
skill library

Models Used

GPT-4

Datasets

The following datasets were used in this research:

None specified

Evaluation Metrics

None specified

Results

3.3× more unique items
15.3× faster tech tree milestones
2.3× longer distances traversed

Limitations

The authors identified the following limitations:

High cost of GPT-4 API
Occasional inaccuracies in code execution
Hallucinations in task proposals

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

embodied agents large language models lifelong learning Minecraft program synthesis automatic curriculum self-verification

External Resources

Funding: Not specified
References: 89
Influential Citations: 70

VOYAGER: An Open-Ended Embodied Agent with Large Language Models

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers