← ML Research Wiki / 2402.10329

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

Cheng Chi Stanford University Columbia University, Zhenjia Xu Stanford University Columbia University, Chuer Pan Stanford University, Eric Cousineau Toyota Research Insititute https, Benjamin Burchfiel Toyota Research Insititute https, Siyuan Feng Toyota Research Insititute https, Russ Tedrake Toyota Research Insititute https, Shuran Song Stanford University Columbia University (2024)

Paper Information

arXiv ID

2402.10329

Venue

Robotics: Science and Systems

Domain

robotics

SOTA Claim

Yes

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

umi-gripper.github.iofor Any Actions (action diversity) Human Demonstration in Any Environment (visual diversity) for Many Robots (embodiment diversity) Dynamic Bimanual Precise Long-Horizon 7DoF 6DoF Fig. 1: Universal Manipulation Interface (UMI) is a portable, intuitive, low-cost data collection and policy learning framework.This framework allows us to transfer diverse human demonstrations to effective visuomotor policies.We showcase the framework for tasks that would be difficult with traditional teleoperation, such as dynamic, precise, bimanual and long-horizon tasks.

Summary

The paper presents the Universal Manipulation Interface (UMI), a framework designed to enable in-the-wild robot teaching without the need for physical robots during data collection. UMI allows users to transfer diverse human demonstrations into effective visuomotor policies. To achieve this, the authors identify critical issues such as insufficient visual context, action imprecision, latency discrepancies, and insufficient policy representation that hinder action transfer in previous works. The UMI system incorporates a handheld gripper with a Fisheye lens for enhanced visual context, utilizes the IMU data for action precision, and employs a Diffusion Policy for modeling multimodal action distributions. The paper showcases UMI’s capability to perform a variety of complex manipulation tasks, exhibiting high transferability and generalization to novel environments and objects. Experimental results indicate a 70% success rate in out-of-distribution tests, demonstrating the effectiveness of the UMI framework for robotic manipulation and its potential for widespread data collection across diverse settings.

Methods

This paper employs the following methods:

Diffusion Policy
SLAM

Models Used

None specified

Datasets

The following datasets were used in this research:

None specified

Evaluation Metrics

None specified

Results

70% success rate in out-of-distribution tests
20/20 completion of cup arrangement task
87.5% success rate in dynamic tossing task
14/20 success rate in bimanual cloth folding task
70% success rate in dish washing task

Limitations

The authors identified the following limitations:

Dependence on data filtering for kinematic feasibility
Limited efficiency compared to human demonstration
Sensitivities of SLAM to texture in environment

Technical Requirements

Number of GPUs: None specified
GPU Type: None specified

Keywords

Manipulation Human demonstration Visual-inertial SLAM Diffusion Policy In-the-wild robot data

Papers Using Similar Methods

External Resources

Funding: Toyota Research Institute, NSF
References: 55
Influential Citations: 15

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers