← ML Research Wiki / 2402.10329

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

Cheng Chi Stanford University Columbia University, Zhenjia Xu Stanford University Columbia University, Chuer Pan Stanford University, Eric Cousineau Toyota Research Insititute https, Benjamin Burchfiel Toyota Research Insititute https, Siyuan Feng Toyota Research Insititute https, Russ Tedrake Toyota Research Insititute https, Shuran Song Stanford University Columbia University (2024)

Paper Information
arXiv ID
Venue
Robotics: Science and Systems
Domain
robotics
SOTA Claim
Yes
Reproducibility
8/10

Abstract

umi-gripper.github.iofor Any Actions (action diversity) Human Demonstration in Any Environment (visual diversity) for Many Robots (embodiment diversity) Dynamic Bimanual Precise Long-Horizon 7DoF 6DoF Fig. 1: Universal Manipulation Interface (UMI) is a portable, intuitive, low-cost data collection and policy learning framework.This framework allows us to transfer diverse human demonstrations to effective visuomotor policies.We showcase the framework for tasks that would be difficult with traditional teleoperation, such as dynamic, precise, bimanual and long-horizon tasks.

Summary

The paper presents the Universal Manipulation Interface (UMI), a framework designed to enable in-the-wild robot teaching without the need for physical robots during data collection. UMI allows users to transfer diverse human demonstrations into effective visuomotor policies. To achieve this, the authors identify critical issues such as insufficient visual context, action imprecision, latency discrepancies, and insufficient policy representation that hinder action transfer in previous works. The UMI system incorporates a handheld gripper with a Fisheye lens for enhanced visual context, utilizes the IMU data for action precision, and employs a Diffusion Policy for modeling multimodal action distributions. The paper showcases UMI’s capability to perform a variety of complex manipulation tasks, exhibiting high transferability and generalization to novel environments and objects. Experimental results indicate a 70% success rate in out-of-distribution tests, demonstrating the effectiveness of the UMI framework for robotic manipulation and its potential for widespread data collection across diverse settings.

Methods

This paper employs the following methods:

  • Diffusion Policy
  • SLAM

Models Used

  • None specified

Datasets

The following datasets were used in this research:

  • None specified

Evaluation Metrics

  • None specified

Results

  • 70% success rate in out-of-distribution tests
  • 20/20 completion of cup arrangement task
  • 87.5% success rate in dynamic tossing task
  • 14/20 success rate in bimanual cloth folding task
  • 70% success rate in dish washing task

Limitations

The authors identified the following limitations:

  • Dependence on data filtering for kinematic feasibility
  • Limited efficiency compared to human demonstration
  • Sensitivities of SLAM to texture in environment

Technical Requirements

  • Number of GPUs: None specified
  • GPU Type: None specified

Keywords

Manipulation Human demonstration Visual-inertial SLAM Diffusion Policy In-the-wild robot data

Papers Using Similar Methods

External Resources