Cheng Chi Stanford University Columbia University, Zhenjia Xu Stanford University Columbia University, Chuer Pan Stanford University, Eric Cousineau Toyota Research Insititute https, Benjamin Burchfiel Toyota Research Insititute https, Siyuan Feng Toyota Research Insititute https, Russ Tedrake Toyota Research Insititute https, Shuran Song Stanford University Columbia University (2024)
The paper presents the Universal Manipulation Interface (UMI), a framework designed to enable in-the-wild robot teaching without the need for physical robots during data collection. UMI allows users to transfer diverse human demonstrations into effective visuomotor policies. To achieve this, the authors identify critical issues such as insufficient visual context, action imprecision, latency discrepancies, and insufficient policy representation that hinder action transfer in previous works. The UMI system incorporates a handheld gripper with a Fisheye lens for enhanced visual context, utilizes the IMU data for action precision, and employs a Diffusion Policy for modeling multimodal action distributions. The paper showcases UMI’s capability to perform a variety of complex manipulation tasks, exhibiting high transferability and generalization to novel environments and objects. Experimental results indicate a 70% success rate in out-of-distribution tests, demonstrating the effectiveness of the UMI framework for robotic manipulation and its potential for widespread data collection across diverse settings.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: