← ML Research Wiki / 2506.17198

Dex1B: Learning with 1B Demonstrations for Dexterous Manipulation

(2025)

Paper Information
arXiv ID

Abstract

tional conditions to enhance diversity.We validate the model on both established and newly introduced simulation benchmarks, where it significantly outperforms prior state-of-the-art methods.Furthermore, we demonstrate its effectiveness and robustness through real-world robot experiments.

Summary

This paper introduces Dex1B, a large-scale dataset containing one billion demonstrations for dexterous hand manipulation tasks, specifically focusing on grasping and articulation. The authors propose a novel data generation pipeline that integrates optimization techniques with generative models to efficiently produce diverse and high-quality manipulation demonstrations. By incorporating geometric constraints into the generative model, their approach addresses issues of feasibility and diversity, significantly enhancing the performance of state-of-the-art methods. The proposed baseline model, DexSimple, leverages this dataset to achieve superior results across various dexterous manipulation tasks. Extensive validation is carried out through established benchmarks and real-world robot experiments, demonstrating the effectiveness and robustness of both the Dex1B dataset and the DexSimple model.

Methods

This paper employs the following methods:

  • Generative Model
  • Optimization Techniques
  • Conditional Variational Autoencoder (CVAE)
  • Debiasing Mechanism

Models Used

  • DexSimple

Datasets

The following datasets were used in this research:

  • Dex1B

Evaluation Metrics

  • Success Rate
  • Q1-score
  • Penetration
  • H mean
  • H std

Results

  • Dex1B comprises 1 billion diverse demonstrations for dexterous manipulation tasks, significantly improving upon previous datasets in scale and diversity.
  • DexSimple achieves state-of-the-art performance across dexterous manipulation tasks, showing a 22-point improvement over previous methods on the DexGraspNet benchmark.

Limitations

The authors identified the following limitations:

  • The method operates in an open-loop manner, making it prone to sim-to-real gaps and control errors.
  • The simulation filtering process for successful data is time-consuming and could be optimized further.
  • The approach mainly focuses on single-object scenarios, potentially requiring a stronger vision backbone for multi-object settings.

Technical Requirements

  • Number of GPUs: 1
  • GPU Type: RTX-3090
  • Compute Requirements: 2000 grasps for 6000 steps on a single GPU

Papers Using Similar Methods

External Resources