← ML Research Wiki / 2404.07191

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

Jiale Xu ARC Lab Tencent PCG ShanghaiTech University https://github.com/TencentARC InstantMesh Input Image Generated Mesh Input Image, Weihao Cheng ARC Lab Tencent PCG, Yiming Gao ARC Lab Tencent PCG, Xintao Wang ARC Lab Tencent PCG, Shenghua Gao ShanghaiTech University https://github.com/TencentARC InstantMesh Input Image Generated Mesh Input Image, Ying Shan ARC Lab Tencent PCG (2024)

Paper Information
arXiv ID
Venue
arXiv.org
Domain
computer vision / 3D reconstruction
SOTA Claim
Yes
Code
Reproducibility
8/10

Abstract

ric supervisions, e.g., depths and normals, we integrate a differentiable iso-surface extraction module into our framework and directly optimize on the mesh representation.Experimental results on public datasets demonstrate that In-stantMesh significantly outperforms other latest image-to-3D baselines, both qualitatively and quantitatively.We release all the code, weights, and demo of InstantMesh, with the intention that it can make substantial contributions to the community of 3D generative AI and empower both researchers and content creators.

Summary

InstantMesh is a framework for efficiently generating 3D meshes from single images, leveraging advancements in large-scale reconstruction models and multi-view diffusion techniques. The model combines a multi-view diffusion component to produce consistent views from an input image and a sparse-view reconstruction model for direct mesh generation, achieving high quality in a significantly reduced timeframe. The paper highlights the integration of a differentiable iso-surface extraction module to optimize the mesh and implement geometric supervision through depth and normals, improving training efficiency and output quality. InstantMesh is evaluated against existing methods on public datasets, showing notable improvements in both qualitative and quantitative metrics, and aims to bolster the 3D generative AI community.

Methods

This paper employs the following methods:

  • Multi-view diffusion
  • Sparse-view reconstruction
  • Differentiable iso-surface extraction

Models Used

  • InstantMesh

Datasets

The following datasets were used in this research:

  • Google Scanned Objects
  • OmniObject3D

Evaluation Metrics

  • PSNR
  • SSIM
  • LPIPS
  • Chamfer Distance
  • F-Score

Results

  • Significantly outperforms other image-to-3D baselines both qualitatively and quantitatively
  • Achieves high-quality mesh generation within 10 seconds

Limitations

The authors identified the following limitations:

  • Resolution bottleneck due to the triplane decoder
  • Multi-view inconsistency from the diffusion model affects generation quality
  • FlexiCubes less effective for thin structures

Technical Requirements

  • Number of GPUs: 8
  • GPU Type: NVIDIA H800

Keywords

single-image 3D reconstruction mesh generation diffusion models transformer models differentiable iso-surface extraction

Papers Using Similar Methods

External Resources