← ML Research Wiki / 2402.05054

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Jiaxiang Tang National Key Lab of General AI Peking University Work done while visiting S-Lab Nanyang Technological University, Zhaoxi Chen S-Lab Nanyang Technological University, Xiaokang Chen National Key Lab of General AI Peking University, Tengfei Wang Shanghai AI Lab, Gang Zeng National Key Lab of General AI Peking University, Ziwei Liu S-Lab Nanyang Technological University (2024)

Paper Information
arXiv ID
Venue
European Conference on Computer Vision
Domain
Not specified

Abstract

Tang et al.produced from text or single-view image input by leveraging multi-view diffusion models.Extensive experiments demonstrate the high fidelity and efficiency of our approach.Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.

Summary

The paper, titled "LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation," presents a novel framework for generating high-resolution 3D models from text prompts or single-view images using a Large Multi-View Gaussian Model (LGM). The authors argue that while existing methods rely on intensive computation, their approach leverages multi-view diffusion models to achieve efficient and high-fidelity 3D content creation in approximately 5 seconds. Key innovations include the use of multi-view Gaussian features and an asymmetric U-Net backbone for processing multi-view image data. The paper includes extensive experimental results that demonstrate the superiority of their approach in terms of detail, resolution, and generation speed for 3D assets across various applications such as digital games and virtual reality. Furthermore, it highlights the importance of data augmentation and introduces a method for converting generated 3D Gaussians into smooth polygonal meshes. The paper concludes by discussing the limitations of their method, particularly the dependency on the quality of input multi-view images and the potential for future refinements in multi-view diffusion modeling.

Methods

This paper employs the following methods:

  • Gaussian Splatting
  • U-Net

Models Used

  • LGM

Datasets

The following datasets were used in this research:

  • Objaverse

Evaluation Metrics

  • Mean Square Error
  • LPIPS

Results

  • High-resolution 3D content generation in 5 seconds
  • High fidelity and efficiency of generated models
  • Quality and resolution surpassing existing methods

Limitations

The authors identified the following limitations:

  • Not specified

Technical Requirements

  • Number of GPUs: 32
  • GPU Type: NVIDIA A100 80G

Papers Using Similar Methods

External Resources