← ML Research Wiki / 2402.05054

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Jiaxiang Tang National Key Lab of General AI Peking University Work done while visiting S-Lab Nanyang Technological University, Zhaoxi Chen S-Lab Nanyang Technological University, Xiaokang Chen National Key Lab of General AI Peking University, Tengfei Wang Shanghai AI Lab, Gang Zeng National Key Lab of General AI Peking University, Ziwei Liu S-Lab Nanyang Technological University (2024)

Paper Information

arXiv ID

2402.05054

Venue

European Conference on Computer Vision

Domain

Not specified

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

Tang et al.produced from text or single-view image input by leveraging multi-view diffusion models.Extensive experiments demonstrate the high fidelity and efficiency of our approach.Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.

Summary

The paper, titled "LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation," presents a novel framework for generating high-resolution 3D models from text prompts or single-view images using a Large Multi-View Gaussian Model (LGM). The authors argue that while existing methods rely on intensive computation, their approach leverages multi-view diffusion models to achieve efficient and high-fidelity 3D content creation in approximately 5 seconds. Key innovations include the use of multi-view Gaussian features and an asymmetric U-Net backbone for processing multi-view image data. The paper includes extensive experimental results that demonstrate the superiority of their approach in terms of detail, resolution, and generation speed for 3D assets across various applications such as digital games and virtual reality. Furthermore, it highlights the importance of data augmentation and introduces a method for converting generated 3D Gaussians into smooth polygonal meshes. The paper concludes by discussing the limitations of their method, particularly the dependency on the quality of input multi-view images and the potential for future refinements in multi-view diffusion modeling.

Methods

This paper employs the following methods:

Gaussian Splatting
U-Net

Models Used

LGM

Datasets

The following datasets were used in this research:

Objaverse

Evaluation Metrics

Mean Square Error
LPIPS

Results

High-resolution 3D content generation in 5 seconds
High fidelity and efficiency of generated models
Quality and resolution surpassing existing methods

Limitations

The authors identified the following limitations:

Not specified

Technical Requirements

Number of GPUs: 32
GPU Type: NVIDIA A100 80G

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 63
Influential Citations: 75

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Related Papers