Jiaxiang Tang National Key Lab of General AI Peking University Work done while visiting S-Lab Nanyang Technological University, Zhaoxi Chen S-Lab Nanyang Technological University, Xiaokang Chen National Key Lab of General AI Peking University, Tengfei Wang Shanghai AI Lab, Gang Zeng National Key Lab of General AI Peking University, Ziwei Liu S-Lab Nanyang Technological University (2024)
The paper, titled "LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation," presents a novel framework for generating high-resolution 3D models from text prompts or single-view images using a Large Multi-View Gaussian Model (LGM). The authors argue that while existing methods rely on intensive computation, their approach leverages multi-view diffusion models to achieve efficient and high-fidelity 3D content creation in approximately 5 seconds. Key innovations include the use of multi-view Gaussian features and an asymmetric U-Net backbone for processing multi-view image data. The paper includes extensive experimental results that demonstrate the superiority of their approach in terms of detail, resolution, and generation speed for 3D assets across various applications such as digital games and virtual reality. Furthermore, it highlights the importance of data augmentation and introduces a method for converting generated 3D Gaussians into smooth polygonal meshes. The paper concludes by discussing the limitations of their method, particularly the dependency on the quality of input multi-view images and the potential for future refinements in multi-view diffusion modeling.
This paper employs the following methods:
The following datasets were used in this research:
The authors identified the following limitations: