← ML Research Wiki / 2305.16213

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

Zhengyi Wang Dept. of Comp. Sci. & Tech Tsinghua-Bosch Joint ML Center BNRist Center Tsinghua University ShengShu BeijingChina, Cheng Lu Dept. of Comp. Sci. & Tech Tsinghua-Bosch Joint ML Center BNRist Center Tsinghua University, Yikai Wang [email protected], Fan Bao Dept. of Comp. Sci. & Tech Tsinghua-Bosch Joint ML Center BNRist Center Tsinghua University ShengShu BeijingChina, Chongxuan Li [email protected] Dept. of Comp. Sci. & Tech Tsinghua-Bosch Joint ML Center BNRist Center Tsinghua University Gaoling School of Artificial Intelligence Key Laboratory of Big Data, Hang Su [email protected] Dept. of Comp. Sci. & Tech Tsinghua-Bosch Joint ML Center BNRist Center Tsinghua University Pazhou Laboratory (Huangpu) GuangzhouChina, Jun Zhu Dept. of Comp. Sci. & Tech Tsinghua-Bosch Joint ML Center BNRist Center Tsinghua University ShengShu BeijingChina Pazhou Laboratory (Huangpu) GuangzhouChina (2023)

Paper Information

arXiv ID

2305.16213

Venue

Neural Information Processing Systems

Domain

computer vision, machine learning, 3D modeling

SOTA Claim

Yes

Code

Available

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

Score distillation sampling (SDS) has shown great promise in text-to-3D generation by distilling pretrained large-scale text-to-image diffusion models, but suffers from over-saturation, over-smoothing, and low-diversity problems.In this work, we propose to model the 3D parameter as a random variable instead of a constant as in SDS and present variational score distillation (VSD), a principled particlebased variational framework to explain and address the aforementioned issues in text-to-3D generation.We show that SDS is a special case of VSD and leads to poor samples with both small and large CFG weights.In comparison, VSD works well with various CFG weights as ancestral sampling from diffusion models and simultaneously improves the diversity and sample quality with a common CFG weight (i.e., 7.5).We further present various improvements in the design space for text-to-3D such as distillation time schedule and density initialization, which are orthogonal to the distillation algorithm yet not well explored.Our overall approach, dubbed ProlificDreamer, can generate high rendering resolution (i.e., 512 × 512) and high-fidelity NeRF with rich structure and complex effects (e.g., smoke and drops).Further, initialized from NeRF, meshes fine-tuned by VSD are meticulously detailed and photo-realistic.Project page and codes: https://ml.cs.tsinghua.edu.cn/prolificdreamer/.

Summary

The paper introduces ProlificDreamer, a framework for high-fidelity text-to-3D generation, addressing the limitations of existing methods like Score Distillation Sampling (SDS) that suffer from over-saturation and low-diversity. The authors propose Variational Score Distillation (VSD), which models 3D parameters as random variables to enhance the quality and diversity of generated 3D scenes. ProlificDreamer efficiently generates high-resolution Neural Radiance Fields (NeRF) and detailed textured meshes while allowing flexible configuration of guidance weights. The paper also presents improvements in design aspects like rendering resolution and distillation schedules, demonstrating superior results compared to existing methods through empirical evaluations and theoretical comparisons.

Methods

This paper employs the following methods:

Variational Score Distillation (VSD)
Score Distillation Sampling (SDS)

Models Used

Neural Radiance Fields (NeRF)
Stable Diffusion

Datasets

The following datasets were used in this research:

None specified

Evaluation Metrics

3D-FID

Results

High-fidelity and photo-realistic textured meshes.
Diverse and semantically correct 3D scenes from text prompts.
Improved sample quality with VSD compared to SDS.

Limitations

The authors identified the following limitations:

Not specified

Technical Requirements

Number of GPUs: 1
GPU Type: NVIDIA A100

Keywords

Text-to-3D Diffusion Models NeRF VSD Score Distillation High-Fidelity 3D Generation

Papers Using Similar Methods

External Resources

Funding: NSF of China Projects, Beijing Outstanding Young Scientist Program, and others
References: 57
Influential Citations: 130

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers