← ML Research Wiki / 2308.16512

MVDREAM: MULTI-VIEW DIFFUSION FOR 3D GENERATION

Yichun Shi [email protected] University of California San Diego, Peng Wang [email protected] University of California San Diego, Jianglong Ye [email protected] University of California San Diego, Long Mai [email protected] University of California San Diego, Kejie Li [email protected] University of California San Diego, Xiao Yang [email protected] University of California San Diego, USABytedance University of California San Diego (2023)

Paper Information
arXiv ID
Venue
International Conference on Learning Representations
Domain
Computer Vision / 3D Generative Models
SOTA Claim
Yes
Reproducibility
7/10

Abstract

We introduce MVDream, a diffusion model that is able to generate consistent multiview images from a given text prompt.Learning from both 2D and 3D data, a multiview diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings.We demonstrate that such a multi-view diffusion model is implicitly a generalizable 3D prior agnostic to 3D representations.It can be applied to 3D generation via Score Distillation Sampling, significantly enhancing the consistency and stability of existing 2D-lifting methods.It can also learn new concepts from a few 2D examples, akin to DreamBooth, but for 3D generation.Our project page is https://MV-Dream.github.io

Summary

This paper introduces MVDream, a multi-view diffusion model designed for generating consistent multi-view images from text prompts. The model leverages both 2D and 3D data, aiming to enhance the generalizability of 2D diffusion models while maintaining the stability offered by 3D renderings. The authors highlight challenges associated with existing 3D object generation methods, including template-based approaches and 2D-lifting techniques, which struggle with consistency across views. MVDream addresses these issues through techniques such as Score Distillation Sampling (SDS), which incorporates 3D awareness to improve both the consistency and quality of generated 3D assets. The paper outlines the model's architecture modifications, data training methodologies, and conducts extensive experiments to validate its effectiveness compared to state-of-the-art methods. Results indicate that MVDream significantly improves multi-view consistency and quality in 3D generation tasks, including personalized 3D generation through DreamBooth adaptations.

Methods

This paper employs the following methods:

  • Score Distillation Sampling (SDS)
  • 3D Self-attention
  • Inflated 2D Self-attention

Models Used

  • Stable Diffusion
  • DreamBooth

Datasets

The following datasets were used in this research:

  • Objaverse
  • LAION

Evaluation Metrics

  • Frechet Inception Distance (FID)
  • Inception Score (IS)
  • CLIP Score

Results

  • MVDream surpasses or matches the diversity of existing state-of-the-art methods.
  • The model can generate high-quality 3D assets consistently.
  • User studies show 78% preference for MVDream over other models.

Limitations

The authors identified the following limitations:

  • The model currently only generates images at 256×256 resolution, lower than the original stable diffusion model's 512×512 resolution.
  • The generalizability is limited by the base model.
  • Generated styles of the model are influenced by the quality of the rendered dataset.

Technical Requirements

  • Number of GPUs: 32
  • GPU Type: Nvidia Tesla A100

Keywords

multi-view diffusion 3D generation diffusion models NeRF Score Distillation Sampling DreamBooth

Papers Using Similar Methods

External Resources