← ML Research Wiki / 2401.07519

Zero-shot Identity-Preserving Generation in Seconds

Qixun Wang Peking University, Xu Bai Peking University, Haofan Wang [email protected] Peking University, Zekui Qin Peking University, Anthony Chen Peking University, Huaxia Li Peking University, Xu Tang Peking University, Yao Hu Peking University, Instantx Team Peking University, Xiaohongshu Inc Peking University (2024)

Paper Information

arXiv ID

2401.07519

Venue

arXiv.org

Domain

Computer Vision, Artificial Intelligence, Deep Learning

SOTA Claim

Yes

Code

Available

Reproducibility

8/10

Contents

Abstract
Methods
Datasets
Results
Limitations
Related Work
External Resources

Abstract

There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA.Yet, their real-world applicability is hindered by high storage demands, lengthy fine-tuning processes, and the need for multiple reference images.Conversely, existing ID embedding-based methods, while requiring only a single forward inference, face challenges: they either necessitate extensive fine-tuning across numerous model parameters, lack compatibility with community pre-trained models, or fail to maintain high face fidelity.Addressing these limitations, we introduce InstantID, a powerful diffusion model-based solution.Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image, while ensuring high fidelity.To achieve this, we design a novel IdentityNet by imposing strong semantic and weak spatial conditions, integrating facial and landmark images with textual prompts to steer the image generation.InstantID demonstrates exceptional performance and efficiency, proving highly beneficial in real-world applications where identity preservation is paramount.Moreover, our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL, serving as an adaptable plugin.Our codes and pre-trained checkpoints will be available at https://github.com/InstantID/InstantID.

Summary

The paper introduces InstantID, a novel diffusion model-based approach aimed at zero-shot identity-preserving image generation using a single facial image. Traditional personalized image synthesis methods face limitations due to high storage demands and lengthy fine-tuning, while existing ID embedding-based methods struggle with fidelity in image generation. InstantID overcomes these challenges by combining an innovative IdentityNet with a lightweight adaptive module, enabling high fidelity and efficiency in image personalization across various styles. The method integrates strong semantic conditions from face and landmark images alongside textual prompts, ensuring robust identity preservation. InstantID is compatible with popular pre-trained text-to-image diffusion models and exhibits superior performance compared to existing methods like LoRA and IP-Adapter, showcasing potential for diverse applications.

Methods

This paper employs the following methods:

Diffusion model
IdentityNet

Models Used

Stable Diffusion
SD1.5
SDXL

Datasets

The following datasets were used in this research:

LAION-Face

Evaluation Metrics

None specified

Results

High fidelity in image personalization
Efficient use with only one reference image
Competitive performance compared to training-based methods

Limitations

The authors identified the following limitations:

Potential for biased outputs due to the facial model used
Ethical concerns regarding the generation of culturally inappropriate imagery
Challenges in decoupling facial attributes for more flexible editing

Technical Requirements

Number of GPUs: 48
GPU Type: NVIDIA H800 80GB

Keywords

Zero-shot Identity-preserving Image synthesis Diffusion models Facial identity Personalized image generation

Papers Using Similar Methods

External Resources

Funding: Not specified
References: 28
Influential Citations: 41

Zero-shot Identity-Preserving Generation in Seconds

Abstract edit

Summary

Methods add

Models Used add

Datasets add

Evaluation Metrics add

Results add

Limitations add

Technical Requirements edit

Keywords add

Related Papers