RAT-Diffusion
|
Data Extrapolation for Text-to-image Generation o…
|
5.00
|
2024-10-02
|
|
Re-Imagen (Finetuned)
|
Re-Imagen: Retrieval-Augmented Text-to-Image Gene…
|
5.25
|
2022-09-29
|
|
U-ViT-S/2-Deep
|
All are Worth Words: A ViT Backbone for Diffusion…
|
5.48
|
2022-09-25
|
|
GLIGEN (fine-tuned, Detection + Caption data)
|
GLIGEN: Open-Set Grounded Text-to-Image Generation
|
5.61
|
2023-01-17
|
|
GLIGEN (fine-tuned, Detection data only)
|
GLIGEN: Open-Set Grounded Text-to-Image Generation
|
5.82
|
2023-01-17
|
|
U-ViT-S/2
|
All are Worth Words: A ViT Backbone for Diffusion…
|
5.95
|
2022-09-25
|
|
ConPreDiff
|
Improving Diffusion-Based Image Synthesis with Co…
|
6.21
|
2024-01-04
|
|
TLDM
|
Truncated Diffusion Probabilistic Models and Diff…
|
6.29
|
2022-02-19
|
|
GLIGEN (fine-tuned, Grounding data)
|
GLIGEN: Open-Set Grounded Text-to-Image Generation
|
6.38
|
2023-01-17
|
|
RAPHAEL (zero-shot)
|
RAPHAEL: Text-to-Image Generation via Large Mixtu…
|
6.61
|
2023-05-29
|
|
ERNIE-ViLG 2.0 (zero-shot)
|
ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion…
|
6.75
|
2022-10-27
|
|
Re-Imagen
|
Re-Imagen: Retrieval-Augmented Text-to-Image Gene…
|
6.88
|
2022-09-29
|
|
eDiff-I (zero-shot)
|
eDiff-I: Text-to-Image Diffusion Models with an E…
|
6.95
|
2022-11-02
|
|
Swinv2-Imagen
|
Swinv2-Imagen: Hierarchical Vision Transformer Di…
|
7.21
|
2022-10-18
|
|
Imagen (zero-shot)
|
Photorealistic Text-to-Image Diffusion Models wit…
|
7.27
|
2022-05-23
|
|
GigaGAN (Zero-shot, 64x64)
|
Scaling up GANs for Text-to-Image Synthesis
|
7.28
|
2023-03-09
|
|
StyleGAN-T (Zero-shot, 64x64)
|
StyleGAN-T: Unlocking the Power of GANs for Fast …
|
7.30
|
2023-01-23
|
|
Make-a-Scene (unfiltered)
|
Make-A-Scene: Scene-Based Text-to-Image Generatio…
|
7.55
|
2022-03-24
|
|
Kandinsky
|
Kandinsky: an Improved Text-to-Image Synthesis wi…
|
8.03
|
2023-10-05
|
|
Lafite
|
LAFITE: Towards Language-Free Training for Text-t…
|
8.12
|
2021-11-27
|
|
SiD-LSG (Data-free distillation, zero-shot FID)
|
Long and Short Guidance in Score identity Distill…
|
8.15
|
2024-06-03
|
|
simple diffusion (U-ViT)
|
Simple diffusion: End-to-end diffusion for high r…
|
8.30
|
2023-01-26
|
|
GigaGAN (Zero-shot, 256x256)
|
Scaling up GANs for Text-to-Image Synthesis
|
9.09
|
2023-03-09
|
|
XMC-GAN (256 x 256)
|
NÜWA: Visual Synthesis Pre-training for Neural vi…
|
9.30
|
2021-11-24
|
|
XMC-GAN
|
Cross-Modal Contrastive Learning for Text-to-Imag…
|
9.33
|
2021-01-12
|
|
ChatPainter
|
ChatPainter: Improving Text to Image Generation u…
|
9.74
|
2018-02-22
|
|
StackGAN + VICTR
|
VICTR: Visual Information Captured Text Represent…
|
10.38
|
2020-10-07
|
|
DALL-E 2
|
Hierarchical Text-Conditional Image Generation wi…
|
10.39
|
2022-04-13
|
|
Corgi-Semi
|
Shifted Diffusion for Text-to-image Generation
|
10.60
|
2022-11-24
|
|
Corgi
|
Shifted Diffusion for Text-to-image Generation
|
10.88
|
2022-11-24
|
|
TR0N (StyleGAN-XL, LAION2BCLIP, BLIP-2, zero-shot)
|
TR0N: Translator Networks for 0-Shot Plug-and-Pla…
|
10.90
|
2023-04-26
|
|
Make-a-Scene (unfiltered)
|
Make-A-Scene: Scene-Based Text-to-Image Generatio…
|
11.84
|
2022-03-24
|
|
GLIDE (zero-shot)
|
GLIDE: Towards Photorealistic Image Generation an…
|
12.24
|
2021-12-20
|
|
KNN-Diffusion
|
KNN-Diffusion: Image Generation via Large-Scale R…
|
12.50
|
2022-04-06
|
|
GALIP (CC12m)
|
GALIP: Generative Adversarial CLIPs for Text-to-I…
|
12.54
|
2023-01-30
|
|
Latent Diffusion (LDM-KL-8-G)
|
High-Resolution Image Synthesis with Latent Diffu…
|
12.63
|
2021-12-20
|
|
Stable Diffusion
|
Retrieval-Augmented Multimodal Language Modeling
|
12.63
|
2022-11-22
|
|
NÜWA (256 x 256)
|
NÜWA: Visual Synthesis Pre-training for Neural vi…
|
12.90
|
2021-11-24
|
|
VQ-Diffusion-F
|
Vector Quantized Diffusion Model for Text-to-Imag…
|
13.86
|
2021-11-29
|
|
StyleGAN-T (Zero-shot, 256x256)
|
StyleGAN-T: Unlocking the Power of GANs for Fast …
|
13.90
|
2023-01-23
|
|
RAT-GAN
|
Recurrent Affine Transformation for Text-to-image…
|
14.60
|
2022-04-22
|
|
ERNIE-ViLG
|
ERNIE-ViLG: Unified Generative Pre-training for B…
|
14.70
|
2021-12-31
|
|
RA-CM3 (2.7B)
|
Retrieval-Augmented Multimodal Language Modeling
|
15.70
|
2022-11-22
|
|
CogView2(6B, Finetuned)
|
CogView2: Faster and Better Text-to-Image Generat…
|
17.70
|
2022-04-28
|
|
DF-GAN (256 x 256)
|
NÜWA: Visual Synthesis Pre-training for Neural vi…
|
18.70
|
2021-11-24
|
|
VQ-Diffusion-B
|
Vector Quantized Diffusion Model for Text-to-Imag…
|
19.75
|
2021-11-29
|
|
DM-GAN+CL
|
Improving Text-to-Image Synthesis Using Contrasti…
|
20.79
|
2021-07-06
|
|
FuseDream (few-shot, k=5)
|
FuseDream: Training-Free Text-to-Image Generation…
|
21.16
|
2021-12-02
|
|
FuseDream (k=5, 256)
|
FuseDream: Training-Free Text-to-Image Generation…
|
21.16
|
2021-12-02
|
|
FuseDream (k=10, 256)
|
FuseDream: Training-Free Text-to-Image Generation…
|
21.89
|
2021-12-02
|
|
AttnGAN+CL
|
Improving Text-to-Image Synthesis Using Contrasti…
|
23.93
|
2021-07-06
|
|
CogView2(6B, Finetuned)
|
CogView2: Faster and Better Text-to-Image Generat…
|
24.00
|
2022-04-28
|
|
OP-GAN
|
Semantic Object Accuracy for Generative Text-to-I…
|
24.70
|
2019-10-29
|
|
DM-GAN (256 x 256)
|
NÜWA: Visual Synthesis Pre-training for Neural vi…
|
26.00
|
2021-11-24
|
|
Lafite (zero-shot)
|
LAFITE: Towards Language-Free Training for Text-t…
|
26.94
|
2021-11-27
|
|
CogView
|
CogView: Mastering Text-to-Image Generation via T…
|
27.10
|
2021-05-26
|
|
CogView (256 x 256)
|
NÜWA: Visual Synthesis Pre-training for Neural vi…
|
27.10
|
2021-11-24
|
|
DALL-E (256 x 256)
|
NÜWA: Visual Synthesis Pre-training for Neural vi…
|
27.50
|
2021-11-24
|
|
DALL-E (12B)
|
Retrieval-Augmented Multimodal Language Modeling
|
28.00
|
2022-11-22
|
|
AttnGAN + VICTR
|
VICTR: Visual Information Captured Text Represent…
|
29.26
|
2020-10-07
|
|
Vanilla CM3
|
Retrieval-Augmented Multimodal Language Modeling
|
29.50
|
2022-11-22
|
|
DM-GAN + VICTR
|
VICTR: Visual Information Captured Text Represent…
|
32.37
|
2020-10-07
|
|
DM-GAN
|
DM-GAN: Dynamic Memory Generative Adversarial Net…
|
32.64
|
2019-04-02
|
|
AttnGAN + OP
|
Generating Multiple Objects at Spatially Distinct…
|
33.35
|
2019-01-03
|
|
AttnGAN (256 x 256)
|
NÜWA: Visual Synthesis Pre-training for Neural vi…
|
35.20
|
2021-11-24
|
|
L-Verse-CC
|
L-Verse: Bidirectional Generation Between Image a…
|
37.20
|
2021-11-22
|
|
L-Verse
|
L-Verse: Bidirectional Generation Between Image a…
|
45.80
|
2021-11-22
|
|
StackGAN + OP
|
Generating Multiple Objects at Spatially Distinct…
|
55.30
|
2019-01-03
|
|
StackGAN-v1
|
StackGAN++: Realistic Image Synthesis with Stacke…
|
74.05
|
2017-10-19
|
|