ML Research Wiki / Benchmarks / Image Generation / WISE

WISE

Image Generation Benchmark

Performance Over Time

📊 Showing 13 results | 📏 Metric: Overall

Top Performing Models

Rank	Model	Paper	Overall	Date	Code
1	Janus	Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling	0.26	2025-01-29	📦 deepseek-ai/janus
2	Janus-pro	Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling	0.37	2025-01-29	📦 deepseek-ai/janus
3	MindOmni (w/o cot)	MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO	0.38	2025-05-19	📦 easonxiao-888/mindomni
4	Show-o	Show-o: One Single Transformer to Unify Multimodal Understanding and Generation	0.40	2024-08-22	📦 showlab/show-o
5	Emu3-gen	Emu3: Next-Token Prediction is All You Need	0.45	2024-09-27	📦 baaivision/emu3 📦 flagopen/flagscale
6	stable-diffusion-xl-base-0.9	SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis	0.48	2023-07-04	📦 stability-ai/generative-models 📦 compvis/fm-boosting 📦 yuchen413/text2image_safety
7	PixArt-XL-2-1024-MS	PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis	0.50	2023-09-30	📦 PixArt-alpha/PixArt-alpha 📦 Karine-Huang/T2I-CompBench 📦 swookey-thinky/image_diffusion
8	stable-diffusion-3.5-large	Scaling Rectified Flow Transformers for High-Resolution Image Synthesis	0.50	2024-03-05	📦 Karine-Huang/T2I-CompBench 📦 hxixixh/adaflow
9	UniWorld-V1	UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation	0.55	2025-06-03	📦 PKU-YuanGroup/UniWorld-V1 📦 pku-yuangroup/imgedit
10	Bagel	Emerging Properties in Unified Multimodal Pretraining	0.55	2025-05-20	📦 ByteDance-Seed/Bagel 📦 neverbiasu/ComfyUI-BAGEL

All Papers (13)

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

2025

Janus

deepseek-ai/janus

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

2025

Janus-pro

deepseek-ai/janus

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

2025

MindOmni (w/o cot)

easonxiao-888/mindomni

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

2024

Show-o

showlab/show-o

Emu3: Next-Token Prediction is All You Need

2024

Emu3-gen

baaivision/emu3 flagopen/flagscale

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

2023

stable-diffusion-xl-base-0.9

stability-ai/generative-models compvis/fm-boosting

PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

2023

PixArt-XL-2-1024-MS

PixArt-alpha/PixArt-alpha Karine-Huang/T2I-CompBench swookey-thinky/image_diffusion

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

2024

stable-diffusion-3.5-large

Karine-Huang/T2I-CompBench hxixixh/adaflow

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

2025

UniWorld-V1

PKU-YuanGroup/UniWorld-V1 pku-yuangroup/imgedit

Emerging Properties in Unified Multimodal Pretraining

2025

Bagel

ByteDance-Seed/Bagel neverbiasu/ComfyUI-BAGEL

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation

2024

Playground-v2.5-1024px-aesthetic

Emerging Properties in Unified Multimodal Pretraining

2025

Bagel (w/ cot)

ByteDance-Seed/Bagel neverbiasu/ComfyUI-BAGEL

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

2025

MindOmni (w/ cot)

easonxiao-888/mindomni

WISE

Performance Over Time

Edit Benchmark Results

Edit Result

Top Performing Models

All Papers (13)

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Emu3: Next-Token Prediction is All You Need

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Emerging Properties in Unified Multimodal Pretraining

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation

Emerging Properties in Unified Multimodal Pretraining

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Model	Paper	Overall	Date
Janus	Janus-Pro: Unified Multimodal Understanding and G…	0.26	2025-01-29
Janus-pro	Janus-Pro: Unified Multimodal Understanding and G…	0.37	2025-01-29
MindOmni (w/o cot)	MindOmni: Unleashing Reasoning Generation in Visi…	0.38	2025-05-19
Show-o	Show-o: One Single Transformer to Unify Multimoda…	0.40	2024-08-22
Emu3-gen	Emu3: Next-Token Prediction is All You Need	0.45	2024-09-27
stable-diffusion-xl-base-0.9	SDXL: Improving Latent Diffusion Models for High-…	0.48	2023-07-04
PixArt-XL-2-1024-MS	PixArt-$α$: Fast Training of Diffusion Transforme…	0.50	2023-09-30
stable-diffusion-3.5-large	Scaling Rectified Flow Transformers for High-Reso…	0.50	2024-03-05
UniWorld-V1	UniWorld-V1: High-Resolution Semantic Encoders fo…	0.55	2025-06-03
Bagel	Emerging Properties in Unified Multimodal Pretrai…	0.55	2025-05-20
Playground-v2.5-1024px-aesthetic	Playground v2.5: Three Insights towards Enhancing…	0.58	2024-02-27
Bagel (w/ cot)	Emerging Properties in Unified Multimodal Pretrai…	0.69	2025-05-20
MindOmni (w/ cot)	MindOmni: Unleashing Reasoning Generation in Visi…	0.70	2025-05-19