GenEval

Name: GenEval
Published: 2023-10-17
License: MIT license

Dataset Information

Modalities

Images, Texts

Introduced

2023

License

MIT license

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Recent breakthroughs in diffusion models, multimodal pretraining, and efficient finetuning have led to an explosion of text-to-image generative models. Given human evaluation is expensive and difficult to scale, automated methods are critical for evaluating the increasingly large number of new models. However, most current automated evaluation metrics like FID or CLIPScore only offer a holistic measure of image quality or image-text alignment, and are unsuited for fine-grained or instance-level analysis. In this paper, we introduce GenEval, an object-focused framework to evaluate compositional image properties such as object co-occurrence, position, count, and color. We show that current object detection models can be leveraged to evaluate text-to-image models on a variety of generation tasks with strong human agreement, and that other discriminative vision models can be linked to this pipeline to further verify properties like object color. We then evaluate several open-source text-to-image models and analyze their relative generative capabilities on our benchmark. We find that recent models demonstrate significant improvement on these tasks, though they are still lacking in complex capabilities such as spatial relations and attribute binding. Finally, we demonstrate how GenEval might be used to help discover existing failure modes, in order to inform development of the next generation of text-to-image models. Our code to run the GenEval framework is publicly available at this https URL.

Variants: GenEval

Associated Benchmarks

This dataset is used in 1 benchmark:

Text-to-Image Generation - Metrics: Overall, Single Obj., Two Obj., Color Attri., Colors, Counting, Position

Recent Benchmark Submissions

Task	Model	Paper	Date
Text-to-Image Generation	UniWorld-V1 (Rewrite)	UniWorld-V1: High-Resolution Semantic Encoders for …	2025-06-03
Text-to-Image Generation	UniWorld-V1	UniWorld-V1: High-Resolution Semantic Encoders for …	2025-06-03
Text-to-Image Generation	MindOmni	MindOmni: Unleashing Reasoning Generation in …	2025-05-19
Text-to-Image Generation	SD3.5-Medium+Flow-GRPO	Flow-GRPO: Training Flow Matching Models …	2025-05-08
Text-to-Image Generation	MetaQuery-XL (Rewrite)	Transfer between Modalities with MetaQueries	2025-04-08
Text-to-Image Generation	Lumina-Image 2.0	Lumina-Image 2.0: A Unified and …	2025-03-27
Text-to-Image Generation	DiffMoE-E16-T2I-Flow (w SFT)	DiffMoE: Dynamic Token Selection for …	2025-03-18
Text-to-Image Generation	SANA-1.5 4.8B (+ Inference Scaling)	SANA 1.5: Efficient Scaling of …	2025-01-30
Text-to-Image Generation	SANA-1.5 4.8B	SANA 1.5: Efficient Scaling of …	2025-01-30
Text-to-Image Generation	Janus-Pro-1B	Janus-Pro: Unified Multimodal Understanding and …	2025-01-29
Text-to-Image Generation	Janus-Pro-7B	Janus-Pro: Unified Multimodal Understanding and …	2025-01-29
Text-to-Image Generation	Show-o [xie2024show] Ft. ORM It. DPO Ft. ORM	Can We Generate Images with …	2025-01-23
Text-to-Image Generation	Show-o [xie2024show] PARM It. DPO PARM	Can We Generate Images with …	2025-01-23
Text-to-Image Generation	SnapGen	SnapGen: Taming High-Resolution Text-to-Image Models …	2024-12-12
Text-to-Image Generation	JanusFlow	JanusFlow: Harmonizing Autoregression and Rectified …	2024-11-12
Text-to-Image Generation	Fluid (10.5B)	Fluid: Scaling Autoregressive Text-to-image Generative …	2024-10-17
Text-to-Image Generation	Emu3	Emu3: Next-Token Prediction is All …	2024-09-27
Text-to-Image Generation	Und. and Gen. Show-o (Ours)	Show-o: One Single Transformer to …	2024-08-22
Text-to-Image Generation	PixArt-Σ	PixArt-Σ: Weak-to-Strong Training of Diffusion …	2024-03-07
Text-to-Image Generation	PIXART-δ	PIXART-δ: Fast and Controllable Image …	2024-01-10

Research Papers

Recent papers with results on this dataset:

External Links:

GenEval

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview