GenEval

Dataset Information
Modalities
Images, Texts
Introduced
2023
License
MIT license
Homepage

Overview

Recent breakthroughs in diffusion models, multimodal pretraining, and efficient finetuning have led to an explosion of text-to-image generative models. Given human evaluation is expensive and difficult to scale, automated methods are critical for evaluating the increasingly large number of new models. However, most current automated evaluation metrics like FID or CLIPScore only offer a holistic measure of image quality or image-text alignment, and are unsuited for fine-grained or instance-level analysis. In this paper, we introduce GenEval, an object-focused framework to evaluate compositional image properties such as object co-occurrence, position, count, and color. We show that current object detection models can be leveraged to evaluate text-to-image models on a variety of generation tasks with strong human agreement, and that other discriminative vision models can be linked to this pipeline to further verify properties like object color. We then evaluate several open-source text-to-image models and analyze their relative generative capabilities on our benchmark. We find that recent models demonstrate significant improvement on these tasks, though they are still lacking in complex capabilities such as spatial relations and attribute binding. Finally, we demonstrate how GenEval might be used to help discover existing failure modes, in order to inform development of the next generation of text-to-image models. Our code to run the GenEval framework is publicly available at this https URL.

Variants: GenEval

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Text-to-Image Generation UniWorld-V1 (Rewrite) UniWorld-V1: High-Resolution Semantic Encoders for … 2025-06-03
Text-to-Image Generation UniWorld-V1 UniWorld-V1: High-Resolution Semantic Encoders for … 2025-06-03
Text-to-Image Generation MindOmni MindOmni: Unleashing Reasoning Generation in … 2025-05-19
Text-to-Image Generation SD3.5-Medium+Flow-GRPO Flow-GRPO: Training Flow Matching Models … 2025-05-08
Text-to-Image Generation MetaQuery-XL (Rewrite) Transfer between Modalities with MetaQueries 2025-04-08
Text-to-Image Generation Lumina-Image 2.0 Lumina-Image 2.0: A Unified and … 2025-03-27
Text-to-Image Generation DiffMoE-E16-T2I-Flow (w SFT) DiffMoE: Dynamic Token Selection for … 2025-03-18
Text-to-Image Generation SANA-1.5 4.8B (+ Inference Scaling) SANA 1.5: Efficient Scaling of … 2025-01-30
Text-to-Image Generation SANA-1.5 4.8B SANA 1.5: Efficient Scaling of … 2025-01-30
Text-to-Image Generation Janus-Pro-1B Janus-Pro: Unified Multimodal Understanding and … 2025-01-29
Text-to-Image Generation Janus-Pro-7B Janus-Pro: Unified Multimodal Understanding and … 2025-01-29
Text-to-Image Generation Show-o [xie2024show] Ft. ORM It. DPO Ft. ORM Can We Generate Images with … 2025-01-23
Text-to-Image Generation Show-o [xie2024show] PARM It. DPO PARM Can We Generate Images with … 2025-01-23
Text-to-Image Generation SnapGen SnapGen: Taming High-Resolution Text-to-Image Models … 2024-12-12
Text-to-Image Generation JanusFlow JanusFlow: Harmonizing Autoregression and Rectified … 2024-11-12
Text-to-Image Generation Fluid (10.5B) Fluid: Scaling Autoregressive Text-to-image Generative … 2024-10-17
Text-to-Image Generation Emu3 Emu3: Next-Token Prediction is All … 2024-09-27
Text-to-Image Generation Und. and Gen. Show-o (Ours) Show-o: One Single Transformer to … 2024-08-22
Text-to-Image Generation PixArt-Σ PixArt-Σ: Weak-to-Strong Training of Diffusion … 2024-03-07
Text-to-Image Generation PIXART-δ PIXART-δ: Fast and Controllable Image … 2024-01-10

Research Papers

Recent papers with results on this dataset: