Image generation

Last updated: February 10, 2025

Our image generation leaderboard evaluates AI models on their ability to generate high-quality images from textual descriptions. We assess various factors based on the project's specific criteria.

Rank	Model	TrueSkill rating	Win rate	Overall preference	Prompt alignment	Visual appeal
1	DALL·E 3	1031.65	61.61	147	179	219

2	Imagen 3	1059.11	59.18	131	182	238
3	Stable Diffusion 3	1005.2	47.63	98	159	204
4	Flux 1.1 Pro	1010.89	49.27	110	165	223
5	Ideogram 2.0	981.77	45.98	105	182	193
6	Recraft v3	959.97	35.91	82	157	212

What is “Elo rating”?

This is a dynamic rating system used in competitive games to rank players. In this context, it's applied to models. Higher Elo ratings indicate better performance based on head-to-head rankings. The Elo system adjusts ratings based on how well models perform against each other, and the K-factor (32) determines how much the rating changes after each match.

Human preference evaluation

Diverse pool of US-based Alignerrs, including generalists and creative artists

Consensus of three Alignerrs per task

Standardized instructions and ontology for consistent evaluations

Carefully curated prompt generation process, balancing creativity and clarity

Overall preference

Prompt alignment

Visual appeal

Description:

Assess your overall satisfaction with the generated image given the input prompt.

Options:

High

Medium

Low

Examples

PROMPT

Cinematic: A resolute soldier unlocking a mysterious wooden door in a dimly lit room. Dramatic chiaroscuro lighting highlights tension. Close-up perspective, emphasizing the soldier's anxious but righteous expression.

Imagen 3

DALL·E 3

Flux 1.1 Pro

Stable Diffusion 3

Ideogram 2.0

Recraft v3

Want us to evaluate your model?

If you’d like to evaluate your model as part of the next leaderboard update, contact us at leaderboard@labelbox.com.

Understand the difference

Explore data factory for

Data factory capabilities

Explore solutions for

Post-training tasks

Use cases

Learn

Connect

Featured reads

Image generation

Human preference evaluation

Examples

Want us to evaluate your model?