logo
Leaderboards

Image generation

Last updated: February 10, 2025

Our image generation leaderboard evaluates AI models on their ability to generate high-quality images from textual descriptions. We assess various factors based on the project's specific criteria.

Rank
Model
TrueSkill rating
Win rate
Overall preference
Prompt alignment
Visual appeal

1

DALL·E 3
1031.65
61.61
147
179
219

2

Imagen 3
1059.11
59.18
131
182
238

3

Stable Diffusion 3
1005.2
47.63
98
159
204

4

Flux 1.1 Pro
1010.89
49.27
110
165
223

5

Ideogram 2.0
981.77
45.98
105
182
193

6

Recraft v3
959.97
35.91
82
157
212

What is “Elo rating”?

This is a dynamic rating system used in competitive games to rank players. In this context, it's applied to models. Higher Elo ratings indicate better performance based on head-to-head rankings. The Elo system adjusts ratings based on how well models perform against each other, and the K-factor (32) determines how much the rating changes after each match.

Human preference evaluation

Diverse pool of US-based Alignerrs, including generalists and creative artists

Consensus of three Alignerrs per task

Standardized instructions and ontology for consistent evaluations

Carefully curated prompt generation process, balancing creativity and clarity

Overall preference

Prompt alignment

Visual appeal

DALL·E 3Imagen 3Flux 1.1 ProIdeogram 2.0Stable Diffusio...Recraft v3

Description:

Assess your overall satisfaction with the generated image given the input prompt.

Options:

High

Medium

Low

Examples

PROMPT

Cinematic: A resolute soldier unlocking a mysterious wooden door in a dimly lit room. Dramatic chiaroscuro lighting highlights tension. Close-up perspective, emphasizing the soldier's anxious but righteous expression.

Imagen 3

DALL·E 3

Flux 1.1 Pro

Stable Diffusion 3

Ideogram 2.0

Recraft v3