Image generation
Last updated: February 10, 2025Our image generation leaderboard evaluates AI models on their ability to generate high-quality images from textual descriptions. We assess various factors based on the project's specific criteria.
Rank | Model | TrueSkill rating | Win rate | Overall preference | Prompt alignment | Visual appeal |
---|---|---|---|---|---|---|
1 | DALL·E 3 | 1031.65 | 61.61 | 147 | 179 | 219 |
2 | Imagen 3 | 1059.11 | 59.18 | 131 | 182 | 238 |
3 | Stable Diffusion 3 | 1005.2 | 47.63 | 98 | 159 | 204 |
4 | Flux 1.1 Pro | 1010.89 | 49.27 | 110 | 165 | 223 |
5 | Ideogram 2.0 | 981.77 | 45.98 | 105 | 182 | 193 |
6 | Recraft v3 | 959.97 | 35.91 | 82 | 157 | 212 |
What is “Elo rating”?
This is a dynamic rating system used in competitive games to rank players. In this context, it's applied to models. Higher Elo ratings indicate better performance based on head-to-head rankings. The Elo system adjusts ratings based on how well models perform against each other, and the K-factor (32) determines how much the rating changes after each match.
Human preference evaluation
Diverse pool of US-based Alignerrs, including generalists and creative artists
Consensus of three Alignerrs per task
Standardized instructions and ontology for consistent evaluations
Carefully curated prompt generation process, balancing creativity and clarity
Overall preference
Prompt alignment
Visual appeal
Description:
Assess your overall satisfaction with the generated image given the input prompt.
Options:
High
Medium
Low
Examples
Cinematic: A resolute soldier unlocking a mysterious wooden door in a dimly lit room. Dramatic chiaroscuro lighting highlights tension. Close-up perspective, emphasizing the soldier's anxious but righteous expression.
Imagen 3
DALL·E 3
Flux 1.1 Pro
Stable Diffusion 3
Ideogram 2.0
Recraft v3