logo
Leaderboards

Image generation

Last updated: September 9, 2024

Our image generation leaderboard evaluates AI models on their ability to generate high-quality images from textual descriptions. We assess various factors based on the project's specific criteria.

DALL•E 3
Leads most metrics — Elo rating, TrueSkill rating, user preferences, prompt alignment, and visual appeal
Imagen 3
Performs exceptionally well in average rank, suggesting strong performance in direct comparisons
Ideogram 2
Despite being lower overall ratings, shows strong performance in prompt alignment
Stable Diffusion 3
Not leading in any particular category, show consistent performance across all metrics
Flux 1.5
Not leading in any particular category, show consistent performance across all metrics

Human preference evaluation

Diverse pool of US-based Alignerrs, including generalists and creative artists

Consensus of three Alignerrs per task

Standardized instructions and ontology for consistent evaluations

Carefully curated prompt generation process, balancing creativity and clarity

Overall preference

Prompt alignment

Visual appeal

Examples

PROMPT

High-resolution photograph: a small, intricately decorated spool of thread on a rustic wooden table. Warm, natural lighting. Close-up perspective, capturing meticulous details. Cozy, vintage atmosphere.

DALL•E

Google Imagen 3

StableDiffusion

Flux1.5

Ideogram2