Video generation
Last updated: November 1, 2024Our video generation leaderboard evaluates AI models on their ability to generate high-quality videos from textual descriptions. We assess factors such as visual quality, adherence to the given text, and creativity.
Rank | Model | Elo rating | Win rate | Overall preference | Prompt alignment |
---|---|---|---|---|---|
1 | Luma dream machine 1.5 | 1219 | 70.5 | 49.7 | 79.7 |
2 | Pika 1.5 | 907 | 44.7 | 25.9 | 57.8 |
3 | Runway gen 3 alpha | 874 | 34.8 | 19.8 | 56.7 |
What is “Elo rating”?
This is a dynamic rating system used in competitive games to rank players. In this context, it's applied to models. Higher Elo ratings indicate better performance based on head-to-head rankings. The Elo system adjusts ratings based on how well models perform against each other, and the K-factor (32) determines how much the rating changes after each match.
Human preference evaluation
Diverse pool of US-based Alignerrs, including generalists and creative artists
Consensus of three Alignerrs per task
Standardized instructions and ontology for consistent evaluations
Carefully curated prompt generation process, balancing creativity and clarity
Overall preference
Prompt alignment
Realism
Description:
Assess your overall satisfaction with the generated video given the input prompt.
Options:
High
Medium
Low
Examples
In a noir style, a detective unearths a savory dish from a decaying diner, with shadowy angles and a moody greyscale palette emphasizing the mysterious atmosphere
Luma dream machine
Pika
Runway gen 3