Video generation

Last updated: March 11, 2025

Our video generation leaderboard evaluates AI models on their ability to generate high-quality videos from textual descriptions. We assess factors such as visual quality, adherence to the given text, and creativity.

Rank	Model	Elo rating	TrueSkill rating
1	Runway Gen 3	1152.5	1089.64

2	Luma Ray 2	1099.75	1115.1
3	Tencent	1035.55	1014.48
4	Pika 1.5	860.54	881.36
5	Luma Dreamachine	851.66	878.37

What is “Elo rating”?

This is a dynamic rating system used in competitive games to rank players. In this context, it's applied to models. Higher Elo ratings indicate better performance based on head-to-head rankings. The Elo system adjusts ratings based on how well models perform against each other, and the K-factor (32) determines how much the rating changes after each match.

Human preference evaluation

Diverse pool of US-based Alignerrs, including generalists and creative artists

Consensus of three Alignerrs per task

Standardized instructions and ontology for consistent evaluations

Carefully curated prompt generation process, balancing creativity and clarity

Consistency with prompt

Artifacts and errors

Realistic

Description:

Evaluate how well the input prompt is represented in the output video content.

Options:

High

Medium

Low

Want us to evaluate your model?

If you’d like to evaluate your model as part of the next leaderboard update, contact us at leaderboard@labelbox.com.

Understand the difference

Explore data factory for

Data factory capabilities

Explore solutions for

Post-training tasks

Use cases

Learn

Connect

Featured reads

Video generation

Human preference evaluation

Want us to evaluate your model?