Speech generation
Last updated: December 12, 2024Our speech generation leaderboard evaluates AI models on their ability to generate high-quality speech from textual descriptions. We assess factors such as speech quality, word error rate and naturalness.
Rank | Model | Elo rating | Win % | |||
---|---|---|---|---|---|---|
OpenAI | 2165 | 58 | ||||
2 | Eleven Labs | 2128 | 52 | |||
3 | Google | 2127 | 67 | |||
4 | AWS | 2103 | 51 | |||
5 | Deepgram | 1478 | 22 | |||
What is `Elo rating`? This is a dynamic rating system used in competitive games to rank players. In this context, it's applied to models. Higher Elo ratings indicate better performance based on head-to-head rankings. The Elo system adjusts ratings based on how well models perform against each other, and the K-factor (32) determines how much the rating changes after each match. |
Human preference evaluation
Diverse pool of US-based Alignerrs, including generalists and creative artists
Consensus of three Alignerrs per task
Standardized instructions and ontology for consistent evaluations
Carefully curated prompt generation process, balancing creativity and clarity
Context Awareness
Pronunciation Accuracy
Prosody Accuracy
Description:
Assesses a text-to-speech model’s ability to understand contextual information throughout the text and adapt its output based on linguistic and situation context. Examples includes tone adjustment, emphasis & rhythm changes, and punctuation interpretation.
Options:
High
Medium
Low