August 7th.MiniMax
The company unveiled its next-generation speech generation model Speech 2.5
.. According to official sources, the model was in the predecessor Speech 02
In addition to the previous version, the multilingual expressiveness, tone reproduction accuracy, and the number of supported languages have been improved.
In the field of Artificial Intelligence Generated Content (AIGC), Text-to-Speech (TTS) technology is a key component in realizing more natural human-computer interaction. Evaluating the merits of a speech model usually takes into account several dimensions, including the accuracy of pronunciation (e.g., low word error rate), the similarity between the generated speech and the target timbre, and the natural rhythmicity of the speech (e.g., whether the pauses and accents are in line with human habits).MiniMax
of this update is centered around these core metrics.
Core Upgrade: Multilingualism, Tone and Coverage
ground MiniMax
official release of information.Speech 2.5
The main breakthroughs are reflected in the following three areas:
- Increased multilingual expressivenessThe new model has been further optimized for Mandarin Chinese, while improving its performance in mainstream languages such as English. Officially, the new model surpasses its predecessor in terms of similarity and natural rhythm, aiming to solve the problem of "mechanical sense" that commonly exists in multi-language scenarios.
- Improved accuracy of sound reproduction: Tone reproduction, the ability to clone a specific character's voice, is the current
TTS
One of the focal points of competition in the field.Speech 2.5
The ability to capture vocal details has been improved, especially in complex scenarios such as cross-language reproduction and preservation of specific accents (e.g., regional accents under the same language), aiming at higher fidelity of reproduction. For example, the model can mimic a specific style of speech accent and preserve the vocal qualities of the original speaker when switching languages. - Expanded language coverage: The new model adds support for niche languages such as Bulgarian, Danish, and Hebrew, bringing the total number of languages to 40. This expansion has practical implications for enterprises that need to globalize their content deployments.
Market Applications and Industry Impact
High-quality, multi-language speech synthesis technology, its application scenarios are expanding from traditional audiobooks, navigation voice, to a broader field.
For business users, especially companies with an overseas presence, theSpeech 2.5
Such a model can dramatically reduce the cost of multilingual content production. Commercials, product videos, and customer service voiceovers that used to require the hiring of native-speaking voiceover artists from different countries can now be quickly generated using the model, significantly shortening the production cycle and reducing costs.
For content creators, personalized tone reproduction means they can publish multilingual content in their own voice, breaking down language barriers and thus reaching a wider global audience. This has huge potential for application in areas such as short videos, podcasts and live avatars.
In the education sector, the technology can also be used to quickly generate teaching courseware in niche languages or to create customized teaching materials with specific regional dialects, allowing for more localized knowledge dissemination.
Competitive landscape
Speech synthesis is not an emerging track and the market is highly competitive.MiniMax Speech
The main competitors include ElevenLabs
, the latter known for its powerful vocal cloning and emotional expression. Meanwhile.OpenAI
(used form a nominal expression) Voice Engine
cap (a poem) Microsoft
(used form a nominal expression) VALL-E
Models such as these also demonstrate strong technical capabilities, although some are not yet available to the public on a large scale.
MiniMax
In the press release, it was mentioned that its Speech
The model has been Vapi
,Pipecat
isometrics Agent
platforms as well as adopted by domestic companies such as Highway Education and Himalaya. By continuously iterating the model and expanding language support, theMiniMax
There is a clear desire to compete in the highly competitive global marketplace with price/performance and in-depth support for specific markets.
Currently.Speech 2.5
approved MiniMax
The Open Platform and its official website are available to users.