Overseas access: www.kdjingpai.com
Bookmark Us

MiniMax 发布 Speech 2.5:语音合成技术在多语言与音色复刻上突破-1

August 7th.MiniMax The company unveiled its next-generation speech generation model Speech 2.5.. According to official sources, the model was in the predecessor Speech 02 In addition to the previous version, the multilingual expressiveness, tone reproduction accuracy, and the number of supported languages have been improved.

In the field of Artificial Intelligence Generated Content (AIGC), Text-to-Speech (TTS) technology is a key component in realizing more natural human-computer interaction. Evaluating the merits of a speech model usually takes into account several dimensions, including the accuracy of pronunciation (e.g., low word error rate), the similarity between the generated speech and the target timbre, and the natural rhythmicity of the speech (e.g., whether the pauses and accents are in line with human habits).MiniMax of this update is centered around these core metrics.

Core Upgrade: Multilingualism, Tone and Coverage

ground MiniMax official release of information.Speech 2.5 The main breakthroughs are reflected in the following three areas:

  1. Increased multilingual expressivenessThe new model has been further optimized for Mandarin Chinese, while improving its performance in mainstream languages such as English. Officially, the new model surpasses its predecessor in terms of similarity and natural rhythm, aiming to solve the problem of "mechanical sense" that commonly exists in multi-language scenarios.
  2. Improved accuracy of sound reproduction: Tone reproduction, the ability to clone a specific character's voice, is the current TTS One of the focal points of competition in the field.Speech 2.5 The ability to capture vocal details has been improved, especially in complex scenarios such as cross-language reproduction and preservation of specific accents (e.g., regional accents under the same language), aiming at higher fidelity of reproduction. For example, the model can mimic a specific style of speech accent and preserve the vocal qualities of the original speaker when switching languages.
  3. Expanded language coverage: The new model adds support for niche languages such as Bulgarian, Danish, and Hebrew, bringing the total number of languages to 40. This expansion has practical implications for enterprises that need to globalize their content deployments.

 

Market Applications and Industry Impact

High-quality, multi-language speech synthesis technology, its application scenarios are expanding from traditional audiobooks, navigation voice, to a broader field.

For business users, especially companies with an overseas presence, theSpeech 2.5 Such a model can dramatically reduce the cost of multilingual content production. Commercials, product videos, and customer service voiceovers that used to require the hiring of native-speaking voiceover artists from different countries can now be quickly generated using the model, significantly shortening the production cycle and reducing costs.

For content creators, personalized tone reproduction means they can publish multilingual content in their own voice, breaking down language barriers and thus reaching a wider global audience. This has huge potential for application in areas such as short videos, podcasts and live avatars.

In the education sector, the technology can also be used to quickly generate teaching courseware in niche languages or to create customized teaching materials with specific regional dialects, allowing for more localized knowledge dissemination.

Competitive landscape

Speech synthesis is not an emerging track and the market is highly competitive.MiniMax Speech The main competitors include ElevenLabs, the latter known for its powerful vocal cloning and emotional expression. Meanwhile.OpenAI (used form a nominal expression) Voice Engine cap (a poem) Microsoft (used form a nominal expression) VALL-E Models such as these also demonstrate strong technical capabilities, although some are not yet available to the public on a large scale.

MiniMax In the press release, it was mentioned that its Speech The model has been Vapi,Pipecat isometrics Agent platforms as well as adopted by domestic companies such as Highway Education and Himalaya. By continuously iterating the model and expanding language support, theMiniMax There is a clear desire to compete in the highly competitive global marketplace with price/performance and in-depth support for specific markets.

Currently.Speech 2.5 approved MiniMax The Open Platform and its official website are available to users.

MiniMax Releases Speech 2.5: Speech Synthesis Technology Breaks Through on Multilingualism and Tone Reproduction-2

0Bookmarked
0kudos

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Top

en_USEnglish