Multilingual Speech Naturalness Enhancement Program
Cross-language TTS faces challenges such as unnatural pronunciation and hard intonation, and Orate offers the following solutions in combination with advanced technologies such as ElevenLabs:
- Dedicated multilingual model: e.g. 'multilingual_v2' model optimized for cross-language scenarios, supports 28 languages
- Pronunciator Presets: Built-in professional speaker configurations such as 'Aria' to ensure accurate language characterization
- Emotional parameterization: Emotional parameters such as speed of speech, pitch, etc. can be adjusted through the API
Implementation Steps:
- Importing elevenlabs adapters
- Select the multilingual_v2 model and the appropriate pronouncer.
- Set prompt words for different languages (e.g. [ZH] Chinese text [EN] English text).
- Option to add prosody parameter to adjust intonation change
Experience has shown that the method generates multilingual speech MOS scores up to 4.2 (on a 5-point scale), which is close to the level of real people.
This answer comes from the articleOrate: A Unified API for Integrating Well-Known Speech Generation, Speech Transcription and Voice Change ModelsThe































