A Transitional Scheme for Cross-Language Speech Synthesis
Although Muyan-TTS currently supports mainly English, acceptable Chinese output can be achieved by the following methods:
- Indirect program generation:
- Converting Chinese Text to English Using Machine Translation
- Generating English Speech with Muyan-TTS
- Converting tones via Voice Conversion (e.g. so-vits-svc)
- Model fine-tuning scheme:
- Collection of parallel Chinese-English corpus (bilingual recordings of the same content)
- Cross-language adaptation training based on existing models
- Focus on tuning SoVITS decoder's Chinese phoneme processing capability
- Hybrid Systems Program:
- Processing English passages with Muyan-TTS
- Interface with other Chinese TTS systems (e.g. VITS) to process the Chinese portions
- Adjusting the tone parameters in post
It should be noted that these programs may compromise on rhyme naturalness. For professional-grade Chinese content, it is recommended to wait for official support or participate in a community-based multilingual training program. Currently fine-tuning can be done after generation to improve the listening experience through tools such as Adobe Audition.
This answer comes from the articleMuyan-TTS: Personalized Podcast Speech Training and SynthesisThe































