Chinese TTS Special Challenge
The Chinese language has complex pronunciation rules such as polyphony and paedophony. Although the Chinese language support in the current version is still being improved, the accuracy can be improved by the following solutions:
prescription
- Text Preprocessing: Integration
pypinyinLibrary mandatory labeling of polyphonic characters (e.g. 'bank' → yin hang) - rhyme scheme: Insert SSML tags to control pauses in the input text (
<break time="200ms"/>) - Privatization training: Use of open source toolkits
chinese-tts-finetuneFine-tuning the ONNX model - Reprocessing correction: By
FFmpeg(used form a nominal expression)atempoFilter Adjustment Abnormal Speech Rate Clip
Interim Alternative Program
If you need production level Chinese TTS urgently, it is recommended that you 1) wait for the official v1.0 Chinese model 2) use it in combination.Bert-VITS2Front-end text analysis 3) Connect to AliCloud/Xunfei API for fallback
This answer comes from the articleKokoro-ONNX: Efficient Text-to-Speech Tool with Multi-Language and Multi-Voice SupportThe





























