Three key steps are required to achieve smooth language/accent switching:
- Configuring Language Parameters::
When calling Text2Speech pass in thelang
parameters (e.g.lang="en"
), and in conjunction withspk_embed_dim
Setting Pronunciator Characteristics - Preprocessed text::
Use the langid tool to detect the text language to ensure that it matches the model parameters. Sample code:import langid
lang = langid.classify(text)[0]
text2speech(text, lang=lang) - Post-processing optimization::
pass (a bill or inspection etc)config.yaml
alignduration_predictor
cap (a poem)pitch_predictor
Parameter, Chinese setting is recommendedpitch_scale: 1.2
English Settingsenergy_scale: 0.9
Experiments show that the method can achieve a MOS score of 4.2/5.0 in English-French bilingual switching scenarios.
This answer comes from the articleOpusLM_7B_Anneal: an efficient unified model for speech recognition and synthesisThe