Key techniques for improving the similarity of speech clones
The following measures can be taken to achieve high quality speech cloning:
- Principles of sample selection:
- Use single-person audio with clear pronunciation (avoid multi-person conversations)
- Optimal duration is 5-10 seconds (including full pronunciation units)
- Prioritize samples with a neutral tone (avoid exaggerated emotions)
- Parameter optimization scheme:
- raise appropriately
--t_wParameter values (recommendations 3.0-4.0) - At the same time, it reduces
--p_wValue (range 0.5-1.2)
- raise appropriately
- Technical Support:
- Enhance sound quality with built-in WaveVAE vocoder
- Be sure to use the official pre-extracted latents file.
If the result is not satisfactory, you can try to select the best result after multiple generation, or split the long text into short sentences to synthesize them separately.
This answer comes from the articleMegaTTS3: A Lightweight Model for Synthesizing Chinese and English SpeechThe































