Overseas access: www.kdjingpai.com
Bookmark Us
Current Position:fig. beginning " AI Answers

How to improve the fidelity of WeClone speech clones?

2025-08-25 1.5 K

Speech Cloning Optimization Solution

To achieve sound similarity above 95%, three dimensions need to be optimized:

  • sample qualityChoose 5-10 seconds of WeChat voice without background noise, and we recommend using the system's own recording function to dump it. Avoid including: 1) background music 2) multi-person conversations 3) current noises
  • parameterization: Higher in xcodec_config.jsonhop_lengthto 256 while setting theremove_silence=TrueEnhanced Feature Extraction
  • data enhancement: Variable speed non-modulated processing using the sox audio tool (command:sox input.wav output.wav tempo 0.9), generating multiple versions of training samples

Advanced tips include 1) Labeling text with rhyming symbols 2) Adding 10ms leading mute 3) Using NSF-HiFiGAN as a back-end vocoder. Tests can be compared to the mel spectral similarity (mel-CDTW) metrics

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Top

en_USEnglish