WeClone's speech cloning function is implemented based on an acoustic model with 0.5B parameters, with specific requirements and effects:
- hardware requirement: CUDA-enabled GPU required, 6GB or more of video memory recommended
- input requirement: Minimum 5 seconds of clear WeChat voice messages (it is recommended to select samples with a typical tone of voice and little background noise)
- Realization effects: The spectral similarity between the generated voice and the original sample can reach 95%, preserving the intonation ebb and flow and emotional characteristics of the original voice.
- Usage Process: Place the voice files in the WeClone-audio folder → Install the xcodec dependency → Run the voice cloning script
Technical Description: This feature uses the latest vector quantization technology to better capture timbre details compared to traditional TTS. Actual tests show that the cloning effect of a 10-second sample is close to the level of professional commercial programs.
This answer comes from the articleWeClone: training digital doppelgangers with WeChat chats and voicesThe





























