Overseas access: www.kdjingpai.com
Bookmark Us
Current Position:fig. beginning " AI Answers

How to use CosyVoice for zero-sample speech generation?

2025-08-23 732
Link directMobile View
qrcode

Zero-sample speech generation is one of the important features of CosyVoice, and the procedure is as follows:

  1. Preparing audio samples: A 16kHz prompt audio file (e.g. zero_shot_prompt.wav) is required.
  2. Calling the generator function: Use the reference_zero_shot method and pass the appropriate parameters:
    from cosyvoice import CosyVoice2
    import torchaudio
    cosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B')
    prompt_speech_16k = torchaudio.load('./asset/zero_shot_prompt.wav')[0]
    cosyvoice.inference_zero_shot('目标文本','提示文本',prompt_speech_16k)
  3. Saving the output::
    torchaudio.save('output.wav', j['tts_speech'], cosyvoice.sample_rate)

Caveats:
- If you want to fully reproduce the effect of the official website, you need to set the text_frontend=False parameter.
- The CosyVoice 2-0.5B model is recommended for best results!
- This method generates speech based on short samples of the target timbre without pre-training.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Top

en_USEnglish