The following points need to be noted to achieve the best song conversion results:
preliminary
- Selection of clean reference audio without background noise (singer samples)
- Ensure that the song recording is of good quality (16bit/44kHz or higher recommended)
parameterization
- start using
f0-conditionoption to preserve the original pitch signature - Diffusion Steps set to 30-50 for finer sound quality.
- utilization
seed-uvit-whisper-baseModel (200M parameters) processing vocals
Advanced Techniques
- Enable for poorly pitched recordings
auto-f0-adjustautomatic calibration - pass (a bill or inspection etc)
semi-tone-shiftFine pitch adjustment to match different singers' ranges - Chorus processing can be synthesized in separate parts after conversion.
Note that the system will download 44kHz by defaultseed-uvit-whisper-basemodel, which is currently the optimal choice for song conversion.
This answer comes from the articleSeed-VC: supports real-time conversion of speech and song with fewer samplesThe































