There are three main steps to using the Voice Clone feature:
- Sound Preparation: Record about 1 minute of clear audio in a quiet environment (professional microphone recommended), which should contain natural dialog with different pitches and rhythms
- Upload trainingClick "Clone Your Voice" in the voice selection menu, upload WAV/MP3 files, and the system will extract voiceprint features through deep neural network (processing time is about 15-30 minutes).
- Synthetic Applications: Once training is complete, the cloned sound appears in the user's private sound library, which can be selected for use in any video project
Important Notes:
- Commercial use ensures full copyright of the recorded content
- The quality of the recording directly affects the cloning effect, and it is recommended that the sampling rate should not be lower than 44.1kHz.
- The system supports cloning of mainstream languages such as Chinese and English, but dialects or special pronunciations may affect accuracy
- Users can delete voice models at any time in their account settings
This answer comes from the articleVisionStory: generating AI explainer videos from images and textThe





























