Generating lip-synchronized videos using SVLS is divided into three key steps:
1. Environmental preparation
The following dependencies need to be installed:
- PyTorch GPU version (1.12.1+cu113) and related visual and audio libraries
- FFmpeg multimedia processing tools
- All Python packages listed in project requirements.txt
- If you want to use the DAIN frame interpolation feature, you also need to install the PaddlePaddle deep learning framework.
2. Preparation of documentation
Two core documents need to be prepared:
- driven_audio: Audio file generated by the driver lip (e.g. .wav format)
- source_video: the original video file containing the portrait (e.g. .mp4 format)
3. Implementation of the generation order
The generation process is initiated with the following typical commands:
python inference.py --driven_audio audio.wav --source_video video.mp4 --enhancer face --use_DAIN --time_step 0.5
where important parameters are described:
- -enhancer: Select enhancement mode (none no enhancement/lip only lip enhancement/face full face enhancement)
- -use_DAIN: Enable 50fps frame interpolation
- -time_step: Control of frame insertion density
The generated results are saved by default in . /results directory, users can view the comparison video of different enhancement effects in sync_show subdirectory.
This answer comes from the articleSVLS: SadTalker Enhanced to Generate Digital People Using Portrait VideoThe































