Command Line Operation Guide
The following is a standard process based on LatentSync version 1.5:
1. Preparation of input documents
- Video Request: Contains a clear face (front view recommended) that can be pre-processed with ffmpeg:
ffmpeg -i input.mp4 -r 25 resized.mp4
- Audio Requirements: 16000Hz sampling rate WAV file, conversion command:
ffmpeg -i audio.mp3 -ar 16000 audio.wav
2. Enforcement of reasoning orders
python -m scripts.inference --unet_config_path "configs/unet/stage2_efficient.yaml" --inference_ckpt_path "checkpoints/latentsync_unet.pt" --inference_steps 25 --guidance_scale 2.0 --video_path "input.mp4" --audio_path "audio.wav" --video_out_path "output.mp4"
Description of key parameters
parameters | corresponds English -ity, -ism, -ization | recommended value |
---|---|---|
inference_steps | Controlling the quality of generation | 20-50 (higher values are finer) |
guidance_scale | Lip Matching Strength | 1.0-3.0 (too high may cause distortion) |
Check output.mp4 when finished, if the lips are not synchronized you can adjust the parameters to regenerate.
This answer comes from the articleLatentSync: an open source tool for generating lip-synchronized video directly from audioThe