Version 1.5 technical upgrades
Version 1.5, released in March 2025, brings three core improvements:
1. Increased time consistency
- adoptionTREPA technology(Timing Relative Positional Attention) Reduction of Inter-Frame Jitter
- New temporal convolutional layer to strengthen the correlation between front and back frames
- Reduced screen jumps in demo video 42%
2. Chinese processing optimization
- Extending the Whisper model'sChinese phoneme recognitionabilities
- 200+ hours of Chinese video samples are added to the training data
- Chinese lip shape accuracy improved from 78% to 91%
3. Training efficiency gains
- U-Net architecture reorganized to reduce video memory footprint by 25% (20GB ready for training)
- Added stage2_efficient.yaml lightweight configuration
- be in favor ofgradient checkpointTechnology for more stable long video training
These improvements make LatentSync more suitable for non-professional developers while maintaining quality.
This answer comes from the articleLatentSync: an open source tool for generating lip-synchronized video directly from audioThe































