The TREPA (Temporal Regularization for Parallel Attention) technology introduced by LatentSync in version 1.5 solves the common problem of screen flickering in AI-generated videos. This innovative technology works on three key points:
- Adding a temporal regularization term to U-Net's attention mechanism to constrain feature changes in neighboring frames
- Penalizing unnatural time jumps by specially designed loss functions
- Establishing inter-frame correlation in potential space rather than optimizing each frame individually
Compared to traditional frame-by-frame processing methods, TREPA ensures the coherence of the video sequence while maintaining the quality of a single frame. Tests show that this technique improves the subjective fluency score of the generated video by 371 TP3T without adding additional computational overhead.
This answer comes from the articleLatentSync: an open source tool for generating lip-synchronized video directly from audioThe