Overseas access: www.kdjingpai.com
Bookmark Us
Current Position:fig. beginning " AI Answers

LatentSync is an open-source tool for audio-driven lip-synchronization using Stable Diffusion technology

2025-08-27 2.4 K

LatentSync is a professional-grade AI tool developed by ByteDance based on Stable Diffusion's potential diffusion model. The tool innovatively combines Whisper audio feature extraction technology and U-Net network architecture to realize direct conversion from audio to video frames. Its technical implementation consists of three core aspects:

  • The phoneme features in the audio are first extracted by Whisper modeling
  • The audio features are then mapped to the latent space of the video frame using a modified U-Net network
  • Finally, a sampler with Stable Diffusion is used to generate video sequences with temporal continuity

This technological route breaks away from the traditional 3D modeling-based lip-synchronization method and achieves a more natural look. In version 1.5, the model also introduces TREPA timing optimization technology, which significantly improves the temporal consistency of the generated video.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Top

en_USEnglish