Background and pain point analysis
Traditional video face-swapping technology often fails to separate facial appearance from movement information, resulting in stiff expressions and uncoordinated movements. canonSwap solves this core problem with an innovative technology framework.
Core technology program
- Normative spatial transformations: first convert the video frames into the canonical space of standard poses, separating the appearance information (features of the five senses) from the motion information (expression/posture)
- Dynamic Attribute Decoupling: Accurately record head trajectory and micro-expression data from the original video using Motion Extractor.
- Local Identity Modulation (PIM): Modify only facial region features in canonical space, protect non-facial regions by spatial masking techniques
Operational realization points
The developer needs to 1) run ID Encoder to extract the source face identity features 2) analyze the target video dynamics with Motion Extractor 3) complete the PIM modulation in the canonical space 4) recover the original motion by inverse transformation. This process ensures that the new face perfectly inherits the 607 micro-expressions and head motion parameters of the original video.
This answer comes from the articleCanonSwap: A tool for realizing high-fidelity face-swapping in videoThe































