The Mixed Expert (MoE) architecture used in Wan2.2 is indeed a key innovation. This architecture allows different expert models to focus on their respective areas of expertise by separating the denoising process in the high-noise and low-noise stages. The high-noise experts specialize in rough denoising processes in the noisier early stages, while the low-noise experts focus on fine-grained image quality optimization in the later stages. This division of labor allows Wan2.2 to maintain computational efficiency while significantly improving generation quality. Compared to a single model, the MoE architecture allows Wan2.2 to increase training data utilization by more than 60%, supporting more complex motion and higher quality aesthetic effect rendering.
This answer comes from the articleWan2.2: Open Source Video Generation Model with Efficient Text and Image to Video SupportThe































