视频生成加速方案
针对Wan2.1等视频模型,可采取以下加速策略:
- Multi-GPU Parallelism: By
parallelism=4
cap (a poem)use_cfg_parallel=True
参数,4张A100可将358秒生成时间缩短至114秒 - 降低输出规格:减少帧数(num_frames)、分辨率(width/height)
- 启用缓存机制:重复生成时复用已加载模型
典型配置示例::
pipe = WanVideoPipeline.from_pretrained(config, parallelism=4, use_cfg_parallel=True)
Caveats:
- 需确保GPU型号一致且NCCL通信正常
- 显存总量需满足模型需求(约6GB/卡)
- 非对称GPU拓扑可能影响加速比
This answer comes from the articleDiffSynth-Engine: Open Source Engine for Low-Existing Deployments of FLUX, Wan 2.1The