Diffuman4D's Technology Positioning and Innovation Value
Diffuman4D, developed by the ZJU3DV team of Zhejiang University, is a cutting-edge technological solution for processing sparse video to generate 4D human body views. Its main innovation lies in combining two core technologies, spatio-temporal diffusion model and 4D Gaussian Splash (4DGS): the spatio-temporal diffusion model is responsible for maintaining the temporal and spatial consistency of the multiview video, while the 4DGS technology realizes high-precision reconstruction at 1024p resolution. Compared to traditional methods based on monocular or multi-view geometry, this technology improves the reconstruction quality by an order of magnitude, especially in the detailed processing of dynamic clothing folds and complex movements, which has significant advantages.
The project has passed the ICCV 2025 academic audit, and its open-source feature makes the technology more verifiable and scalable. In benchmark tests, for videos with only 2-3 input viewpoints, the system can generate high-fidelity video streams with 16 viewpoints, and the rendering latency is controlled at the millisecond level, which fully meets the needs of VR real-time interaction.
This answer comes from the articleDiffuman4D: Generating High-Fidelity 4D Human Body Views from Sparse VideoThe































