Compared to the previous generation of Wan2.1, Wan2.2 has made significant progress in terms of training data. Specifically, its training dataset has been increased by 65.61 TP3T of image data and 83.21 TP3T of video data, and these additions significantly improve the model's performance in multiple dimensions: motion generation is more natural and smooth, semantic comprehension ability is more detailed and accurate, and the aesthetic effect reaches a movie-level level. The accumulation of such large-scale data enables Wan2.2 to handle more complex scene descriptions and generate more professional visual effects, which is one of the key reasons why it can outperform some of the commercial models in the Wan-Bench 2.0 benchmark test.
This answer comes from the articleWan2.2: Open Source Video Generation Model with Efficient Text and Image to Video SupportThe































