Current Position:fig. beginning " AI Answers

Qwen3's training data size and quality build cognitive advantage

2025-08-24

1.6 K

Scale effects of data engineering innovations

Qwen3 has 36 trillion tokens of pre-training data, twice as much as its predecessor Qwen2.5, covering high-quality content such as STEM, programming, and academic papers. The technical report reveals that its data construction consists of three key phases: basic training with 4K contexts (30 trillion tokens), knowledge-intensive data optimization (5 trillion tokens), and 32K-128K long context extended training. The data sources include PDF document parsing (accuracy 92.3%) and synthetic data generated by the Qwen2.5 series of models, in addition to generic web pages.

Quality improvement measures include:

Optimizing Multimodal Text Extraction Using the Qwen2.5-VL Model
Generating Millions of Examples of Mathematical Reasoning with Qwen2.5-Math
Enhancing Code Data Diversity Based on Qwen2.5-Coder
Implementation of a five-tier content security filtering mechanism

Benchmark tests show that the Qwen3-32B base model outperforms the Qwen2.5-72B version on professional reviews such as MATH and HumanEval, validating the decisive impact of data quality on model capability. This data advantage allows even small-scale models (e.g., 4B parameters) to handle tasks that traditionally require 70B parameter-level models.

This answer comes from the articleQwen3 Released: A New Generation of Big Language Models for Thinking Deeply and Responding FastThe

May not be reproduced without permission:AI productivity tools " Qwen3's training data size and quality build cognitive advantage

Qwen3's training data size and quality build cognitive advantage

Scale effects of data engineering innovations

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

Qwen3's training data size and quality build cognitive advantage

Scale effects of data engineering innovations

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

Quick query station AI tool