Technical realization of multilingual capabilities
Qwen3 Override119 languages and dialects, breakthrough performance in:
- Full language coverage: Includes mainstream language families such as Indo-European (67), Sino-Tibetan (3), South Island (12), and even low-resource languages such as Luxembourgish and Assamese.
- dialectal subdivision: Arabic supports 7 dialectal variants of Najdi/Egyptian/Moroccan etc.
- hybrid coding: Effectively handles mixed input of Chinese/Japanese/Korean CJK characters and Latin letters.
Three innovations in training data strategies:
- Multiplication of data volumes: Pre-training token reaches 36 trillion (2x Qwen 2.5), with non-English data share boosted to 45%
- Multimodal Cleaning: Use Qwen2.5-VL to extract text from PDFs and other documents, and add it to the training after quality filtering.
- Synthetic Data Enhancement: Generate structured data such as code solutions, mathematical derivations, etc. with Qwen2.5-Math/Coder
The three-phase pre-training, with the S2 phase dedicated to increasing the proportion of knowledge-intensive data, and the S3 phase reinforcing contextual understanding in low-resource languages through long text fine-tuning, enabled Qwen3 to reach the GPT-3.5 level on the small-language task.
This answer comes from the articleQwen3 Released: A New Generation of Big Language Models for Thinking Deeply and Responding FastThe
































