Current Position:fig. beginning " AI Answers

What are the specific breakthroughs in multilingual support in Qwen3? What are the features of its training data strategy?

2025-08-24

AI Answers

1.7 K

Link directMobile View

Technical realization of multilingual capabilities

Qwen3 Override119 languages and dialects, breakthrough performance in:

Full language coverage: Includes mainstream language families such as Indo-European (67), Sino-Tibetan (3), South Island (12), and even low-resource languages such as Luxembourgish and Assamese.
dialectal subdivision: Arabic supports 7 dialectal variants of Najdi/Egyptian/Moroccan etc.
hybrid coding: Effectively handles mixed input of Chinese/Japanese/Korean CJK characters and Latin letters.

Three innovations in training data strategies:

Multiplication of data volumes: Pre-training token reaches 36 trillion (2x Qwen 2.5), with non-English data share boosted to 45%
Multimodal Cleaning: Use Qwen2.5-VL to extract text from PDFs and other documents, and add it to the training after quality filtering.
Synthetic Data Enhancement: Generate structured data such as code solutions, mathematical derivations, etc. with Qwen2.5-Math/Coder

The three-phase pre-training, with the S2 phase dedicated to increasing the proportion of knowledge-intensive data, and the S3 phase reinforcing contextual understanding in low-resource languages through long text fine-tuning, enabled Qwen3 to reach the GPT-3.5 level on the small-language task.

This answer comes from the articleQwen3 Released: A New Generation of Big Language Models for Thinking Deeply and Responding FastThe

May not be reproduced without permission:AI productivity tools " What are the specific breakthroughs in multilingual support in Qwen3? What are the features of its training data strategy?

What are the specific breakthroughs in multilingual support in Qwen3? What are the features of its training data strategy?

Technical realization of multilingual capabilities

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

What are the specific breakthroughs in multilingual support in Qwen3? What are the features of its training data strategy?

Technical realization of multilingual capabilities

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

Quick query station AI tool