Overseas access: www.kdjingpai.com
Bookmark Us
Current Position:fig. beginning " AI Answers

What is the Chinese DeepSeek-R1 distillation dataset?

2025-09-05 1.7 K

Chinese DeepSeek-R1 Distillation Dataset Introduction

The Chinese DeepSeek-R1 distillation dataset is an open source Chinese dataset designed specifically for machine learning and natural language processing research. Released by Cong Liu's NLP team, the core features of this dataset include the following:

  • Data size: 110,000 high-quality data included
  • data type: covers math data, logical reasoning data, and general type data (e.g., content from Little Red Book, Knowledge, etc.)
  • quality assurance (QA): Strictly following the official DeepSeek-R1 standard for data distillation
  • open source property: Completely free and available on Hugging Face and ModelScope platforms!

The main application scenarios of this dataset include: language model training, dialog system development, text comprehension research, and so on. It is especially worth mentioning that it not only provides raw data, but also contains detailed data distribution information, so that the user can clearly understand the proportion of each type of data.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Top