Dataset Feature Functionality Details
The Chinese DeepSeek-R1 distillation dataset has a number of features that make it stand out from similar datasets:
1. Diversity of data types
- Mathematical data: Contains math problems that require step-by-step reasoning
- logical inference: Logical problems requiring deductive induction
- Common data: Various texts from Little Red Book, Zhihu, etc.
2. Specialized data-processing functions
- Mathematical data processing: support automatic addition of reasoning prompts "Please reason step by step and put the final answer in boxed {}".
- Logical Data Optimization: Provide special processing pipelines to ensure logical consistency
3. Well-established training support
The dataset can be directly used in the training process of mainstream NLP frameworks (e.g., PyTorch, TensorFlow), and the sample code already contains training configurations for common models such as BERT.
4. Detailed statistics
Provides complete information about the distribution of data classes, allowing users to precisely control the class balance of training data.
This answer comes from the articleChinese based full-blooded DeepSeek-R1 distillation dataset, supports Chinese R1 distillation SFT datasetThe































