Guidelines for accessing and using the dataset
The process of using the Chinese DeepSeek-R1 distillation dataset can be divided into the following steps:
Acquisition Methods
- Access to Hugging Face or ModelScope platforms
- Search for "Chinese-DeepSeek-R1-Distill-data-110k"
- Select the appropriate format (e.g. JSON, CSV, etc.) to download the dataset
Loading and use
- environmental preparation: Python and datasets libraries need to be installed
- Basic loading::
from datasets import load_dataset dataset = load_dataset("Congliu/Chinese-DeepSeek-R1-Distill-data-110k") - Data Viewing: Basic information can be viewed via print(dataset) and print(dataset['train'][0])
Preprocessing and training
It is recommended to use Transformer related tool libraries (e.g. Hugging Face's transformers) for data preprocessing and model training. The dataset has been normalized, but further processing may still be performed depending on the specific task requirements.
This answer comes from the articleChinese based full-blooded DeepSeek-R1 distillation dataset, supports Chinese R1 distillation SFT datasetThe































