使用Open R1项目训练模型需要遵循以下步骤:
- Environment Configuration:首先创建Python虚拟环境并激活
conda create -n openr1 python=3.11 conda activate openr1
- Installation of dependencies:安装vLLM和项目依赖
pip install vllm==0.6.6.post1 pip install -e ".[dev]"
- Account Login:登录Hugging Face和Weights and Biases账户
huggingface-cli login wandb login
- training model:使用提供的脚本进行训练
- GRPO训练:
python src/open_r1/grpo.py --dataset <dataset_path>
- SFT训练:
python src/open_r1/sft.py --dataset <dataset_path>
- GRPO训练:
值得注意的是,项目支持多阶段训练,可以从基础模型开始,逐步过渡到强化学习调优模型。
This answer comes from the articleOpen R1: Hugging Face Replicates the Training Process of DeepSeek-R1The