Qwen3-FineTuning-Playground is an open source project that provides a complete codebase dedicated to fine-tuning the Qwen3 family of large language models. The basis of this project is to provide clear , professional and easy to extend the fine-tuning code examples , so that developers and researchers can easily practice a variety of mainstream fine-tuning techniques . The project code is clearly structured and modularizes different functions, such as supervised fine-tuning, reinforcement learning, knowledge distillation, and inference, all stored in separate directories. All training and inference scripts support configuration by passing parameters from the command line, which means that users can run different experiments without modifying the source code. In addition, the project provides several detailed end-to-end tutorials so that even beginners can follow the documentation step-by-step through the entire process from environment configuration, data preparation to model training and inference.
Function List
- Multiple supervised fine-tuning programs (SFT): Supports full parameter fine-tuning of the model, as well as efficient fine-tuning using techniques such as LoRA (low-rank adaptation) to adapt to different hardware resources and training requirements.
- Reinforcement Learning Alignment (RL): Integration of multiple human feedback-based reinforcement learning algorithms for improving the model's dialog quality and ability to follow instructions.
- PPO algorithm: implements a classical proximal policy optimization algorithm that guides model learning through a reward model (RM).
- ORPO algorithm: Provides an efficient preference alignment algorithm that simplifies the training process without requiring additional reward models.
- Post-training optimization techniques:
- Knowledge Distillation: Support for migrating knowledge from a larger, more capable teacher model (e.g., Qwen3-4B) to a smaller student model (e.g., Qwen3-1.7B) to obtain a lighter but still powerful model.
- Modular code structure: The project divides the code clearly into different directories according to the function, for example
Supervised_FineTuning
,RL_FineTuning
etc., making the code easy to understand and maintain. - Parameterized Run Scripts: All scripts support configuration via command line parameters, allowing users to flexibly adjust parameters such as model paths, datasets, output directories, etc. without modifying the code.
- End-to-End Sample Tutorial: In
example/
The catalog provides several complete hands-on tutorials covering the complete process from SFT to PPO, single-step alignment for ORPO, knowledge distillation, and domain-specific applications (e.g., fine-tuning of multi-round conversations in psychology).
Using Help
This codebase provides a complete process to help you quickly get started with fine-tuning the Qwen3 model. The following will take an SFT-LoRA fine-tuning as an example and describe the procedure in detail.
1. Preparation: cloning the project and configuring the environment
First, you need to clone the project code from GitHub to your local computer and go to the project directory.
git clone https://github.com/Muziqinshan/Qwen3-FineTuning-Playground.git
cd Qwen3-FineTuning-Playground
The next step is to configure the Python environment. To avoid conflicts with libraries already on your system, it is highly recommended to use theconda
Create a brand new standalone environment. Python version 3.10 is recommended for this project.
# 创建一个名为 qwen3_ft 的新环境
conda create -n qwen3_ft python=3.10
# 激活这个新创建的环境
conda activate qwen3_ft
After the environment is activated, you need to install all the dependent libraries required by the project. The project root directory contains therequirements.txt
The file already lists all the required libraries. Run the following command to install them:
pip install -r requirements.txt
2. Downloading the model and preparing the data
Before fine-tuning, you need to prepare the base model and the dataset for training.
Download model
Recommended for this programmodelscope
The library downloads a pre-trained Qwen3 model from the "Magic Match" community.modelscope
The library is executing the previous steppip install
command has been installed.
Run the following commands to download the two base models that will be used in this project example:
# 下载Qwen3-1.7B模型,它将主要用于SFT、ORPO和PPO等微调任务
modelscope download --model Qwen/Qwen3-1.7B --local_dir ./Qwen3/Qwen3-1.7B
# 下载Qwen3-4B模型,它主要用作知识蒸馏任务中的教师模型
modelscope download --model Qwen/Qwen3-4B --local_dir ./Qwen3/Qwen3-4B
After the command is executed, the model files are automatically downloaded and saved to the project root directory in the./Qwen3/Qwen3-1.7B
cap (a poem)./Qwen3/Qwen3-4B
Catalog.
Prepare data
The data used in this project has a specific JSON format. There is a specific JSON format for the data used in thedata/
directory, which provides a file nameddirty_chinese_dpo.json
example file, which you can refer to for formatting to prepare your own dataset.
3. Commencement of SFT-LoRA fine-tuning
Once everything is ready, it is time to start training. The following command will start a supervised fine-tuning (SFT) task and use the LoRA technique to improve efficiency.
python Supervised_FineTuning/train_sft_dirty.py \
--model_path ./Qwen3/Qwen3-1.7B \
--dataset_path data/dirty_chinese_dpo.json \
--sft_adapter_output_dir ./output/sft_adapter_demo
--model_path
: Specifies the path to the base model we just downloaded.--dataset_path
: Specifies the dataset file to be used for training.--sft_adapter_output_dir
: Specifies the directory where LoRA adapter weights are saved after training is complete.
The training process lasts for a while depending on the performance of your hardware. At the end of the training process, you will be./output/sft_adapter_demo
Find the generated model adapter file in the directory.
4. Reasoning with fine-tuned models
Once the model is trained, the most important step is to verify its effectiveness. You can run the following inference script to have an interactive chat with the model you just fine-tuned.
python inference/inference_dirty_sft.py \
--model_path ./Qwen3/Qwen3-1.7B \
--adapter_path ./output/sft_adapter_demo \
--mode interactive
--model_path
:: Again, the path of the base model.--adapter_path
: Points to the directory where the LoRA adapter we trained in the previous step is located.--mode interactive
: Indicates that interactive chat mode is activated, where you can talk to the model by typing a question directly from the command line.
Now you're ready to test the model's performance on specific tasks. If you want to try other fine-tuning methods, such as ORPO or knowledge distillation, you can refer to theexample/
The detailed tutorial documentation in the directory has similar steps.
application scenario
- Domain-specific Intelligent Customer Service
The generic Qwen3 model can be fine-tuned on an industry-specific knowledge base (e.g., financial, medical, legal) to become an intelligent customer service robot that can accurately answer specialized questions. The SFT tutorial provided in the project is an ideal starting point for realizing this scenario. - Personalized Content Creation Assistant
By fine-tuning the model using textual data in a particular style (e.g., the work of a particular author, a particular style of marketing copy), an assistant can be created that can mimic that style of writing for use in assisting with content creation, poetry writing, or ad copy generation. - Model Lightweighting and Private Deployment
For scenarios with limited arithmetic resources, the knowledge distillation feature can be used to migrate the capabilities of a large model (e.g., Qwen3-4B) to a small model (e.g., Qwen3-1.7B). This preserves most of the performance and significantly reduces the inference cost, facilitating private deployment on personal devices or edge computing devices. - Improving the security and consistency of model dialogs
Reinforcement learning methods (e.g., PPO or ORPO) can be used to align the model with human preference data so that its output responses are more in line with human values, reducing the generation of harmful or inappropriate content and improving the reliability and security of the conversation.
QA
- What are the main Qwen3 models supported by this program?
The project is mainly based on the Qwen3 series of models for development and testing, and the sample code provides direct access to the Qwen3-1.7B and Qwen3-4B models for download and use. Theoretically, the code structure is also compatible with other size models of Qwen3 series. - What kind of hardware configuration is required to perform fine-tuning?
Hardware requirements depend on the fine-tuning method and model size you choose. For small models like SFT-LoRA fine-tuning Qwen3-1.7B, a consumer-grade graphics card (e.g., NVIDIA 3090 or 4090) is usually sufficient. However, for full fine-tuning or training larger models, more video memory and computational resources are required. - What is the difference between ORPO and PPO fine tuning?
PPO is a classic reinforcement learning algorithm, which requires an independent, pre-trained Reward Model to score the model's output, and the training process is relatively complex. ORPO, on the other hand, is a newer algorithm that does not require an additional Reward Model and optimizes the model directly from preference data, making the process simpler and more efficient. - Can I use my own dataset?
Totally. You just need to organize your dataset into itemsdata/
directory, and then run the training script with the--dataset_path
Just specify your file path with the parameter.