gpt-oss-recipes
is a GitHub repository maintained by Hugging Face that focuses on providing scripts and Jupyter Notebook tutorials for using OpenAI GPT OSS models. The repository contains the latest open source models for OpenAI gpt-oss-120b
cap (a poem) gpt-oss-20b
configuration and usage examples. Known for their powerful reasoning capabilities and efficient resource footprint, these models are suitable for developers to run in production environments or on personal devices. The code and documentation in the repository help users quickly get started with model inference, fine-tuning, and deployment, covering everything from environment setup to implementation of complex tasks. All content is based on the Apache 2.0 license, which allows free use and modification.
Function List
- furnish
gpt-oss-120b
cap (a poem)gpt-oss-20b
Configuration scripts for models that support fast switching of model sizes. - Contains environment setup code to support Python virtual environments and dependency installation.
- Provides reasoning examples that show how to use the model to generate text or perform tool calls.
- Supports model fine-tuning and contains examples of processing multilingual inference datasets.
- Provides the ability to work with Transformers, vLLM and Ollama Integration tutorials for frameworks such as.
- Supports optimized configurations for running models on different hardware (H100 GPUs, consumer-grade devices).
Using Help
Installation process
To use gpt-oss-recipes
scripts in the repository, you first need to clone the repository and set up the Python environment. Here are the detailed steps:
- clone warehouse
Open a terminal and run the following command to clone the repository locally:git clone https://github.com/huggingface/gpt-oss-recipes.git cd gpt-oss-recipes
- Creating a Virtual Environment
It is recommended that you create a virtual environment using Python 3.11 to ensure compatibility. It is recommended to useuv
Tools:uv venv gpt-oss --python 3.11 source gpt-oss/bin/activate
- Installation of dependencies
Install the necessary Python packages, including PyTorch and Transformers. run the following command:uv pip install --upgrade pip uv pip install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128 uv pip install -U transformers accelerate
- Installing the Triton kernel (optional)
If the hardware supports MXFP4 quantization (e.g. H100 or RTX 50xx), the Triton kernel can be installed to optimize performance:uv pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
configuration model
The repository offers two models:gpt-oss-120b
(117B parameters for high performance GPUs) and gpt-oss-20b
(21B parameters for consumer grade hardware). In the script, modify the model_path
Variable selection models. Example:
model_path = "openai/gpt-oss-20b" # 选择 20B 模型
# model_path = "openai/gpt-oss-120b" # 选择 120B 模型
The script automatically configures device mapping and optimization settings based on model size.
running inference
The repository contains simple reasoning examples for generating text or performing specific tasks. The following is an example of an application that uses the gpt-oss-20b
Example of model-generated text:
- show (a ticket)
inference.py
file (or similar script). - Ensure that the model and splitter are loaded:
from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "openai/gpt-oss-20b" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto")
- Enter prompts and generate results:
messages = [{"role": "user", "content": "如何用 Python 写一个排序算法?"}] inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) generated = model.generate(**inputs, max_new_tokens=200) print(tokenizer.decode(generated[0]))
- Run the script and the model returns sample Python code for the sorting algorithm.
Adjustment of inference parameters
The level of detail in reasoning can be adjusted by system prompts. For example, set a high reasoning level:
messages = [
{"role": "system", "content": "Reasoning: high"},
{"role": "user", "content": "解释量子计算的基本原理"}
]
High inference levels generate more detailed reasoning processes, suitable for complex problems.
fine-tuned model
The repository provides fine-tuning examples, based on Hugging Face's TRL library and LoRA technology. Here is the fine-tuning gpt-oss-20b
The Steps:
- Download the multilingual inference dataset:
from datasets import load_dataset dataset = load_dataset("HuggingFaceH4/Multilingual-Thinking", split="train")
- Configure LoRA parameters and load the model:
from transformers import AutoModelForCausalLM from peft import PeftModel, LoraConfig model_name = "openai/gpt-oss-20b" lora_config = LoraConfig(r=8, lora_alpha=32, target_modules=["q_proj", "v_proj"]) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto") model = PeftModel(model, lora_config)
- Use the TRL library for fine-tuning (refer to the repository for the
finetune.ipynb
). - Save the fine-tuned model for specific tasks such as multilingual reasoning.
Using vLLM or Ollama
If rapid deployment is required, warehouse support vLLM and Ollama:
- vLLM: Start an OpenAI-compatible server:
uv pip install --pre vllm==0.10.1+gptoss --extra-index-url https://wheels.vllm.ai/gpt-oss/ vllm serve openai/gpt-oss-20b
- Ollama: Runs on consumer-grade hardware:
ollama pull gpt-oss:20b ollama run gpt-oss:20b
Featured Function Operation
- Tool Call: The model supports function calls and Web searches. For example, calling the weather function:
tools = [{"type": "function", "function": {"name": "get_current_weather", "description": "获取指定地点的天气", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}}}] messages = [{"role": "user", "content": "巴黎的天气如何?"}] response = client.chat.completions.create(model="openai/gpt-oss-120b:cerebras", messages=messages, tools=tools)
- multilingual reasoning: Through fine-tuning, the model can generate reasoning processes in English, Spanish, French and other languages. The user can specify the reasoning language, for example:
messages = [{"role": "system", "content": "Reasoning language: Spanish"}, {"role": "user", "content": "¿Cuál es la capital de Australia?"}]
application scenario
- AI development experiments
Developers can use the scripts in the repository to test the performance of GPT OSS models in different tasks, such as text generation, code generation, or Q&A systems. Ideal for rapid prototyping. - Local Model Deployment
Can be deployed on local devices by businesses or individualsgpt-oss-20b
, for privacy-sensitive scenarios such as internal document processing or customer support. - Education and Research
Researchers can use the fine-tuning tutorials to optimize models based on specific datasets (e.g., multilingual reasoning) and explore the application of large models in academic fields. - Production Environment Integration
The repository supports the deployment of API servers via vLLM and is suitable for integrating models into production environments such as chatbots or automated workflows.
QA
- What models does the warehouse support?
Warehouse Supportgpt-oss-120b
(117B parameters) andgpt-oss-20b
(21B parameters) for high-performance GPUs and consumer hardware, respectively. - How to choose the right model?
Recommended if you have an H100 GPUgpt-oss-120b
If you are using a regular device (16GB of memory), select thegpt-oss-20b
The - What hardware is required?
gpt-oss-20b
Requires 16GB of RAM.gpt-oss-120b
Requires 80GB GPUs (e.g., H100). mxFP4 quantization reduces resource requirements. - How to deal with errors in model reasoning?
Make sure to use harmony format for input and output. Check for hardware compatibility and update dependencies such as PyTorch and Triton kernels.