The steps to run model inference in gpt-oss-recipes are as follows:
- Loading Models and Splitters: Use
AutoModelForCausalLM
cap (a poem)AutoTokenizer
Load the specified model (e.g.openai/gpt-oss-20b
). - input prompt: Define user messages (e.g.
"如何用Python写一个排序算法?"
), and throughapply_chat_template
method to process the input. - Generate results: Call
model.generate
method generates the response and decodes the output using a splitter. - Adjustment of inference parameters (optional): The level of reasoning detail can be adjusted by system prompts, such as setting the
"Reasoning: high"
to generate a more detailed reasoning process.
The reasoning example scripts are usually located in theinference.py
file, the model returns the generated results after running.
This answer comes from the articleCollection of scripts and tutorials for fine-tuning OpenAI GPT OSS modelsThe