How to run model inference in gpt-oss-recipes?

2025-08-19

279

The steps to run model inference in gpt-oss-recipes are as follows:

Loading Models and Splitters: UseAutoModelForCausalLMcap (a poem)AutoTokenizerLoad the specified model (e.g.openai/gpt-oss-20b).
input prompt: Define user messages (e.g."如何用Python写一个排序算法？"), and throughapply_chat_templatemethod to process the input.
Generate results: Callmodel.generatemethod generates the response and decodes the output using a splitter.
Adjustment of inference parameters (optional): The level of reasoning detail can be adjusted by system prompts, such as setting the"Reasoning: high"to generate a more detailed reasoning process.

The reasoning example scripts are usually located in theinference.pyfile, the model returns the generated results after running.

Quick query station AI tool