Seed-OSS provides a flexible inference budget control function, which allows users to dynamically adjust the inference length through the thinking_budget parameter to balance the inference speed and depth. The specific method of using this function is as follows:
- simple task(as translated): set thinking_budget=128.
- Medium complexity tasks(e.g. regular Q&A): suggests thinking_budget=512.
- complex mission(e.g., mathematical reasoning or code generation): set thinking_budget=1024.
This parameter can be set directly in the generating script, e.g. in Python code:
tokenized_chat = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
thinking_budget=1024
)
By adjusting this parameter, the user can optimize the model's reasoning efficiency and effectiveness according to the actual task requirements.
This answer comes from the articleSeed-OSS: Open Source Large Language Model for Long Context Reasoning and Versatile ApplicationsThe































