Qwen3-8B-BitNet offers two characteristic inference modes:
- cast(enable_thinking=True): suitable for complex logic tasks, will generate a detailed reasoning process. For example, when dealing with math equations, it will show the steps to solve the problem step-by-step
- modus vivendi(enable_thinking=False): faster response time, suitable for simple Q&A or daily conversation scenarios
Mode switching method:
The switch is made by setting the enable_thinking parameter when calling the apply_chat_template function. Typical code example:
# 启用思考模式
text = tokenizer.apply_chat_template(messages,
tokenize=False,
enable_thinking=True)
# 禁用思考模式
text = tokenizer.apply_chat_template(messages,
tokenize=False,
enable_thinking=False)
In practice, it is recommended that Thinking Mode be enabled for tasks that require step-by-step analysis, and that Non-Thinking Mode be used for simple tasks that are time-sensitive.
This answer comes from the articleQwen3-8B-BitNet: an open source language model for efficient compressionThe