Current Position:fig. beginning " AI Tool

Qwen3-235B-A22B-Thinking-2507：支持复杂推理的大型语言模型

Qwen3-235B-A22B-Thinking-2507: A large-scale language model to support complex reasoning

2025-07-26

1.3 K 4

https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507

make a copy of

Qwen3-235B-A22B-Thinking-2507 is a large-scale language model developed by the Alibaba Cloud Qwen team, released on July 25, 2025 and hosted on the Hugging Face platform. It focuses on complex reasoning tasks, supports context lengths up to 256K (262,144) tokens, and is suitable for handling logical reasoning, math, science, programming, and academic tasks. The model uses the Mixed Expert (MoE) architecture with 235 billion parameters and 22 billion parameters activated per inference, balancing performance and efficiency. It excels among open-source inference models and is particularly suited for application scenarios that require deep thinking and long context processing. Users can use it with a variety of inference frameworks such as transformers, sglang, and vLLM Deployment models that also support local runs.

Function List

Supports ultra-long contextual understanding of 256K tokens for processing complex documents or multiple rounds of dialog.
Provides strong logical reasoning for math, science, and academic problems.
Specialize in programming tasks, support code generation and debugging.
Integrate tool invocation functionality to simplify external tool interactions through Qwen-Agent.
Support for more than 100 languages, suitable for multilingual command following and translation.
A quantized version of FP8 is available to reduce hardware requirements and optimize inference performance.
Compatible with a variety of inference frameworks such as transformers, sglang, vLLM and llama.cpp.

Using Help

Installation and Deployment

To use Qwen3-235B-A22B-Thinking-2507, you need to prepare a high-performance computing environment due to its large model files (about 437.91GB for the BF16 version and 220.20GB for the FP8 version). The following are the detailed installation steps:

environmental preparation::
- Make sure the hardware meets the requirements: 88GB of video memory is recommended for the BF16 version, and about 30GB of video memory for the FP8 version.
- Install Python 3.8+ and PyTorch, a GPU environment with CUDA support is recommended.
- Install the Hugging Face transformers library, version ≥ 4.51.0 to avoid compatibility issues:
```
pip install transformers>=4.51.0
```
- Optionally install sglang (≥0.4.6.post1) or vLLM (≥0.8.5) to support efficient reasoning:
```
pip install sglang>=0.4.6.post1 vllm>=0.8.5
```
Download model::
- Download the model from the Hugging Face repository:
```
huggingface-cli download Qwen/Qwen3-235B-A22B-Thinking-2507
```
- For FP8 version, download Qwen3-235B-A22B-Thinking-2507-FP8:
```
huggingface-cli download Qwen/Qwen3-235B-A22B-Thinking-2507-FP8
```

local operation::

Use transformers to load the model:

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen3-235B-A22B-Thinking-2507"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

To avoid running out of memory, the context length can be reduced (e.g., 32768 tokens):

python -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B-Thinking-2507 --tp 8 --context-length 32768 --reasoning-parser deepseek-r1

Tool Call Configuration::

Simplify tool calls with Qwen-Agent:

from qwen_agent.agents import Assistant
llm_cfg = {
'model': 'qwen3-235b-a22b-thinking-2507',
'model_type': 'qwen_dashscope'
}
tools = [{'mcpServers': {'time': {'command': 'uvx', 'args': ['mcp-server-time', '--local-timezone=Asia/Shanghai']}}}]
bot = Assistant(llm=llm_cfg, function_list=tools)
messages = [{'role': 'user', 'content': '获取当前时间'}]
for responses in bot.run(messages=messages):
print(responses)

Main Functions

complex inference: The model has think mode enabled by default and the output contains <think> Tags, suitable for solving mathematical or logical problems. For example, enter "prove Fermat's Little Theorem" and the model will generate a step-by-step reasoning process.
Long Context Processing: Supports 256K tokens, suitable for analyzing long documents. After inputting long text, the model can extract key information or answer relevant questions.
Programming Support: Enter a code snippet or a question, such as "Write a Python sorting algorithm", and the model generates the full code and explains the logic.
Tool Call: With Qwen-Agent, models can invoke external tools, such as getting time or executing web requests, simplifying complex tasks.

caveat

In inference mode, a context length ≥ 131072 is recommended to ensure performance.
Avoid using greedy decoding, which may result in duplicate output.
For local operation, it is recommended to use the Ollama or LMStudio, but the context length needs to be adjusted to avoid looping problems.

application scenario

academic research
Researchers can use the model to analyze long papers, extract key arguments or validate mathematical formulas. Its 256K context length supports processing entire documents and is suitable for literature reviews or cross-chapter analysis.
Programming
Developers can use models to generate code, debug programs, or optimize algorithms. For example, enter a complex algorithm requirement and the model will provide the code and explain the steps to implement it.
multilingual translation
Enterprises can use the model for multilingual document translation or instruction processing, supporting more than 100 languages, suitable for cross-border communication or localization tasks.
Educational support
Students and teachers can use models to answer math and science questions or to generate instructional materials. The reasoning power of models helps explain complex concepts.

QA

What inference frameworks does the model support?
Support for transformers, sglang, vLLM, Ollama, LMStudio and llama.cpp. The latest version is recommended to ensure compatibility.
How do I deal with out-of-memory problems?
Reduce the context length to 32768, or use the FP8 version to reduce memory requirements. Multiple GPU resources can also be allocated via the tensor-parallel-size parameter.
How do I enable the tool call feature?
Using the Qwen-Agent Configuration Tool, define the MCP files or built-in tools, the model can automatically call external functions.

AI open source project

AI productivity tools " Qwen3-235B-A22B-Thinking-2507: A large-scale language model to support complex reasoning Posted on 2025-07-26, if you find the URL is out of date, or inaccessible, please contact us.

0Bookmarked

0kudos

Qwen3-235B-A22B-Thinking-2507: A large-scale language model to support complex reasoning

Function List

Using Help

Installation and Deployment

Main Functions

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

Qwen3-235B-A22B-Thinking-2507: A large-scale language model to support complex reasoning

Function List

Using Help

Installation and Deployment

Main Functions

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

Quick query station AI tool