Current Position:fig. beginning " AI Tool

Seed-OSS: Open Source Large Language Model for Long Context Reasoning and Versatile Applications

2025-08-22

1.3 K 6

https://github.com/ByteDance-Seed/seed-oss

make a copy of

Seed-OSS is a series of open source large language models developed by the Seed team at ByteDance, focusing on long context processing, reasoning capabilities, and agent task optimization. The models contain 36 billion parameters and use only 12 trillion token Seed-OSS provides flexible inference budget control, which allows users to adjust the inference length according to their needs and improve the efficiency of practical applications. Seed-OSS adopts Apache-2.0 license and is completely open source, allowing developers to use and modify it freely, and it is widely used in research, reasoning tasks and multimodal scenarios, and it has been supported in more than 50 real-world applications of ByteDance.

Function List

ultra-long context processing: Supports a context window of 512K tokens, which is equivalent to about 1600 pages of text, making it suitable for processing long documents or complex dialogs.
Flexible reasoning for budgetary control: Users can access this information through the thinking_budget Parameters dynamically adjust inference length to balance speed and depth.
strong reasoning: Optimized for complex tasks such as math and code generation, performance is excellent in benchmarks such as AIME and LiveCodeBench.
Internationalization Optimization: Supports multi-language tasks for developers worldwide, covering multiple languages for translation and understanding.
Agent Mission Support: Built-in tool-calling functionality with enable-auto-tool-choice Automated task processing is possible.
Efficient deployment: Supports multi-GPU reasoning, compatible with bfloat16 data types to optimize inference efficiency.
Open Source and Community Support: Based on the Apache-2.0 license, it provides full model weights and code for easy customization by developers.

Using Help

Installation process

To use the Seed-OSS model, follow the steps below to install and configure it locally or on a server. The following is an example of the Seed-OSS-36B-Instruct model, based on the official guide provided by GitHub.

clone warehouse::

git clone https://github.com/ByteDance-Seed/seed-oss.git
cd seed-oss

Installation of dependencies::
Make sure Python 3.8+ and pip are installed on your system. run the following command to install the necessary dependencies:
```
pip3 install -r requirements.txt
pip install git+ssh://git@github.com/Fazziekey/transformers.git@seed-oss
```

Install vLLM (recommended)::
Seed-OSS Support vLLM Reasoning framework for more efficient reasoning. Install vLLM:

VLLM_USE_PRECOMPILED=1 VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL=1 pip install git+ssh://git@github.com/FoolPlayer/vllm.git@seed-oss

Download model weights::
Download Seed-OSS-36B-Instruct model weights from Hugging Face:

huggingface-cli download ByteDance-Seed/Seed-OSS-36B-Instruct --local-dir ./Seed-OSS-36B-Instruct

Configuring the runtime environment::
Ensure that your system has a hardware environment that supports multiple GPUs (e.g. NVIDIA H100). Recommended Configurations tensor-parallel-size=8 cap (a poem) bfloat16 data type to optimize performance.

Initiate reasoning service::
Use vLLM to start an OpenAI-compatible API service:

python3 -m vllm.entrypoints.openai.api_server \
--host localhost \
--port 4321 \
--enable-auto-tool-choice \
--tool-call-parser seed_oss \
--trust-remote-code \
--model ./Seed-OSS-36B-Instruct \
--chat-template ./Seed-OSS-36B-Instruct/chat_template.jinja \
--tensor-parallel-size 8 \
--dtype bfloat16 \
--served-model-name seed_oss

Usage

Seed-OSS provides a variety of ways to use it, suitable for different scenarios. Below is the detailed operation flow of the main functions.

1. Basic dialog and reasoning

Use Python scripts to interact with the model. Take the example of generating a cooking tutorial:

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "ByteDance-Seed/Seed-OSS-36B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
messages = [{"role": "user", "content": "How to make pasta?"}]
tokenized_chat = tokenizer.apply_chat_template(
messages, 
tokenize=True, 
add_generation_prompt=True, 
return_tensors="pt", 
thinking_budget=512
)
outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)
output_text = tokenizer.decode(outputs[0])
print(output_text)

Key parameters::
- thinking_budget=512: Controls the depth of reasoning, the larger the value, the deeper the reasoning, suitable for complex tasks.
- max_new_tokens=2048: Sets the maximum number of tokens to generate, which affects the length of the output.

2. Long contextualization

Seed-OSS supports 512K token contexts, which is suitable for processing long documents or multi-round conversations. For example, analyzing long reports:

To use the content of a long document as a part of the messages Input, in the format [{"role": "user", "content": "<长文档内容>"}].
Setting High thinking_budget(e.g., 1024) to ensure deep inference.
Use the above script to generate summaries or answer questions.

3. Proxy tasks and tool calls

Seed-OSS supports automated tool invocation. enable-auto-tool-choiceFor example, after configuring the API service, the model can be invoked via an HTTP request. For example, after you configure the API service, you can invoke the model via an HTTP request:

curl http://localhost:4321/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "seed_oss",
"messages": [{"role": "user", "content": "Calculate 2+2"}]
}'

The model automatically selects the appropriate tool (e.g., a math calculator) and returns the results.
assure tool-call-parser seed_oss Enabled to parse tool calls.

4. Reasoning about budget optimization

The user can adjust the thinking_budget Parameter optimization inference efficiency:

Simple tasks (e.g., translation): setup thinking_budget=128The
Complex tasks (e.g., mathematical reasoning): set up thinking_budget=1024The
Example:

tokenized_chat = tokenizer.apply_chat_template(
messages, 
tokenize=True, 
add_generation_prompt=True, 
return_tensors="pt", 
thinking_budget=1024
)

5. Deployment optimization

Multi-GPU Inference: By tensor-parallel-size parameter to allocate GPU resources. For example.tensor-parallel-size=8 Suitable for 8 GPUs.
data type: Use bfloat16 Reduced video memory footprint for large-scale deployments.
Generating Configurations: Recommendations temperature=1.1 cap (a poem) top_p=0.95 for diverse output. For specific tasks (e.g., Taubench), this can be adjusted to temperature=1 cap (a poem) top_p=0.7The

caveat

hardware requirement: At least 1 NVIDIA H100-80G GPU is recommended, with 4 supporting higher resolution tasks.
Model Selection: Seed-OSS is available in Base and Instruct versions, with Instruct being more suitable for interactive tasks and Base for research and fine-tuning.
Community Support: Contribute to the community by submitting an issue or pull request via GitHub.

application scenario

academic research
- Scene Description: Researchers can use Seed-OSS for long document analysis, data extraction or complex reasoning tasks. For example, analyzing academic papers or generating summaries of research reports.
multilingual application
- Scene Description: Developers can leverage the model's multilingual support to build internationalized chatbots or translation tools that cover multiple language scenarios.
Automation Agents
- Scene Description: Organizations can deploy Seed-OSS as an intelligent agent to handle customer service, automated task scheduling or data analysis.
code generation
- Scene Description: Programmers can use the model to generate code snippets or debug complex algorithms in conjunction with 512K contexts to process large code bases.
Educational support
- Scene Description: Educational institutions can use the models to generate instructional materials, answer student questions, or provide personalized study guides.

QA

What languages does Seed-OSS support?
- The model is optimized for internationalized scenarios and supports multiple languages, including English, Chinese, Spanish, etc. The specific performance can be found in the FLORES-200 benchmark test.
How do I adjust my reasoning budget?
- Setting in the generation script thinking_budget parameter, ranging from 128 (for simple tasks) to 1024 (for complex tasks), adjusted according to task requirements.
How much video memory is needed to run the model?
- A single H100-80G GPU can support basic inference, while 4 GPUs can handle higher load tasks. Recommended Usage bfloat16 Reduced video memory requirements.
How do I get involved in model development?
- Code can be submitted or issues can be fed back via the GitHub repository (https://github.com/ByteDance-Seed/seed-oss), under the Apache-2.0 license.

AI open source project

AI productivity tools " Seed-OSS: Open Source Large Language Model for Long Context Reasoning and Versatile Applications Posted on 2025-08-22, if you find the URL is out of date, or inaccessible, please contact us.

0Bookmarked

0kudos

Seed-OSS: Open Source Large Language Model for Long Context Reasoning and Versatile Applications

Function List

Using Help

Installation process

Usage

1. Basic dialog and reasoning

2. Long contextualization

3. Proxy tasks and tool calls

4. Reasoning about budget optimization

5. Deployment optimization

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

Seed-OSS: Open Source Large Language Model for Long Context Reasoning and Versatile Applications

Function List

Using Help

Installation process

Usage

1. Basic dialog and reasoning

2. Long contextualization

3. Proxy tasks and tool calls

4. Reasoning about budget optimization

5. Deployment optimization

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

Quick query station AI tool