The repository supports rapid deployment of models via vLLM and Ollama:
- vLLM Deployment::
- To install vLLM: run
uv pip install --pre vllm==0.10.1+gptoss --extra-index-url https://wheels.vllm.ai/gpt-oss/
The - Start the server: execute
vllm serve openai/gpt-oss-20b
, providing OpenAI-compatible API services.
- To install vLLM: run
- Ollama deployment::
- Pull model: run
ollama pull gpt-oss:20b
Download the model. - Start-up model: implementation
ollama run gpt-oss:20b
, running the model on consumer-grade hardware.
- Pull model: run
These two approaches are suitable for different scenarios. vLLM is suitable for production environment API deployment, and Ollama is suitable for local testing and development.
This answer comes from the articleCollection of scripts and tutorials for fine-tuning OpenAI GPT OSS modelsThe