Complete Guide to Local Deployment
Jan-nano provides a standardized local deployment process that is divided into 4 key steps:
- environmental preparation: Python 3.8+ and Git environment required, isolated virtual environment recommended (venv)
- Dependent Installation: Install transformers and vLLM libraries via pip for optimal inference performance
- Model Download: use the huggingface-cli tool to obtain official models or third-party quantized versions (e.g., bartowski's GGUF format)
- service activation: The vLLM engine starts up with care:
- The basic version uses standard parameters
--enable-auto-tool-choice - Configuration required for 128k version
--rope-scalingParameters support extended contexts
- The basic version uses standard parameters
Typical deployment example:vllm serve Menlo/Jan-nano --port 1234 --enable-auto-tool-choice
After deployment, validation tests can be performed via REST API or Python requests library. Special note: You need to choose the appropriate quantization level according to the size of the video memory, 8GB device recommended Q4_K_M version.
This answer comes from the articleJan-nano: a lightweight and efficient model for text generationThe































