Deployment Options
dots.llm1 provides a variety of deployment methods for different usage scenarios.
1. Docker deployment (recommended)
- Installing Docker and NVIDIA Container Toolkit
- Run the following command to pull the image:
docker run -gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 -ipc=host rednotehilab/dots1. vllm-openai-v0.9.0.1 -model rednote-hilab/dots.llm1.base -tensor-parallel-size 8 -trust-remote-code -served-model-name dots1 - Use curl to test if the service is working
2. Use of Hugging Face Transformers
- Install the dependencies:
pip install transformers torch - Load models and disambiguators:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = 'rednote-hilab/dots.llm1.base'
model = AutoModelForCausalLM.from_pretrained(model_name)
3. High throughput reasoning with vLLM
Suitable for large-scale reasoning scenarios:
vllm serve rednote-hilab/dots.llm1.base -port 8000 -tensor-parallel-size 8
This answer comes from the articledots.llm1: the first MoE large language model open-sourced by Little Red BookThe