Multi-LoRA loading solution
When using vllm-cli to load multiple LoRA adapters at the same time, the following method is recommended:
- Parameter combinations:Specify multiple adapter paths with the -lora-modules parameter in the format "name1:path1,name2:path2″.
- Graphics Memory Planning:Each LoRA takes up about 200-500MB of video memory, use vllm-cli info to check the remaining capacity before loading.
- <strong]Version compatible:Ensure that all LoRA adapters match the base model version
- Weighted Fusion:Advanced users can pass in the weight configuration via the -lora-extra-config parameter
Example of operation:
vllm-cli serve base_model -lora-modules "adapter1:/path/lora1,adapter2:/path/lora2"
Troubleshooting: If loading fails, first check the README.md of each LoRA to confirm compatibility, then load each adapter individually to test. It is recommended to test the individual LoRAs in interactive mode first before loading them in combination.
This answer comes from the articlevLLM CLI: Command Line Tool for Deploying Large Language Models with vLLMThe