Installation process details
- 从GitHub克隆仓库:
git clone https://github.com/NVIDIA/TensorRT-LLM.git
- 进入目录后执行编译:
make build
- Install Python dependencies:
pip install -r requirements.txt
验证步骤
fulfillmentpython -c "import tensorrt_llm; print(tensorrt_llm.__version__)"
确认安装成功。如果报错,需检查:
- CUDA工具链是否完整
- GPU驱动版本是否匹配
- Python环境是否隔离
common problems
编译过程可能遇到cuBLAS等依赖问题,建议参考NVIDIA官方文档预装CUDA开发套件。对于多GPU部署,还需配置NCCL通信库。
This answer comes from the articleDeepSeek-R1-FP4: FP4-optimized version of DeepSeek-R1 inference 25x fasterThe