Preparation of the basic environment
- hardware requirement:NVIDIA H800等支持NVLink/RDMA的GPU
- network requirement:InfiniBand 400Gb/s以上
- software dependency:CUDA 11+/NCCL/Python 3.8+
Step-by-step installation guide
- 克隆源码库:
git clone https://github.com/deepseek-ai/DeepEP.git
- 安装修改版NVSHMEM(需应用补丁文件)
- 执行编译:
make
生成内核文件 - Setting environment variables:
export NVSHMEM_IB_SL=0
export NVSHMEM_ENABLE_ADAPTIVE_ROUTING=1
(限低延迟模式)
Verify Installation
Run the test script:python tests/test_low_latency.py
预期输出应包含:
- All-to-all通信成功标志
- 各GPU间的带宽测试结果
- 时延统计信息
This answer comes from the articleDeepEP: An Open Source Tool to Optimize Communication Efficiency Specifically for MoE Models (DeepSeek Open Source Week Day 2)The