How to use HRM for inference in a non-NVIDIA graphics environment?

2025-08-23

386

Description of environmental constraints

HRM relies on CUDA extensions by default, but can be run on AMD/Intel graphics cards using the following scheme:

Option 1: CPU mode
1. Install the CPU version of PyTorch: pip install torch -cpu
2. Modify all .cuda() calls in the code to .cpu()
3. Setting environment variables: export CUDA_VISIBLE_DEVICES=-1
Note: Reasoning speed is reduced by a factor of about 10
Option 2: ROCm conversion
1. Installing the ROCm version of PyTorch
2. Enable automatic optimization with torch.compile()
3. Rewriting the CUDA kernel as HIP code
Option 3: Cloud Service Agent
- Deployment to Azure ML via ONNX Runtime
- Transforming Models with TensorRT-LLM