HRM (Hierarchical Reasoning Model) is a hierarchical reasoning model with only 27 million parameters designed to solve complex reasoning tasks in the field of artificial intelligence. The design of the model is inspired by the hierarchical, multi-timescale information processing of the human brain. It performs sequential inference tasks through a single forward propagation without explicit intermediate process supervision, via a recurrent neural network structure with interdependent high-level modules (responsible for slow, abstract planning) and a low-level module (handling fast, concrete computations). HRM achieves near-perfect performance on complex tasks such as Sudoku and large maze pathfinding with only 1,000 training samples, without the need for pre-training or chain-of-thinking (CoT) data, and outperforms many larger models on the Abstraction and Reasoning Corpus (ARC), a key benchmark for general-purpose AI. The ARC is the key benchmark for general-purpose AI, outperforming many larger-scale models.
Function List
- Efficient Reasoning: A novel recurrent architecture is used to gain tremendous computational depth while maintaining training stability and efficiency.
- No pre-training required: Models can learn directly from a small number of samples without a large-scale pre-training process.
- low data demand: High performance on complex inference tasks can be achieved with only 1000 training samples.
- Two-module structure: contains a high-level module for abstract planning and a low-level module to handle fast computations.
- Wide applicability: Performs well in a variety of complex benchmarks, for example:
- Sudoku 9×9 Extreme
- 30×30 Maze Path Finding (Maze 30×30 Hard)
- Abstraction and Reasoning Corpus (ARC-AGI-2)
- expand one's financial resources: The code is open-sourced on GitHub and provides pre-trained model checkpoints.
Using Help
The installation and use of HRM requires specific hardware and software environments, mainly NVIDIA GPUs with CUDA support.The following is a detailed installation and usage procedure.
environmental preparation
- Installing CUDA:
HRM relies on CUDA extensions and you first need to make sure that the NVIDIA driver and CUDA toolkit are installed on your system. It is recommended to install CUDA version 12.6.# 下载CUDA 12.6安装程序 wget -q --show-progress --progress=bar:force:noscroll -O cuda_installer.run https://developer.download.nvidia.com/compute/cuda/12.6.3/local_installers/cuda_12.6.3_560.35.05_linux.run # 以静默模式安装 sudo sh cuda_installer.run --silent --toolkit --override # 设置CUDA环境变量 export CUDA_HOME=/usr/local/cuda-12.6
- Installing PyTorch:
Install the corresponding PyTorch according to the installed CUDA version.pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
- Installing the build tool:
In order to compile the CUDA extension, some additional packages need to be installed.pip3 install packaging ninja wheel setuptools setuptools-scm
- Install FlashAttention:
Install the appropriate version of FlashAttention for your GPU model.- Hopper architecture GPUs (e.g. H100):
git clone git@github.com:Dao-AILab/flash-attention.git cd flash-attention/hopper python setup.py install
- Ampere architecture or earlier GPUs (e.g. RTX 30 series, 40 series):
pip3 install flash-attn
- Hopper architecture GPUs (e.g. H100):
- Install project dependencies:
Clone HRM repository and installrequirements.txt
Dependency in the middle.git clone https://github.com/sapientinc/HRM.git cd HRM pip install -r requirements.txt
- W&B integration (optional):
The project uses Weights & Biases for experimental tracking. If you need to visualize the training process, please visit W&B.wandb login
Quick start: training a sudoku solving AI
This is an example of training a difficult Sudoku solving model on a single consumer GPU (e.g. RTX 4070).
- Constructing data sets:
First, you need to download and build the Sudoku dataset. This command generates 1000 samples of high level Sudoku with data augmentation.python dataset/build_sudoku_dataset.py --output-dir data/sudoku-extreme-1k-aug-1000 --subsample-size 1000 --num-aug 1000
- Start training:
Use the following command to start training on a single GPU.OMP_NUM_THREADS=8 python pretrain.py data_path=data/sudoku-extreme-1k-aug-1000 epochs=20000 eval_interval=2000 global_batch_size=384 lr=7e-5 puzzle_emb_lr=7e-5 weight_decay=1.0 puzzle_emb_weight_decay=1.0
On the RTX 4070, this process takes about 10 hours.
Large-scale experiments
For larger experiments, such as ARC or full Sudoku datasets, a multi-GPU environment (e.g., 8 cards) is recommended.
- Initializing Submodules:
git submodule update --init --recursive
- Preparing the dataset:
Depending on the problem to be solved, construct the appropriate dataset.- ARC-2:
python dataset/build_arc_dataset.py --dataset-dirs dataset/raw-data/ARC-AGI-2/data --output-dir data/arc-2-aug-1000
- Maze:
python dataset/build_maze_dataset.py
- ARC-2:
- Initiate multi-GPU training:
utilizationtorchrun
Start distributed training. Take the example of training high level Sudoku (1000 samples):OMP_NUM_THREADS=8 torchrun --nproc-per-node 8 pretrain.py data_path=data/sudoku-extreme-1k-aug-1000 epochs=20000 eval_interval=2000 lr=1e-4 puzzle_emb_lr=1e-4 weight_decay=1.0 puzzle_emb_weight_decay=1.0
In an 8 GPU environment, this training process takes about 10 minutes.
assessment model
You can use pre-trained model checkpoints or evaluate your own trained models.
- Download Checkpoints:
The official repository provides pre-trained models for ARC, Sudoku and Maze tasks. - Run the evaluation script:
Viewed through the W&B interfaceeval/exact_accuracy
Metrics. For ARC tasks, additional evaluation scripts need to be run and the results analyzed using Jupyter Notebook.OMP_NUM_THREADS=8 torchrun --nproc-per-node 8 evaluate.py checkpoint=<CHECKPOINT_PATH>
After that, open the
arc_eval.ipynb
to finalize and check the results.
caveat
- The accuracy of small-sample learning typically fluctuates by ±2 points.
- For the thousand-sample dataset of high-difficulty Sudoku, overfitting may occur at the late stage of training resulting in unstable values. An early stopping strategy is recommended when the training accuracy is close to 100%.
application scenario
- Cognitive Science Research
The design of HRM simulates the hierarchical information processing mechanism of the human brain, providing a computable model for studying human planning, reasoning, and problem-solving abilities, and helping to explore the path to the realization of general artificial intelligence. - Complex Planning and Scheduling
In areas such as logistics, robot path planning, and automated production scheduling, HRM is able to quickly find optimal or near-optimal solutions without relying on large amounts of data, such as solving large maze pathfinding problems. - Game AI
It can be used to develop AIs capable of solving complex strategy games (e.g., Sudoku) whose efficient reasoning enables them to learn and master the rules of the game without pre-training. - Localized AI Applications
Due to the small number of model parameters, HRM has the potential to be deployed on local devices with limited resources for scenarios such as home automation to perform specific logical reasoning tasks.
QA
- How does HRM differ from the traditional Large Language Model (LLM)?
Instead of relying on natural language for reasoning, HRM performs symbolic manipulation and reasoning through two recurrent modules with different time scales. It does not require large-scale pre-training like LLM and avoids the vulnerability, high data requirements and high latency inherent in Chain of Thought (CoT) approaches. - What kind of hardware do you need to train HRM?
Training HRM requires NVIDIA GPUs with CUDA support, and while it is possible to train on a single consumer GPU (such as an RTX 3060 or 4070), this can take a long time. It is officially recommended to use a multi-GPU environment (e.g., an 8-card server) for efficient training, e.g., it takes only 10 minutes to train a Sudoku model on 8 GPUs. - Is the amount of training data required for HRM high?
Not high.One of the major advantages of HRM is that it requires very little training data. On complex tasks such as Sudoku, mazes, and ARC, it achieves a very high level of performance using only 1,000 training samples. - How to reproduce the performance reported in the paper?
Due to the stochastic nature of small sample learning, the results of a single training session may fluctuate slightly. The authors of the paper note that for the Sudoku task, the standard deviation of accuracy is about 21 TP3 T. For optimal performance, it may be necessary to fine-tune the training duration and stop early before the model begins to overfit.