Background to the issue
Although HRM requires only 1000 training samples, it is prone to overfitting in the later stages of tasks such as difficult Sudoku, resulting in performance fluctuations of ±2% in the test set.
Prevention program
- Data level::
- Data enhancement using the -num-aug 1000 parameter
- Mixing samples of different difficulty levels (e.g., 80% High + 20% Medium)
- training technique::
- Set eval_interval=2000 for frequent validation
- Stop training when accuracy drops for 3 consecutive validations
- Enhanced regularization with weight_decay=1.0
remedial measure
- Loading early-stop checkpoints for fine-tuning
- Freeze high-level modules (puzzle_emb_lr=0), train only low-level modules
- Add Dropout layer (probability 0.1-0.3)
Monitoring Recommendations
The following metrics are tracked through W&B:
- train_loss vs. val_loss gap
- exact_accuracy change curve
- Histogram of weight distribution
This answer comes from the articleHRM: Hierarchical Reasoning Model for Complex ReasoningThe































