How to solve the problem of insufficient quality of training data for reinforcement learning?

2025-09-05

1.5 K

Data Quality Improvement Program

Open-Reasoner-Zero offers a complete solution to data problems:

57k high-quality dataset: The preprocessed dataset that comes with the project has been screened through multiple stages and contains:
- 20k GPQA Diamond Standards data
- 15k logical reasoning data
- 22k multi-step decision data
Customized data processing flow: Available in the src/data_processing directory:
1. clean_raw_data.py - Raw data cleansing
2. generate_synthetic.py - Synthetic data generation
3. quality_filter.py - Quality filtering (PPL threshold set to 2.5 by default)

To add field-specific data:

build upcustom_data/Catalog to store new data
modificationsconfig.yamlThe data_mix_ratio parameter controls the data mixing ratio in the
Recommended Interactive Validation of Data Quality with Jupyter Notebook