A systematic response to the overfitting problem
A comprehensive processing solution for the three dimensions of data, modeling, and training:
- Data-level solutions::
- Ensure that the amount of training data is > 1/10th of the model parameters (e.g., 7B model requires at least 700MB of good quality data)
- Remove duplicate samples using the platform's built-in data cleaning tool
- Adding 5-10% Noise Data Enhanced Generalizations
- Model-level solutions::
- Turn on Dropout in "Fine Tuning Parameters" (0.1-0.3 recommended)
- Use a smaller learning rate (e.g., 1e-5) for the pre-training layer and a higher learning rate (e.g., 5e-4) for the newly added layer
- Layer-by-Layer Learning Rate Decay using Layer-wise Learning Rate Decay
- Solutions at the training level::
- Set up the validation set in the Evaluation Tool (recommended training:validation = 8:2)
- L2 regularization enabled (weight decay factor set to 0.01)
- Automatically stop training when the validation set loss does not decrease for 3 consecutive times
Additional suggestions: After the fine-tuning was completed, the robustness was checked using the adversarial testing function of "Model Evaluation", and the fluctuation of the F1 value <5% indicated that the overfitting was well controlled.
This answer comes from the articleVolcano Ark: Big Model Training and Cloud Computing Service, Sign Up for $150 Equivalent ArithmeticThe































