Typical Risk Analysis
Open source VLM fine-tuning often encounters problems such as exploding/disappearing gradients, overfitting, and catastrophic forgetting, and Maestro builds a safety net through the following mechanisms:
Preventive measures
- gradient cropping: Automatic monitoring and limiting of gradient amplitude with a threshold set to the recommended value of 1.0
- Dynamic learning rate: Adoption of Cosine Annealing Warm Restarts (CAWRs)
- Regularization Package: the combination label_smoothing=0.1 + dropout=0.2 is enabled by default
Remediation program
- Automatically when a loss anomaly is detected:
- Suspension of training
- Rollback to the most recent normal checkpoint
- Reduced learning rate 50% continued after - furnish
--debug-modeParameter outputs diagnostic information such as gradient histograms
best practice
Recommended for beginners:
1. Prioritize the use of ready-to-use formulations (maestro recipies list)
2. Starting with small-scale data for trial training (additions)--fast-dev-runParameters)
3. Utilizing the Cookbook
This answer comes from the articleMaestro: A tool to simplify the process of fine-tuning mainstream open source visual language modelsThe































