Model performance can be optimized with the following advanced configurations:
- Cue Optimization Loop: Increase
-r
Parameter values (default 1 time) can improve the quality of synthesized data, for example-r 3
There will be three rounds of optimization, but the training will be extended. - Edge Case Generation: enabled by default
--generate-edge-cases
The function generates 50 complex samples (e.g., comments containing spelling errors) for each class, enhancing model robustness. - Data volume adjustment: By
--target-volume-per-class
Increase the number of single class samples (e.g., set to 100), but need to balance training efficiency. - Model Selection: Different base LLMs (e.g. Grok-3-beta) can be specified in the configuration file, affecting the diversity of the data generated.
The tool will also output detailed training logs (e.g., accuracy, loss values) to help developers target and adjust parameters. If the results are not satisfactory, it is recommended to discuss specific cases through the GitHub community.
This answer comes from the articleWhiteLightning: an open source tool for generating lightweight offline text classification models in one clickThe