Parameter tuning strategy
core conditioning--cfg
The parameter controls the text-image alignment, the larger the value the more strictly the model follows the cue word. The official recommended initial value is 4.0, which can be gradually increased to 7.0 to test the effect.
Cue word engineering tips
- Use of English descriptions: Although Chinese is supported, the training data is in English.
- Add detail modifiers: e.g. quality descriptors such as "4K Ultra HD"/"Professional Photography".
- Structured Expression: Organize prompts according to the format of "Subject + Setting + Style".
Follow-up optimization programme
- Multi-round editing: by
generate_examples
Step-by-step correction of the editing script in - Theme fine-tuning: using the TRAIN.md guide to load domain-specific data for training
- Hybrid control: precise feature tuning in conjunction with MoVQGAN's latent spatial control function
This answer comes from the articleLumina-mGPT-2.0: an autoregressive image generation model for handling multiple image generation tasksThe