Scenario-based effect tuning solution
For different application scenarios, the following optimization strategies can be adopted:
- A single diagram depicting the scene::
- Increase the proportion of samples described by the image in sft_vlm_data.jsonl
- Adjusting the temperature parameter to control generation diversity
- Include "Please describe this image in detail" in the prompt.
- Q&A scenario::
- Collect domain-specific QA data to add to the microtuning set
- Modify the max_seq_len parameter in LMConfig.py to extend the context
- Example of using fresh-shot prompting
- Multi-graph reasoning scenarios::
- Increase sft_vlm_data_multi.jsonl data volume
- Adjusting position embedding for visual tokens
- Add clear indication of image order in the input
Generic optimization suggestions: 1) Increase the training epoch on the same data 2) Try a medium-sized configuration with dim=768 3) Use beam search to improve the generation quality. The project web_demo_vlm.py has built-in effect evaluation tool to test the optimization effect in real time.
This answer comes from the articleMiniMind-V: 1 hour training of a 26M parameter visual language modelThe































