Solution: Utilizing the Data Efficient Training Features of MM-EUREKA
While traditional multimodal models require millions of data samples to achieve the desired results, MM-EUREKA breaks through this limitation with the following approach:
- Rule-based reinforcement learning: The system migrates textual inference rules to the visual domain, reducing the dependence on raw data. In practice, it is only necessary to set the configuration file in the
use_rules=Trueto activate the function - Small Sample Optimization TechniquesThe 8B/38B model provided by the project is specially designed to be trained with 8K-54K data:
- Download the official MM-Eureka-Dataset
- modifications
config.yamlhit the nail on the headfew_shot: 8000parameters - (of a computer) run
train.pywhen adding--few_shotsymbolize
- Data Enhancement Program::
- Add transformations such as rotation, cropping, etc. to images in JSONL data (requires changes to preprocessing code)
- Generating diverse problem descriptions through text rewriting
Implementation of recommendations: It is recommended to use a combination of rule engine + 8K data samples for the first attempt, and then expand the data size after the effect is stabilized.
This answer comes from the articleMM-EUREKA: A Multimodal Reinforcement Learning Tool for Exploring Visual ReasoningThe































