MM-EUREKA sets a new technological benchmark in training data usage efficiency. Experimental data shows that its 8B parameter version requires only 8K graphic pairs of training data, and the 38B version can outperform traditional multimodal models that require millions of training data after using 54K data.
This efficiency stems from three aspects: first, the rule-based reinforcement learning approach dramatically improves data utilization; second, the innovative model architecture design; and third, the optimized training process. The MM-Eureka-Dataset made public by the project team on GitHub contains rigorously screened high-quality training samples, and each pair of data has been labeled by experts and verified in multiple rounds.
The high data-efficiency feature makes MM-EUREKA particularly suitable for arithmetic-limited research organizations and small development teams who can reproduce state-of-the-art model performance with limited resources.
This answer comes from the articleMM-EUREKA: A Multimodal Reinforcement Learning Tool for Exploring Visual ReasoningThe































