MiroFlow demonstrated excellent performance in the GAIA Validation Set performance tests:
- When using Claude Sonnet 3.7 as the main large-scale language model
- Averaged a pass@1 scoring rate of 72.21 TP3T through three runs
- This performance is at the forefront of open source smart body frameworks
Notably, MiroFlow places special emphasis on the reproducibility of its performance, providing fully open evaluation scripts and profiles, and publishing multiple independent GAIA trace runs on HuggingFace to ensure transparency and reliability of results.
This answer comes from the articleMiroFlow: a framework for building, managing and scaling AI intelligencesThe