The core technical architecture of ScoreFlow consists of three stages: in the preprocessing stage, adaptive binarization algorithms are applied to process music score images under different lighting conditions; in the symbol recognition stage, the improved YOLOv5 model is used to locate the music symbols, and combined with the CRNN network to analyze the temporal relationship; in the encoding output stage, standard format files are generated based on the music grammar rules. The whole process uses knowledge distillation technology to migrate the recognition capability of large-scale models to mobile.
The technical innovation points are reflected in three dimensions: firstly, the developed composite symbol segmentation algorithm can accurately separate overlapping notes; secondly, the timing analysis module can intelligently correct the possible distortion and deformation of the scanned image; and lastly, the context-aware encoder can automatically supplement the implicit information such as playing notation. Test data show that the system achieves a comprehensive accuracy of 96.7% on the ISMIR standard test set, exceeding similar products by 10 percentage points.
The team continues to optimize model performance and iteratively updates the recognition engine once a month. In terms of open source strategy, the core module code of PianoSync has been released on GitHub, attracting more than 200 developers around the world to participate in the community contribution.
This answer comes from the articleScoreFlow: Music Learning Tool for Converting Scores to MIDI and MusicXMLThe