Optical Music Recognition (OMR) is the process of obtaining from the image of a musical score the semantic representation of its musical content. It is analogous to Optical Character Recognition (OCR), which reads images of text, but while both text and music notation consist of symbols from a predefined set, music notation has complex, inherently two-dimensional visual syntax, and reconstructing the musical content (mainly pitches and durations) depends on non-contiguous contexts (e.g., pitch depends on clefs). OMR is a bottleneck for modern musicology and dissemination of both musical archives and contemporary composers’ works.
Multimodal OMR (mOMR) leverages the audio representation of the music. The audio is most relevant for reconstructing pitch and duration, but it can inform the recognition process throughout, if the system provides principled ways of incorporating this information.
The goal of the project is to use the audio modality to improve OMR. We will apply end-to-end multimodal deep learning methods, which have recently shown promise with multimodal live score following; the challenge is adapting the models from the score following objective to OMR. The project will utilize recently established international contacts with CP JKU Linz.