Multimodal Optical Music Recognition using Deep Learning

Principal investigator (ÚFAL):

Jan Hajič jr.

Provider:

GAUK

Grant id:

1444217

Duration:

2017-2019

Tags:

Machine Learning

Multi-modality

Projects:

Prague Music Computing Group

Optical Music Recognition (OMR) is the process of obtaining from the image of a musical score the semantic representation of its musical content. It is analogous to Optical Character Recognition (OCR), which reads images of text, but while both text and music notation consist of symbols from a predefined set, music notation has complex, inherently two-dimensional visual syntax, and reconstructing the musical content (mainly pitches and durations) depends on non-contiguous contexts (e.g., pitch depends on clefs). OMR is a bottleneck for modern musicology and dissemination of both musical archives and contemporary composers’ works.

Multimodal OMR (mOMR) leverages the audio representation of the music. The audio is most relevant for reconstructing pitch and duration, but it can inform the recognition process throughout, if the system provides principled ways of incorporating this information.

The goal of the project is to use the audio modality to improve OMR. We will apply end-to-end multimodal deep learning methods, which have recently shown promise with multimodal live score following; the challenge is adapting the models from the score following objective to OMR. The project will utilize recently established international contacts with CP JKU Linz.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form