Jan Hajič jr.

office: 424
email: hajicj@ufal.mff.cuni.cz
address: Malostranské náměstí 25
118 00 Praha 1
Czech Republic

Main Research Interests

I run the Prague Music Computing Group, which is where you'll find more frequent updates on my activities.

My research interests are in the field of Music Information Retrieval, perhaps better described as Computational Music Processing, and Computational Musicology. With a musical and musicological background aside from computer science, I emphasise interdisciplinarity. Besides my own research and reasearch leadership, I love supervising students on these topics!

My most interesting topic is computational analysis of Gregorian Chant, a key part of European cultural identity from the early Middle Ages until the 1970s. Most significantly, I showed the development of Gregorian melodies can be traced via methods from bioinformatics, and worked on questions of what is an appropriate music theory for chant and its eight modes. I find the intersection between chant scholarship and ecology also fascinating. In 2023-2024 I was the principal investigator of the Genome of Melody project, funded by the John Templeton Foundation through the Cultural Evolution Society (it was housed at the Masaryk Institute and Archives of the Czech Academy of Sciences). At UFAL, I pursue this axis of research through my role as the Co-Investigator and head of Chant Analytics team of the Digital Analysis of Chant Transmission project of the Canadian Social Sciences and Humanities Research Council.

More conservatively, I work on Optical Music Recognition -- currently as the Principal Investigator for OmniOMR, a 5-year applied research project of the Czech Ministry of Culture, together with a team from the Moravian Library in Brno. I have previously done some foundational work for the field: the Understanding OMR paper, and the MUSCIMA++ dataset; our more recent work focused on practical recognition of pianoform notation, and we are extending this to manuscripts with domain adaptation techniques such as data synthesis.

Besides these main areas, however, I am quite happy to supervise theses on other music-related topics (and ideally turn them into publications, like this ISMIR 2025 paper on a model of difficulty for the saxophone).

I also play the harpsichord.

Older things:

My dissertation on OMR. Within my PhD, I published the MUSCIMA++ dataset.

Ribosomal RNA secondary structure prediction: PI of the GAUK project that created the rPredictor database, based on the ideas of Josef Pánek from the Institute of Microbiology of the Czech Academy of Sciences. (2014-2015) This experience with bioinformatics came in very handy: nine years later, it led to the Genome of Melody project!

Projects

Ongoing:

OmniOMR - optical music recognition using machine learning for digital libraries.
2023-2027, Principal Investigator
NAKI III programme grant no. DH23P08OVV008, supported by the Ministry of Culture of the Czech Republic

Digital Analysis of Chant Transmission
2023-2029. Co-Investigator, leader of the Chant Analytics team.
SSHRC Partnership Grant (Canada).

Finished:

Genome of Melody (Cultural Evolution Transformation Fund, supported by John Templeton Foundation grant #61593), 2023-2024 (PI).
Using phylogenetics to study the development of Gregorian Chant melodies.

Multimodal Optical Music Recognition (GAUK 1444217), 2017 - 2019 (PI).
A summarizing technical report on the project (which basically re-uses text from my dissertation, whenever relevant, and adds things that were not part of my thesis but were done on the project) is here: GAUK1444217_techreport.pdf

Convolutional Neural Networks for Optical Music Recognition (GAUK 170217), 2017 - 2018 (Co-investigator)

rRNA Secondary Structure Prediction (GAUK 550214), 2015 - 2016 (PI).

Teaching

NPFL144 Computational Music Processing
NPFL145 Practicum in Computational Music Processing
NPRG045 Ročníkový projekt
Thesis supervision on music-related topics.

Selected Bibliography

Google Scholar
ORCID: 0000-0002-9207-567X
Scopus ID: 57193414196
Researcher ID: P-4278-2017
Jan Hajič Jr, Vojtěch Lanz, Gustavo A. Ballen; Genome of melody: applying bioinformatics to study the evolution of Gregorian chant. Philos Trans R Soc Lond B Biol Sci 4 December 2025; 380 (1940): 20240274. https://doi.org/10.1098/rstb.2024.0274

Calvo-Zaragoza, J.; Hajič jr., J.; Pacha, A. Understanding Optical Music Recognition. ACM Computing Surveys 53:4, September 2020.

Hajič jr., J.; Ballén, G.; Mühlová, K.H., Vlhová-Wörner, H. Towards building a phylogeny of Gregorian Chant melodies. Proceedings of the 24th International Society for Music Information Retrieval Conference, Milan, Italy, 2023, 571-578.

Hajič jr, Jan, and Fabian C. Moss. Knowing when to stop: insights from ecology for building catalogues, collections, and corpora. In Proceedings of the 12th International Conference on Digital Libraries for Musicology, pp. 90-94. 2025.

Lanz, V.; Hajič jr., J. Text boundaries do not provide a better segmentation of Gregorian antiphons. Proceedings of the 10th International Conference on Digital Libraries for Musicology (DLfM '23). Association for Computing Machinery, New York, NY, USA, 72–76.

Jelínek, J.; Hoksza, D.; Hajič jr. J.; Pešek, J.; Drozen, J.; Hladík, T.; Klimpera, M.; Vohradský, J.& Pánek, J. rPredictorDB: a predictive database of individual secondary structures of RNAs and their formatted plots. Database: Vol.2019, 2019, 1-9.

Pacha, Alexander, Jorge Calvo-Zaragoza, and Jan Hajic Jr. Learning Notation Graph Construction for Full-Pipeline Optical Music Recognition. 20th International Society for Music Information Retrieval Conference, 2019, 75-82

Hajič jr., J.; Dorfer, M.; Widmer, G. & Pecina, P. Towards Full-Pipeline Handwritten OMR with Musical Symbol Detection by U-Nets. 19th International Society for Music Information Retrieval Conference, 2018, 225-232.

Pacha, A.; Hajič jr., J. & Calvo-Zaragoza, J. A Baseline for General Music Object Detection with Deep Learning. Applied Sciences, 2018, 8, 1488-1508. (IF: 1.61)

Dorfer, M.; Hajič jr., J.; Arzt, A.; Frostel, H. & Widmer, G. Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification. Transactions of the International Society for Music Information Retrieval, 2018, 1, 22-33.

Hajič jr., J.; Kolárová, M., Pacha, A., & Calvo-Zaragoza, J. How current optical music recognition systems are becoming useful for digital libraries. In Proceedings of the 5th International Conference on Digital Libraries for Musicology, 2018, 57-61.

Dorfer, M., Hajič jr., J. & Widmer, G. Attention as a Perspective for Learning Tempo-invariant Audio Queries. The 2018 Joint Workshop on Machine Learning for Music, Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018. *

Matthias Dorfer, Jan Hajič Jr, Andreas Arzt, Harald Frostel, Gerhard Widmer: Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification. Transactions of the International Society for Music Information Retrieval, 1 (1), 2018. [html]

Jan Hajič jr., Pavel Pecina: The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, Osaka Prefecture University, Kyoto, Japan, November 2017. [pdf]

Hajič jr., J.; Novotný, J.; Pecina, P. & Pokorný, J.: Further Steps towards a Standard Testbed for Optical Music Recognition. Proceedings of the 17th International Society for Music Information Retrieval Conference, New York University, 2016, 157-163 [pdf]

(For a full overview, see Google Scholar.)

Students

I am open to topics especially concerning music technology, song lyrics and poetry (and potentially any machine learning and/or NLP topic).

Here is a nice guide about how to write a thesis, by Martin Koutecký from IUUK.

Aside from that, here is a different guide about how to read research papers (and possibly how and why to write them),

Current students:

Adam Štefunko (PhD): Generating harmonic accompaniments (in style!)

Reut Tal (Bc.): A game interface for collecting natural human judgments on emotional content of music.

Šimon Libřický (Bc.): A computational model of difficulty for saxophone music. Previously implemented a plugin factory for MuseScore.

Jan Borecký (Mgr.): Using sound and music for orienting visually impaired players in open-world games.

Filip Ruta (Mgr.): Teaching music notation for piano via a game with generated sheet music.

Defended theses:

(Feel free to ask those people for references!)

Anna Dvořáková (Bc.): Mapping tools and network analysis for transmission of Gregorian Chant.

Emre Rasimgil (Bc.): Low-resource "infinite radio" music generation.

Patrik Backo (Bc.): Generating audio of novel drum samples.

Vojtěch Lanz defended his Mgr. thesis that experimented with nonparametric Bayesian methods to find meaningful segmentations of Gregorian chant melodies. His work was published at the Digital Libraries for Musicology conference in 2023!

Kristina Szabová defended her Bc. thesis that created software for the analysis of Gregorian chant, i.a. borrowing Multiple Sequence Alignment from bioinformatics in 2021. Her software is live at chantlab.mua.cas.cz.

Jiří Balhar defended his Bc. thesis on melody extraction from recordings of polyphonic music. Won the Dean's Award in the Bc. thesis category, published the work at a satellite event of the ISMIR 2019 conference (Late-Breaking Demo), achieved state-of-the-art results.

Marek Židek defended his Mgr. thesis on generating music a bit more cleverly using musical form derived in an unsupervised manner. Previously defended Bc. thesis on generating music with LSTMs. Both theses include significant effort in evaluation. A composition generated by his models was performed at the Microsoft DOTS conference in 2017.

Jan Výkruta (defended Bc. thesis on automatically expressively reading poetry)

Vladan Glončák (defended Mgr. thesis on exploiting syntax to identify specifically what people have an opinion about in evaluative text such as restaurant reviews.

In 2023, I completed a Masters in harpsichord performance, and came fully back to Academia with two successful project applications: for applied research in Optical Music Recognition, and for basic research on cultural evolution of Gregorian Chant (specifically, its melodies).

In 2020-2022, I worked as a digital musicology postdoc for the project Old Myths, New Facts at the Masaryk Institute and Archives of the Czech Academy of Sciences. In the meantime, I was studying the harpsichord - finishing a Bachelors in combined performance and musicology in 2021, moving on to a Masters in performance.

In 2019, I finished PhD studies at ÚFAL, writing my thesis under RNDr. Pavel Pecina on the topic of Optical Music Recognition. I am generally interested in music informatics: if you are a student and have interest in music, especially machine learning for musical applications, I will be happy to hear about it! For instance, we did some music generation (interview in Czech).

My Mgr. thesis, also under RNDr. Pecina, was on the topic of automatically selecting images for news articles. This work was also done for the CEMI project. My thesis is available here: Matching Images to Texts

I have previously worked on the SEANCE project on Sentiment Analysis, with my Bc. thesis and in the following years.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Jan Hajič jr.

Main Research Interests

Older things:

Projects

Ongoing:

Finished:

Teaching

Selected Bibliography

Students

Current students:

Defended theses: