Jan Hajič jr.

Main Research Interests

I am now mostly supervising students' projects (NPRG045) and bachelor/master theses.


Optical Music Recognition: Here's my dissertation: ODEVZDANE_IPTX_2013_2_11320_0_455244_0_153947.pdf

I've recently published the MUSCIMA++ dataset.

Music Information Retrieval in general (see e.g. the defended Bc. thesis of Marek Židek).

Bayesian models, non-parametric Bayesian models

Neural networks for text modeling

Multimodal (text/image) models


Ribosomal RNA secondary structure prediction

Sentiment analysis

Unsupervised morphology

Generative parsing



Multimodal Optical Music Recognition (GAUK 1444217), 2017 - 2019 (PI).
A summarizing technical report on the project (which basically re-uses text from my dissertation, whenever relevant, and adds things that were not part of my thesis but were done on the project) is here: GAUK1444217_techreport.pdf

Convolutional Neural Networks for Optical Music Recognition (GAUK 170217), 2017 - 2018 (Co-investigator)

rRNA Secondary Structure Prediction (GAUK 550214), 2015 - 2016 (PI).

Curriculum Vitae

My CV is available here: CV_HajicJr.pdf

Selected Bibliography

Jan Hajič jr., Matthias Dorfer, Gerhard Widmer, Pavel Pecina: Towards Full-Pipeline Handwritten OMR with Musical Symbol Detection by U-Nets. In: 19th International Society for Music Information Retrieval Conference, Paris, France, 2018. [pdf]

Alexander Pacha, Jan Hajič jr., Jorge Calvo-Zaragoza: A Baseline for Musical Object Detection with Deep Learning. Applied Sciences 8 (9), 1488-1509. 2018. [pdf]

Matthias Dorfer, Jan Hajič Jr, Andreas Arzt, Harald Frostel, Gerhard Widmer: Learning Audio–Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification. Transactions of the International Society for Music Information Retrieval, 1 (1), 2018. [html]

Jan Hajič jr., Pavel Pecina: The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, Osaka Prefecture University, Kyoto, Japan, November 2017. [pdf]

Hajič jr., J. & Pecina, P.: Detecting Noteheads with ConvNets and Bounding Box Regression. Technical report, to appear in ArXiv e-prints, 2017 [pdf]

Hajič jr., J. & Pecina, P.: In Search of a Dataset for Handwritten Optical Music Recognition: Introducing MUSCIMA++
ArXiv e-prints, 1703.04824, 2017 [pdf]

Hajič jr., J.; Novotný, J.; Pecina, P. & Pokorný, J.: Further Steps towards a Standard Testbed for Optical Music Recognition. Proceedings of the 17th International Society for Music Information Retrieval Conference, New York University, 2016, 157-163 [pdf]

Straka, M.; Hajič, J.; Straková, J. & Hajič jr., J.: Parsing Universal Dependency Treebanks using Neural Networks and Search-Based Oracle. 14th International Workshop on Treebanks and Linguistic Theories (TLT 2015), IPIPAN, 2015, 208-220

Hajič jr., J. & Pecina, P.: Matching Illustrative Images to “Soft News” Articles. In: UFAL WDS 2015 (Conference of PhD Students in Mathematical Linguistics), Institute of Formal and Applied Linguistics, Charles University in Prague, 2015, 49-56

Veselovská, K.; Hajič jr., J. & Šindlerová, J.: Subjectivity Lexicon for Czech: Implementation and Improvements.
Journal for Language Technology and Computational Linguistics, German Society for Computational Linguistics and Language Technology, 2014, 29, 47-61 

Veselovská, K. & Hajič jr., J.: Why Words Alone Are Not Enough: Error Analysis of Lexicon-based Polarity Classifier for Czech. Proceedings of the 6th International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing, 2013, 1-5 [pdf]

Veselovská, K.; Hajič jr., J. & Šindlerová, J.: Creating Annotated Resources for Polarity Classification in Czech
Proceedings of the 11th Conference on Natural Language Processing, Schriftenreihe der Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI), 2012


I am open to topics especially concerning music technology, song lyrics and poetry (and potentially any machine learning and/or NLP topic).

Current students:

Kristina Szabová - NPRG045 and presumed Bc. thesis: on automatically assessing the quality of song lyrics.

Patricia Brezinová - Mgr. thesis on generating song lyrics.

Defended theses:

(Feel free to ask those people for references!)

Jiří Balhar defended his Bc. thesis on melody extraction from recordings of polyphonic music. Won the Dean's Award in the Bc. thesis category, published the work at a satellite event of the ISMIR 2019 conference (Late-Breaking Demo), currently working on a full-scale conference publication.

Marek Židek defended his Mgr. thesis on generating music a bit more cleverly using musical form derived in an unsupervised manner. Previously defended Bc. thesis on generating music with LSTMs. Both theses include significant effort in evaluation. A composition generated by his models was performed at the Microsoft DOTS conference in 2017.

Jan Výkruta (defended Bc. thesis on automatically expressively reading poetry)

Vladan Glončák (defended Mgr. thesis on exploiting syntax to identify specifically what people have an opinion about in evaluative text such as restaurant reviews.



I've recently finished PhD studies at ÚFAL, writing my thesis under RNDr. Pavel Pecina on the topic of Optical Music Recognition. I am generally interested in music informatics: if you are a student and have interest in music, especially machine learning for musical applications, I will be happy to hear about it! For instance, we did some music generation (interview in Czech).

My Mgr. thesis, also under RNDr. Pecina, was on the topic of automatically selecting images for news articles. This work was also done for the CEMI project. My thesis is available here: Matching Images to Texts

I have previously worked on the SEANCE project on Sentiment Analysis, with my Bc. thesis and in the following years.