Main Research Interests


Optical Music Recognition: I've recently published the MUSCIMA++ dataset.

Music Information Retrieval

Bayesian models, non-parametric Bayesian models

Neural networks for text modeling

Multimodal (text/image) models


Ribosomal RNA secondary structure prediction

Sentiment analysis

Unsupervised morphology

Generative parsing


rRNA Secondary Structure Prediction (GAUK 550214), 2015 - 2016 (PI).

Curriculum Vitae

My CV is available here: CV_Hajic.pdf

Selected Bibliography

Jan Hajič jr., Pavel Pecina.: The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, Osaka Prefecture University, Kyoto, Japan, November 2017. pp. ?? [accepted manuscript] [pdf]

Hajič jr., J. & Pecina, P.: Detecting Noteheads with ConvNets and Bounding Box Regression. Technical report, to appear in ArXiv e-prints, 2017 [pdf]

Hajič jr., J. & Pecina, P.: In Search of a Dataset for Handwritten Optical Music Recognition: Introducing MUSCIMA++
ArXiv e-prints, 1703.04824, 2017 [pdf]

Hajič jr., J.; Novotný, J.; Pecina, P. & Pokorný, J.: Further Steps towards a Standard Testbed for Optical Music Recognition. Proceedings of the 17th International Society for Music Information Retrieval Conference, New York University, 2016, 157-163 [pdf]

Straka, M.; Hajič, J.; Straková, J. & Hajič jr., J.: Parsing Universal Dependency Treebanks using Neural Networks and Search-Based Oracle. 14th International Workshop on Treebanks and Linguistic Theories (TLT 2015), IPIPAN, 2015, 208-220

Hajič jr., J. & Pecina, P.: Matching Illustrative Images to “Soft News” Articles. In: UFAL WDS 2015 (Conference of PhD Students in Mathematical Linguistics), Institute of Formal and Applied Linguistics, Charles University in Prague, 2015, 49-56

Veselovská, K.; Hajič jr., J. & Šindlerová, J.: Subjectivity Lexicon for Czech: Implementation and Improvements.
Journal for Language Technology and Computational Linguistics, German Society for Computational Linguistics and Language Technology, 2014, 29, 47-61 

Veselovská, K. & Hajič jr., J.: Why Words Alone Are Not Enough: Error Analysis of Lexicon-based Polarity Classifier for Czech. Proceedings of the 6th International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing, 2013, 1-5 [pdf]

Veselovská, K.; Hajič jr., J. & Šindlerová, J.: Creating Annotated Resources for Polarity Classification in Czech
Proceedings of the 11th Conference on Natural Language Processing, Schriftenreihe der Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI), 2012


I am a PhD student at ÚFAL, writing my thesis under RNDr. Pavel Pecina on the topic of Neural Network Models for Intepretation of Multimodal Data. This work is done for the CEMI project. In 2016, I started focusing on Optical Music Recognition (with the eventual goal of applying these multimodal models). I am generally interested in music informatics: if you are a student and have interest in music, especially machine learning for musical applications, I will be happy to hear about it!

My Mgr. thesis, also under RNDr. Pecina, was on the topic of automatically selecting images for news articles. This work was also done for the CEMI project. My thesis is available here: Matching Images to Texts

I have previously worked on the SEANCE project on Sentiment Analysis, with my Bc. thesis and in the following years.