Main Research Interests

machine learning, lexical semantics, lexical disambiguation


  • CEMI -- Center for large-scale multi-modal data interpretation
    The project aims at exploiting large collections of unlabeled multi-modal data, mainly video footage, to further state-of-the-art in video, audio and natural language understanding, interpretation, annotation and retrieval by combining unsupervised and semi-supervised learning.
    • Participants: Czech Technical University, Charles University in Prague, Masaryk University, University of West Bohemia
  • SPR -- Semantic Pattern Recognition -- semantic analysis of words in contexts
    This project draws on the Corpus Pattern Analysis coined by Patrick W. Hanks and on his Pattern Dictionary of English Verbs (PDEV). We find very appealing the idea of semantic analysis of words (verbs) in contexts and seek to explore its application in NLP tasks. Among other things we have been investigating the option of having PDEV as gold-standard data for statistical machine learning.



Selected Bibliography


  • Sudarikov, Roman; Bojar, Ondřej; Dušek, Ondřej; Holub, Martin; Kríž, Vincent: Verb Sense Disambiguation in Machine Translation. In HyTra-6 2016: Sixth Workshop on Hybrid Approaches to Translation, pp. 42--50, Stroudsburg, PA, USA, 2016.
  • Kríž, Vincent; Holub, Martin; Pecina, Pavel: Feature Extraction for Native Language Identification Using Language Modeling. In RANLP 2015: Proceedings of Recent Advances in Natural Language Processing, pp. 298--306, Hisarja, Bulgaria, 2015.
  • Hladká, Barbora; Holub, Martin: A Gentle Introduction to Machine Learning for Natural Language Processing (How to start – 16 practical steps). In Language and Linguistics Compass, Vol. 9, No. 2, pp. 55--76. John Wiley & Sons Inc., 2015. 
  • Hladká, Barbora; Holub, Martin; Kríž, Vincent: Feature Engineering in the NLI Shared Task 2013: Charles University Submission Report. In NAACL HLT 2013 - BEA workshop: Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications. Atlanta, Georgia, USA, 2013.
  • Cinková, Silvie; Holub, Martin; Kríž, Vincent: Managing Uncertainty in Semantic Tagging. In EACL 2012: Proceedings of 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, 2012.
  • Holub, Martin; Kríž, Vincent; Cinková, Silvie; Bick, Eckhard: Tailored Feature Extraction for Lexical Disambiguation of English Verbs Based on Corpus Pattern Analysis. In COLING 2012: Proceedings of the 24th International Conference on Computational Linguistics (Coling 2012), Mumbai, India, 2012.
  • Cinková, Silvie; Holub, Martin; Rambousek, Adam; Smejkalová, Lenka: A database of semantic clusters of verb usages. In LREC 2012: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), İstanbul, Turkey, 2012.
  • Cinková, Silvie; Holub, Martin; Rychlý, Pavel; Smejkalová, Lenka; Šindlerová, Jana: Can Corpus Pattern Analysis Be Used in NLP? In TSD 2010: Text, Speech and Dialogue. 13th International Conference, TSD 2010. Berlin / Heidelberg, 2010.