Pavel Straňák

office
425
email
pavel.stranak@mff.cuni.cz
phone
+420 951 554 247
fax
+420 257 223 293
address
Malostranské náměstí 25
118 00 Praha 1
Czech Republic

Main Research Interests

Lexical semantics, computational lexicography, reliability of annotations, machine translation, application of NLP technology in everyday life

Projects

  • CLARIN Plus: Enhancing CLARIN (H2020-INFRADEV-1-2015-1-676529)
  • I do some other work in the CLARIN project that aims at making linguistic data and processing tools more available to users, especially (but not only) humanities' scholars.
  • PARSEME: PARSing and Multi-word Expressions (ICT COST Action)
  • Korektor – an open source contextual spell-checker and diacritics generation system.

Curriculum Vitae

Education

  • 2010 - Ph.D. in Computational Linguistics, Charles University in Prague.
  • 2001 - Mgr. (equiv. of M.A.) in Czech Philology, University of Ostrava.

Teaching

Selected Bibliography

  1. Eduard Bejček, Jan Hajič, Pavel Straňák, Zdeňka Urešová (2017): Extracting Verbal Multiword Data from Rich Treebank Annotation. In: Proceedings of the 15th International Workshop on Treebanks and Linguistic Theories (TLT 15), pp. 13-24, Indiana University, Bloomington, Bloomington, IN, USA (pdf, biblio, batt1.pdf, batt2.pdf, bibtex)
  2. Paweł Kamocki, Pavel Straňák, Michal Sedlák (2016): The Public License Selector: Making Open Licensing Easier. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 1-10, European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1 (biblio, batt1.pdf, obd, bibtex)
  3. Natalia Klyueva, Pavel Straňák (2016): Improving Corpus Search via Parsing. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 2862-2866, European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1 (pdf, biblio, batt1.pdf, obd, bibtex)
  4. Sarah Berenji Ardestani, Carl Johan Håkansson, Erwin Laure, Ilja Livenson, Pavel Straňák, Emanuel Dima, Dennis Blommesteijn, Mark van de Sanden (2015): B2SHARE: An Open eScience Data Sharing Platform. In: 2015 IEEE 11th International Conference on e-Science (e-Science), pp. 448-453, IEEE computer society, Munich, Germany, ISBN 978-1-4673-9325-6 (url, biblio, batt1.pdf, obd, bibtex)
  5. Loganathan Ramasamy, Alexandr Rosen, Pavel Straňák (2015): Improvements to Korektor: A case study with native and non-native Czech. In: Proceedings of the 15th conference ITAT 2015: Slovenskočeský NLP workshop (SloNLP 2015), pp. 73-80, CreateSpace Independent Publishing Platform, Praha, Czechia, ISBN 978-1515120650 (biblio, obd, bibtex)
  6. Ondřej Bojar, Vojtěch Diatka, Pavel Rychlý, Pavel Straňák, Vít Suchomel, Aleš Tamchyna, Daniel Zeman (2014): HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 3550-3555, European Language Resources Association, Reykjavík, Iceland, ISBN 978-2-9517408-8-4 (pdf, biblio, batt1.pdf, batt2.pdf, obd, bibtex)
  7. Eduard Bejček, Pavel Straňák, Pavel Pecina (2013): Syntactic Identification of Occurrences of Multiword Expressions in Text using a Lexicon with Dependency Structures. In: The 9th Workshop on Multiword Expressions (MWE 2013), pp. 106-115, Association for Computational Linguistics, Atlanta, Georgia, USA, ISBN 978-1-937284-47-3 (pdf, biblio, batt1.pdf, batt2.zip, batt3.pdf, obd, bibtex)
  8. Marie Mikulová, Eduard Bejček, Jiří Mírovský, Anna Nedoluzhko, Jarmila Panevová, Lucie Poláková, Pavel Straňák, Magda Ševčíková, Zdeněk Žabokrtský (2013): Úpravy a doplňky Pražského závislostního korpusu (Od PDT 2.0 k PDT 3.0) (technical report). ÚFAL MFF UK (biblio, batt1.pdf, bibtex)
  9. Marie Mikulová, Eduard Bejček, Jiří Mírovský, Anna Nedoluzhko, Jarmila Panevová, Lucie Poláková, Pavel Straňák, Magda Ševčíková, Zdeněk Žabokrtský (2013): From PDT 2.0 to PDT 3.0 (Modifications and Complements) (technical report). ÚFAL MFF UK (biblio, batt1.pdf, bibtex)
  10. Eduard Bejček, Jarmila Panevová, Jan Popelka, Pavel Straňák, Magda Ševčíková, Jan Štěpánek, Zdeněk Žabokrtský (2012): Prague Dependency Treebank 2.5 -- a revisited version of PDT 2.0. In: Proceedings of the 24th International Conference on Computational Linguistics (Coling 2012), pp. 231-246, Coling 2012 Organizing Committee, Mumbai, India (biblio, batt1.pdf, batt2.pdf, obd, bibtex)
  11. Michal Richter, Pavel Straňák, Alexandr Rosen (2012): Korektor – A System for Contextual Spell-checking and Diacritics Completion. In: Proceedings of the 24th International Conference on Computational Linguistics (Coling 2012), pp. 1-12, Coling 2012 Organizing Committee, Mumbai, India (biblio, batt1.pdf, obd, bibtex)
  12. Eduard Bejček, Pavel Straňák, Daniel Zeman (2011): Influence of Treebank Design on Representation of Multiword Expressions. In: Lecture Notes in Computer Science, ISSN 0302-9743, 6608, pp. 1-14 (url, biblio, batt1.pdf, obd, bibtex)
  13. Eduard Bejček, Pavel Straňák (2010): Annotation of Multiword Expressions in the Prague Dependency Treebank. In: Language Resources and Evaluation, ISSN 1574-020X, vol. 44, no. 1-2, pp. 7-21 (url, biblio, batt1.pdf, obd, bibtex)
  14. Ondřej Bojar, Pavel Straňák, Daniel Zeman (2010): Data Issues in English-to-Hindi Machine Translation. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), pp. 1771-1777, European Language Resources Association, Valletta, Malta, ISBN 2-9517408-6-7 (biblio, batt1.pdf, batt2.odp, batt3.pdf, obd, bibtex)
  15. Pavel Straňák (2010): Annotation of Multiword Expressions in The Prague Dependency Treebank (PhD thesis). Univerzita Karlova v Praze, Prague, Czech Republic (biblio, batt1.pdf, batt2.pdf, bibtex)
  16. Pavel Straňák, Jan Štěpánek (2010): Representing Layered and Structured Data in the CoNLL-ST Format. In: Proceedings of the Second International Conference on Global Interoperability for Language Resources, pp. 143-152, City University of Hong Kong, Hong Kong, China, ISBN 978-962-442-323-5 (biblio, batt1.pdf, batt2.pdf, obd, bibtex)
  17. Eduard Bejček, Pavel Straňák, Jan Hajič (2009): Finalising Multiword Annotations in PDT. In: Proceedings of 8th Treebanks and Linguistic Theories Workshop (TLT), pp. 17-25, Università Cattolica del Sacro Cuore, Milano, Italy, ISBN 978-88-8311-712-1 (biblio, batt1.pdf, batt2.pdf, batt3.pdf, bibtex)
  18. Ondřej Bojar, Pavel Straňák, Daniel Zeman, Gaurav Jain, Michal Hrušecký, Michal Richter, Jan Hajič (2009): English-Hindi Translation – Obtaining Mediocre Results with Bad Data and Fancy Models. In: Proceedings of ICON 2009: 7th International Conference on Natural Language Processing, pp. 316-321, Macmillan Publishers, India, Hyderabad, India, ISBN 978-023-032-845-7 (biblio, batt1.pdf, batt2.pdf, bibtex)
  19. Jan Hajič, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antònia Martí, Lluís Màrquez, Adam Meyers, Joakim Nivre, Sebastian Padó, Jan Štěpánek, Pavel Straňák, Mihai Surdeanu, Nianwen Xue, Yi Zhang (2009): The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL): Shared Task, pp. 1-18, Association for Computational Linguistics, Boulder, CO, USA, ISBN 978-1-932432-29-9 (url, biblio, batt1.pdf, obd, bibtex)
  20. Eduard Bejček, Pavel Straňák (2008): Anotace víceslovných výrazů v Pražském závislostním korpusu. In: Grammar & Corpora / Gramatika a korpus 2007, pp. 143-149, Academia, Praha, ISBN 978-80-200-1634-8 (biblio, batt1.pdf, obd)
  21. Eduard Bejček, Pavel Straňák, Pavel Schlesinger (2008): Annotation of Multiword Expressions in the Prague Dependency Treebank. In: IJCNLP 2008 Proceedings of the Third International Joint Conference on Natural Language Processing, pp. 793-798, International Institute of Information Technology, Hyderabad, India (biblio, batt1.pdf, batt2.pdf, obd, bibtex)
  22. Ondřej Bojar, Pavel Straňák, Daniel Zeman (2008): English-Hindi Translation in 21 Days. In: Proceedings of the 6th International Conference On Natural Language Processing (ICON-2008) NLP Tools Contest, International Institute of Information Technologies, Hyderabad, Pune, India (url, biblio, batt1.ppt, batt2.pdf, bibtex)
  23. Eduard Bejček, Petra Möllerová, Pavel Straňák (2006): The lexico-semantic annotation of PDT: Some results, problems and solutions. In: Lecture Notes in Computer Science, ISSN 0302-9743, 4188, pp. 21-28 (url, biblio, batt1.pdf, batt2.pdf, bibtex)
  24. Pavel Straňák (2005): Review of Leonard Talmy: Toward a Cognitive Semantics, Volume I, Concept Structuring Systems (review). In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 83, pp. 85-86 (biblio, batt1.pdf, bibtex)
  25. Jan Hajič, Martin Holub, Marie Hučínová, Martin Pavlík, Pavel Pecina, Pavel Straňák, Pavel Šidák (2004): Validating and Improving the Czech WordNet via Lexico-Semantic Annotation of the Prague Dependency Treebank. In: Proceedings of LREC 2004, pp. - - (biblio, bibtex)
  26. Martin Holub, Pavel Straňák (2003): Approaches to Building Semantic Lexicons. In: WDS'03 Proceedings of Contributed Papers, Part I, pp. 173--178, MATFYZPRESS, Prague, ISBN 80-86732-18-5 (biblio, bibtex)

Students

Defended

Other Activities

  • I am a scientific secretary of LINDAT/CLARIN.
  • I am also on the editoral board of UFAL's Publishing House that publishes a monograph-oriented series "Studies in Computational and Theoretical Linguistics".

Past Activities

Data

I have participated on production of several datasets, all of which are freely available in the LINDAT-Clarin Repository.