Pavel Straňák

office
425
email
pavel.stranak@mff.cuni.cz
phone
+420 951 554 247
fax
+420 257 223 293
address
Malostranské náměstí 25
118 00 Praha 1
Czech Republic

Main Research Interests

Lexical semantics, computational lexicography, reliability of annotations, machine translation, application of NLP technology in everyday life

Projects

  • CLARIN Plus: Enhancing CLARIN (H2020-INFRADEV-1-2015-1-676529)
  • I do some other work in the CLARIN project that aims at making linguistic data and processing tools more available to users, especially (but not only) humanities' scholars.
  • PARSEME: PARSing and Multi-word Expressions (ICT COST Action)
  • Korektor – an open source contextual spell-checker and diacritics generation system.

Curriculum Vitae

Education

  • 2010 - Ph.D. in Computational Linguistics, Charles University in Prague.
  • 2001 - Mgr. (equiv. of M.A.) in Czech Philology, University of Ostrava.

Teaching

Selected Bibliography

  1. Eduard Bejček, Jan Hajič, Pavel Straňák, Zdeňka Urešová (2017): Extracting Verbal Multiword Data from Rich Treebank Annotation. In: Proceedings of the 15th International Workshop on Treebanks and Linguistic Theories (TLT 15), pp. 13-24, Indiana University, Bloomington, Bloomington, IN, USA (pdf, biblio, batt1.pdf, batt2.pdf, bibtex)
  2. Paweł Kamocki, Pavel Straňák, Michal Sedlák (2016): The Public License Selector: Making Open Licensing Easier. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 1-10, European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1 (biblio, batt1.pdf, obd, bibtex)
  3. Natalia Klyueva, Pavel Straňák (2016): Improving Corpus Search via Parsing. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 2862-2866, European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1 (pdf, biblio, batt1.pdf, obd, bibtex)
  4. Sarah Berenji Ardestani, Carl Johan Håkansson, Erwin Laure, Ilja Livenson, Pavel Straňák, Emanuel Dima, Dennis Blommesteijn, Mark van de Sanden (2015): B2SHARE: An Open eScience Data Sharing Platform. In: 2015 IEEE 11th International Conference on e-Science (e-Science), pp. 448-453, IEEE computer society, Munich, Germany, ISBN 978-1-4673-9325-6 (url, biblio, batt1.pdf, obd, bibtex)
  5. Loganathan Ramasamy, Alexandr Rosen, Pavel Straňák (2015): Improvements to Korektor: A case study with native and non-native Czech. In: Proceedings of the 15th conference ITAT 2015: Slovenskočeský NLP workshop (SloNLP 2015), pp. 73-80, CreateSpace Independent Publishing Platform, Praha, Czechia, ISBN 978-1515120650 (biblio, obd, bibtex)
  6. Ondřej Bojar, Vojtěch Diatka, Pavel Rychlý, Pavel Straňák, Vít Suchomel, Aleš Tamchyna, Daniel Zeman (2014): HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 3550-3555, European Language Resources Association, Reykjavík, Iceland, ISBN 978-2-9517408-8-4 (pdf, biblio, batt1.pdf, batt2.pdf, obd, bibtex)
  7. Eduard Bejček, Pavel Straňák, Pavel Pecina (2013): Syntactic Identification of Occurrences of Multiword Expressions in Text using a Lexicon with Dependency Structures. In: The 9th Workshop on Multiword Expressions (MWE 2013), pp. 106-115, Association for Computational Linguistics, Atlanta, Georgia, USA, ISBN 978-1-937284-47-3 (pdf, biblio, batt1.zip, batt2.pdf, batt3.pdf, obd, bibtex)
  8. Marie Mikulová, Eduard Bejček, Jiří Mírovský, Anna Nedoluzhko, Jarmila Panevová, Lucie Poláková, Pavel Straňák, Magda Ševčíková, Zdeněk Žabokrtský (2013): From PDT 2.0 to PDT 3.0 (Modifications and Complements) (technical report). In: (biblio, batt1.pdf, bibtex)
  9. Marie Mikulová, Eduard Bejček, Jiří Mírovský, Anna Nedoluzhko, Jarmila Panevová, Lucie Poláková, Pavel Straňák, Magda Ševčíková, Zdeněk Žabokrtský (2013): Úpravy a doplňky Pražského závislostního korpusu (Od PDT 2.0 k PDT 3.0) (technical report). In: (biblio, batt1.pdf, bibtex)
  10. Eduard Bejček, Jarmila Panevová, Jan Popelka, Pavel Straňák, Magda Ševčíková, Jan Štěpánek, Zdeněk Žabokrtský (2012): Prague Dependency Treebank 2.5 -- a revisited version of PDT 2.0. In: Proceedings of the 24th International Conference on Computational Linguistics (Coling 2012), pp. 231-246, Coling 2012 Organizing Committee, Mumbai, India (biblio, batt1.pdf, batt2.pdf, obd, bibtex)
  11. Michal Richter, Pavel Straňák, Alexandr Rosen (2012): Korektor – A System for Contextual Spell-checking and Diacritics Completion. In: Proceedings of the 24th International Conference on Computational Linguistics (Coling 2012), pp. 1-12, Coling 2012 Organizing Committee, Mumbai, India (biblio, batt1.pdf, obd, bibtex)
  12. Eduard Bejček, Pavel Straňák, Daniel Zeman (2011): Influence of Treebank Design on Representation of Multiword Expressions. In: Lecture Notes in Computer Science, ISSN 0302-9743, 6608, pp. 1-14 (url, biblio, batt1.pdf, obd, bibtex)
  13. Eduard Bejček, Pavel Straňák (2010): Annotation of Multiword Expressions in the Prague Dependency Treebank. In: Language Resources and Evaluation, ISSN 1574-020X, vol. 44, no. 1-2, pp. 7-21 (url, biblio, batt1.pdf, obd, bibtex)
  14. Ondřej Bojar, Pavel Straňák, Daniel Zeman (2010): Data Issues in English-to-Hindi Machine Translation. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), pp. 1771-1777, European Language Resources Association, Valletta, Malta, ISBN 2-9517408-6-7 (biblio, batt1.pdf, batt2.odp, batt3.pdf, obd, bibtex)
  15. Pavel Straňák (2010): Annotation of Multiword Expressions in The Prague Dependency Treebank (PhD thesis). Univerzita Karlova v Praze, Prague, Czech Republic (biblio, batt1.pdf, batt2.pdf, bibtex)
  16. Pavel Straňák, Jan Štěpánek (2010): Representing Layered and Structured Data in the CoNLL-ST Format. In: Proceedings of the Second International Conference on Global Interoperability for Language Resources, pp. 143-152, City University of Hong Kong, Hong Kong, China, ISBN 978-962-442-323-5 (biblio, batt1.pdf, batt2.pdf, obd, bibtex)
  17. Eduard Bejček, Pavel Straňák, Jan Hajič (2009): Finalising Multiword Annotations in PDT. In: Proceedings of 8th Treebanks and Linguistic Theories Workshop (TLT), pp. 17-25, Università Cattolica del Sacro Cuore, Milano, Italy, ISBN 978-88-8311-712-1 (biblio, batt1.pdf, batt2.pdf, batt3.pdf, bibtex)
  18. Ondřej Bojar, Pavel Straňák, Daniel Zeman, Gaurav Jain, Michal Hrušecký, Michal Richter, Jan Hajič (2009): English-Hindi Translation – Obtaining Mediocre Results with Bad Data and Fancy Models. In: Proceedings of ICON 2009: 7th International Conference on Natural Language Processing, pp. 316-321, Macmillan Publishers, India, Hyderabad, India, ISBN 978-023-032-845-7 (biblio, batt1.pdf, batt2.pdf, bibtex)
  19. Jan Hajič, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antònia Martí, Lluís Màrquez, Adam Meyers, Joakim Nivre, Sebastian Padó, Jan Štěpánek, Pavel Straňák, Mihai Surdeanu, Nianwen Xue, Yi Zhang (2009): The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL): Shared Task, pp. 1-18, Association for Computational Linguistics, Boulder, CO, USA, ISBN 978-1-932432-29-9 (url, biblio, batt1.pdf, obd, bibtex)
  20. Eduard Bejček, Pavel Straňák (2008): Anotace víceslovných výrazů v Pražském závislostním korpusu. In: Grammar & Corpora / Gramatika a korpus 2007, pp. 143-149, Academia, Praha, ISBN 978-80-200-1634-8 (biblio, batt1.pdf, obd)
  21. Eduard Bejček, Pavel Straňák, Pavel Schlesinger (2008): Annotation of Multiword Expressions in the Prague Dependency Treebank. In: IJCNLP 2008 Proceedings of the Third International Joint Conference on Natural Language Processing, pp. 793-798, International Institute of Information Technology, Hyderabad, India (biblio, batt1.pdf, batt2.pdf, obd, bibtex)
  22. Ondřej Bojar, Pavel Straňák, Daniel Zeman (2008): English-Hindi Translation in 21 Days. In: Proceedings of the 6th International Conference On Natural Language Processing (ICON-2008) NLP Tools Contest, International Institute of Information Technologies, Hyderabad, Pune, India (url, biblio, batt1.ppt, batt2.pdf, bibtex)
  23. Eduard Bejček, Petra Möllerová, Pavel Straňák (2006): The lexico-semantic annotation of PDT: Some results, problems and solutions. In: Lecture Notes in Computer Science, ISSN 0302-9743, 4188, pp. 21-28 (url, biblio, batt1.pdf, batt2.pdf, bibtex)
  24. Pavel Straňák (2005): Review of Leonard Talmy: Toward a Cognitive Semantics, Volume I, Concept Structuring Systems (review). In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 83, pp. 85-86 (biblio, batt1.pdf, bibtex)
  25. Jan Hajič, Martin Holub, Marie Hučínová, Martin Pavlík, Pavel Pecina, Pavel Straňák, Pavel Šidák (2004): Validating and Improving the Czech WordNet via Lexico-Semantic Annotation of the Prague Dependency Treebank. In: Proceedings of LREC 2004, pp. - - (biblio, bibtex)
  26. Martin Holub, Pavel Straňák (2003): Approaches to Building Semantic Lexicons. In: WDS'03 Proceedings of Contributed Papers, Part I, pp. 173--178, MATFYZPRESS, Prague, ISBN 80-86732-18-5 (biblio, bibtex)

Students

Defended

Other Activities

  • I am a scientific secretary of LINDAT/CLARIN.
  • I am also on the editoral board of UFAL's Publishing House that publishes a monograph-oriented series "Studies in Computational and Theoretical Linguistics".

Past Activities

Data

I have participated on production of several datasets, all of which are freely available in the LINDAT-Clarin Repository.