Rudolf Rosa

office
409
office hours
13:00 - 19:00 Tuesday
13:00 - 19:00 Wednesday
13:00 - 19:00 Thursday
email
rosa@ufal.mff.cuni.cz
phone
4273
address
Malostranské náměstí 25
118 00 Praha 1
Czech Republic

Main Research Interests

  • Automatic post-editing of Machine translation
  • Dependency parsing
  • Unsupervised and semi-supervised methods
  • Machine translation

Visit my Research MicroBlog to see what I am currently up to!

Projects

QTLeap

Currently, my main project is QTLeap, a project aimed at significantly improving the quality of machine translation using deep language processing approaches.

Khresmoi

I am also currently employed on the Khresmoi project, focusing of machine translation of medical texts for information retrieval (translation of search queries and resulting snippets).

HamleDT

I am starting to work on HamleDT, which is a project of harmonizing dependency treebanks for various languages.

Depfix

Depfix is a system for automatic post-editing of machine translation outputs. It was developed as a part of the Faust project and its development will probably continue as a part of the QTLeap project.

MSTperl

MSTperl is a reimplementation of the Maximum spanning tree dependency parser (McDonald et al., 2005) in Perl. It is tuned for Czech and has several advanced features that are useful for parsing the machine-translated sentences by Depfix.

Discovering the structure of natural language sentences by unsupervised and semi-supervised methods

(Odhalování struktury vět přirozeného jazyka pomocí neřízených a částečně řízených metod)

This is my topic for PhD studies (and my dissertation). I am only starting my work on the topic. However, my work on HamleDT will also be a part of my studies, as it is related to the topic.

Curriculum Vitae

You can download my CV either in English or in Czech.

Selected Bibliography

For each publication, there is also a link to the paper in PDF, and also to presentation(s) and/or poster(s).

However, the names of the files are always something like batt1.pdf and I cannot change that as it gets generated automatically, so you have to try out the files to see which is which...

Or, you can follow the links named "biblio", which lead to a page of the publication with detailed information about it and a more user-friendly list of files for download.

  1. Pavel Pecina, Ondřej Dušek, Lorraine Goeuriot, Jan Hajič, Jaroslava Hlaváčová, Gareth J.F. Jones, Liadh Kelly, Johannes Leveling, David Mareček, Michal Novák, Martin Popel, Rudolf Rosa, Aleš Tamchyna, Zdeňka Urešová (2014): Adaptation of machine translation for multilingual information retrieval in medical domain. In: Artificial Intelligence in Medicine, ISSN 0933-3657, 5 February 2014, pp. 1-25 (url, biblio, bibtex)
  2. Niraj Aswani, Thomas Beckers, Erich Birngruber, Célia Boyer, Andreas Burner, Jakub Bystroň, Khalid Choukri, Sarah Cruchet, Hamish Cunningham, Jan Dědek, Ljiljana Dolamic, René Donner, Ondřej Dušek, Sebastian Dungs, Ivan Eggel, Antonio Foncubierta, Norbert Fuhr, Adam Funk, Alba García Seco de Herrera, Arnaud Gaudinat, Georgi Georgiev, Julien Gobeill, Lorraine Goeuriot, Paz Gomez, Mark A. Greenwood, Manfred Gschwandtner, Allan Hanbury, Jan Hajič, Jaroslava Hlaváčová, Markus Holzer, Gareth J.F. Jones, Blanca Jordán, Matthias Jordan, Klemens Kaderk, Franz Kainberger, Liadh Kelly, Sascha Kriewel, Marlene Kritz, Georg Langs, Nolan Lawson, Johannes Leveling, David Mareček, Dimitrios Markonis, Iván Martínez, Vassil Momtchev, Alexandre Masselot, Hélène Mazo, Henning Müller, Michal Novák, Johann Petrak, João Palotti, Pavel Pecina, Konstantin Pentchev, Deyan Peychev, Natalia Pletneva, Martin Popel, Diana Pottecher, Angus Roberts, Rudolf Rosa, Patrick Ruch, Alexander Sachs, Matthias Samwald, Priscille Schneller, Veronika Stefanov, Aleš Tamchyna, Miguel Angel Tinte, Zdeňka Urešová, Alejandro Vargas, Dina Vishnyakova (2013): Khresmoi Professional: Multilingual Semantic Search for Medical Professionals. In: Proceedings of the ACM SIGIR Workshop on Health Search and Discovery: Helping Users and Advancing Medicine, pp. 31-34, Microsoft Research, Cambridge, UK (url, biblio, batt1.pdf, obd, bibtex)
  3. Ondřej Bojar, Rudolf Rosa, Aleš Tamchyna (2013): Chimera – Three Heads for English-to-Czech Translation. In: Proceedings of the Eight Workshop on Statistical Machine Translation, pp. 92-98, Association for Computational Linguistics, Sofija, Bulgaria, ISBN 978-1-937284-57-2 (url, biblio, batt1.pdf, batt2.pdf, obd, bibtex)
  4. Rudolf Rosa (2013): Automatic post-editing of phrase-based machine translation outputs (masters thesis). Charles University in Prague, Faculty of Mathematics and Physics, Praha, Czechia (biblio, batt1.pdf, batt2.pdf, batt3.pdf, batt4.pdf, bibtex)
  5. Rudolf Rosa, David Mareček, Aleš Tamchyna (2013): Deepfix: Statistical Post-editing of Statistical Machine Translation Using Deep Syntactic Analysis. In: 51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop, pp. 172-179, Association for Computational Linguistics, Sofija, Bulgaria, ISBN 978-1-937284-53-4 (url, biblio, batt1.pdf, batt2.pdf, batt3.pdf, obd, bibtex)
  6. Aleš Tamchyna, Ondřej Dušek, Rudolf Rosa, Pavel Pecina (2013): MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 100, pp. 31-40 (pdf, biblio, batt1.pdf, batt2.pdf, obd, bibtex)
  7. Rudolf Rosa, Ondřej Dušek, David Mareček, Martin Popel (2012): Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors. In: Proceedings of Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-6), ACL, pp. 39-48, Association for Computational Linguistics, Jeju, Korea, ISBN 978-1-937284-38-1 (pdf, biblio, batt1.pdf, batt2.pdf, obd, bibtex)
  8. Rudolf Rosa, David Mareček (2012): Dependency Relations Labeller for Czech. In: Text, Speech and Dialogue: 15th International Conference, TSD 2012. Proceedings, Lecture Notes in Computer Science, ISSN 0302-9743, 7499, pp. 256-263, Springer Verlag, Berlin / Heidelberg, ISBN 978-3-642-32789-6 (url, biblio, batt1.pdf, batt2.pdf, obd, bibtex)
  9. Rudolf Rosa, David Mareček, Ondřej Dušek (2012): DEPFIX: A System for Automatic Correction of Czech MT Outputs. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, pp. 362-368, Association for Computational Linguistics, Montréal, Canada, ISBN 978-1-937284-20-6 (pdf, biblio, batt1.pdf, batt2.pdf, obd, bibtex)
  10. Ondřej Hálek, Rudolf Rosa, Aleš Tamchyna, Ondřej Bojar (2011): Named Entities from Wikipedia for Machine Translation. In: Information Technologies – Applications and Theory, pp. 23-30, Univerzita Pavla Jozefa Šafárika v Košiciach, Košice, Slovakia, ISBN 978-80-89557-02-8 (biblio, batt1.pdf, batt2.pdf, batt3.pdf, obd, bibtex)
  11. David Mareček, Rudolf Rosa, Petra Galuščáková, Ondřej Bojar (2011): Two-step translation with grammatical post-processing. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 426-432, Association for Computational Linguistics, Edinburgh, UK, ISBN 978-1-937284-12-1 (url, biblio, batt1.pdf, batt2.pdf, obd, bibtex)

Náměty na zápočťáky, bakalářky, diplomky a podobné věci (v češtině).

My Erdös number is ≤ 7 (me - Jan Hajič - Jason Eisner - Jean-Marc Champarnaud - Gérard Duchamp - Jean-Yves Thibon - Persi Diaconis - Paul Erdös)