Rudolf Rosa
Main Research Interests
- Robopsychologist (looking for linguistic structures in Deep Neural Networks)
- Natural language generation, generative art
- Popularization of science
- Contact me if you are interested in me doing a popularization lecture/workshop/seminar for your audience
- Kontaktujte mě, pokud máte zájem o popularizační přednášku/workshop/seminář pro vaše publikum
- In the past:
- Automatic post-editing of Machine translation
- Morphology, derivations
- Dependency parsing
- Unsupervised and semi-supervised methods, especially cross-lingual and multilingual
Projects
THEaiTRE: automatic generation of theatre play scripts
In cooperation with Švandovo theatre, DAMU, and Tomáš Studeník, we are working on a system for automatic generation of theatre play scripts.
Linguistic Structure Representation in Neural Networks (LSD)
I am now working on a GAČR grant of David Mareček, called LSD, where we are trying to look at what linguistic structures can be found hidden inside of neural networks.
Unsupervised morphology induction
Together with Zdeněk Žabokrtský, we are trying to handle morphology in an unsupervised way, e.g. to find lemmas for word forms, to separate derivation from inflection, etc.
Past
- Cross-lingual Syntactic Parsing, i.e. training a parser on one language and applying it to another language. This was my dissertation, and I also had a GAUK grant for that.
- Pohádkové dítě / Fairytale Child chatbot. A simple console chatbot that wants to hear a fairly tale from you! / Jednoduchý konzolový chatbot, který si od vás chce nechat vyprávět pohádku!
- HimL focused on semantically sane translation of medical texts from English to Czech, German, Romanian and Polish.
- QTLeap was a project aimed at significantly improving the quality of machine translation using deep language processing approaches (also see TectoMT).
- I was a member of the HamleDT group, which was a project of harmonizing dependency treebanks for various languages, later evolving and merging into the Universal Dependencies project (which I am also an official memebr of).
- Depfix is a system for automatic post-editing of machine translation outputs. It was developed as a part of the Faust project. It was later succeeded by MLFix, by Dušan Variš.
- MSTperl is a reimplementation of the Maximum spanning tree dependency parser (McDonald et al., 2005) in Perl. It is tuned for Czech and has several advanced features that are useful for parsing the machine-translated sentences by Depfix. It also has some features for delexicalized parser transfer. I do not use it anymore -- I switched to Parsito and UDPipe.
Curriculum Vitae
You can download my CV either in English or in Czech.
Teaching
List of classesNAIL127NPFL092 NLP TechnologyNPFL118 Natural language processing on computational clusterNPFL120 Multilingual Natural Language ProcessingNPRG045
Teaching
Stránka cvičení Programování 1
I am happy to supervise NLP projects (bachelor theses, master theses, etc.), have a look at Project Ideas.
Warning: Reading scientific literature is my weak point, so it will be mostly your responsibility to review existing literature relevant to the topic!Rád povedu projekty v oblasti zpracování přirozeného jazyka (Bc. a Mgr. práce apod.), mrkněte na Náměty na projekty.
Varování: Čtení odborné literatury není mou silnou stránkou, takže rešerše relevantních článků budou především Vaší zodpovědností!
Selected Bibliography
You can use Google Scholar or Semantic Scholar, and I also have here an automated static listing of my publications.
Students
Bachelor students
- Yuliya Yamalutdinova: Detection of contradictions in pairs of texts in Kazakh (Detekce kontradikce mezi dvěma texty v kazaštině) — defended 2019
- Zuzana Svobodová: Generating text descriptions of journeys in a map (Generování textového popisu trasy v mapě) — defended 2020
- Jan Matějka: Generator of computer descriptions (Generátor popisků počítačových sestav a notebooků) — defended 2020
- Lukáš Chaloupský: Automatic generation of images and their usage as training data (Automatické generování obrázků a jejich využití jako trénovacích dat) — defended 2020
- Ondřej Michálek: Biblical paraphrasing (Biblické parafrázování) — defended 2020
- František Trebuňa: Generating text from structured data (Generování textu ze strukturovaných dat) — defended 2021
- Peter Grajcar: Generating a drawing according to a textual description (Generování kresby dle slovního popisu) — defended 2021
- Daniela Jurášová: Automatické generovanie hrebeňoviek (Automatic generation of crosswords) — defended 2021
- Zuzana Urbanová: Quote Attribution and Character Networks in Novels (Přiřazování mluvčích a vztahy mezi postavami v knihách) — defended 2021
- Dominik Prokop: Generování výsledků tenisových dvouher (Generation of tennis singles results) — defended 2022
- Viktor Bujko: Extrakcia informácií z reportov o leteckých incidentoch (Information extraction from aviation incident reports) — defended 2022
- X Y: Biblical Chatbot
- X Y: Generating textual weather forecasts from structured data
- X Y: Generating DnD maps from textual deascriptions
- X Y: Czech morphological guesser for OOV words
Master students
- Abhishek Agrawal: Eye-tracking features in syntactic parsing (Rysy z eye-trackeru v syntaktickém parsingu) — defended 2020 (paper on Lantern 2020)
- Lukáš Chaloupský: Automatic generation of medical reports from chest X-rays in Czech — defended 2022
- Goutham Venkatesh: Modelling character personalities within THEaiTRE project (research project)
- Rishu Kumar: Summarization of theatre scripts within THEaiTRE project (research project)
Interns
- Tomasz Limisiewicz: Analyzing syntactic features of BERT self-attentions — completed in 2019 (paper in findings of EMNLP 2020)
Other
I was one of the main organizers of the Slovakoczech NLP workshop for students and early-stage researchers -- see SloNLP 2015, SloNLP 2016, SloNLP 2017, SloNLP 2018, SloNLP 2019.
I am the student ambassador of ÚFAL, so feel free to contact me with anything related to studying here. / Jsem studentským ambasadorem ÚFALu -- zeptejte se mě na cokoliv ohledně studia lingvistiky na Matfyzu!
My ORCID ID is 0000-0003-4908-6127.
My Erdös number is 4 (me - Jaroslava Hlaváčová - Petr Savický - Zsolt Tuza - Paul Erdös)
Můj herní index je 73 (aktuální po Navíc 2021), náš šifrovací tým se jmenuje Divize nulou.