Who we are
Institute of Formal and Applied Linguistics (ÚFAL) is one of the specialized departments at the Faculty of Mathematics and Physics, Charles University in Prague. We are a group of research scientists, postdocs, programmers, teachers and students working on a broad variety of topics connected to the dynamic field of computational linguistics.
Having a long tradition going back to 60's behind, we use our experience to reach the best results in the field of natural language processing (NLP) worldwide, as evidenced by the number of international publications as well as the books we publish. Not only are we concerned with models of language and linguistic theories, but we also work on many projects and applications put into practice by both state and private companies.
Apart from research activities, we also carry a comprehensive teaching program both for the Master's degree in both in Czech and English (Mgr., or MSc.) as well as for a doctorate (Ph.D.) in Computational Linguistics. As for the Bachelor's degree, students can choose the profile Mathematical Linguistics within the field of General Informatics. The Institute is also a member of the double-degree Master's LCT programme of the EU.
What we do
The history of ÚFAL is tightly related to the development of the Functional generative description, an influential linguistic framework invented by Petr Sgall et al. The theory treats the sentence as a system of interlinked layers: phonological, morphematical, morphonological, analytical (surface syntax) and tectogrammatical (deep syntax). On the basis of this well-elaborated assumption, the team of ÚFAL has built the whole family of dependency treebanks usable not only in the widely-solved task of the machine translation.
The oldest and biggest treebank is the Prague Dependency Treebank, having a large number of users all around the world. The latest version 2.5 has been adapted for the current computational linguistics research needs. The corpus itself uses the latest annotation technology and the software tools for corpus search, annotation and language analysis are included as its components. Below you can see the scheme of a typical PDT sentence:
You can also try our smart spell-checker and diacritics restorer/remover.
Machine translation (MT) is a hot topic of research at our department. ÚFAL regularly participates in competitions in MT at the Workshop on Statistical Machine Translation. According to the latest evaluation, our Chimera is currently the most advanced MT system for English→Czech in the world, beating even Google Translate.
Chimera is a combination of two fundamentally different approaches: a statistical system Moses, and our own linguistically-oriented system TectoMT, which builds on the strong linguistic theory of Functional generative description and combines it with state-of-the art machine learning methods.
Try an online demo of MT for the project Khresmoi (MT system specialized for translating search queries in the medical domain).
Statistical dialogue systems
Spoken dialogue systems are a combination of very complex tasks in NLP – they require high-quality speech recognition of user’s input, advanced modelling of semantics and dialogue status and speech synthesis of the output. We have an active research group within the project Vystadial. The goal of the project is to study and improve statistical methods for learning of statistical models used in complex dialogue systems. Thanks to the Vystadial team, you can find your transport connection within Prague! Call our dialogue system ALEX for free (in Czech): 800 899 998.
We were also involved in the project Companions. In this project, we created an avatar for human-computer interaction called Petra. To chat with Petra, add user firstname.lastname@example.org to your Gmail (in Czech).
Malach Centre for visual history
The Malach Centre for visual history provides local access to the extensive digital archives of the USC Shoah Foundation which contain over 50.000 witness testimonies covering the history of entire 20th century. ÚFAL has participated in developing tools for linguistic processing of the Czech data.
ÚFAL has organized a number of conferences, such as the Annual Meeting of the Association of Computational Linguistics (ACL) in 2007, Depling in 2013 or Machine Translation Marathons in 2009 and 2013.
Fred Jelinek seminar series, a loose series of lectures organized in recognition of the late professor Frederick Jelinek, regularly features the most prominent researchers in the field. Video recordings of these lectures as well as regular Monday seminars are available online.
... and many more!
Apart from all the above mentioned projects, we also work on different NLP tasks including automatic speech recognition, information retrieval, machine learning, neurolinguistics, opinion mining, language teaching applications and many more. To find out, join our team in the beautiful historical centre of Prague! Did you know that Prague scored as the 2nd best in the Top Ten Best Student Cities based on ratings from students and recent graduates!
Why study at ÚFAL
ÚFAL is one of the top-level internationally recognized departments concerned with the modern and widely-applicable domain of computational linguistics.
Not only do we provide many interesting courses to familiarize the students with the field from the very beginnings to the exciting details, but we also offer the possibility to participate on many grants and both Czech and international projects.
Our staff and students have many opportunities to travel abroad, either for conferences, workshops and summer schools, or for educational exchanges or research fellowships e.g. at Johns Hopkins University, Baltimore (USA), University of Saarland, Saarbrücken (D) and many other top-ranking institutions all over the world.
We have at our disposal up-to-date computer technology for the most demanding computations.
Our graduates find employment in leading companies in the field, as well as in any broader domain of informatics.
Some of our alumni
Jan Cuřín - now at IBM, Prague
Martin Čmejrek - now at IBM, New York
Jiří Havelka - now at IBM, Prague
Magda Hnátková - now at Arriba, San Francisco
Pavel Krbec - now at CET21, Prague
Pavel Květoň - now at IBM, Prague
Martin Majliš - now at Amazon, Toronto
Petr Pajas - now at Google, Zürich
Petr Podveský - now at RWE, Prague
Jan Rouš - now at Google, Mountain View
Jiří Semecký - now at Google, Zürich
Otakar Smrž - now at Seznam, Prague
Jan Štěpánek - now at Barclays, Prague