Center's logo -- go home

Home
Activities
People
Publications & Reports

Activities

The Center for Computational Linguistics aimed at research and development in computational linguistic on a completely new level based on unique multilevel grammatical analysis of a very large corpus of Czech. The Center that successfully integrated research institutions focusing on text and speech processing has a great impact on the extended field interested in communication between human and computer.

The development of Prague Dependency Treebank (PDT) was the crucial task of the Center. PDT is a unique collection of Czech sentences annotated / assigned with rich information about their morphology, syntax, as well as their semantic structure.

Such a data collection serves for further theoretical research of Czech language, but first and foremost, it is an essential requirement for any task of automatic natural language processing, such as machine translation, data mining, automatic text understanding and text generating.

Rich experiences in building corpora have been used in Prague Arabic Dependency Treebank (published by LDC, 2004) that consists of multi-level linguistic annotations over the language of Modern Standard Arabic.

Another aim of the Center was to produce and research over multilingual language resources. These activities resulted in Prague Czech-English Dependency Treebank (published by LDC, 2004), a corpus of Czech-English parallel resources suitable for experiments in machine translation, with a special emphasis on dependency-based (structural) translation (with evaluation data provided for Czech-to-English systems).

The second essential task of the Center was the statistically based research in speech analysis. The outputs of these activities are available as Czech Broadcast News Corpus and Czech Broadcast News Transcripts (both published by LDC, 2004).

Involvment in the extraordinary international project MALACH (Multilingual Access to Large Spoken Archives) was the great asset of the Center. The MALACH project, which aims at transcription of memories of Holocaust survivors, focuses on improving the ways to access large multilingual collections of recorded speech.

The Center for Computational Linguistics paid due attention to the theoretical and application aspects of computational linguistics with special regard to the Czech language, both in its written and spoken form, as well as to the mathematical and computational foundations of the methods, algorithms and procedures of natural language processing. The methodology was based on a deep study, comparison and considerate employment of both structural and statistical approaches, including methods of machine learning, having in mind the specific typological properties of Czech as a highly inflected language; in this respect an original methodology was developed and used.

The Center was awarded by the possibility to organize the prestigious XVII International Congress of Linguists, CIL 17, July 24-29, 2003.


 
Valid HTML 4.01! Petr Homola, CKL 2004