JAZZ Integration of language resources for information extraction from natural texts The project of Information Society supported by the Grant Agency of the Academy of Sciences of the Czech Republic

Internal code: 207-14 / 242083

Estimated time for project solving: 2005 - 2009

Aim of the project: Suggestion and implementation of a general format and tools for unified processing of any type of language data, system for automatic search and anotation of "named entities" in Czech texts.

Project justification: DOC

, HTML

Project characteristics The project "Integration of Language Resources for Information Extraction from Natural Texts" focuses on the problem of current heterogeneity of language data intended for linguistic research. The result of the project will be a unified system for storing and using language resources together with robust tools enabling effective text processing. All the available language resources will be converted into the new system. The project is concerned also with detection and classification of "named entities" in Czech texts, a subject not yet resolved for the Czech language. Its inclusion into the unified data system will improve results of automatic language processing, especially in the field of information retrieval from large text databases.

Chief researcher
Jaroslava Hlaváčová Institute of Formal and Applied Linguistics
Faculty of Mathematics and Physics of Charles University
Malostranské nám. 25
118 00 Praha 1
tel.: +420-221 914 360, fax: +420-221 914 309
e-mail: hlava at ufal dot mff dot cuni dot cz

Associate researcher

Ústav pro jazyk český
Akademie věd ČR
Letenská 4/123
118 51 Praha 1