Register number of Czech Academy of Science Grant Agency: 1ET101120503
Internal code: 207-14 / 242083
Estimated time for project solving: 2005 - 2009
Aim of the project:
Suggestion and implementation of a general format and tools for unified
processing of any type of language data, system for automatic search
and anotation of "named entities" in Czech texts.
Project characteristics
The project "Integration of Language Resources for Information Extraction from
Natural Texts" focuses on the problem of current heterogeneity of language
data intended for linguistic research. The result of the project will be a
unified system for storing and using language resources together with robust
tools enabling effective text processing. All the available language
resources will be converted into the new system. The project is concerned
also with detection and classification of "named entities" in Czech texts, a
subject not yet resolved for the Czech language. Its inclusion into the
unified data system will improve results of automatic language processing,
especially in the field of information retrieval from large text databases.