News
- The Czech academic corpus 2.0 has been released by the Linguistic Data Consortium.
- The language games portal LGame has been launched.
Project Info
- project ID: 1ET101120413
- project duration: July 1, 2004 - December 31, 2008
- funded by the Grant Agency of the Academy of Sciences of the Czech Republic
Description
The project focuses on fulltext information systems
(containing both written and spoken materials) in Czech (when
standard methods fail due to the different type
of languages they have been developed for). The project aims
to strengthen and improve current
methods for morphological analysis of Czech, in order to
attain higher precision in identifying lexical units
and in some cases also their meaning. The project will use
state-of-the-art statistical technology
and machine learning based on linguistically annotated
data. Within the project, such data will be prepared
(at a fraction of the usual cost) by conversion of older
resources, and tools will be created
(based on the resulting larger corpus) that would have
parameters needed for a successful
application in end-user information systems. (Proposal submitted
)
Links
Contact
- Barbora Hladká (hladka@ufal.mff.cuni.cz)
Institute of Formal and Applied Linguistics,
Faculty of Mathematics and Physics, Charles University
Malostranské nám. 25
CZ - 118 00 Prague 1
tel.:+420-221 914 233, fax:+420-221 914 304
