TectoMT

Czech Named Entity Corpus 1.0

The aim of Named Entity Recognition (NER) is to identify proper names in text and to classify them into predefined categories such as names of persons, geographical names, names of organizations etc. The task of NER is motivated by the needs of Natural Language Processing (NLP) applications such as Information Extraction and Machine Translation. Similarly to most other tasks in NLP, it is advantageous to use annotated data when developing a named entity recognizer, especially for training and evaluation purposes. The presented Czech Named Entity Corpus 1.0 is the first publicly available corpus providing a large body of manually annotated named entities in Czech sentences, including a fine-grained classification.

Download Czech Named Entity Corpus 1.0

Evaluation script

Highly Modular MT System with Tectogrammatics Used as Transfer Layer

Czech Named Entity Corpus 1.0