Up

Czech Language Tagging

Several approaches to the tagging of texts have been proposed.The so-called stochastic strategies use various statistical models, namely Markov Models, Maximum Entropy Model and Exponential Model.
These strategies are classifieds as corpus-based - they work on annoatated corpora to achieve appropriate probabilities. To tag Czech texts, we concentrate mainly on the  Markov Model and the  Feature-based tagger.

All papers related to the Czech tagging are listed bellow.



References

The file pdt.bib contains a collection of books, papers, technical reports related to the PDT.
  1. Jan Hajic: Disambiguation of Rich Inflection - Computational Morphology of Czech. Charles University Press - Karolinum, in press.
    Available in: BibTex item
  2. Jan Hajic. Morphological Tagging: Data vs. Dictionaries. In Proceedings of ANLP-NAACL Conference, pp. 94-101, Seattle, Washington, USA, 2000.
    Available in: pdffile, psfile, BibTex item
  3. Jan Hajic, Barbora Hladka. Probabilistic and Rule-Based Tagger of an Inflective Language - a Comparison. In Proceedings of the 5th Conference on Applied Natural Language Processing, pp. 111-118, Washington, USA, 1997.
    Available in: pdffile, psfile, BibTex item
  4. Jan Hajic, Barbora Hladka. Czech Language Processing - POS Tagging. In Proceedings of the First International Conference on Language Resources and Evaluation, pp.931-936, Granada, Spain, 1998
    Available in: pdffile, psfile, BibTex item
  5. Jan Hajic, Barbora Hladka. Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset. In Proceedings of COLING-ACL Conference, pp. 483-490, Montreal,  Canada, 1998.
    Available in: pdffile, psfile, BibTex item
  6. Jan Hajic, Pavel Krbec, Pavel Kveton, Karel Oliva, Vladimit Petkevic. Serial Combination of Rules and Statistics: A Case Study in Czech Tagging. In Proceedings of ACL'01, Toulouse, France, 2001
    Available in: pdffile, psfile, BibTex item
  7. Barbora Hladka. Software Tools for Large Czech Corpora Annotation. MSc thesis (in Czech), MFF UK, Prague, Czech Republic, 1994.
    Available in: BibTex item
  8. Barbora Hladka. The Context (not only) for Humans. In Proceedings of the Second International Conference on Language Resources and Evaluation, Athens, Greece, 2000.
    Available in: BibTex item
  9. Barbora Hladka. Czech Language Tagging. PhD thesis, IFAL MFF UK, Prague, Czech Republic, 2000.
    Available in: pdffile, psfile, BibTex item
  10. Barbora Hladka, Kiril Ribarov. POS Tags for Automatic Tagging and Syntactic Structures. In Issues of Valency and Meaning. Studies in Honour of Jarmila Panevova, ed. Eva Hajicova, pp. 226-240, Karolinum, Charles University Press, Prague, Czech republic, 1998.
    Available in: pdffile, psfile, BibTex item
  11. Jiri Mirovsky. Morphological Annotation of Text: Automatic Disambiguation. MSc thesis (in Czech), MFF UK, Prague, Czech Republic, 1998.
    Available in: pdffile, psfile, BibTex item

Up