Jirka.LastName@gmail.com (start the email's subject with ESSLLI)
Description of the course
The course will introduce the students to the methods of processing morphology of natural languages. It starts with a linguistic overview of morphology, discussing features of a wide variety of languages including fusional languages such as German and Czech and agglutinative such as Turkish and Esperanto. After that, a discussion of corpora, tagsets and annotation will follow.
The core of the course covers supervised, unsupervised and semi-supervised methods of morphological analysis, morpheme segmentation, lexicon creation, etc. The course covers standard and well establishedmethods (two level morphology, Porter stemmer), but also includes discussion of recent important papers, for example, Yarowsky & Wicentowski 2000, Creutz and Lagus 2007, Monson 2009, and Tepper & Xia 2010.
Slides
Basics of Phonetics, Phonology & Morphology: slides, exercise
Unsupervised and Resource-light Approaches to Computational Morphology: slides
Old Czech MA
Old Czech MA V 0.1 (Wed) - guesser needs implementing, and the results somehow combined with the map based xanalyzer (see readme.txt for basic instructions)
Old Czech MA V 0.2 (Thu) - adds a cascade of analyzer and a guesser (the analysis part, the part reading the guesser specification is missing)
SFST
-- Stuttgart Finite State Transducer Tools
SFST is a toolbox for the implementation of morphological
analysers and other tools which are based on finite state
transducer technology.
OpenFst Library
A library for constructing, combining, optimizing, and
searching weighted finite-state transducers (FSTs).
FSA6.2xx: FSA Utilities toolbox: a collection of utilities to manipulate
regular expressions, finite-state automata and finite-state
transducers. (SICStus Prolog, SWI-Prolog or YAP)
HFST
: Helsinki Finite State Transducer Technology
TnT
tagger -- a very efficient statistical part-of-speech
tagger
Jitar HMM part
of speech tagger : An open source java implementation of a
simple Trigram HMM part-of-speech tagger, based on TnT