Computational Morphology

Instructors:	Jirka Hana (Charles University in Prague)
	Anna Feldman (Montclair State University)
e-mail:	Jirka.LastName@gmail.com (start the email's subject with ESSLLI)

Description of the course

The course will introduce the students to the methods of processing morphology of natural languages. It starts with a linguistic overview of morphology, discussing features of a wide variety of languages including fusional languages such as German and Czech and agglutinative such as Turkish and Esperanto. After that, a discussion of corpora, tagsets and annotation will follow. The core of the course covers supervised, unsupervised and semi-supervised methods of morphological analysis, morpheme segmentation, lexicon creation, etc. The course covers standard and well establishedmethods (two level morphology, Porter stemmer), but also includes discussion of recent important papers, for example, Yarowsky & Wicentowski 2000, Creutz and Lagus 2007, Monson 2009, and Tepper & Xia 2010.

Slides

Basics of Phonetics, Phonology & Morphology: slides, exercise
Tagset Design and Corpora Annotation: slides
Classical approaches to morphological analysis: slides
Classical approaches to tagging: slides
An example: our approach: slides
Unsupervised and Resource-light Approaches to Computational Morphology: slides

Old Czech MA

Old Czech MA V 0.1 (Wed) - guesser needs implementing, and the results somehow combined with the map based xanalyzer (see readme.txt for basic instructions)
Old Czech MA V 0.2 (Thu) - adds a cascade of analyzer and a guesser (the analysis part, the part reading the guesser specification is missing)
Old Czech MA V 0.3 (Fri) - added a simple guesser and stem change rules

Recommended Reading

(A star marks papers that are optional)

Overview papers written by us:

J. Hana and Feldman A.(2012): Resource-Light Approaches to Computational Morphology Part 1: Monolingual Approaches In Language and Linguistics Compass Journal (Computational Linguistics Section). Vol.6, Issue 10, pp. 622-634, Blackwell.
J. Hana and Feldman A.(2013): ESSLLI 2013 reader

M.F.Porter (1980): An algorithm for suffix stripping (What is now called Porter stemmer)
L. Karttunen & K. Wittenburg (1983): A two-level morphological analysis of English

Unsupervised and/or light-lightly supervised approaches):

A. Feldman & J. Hana (2010): A resource-light approach to morpho-syntactic tagging (Chapter 6, 7)
J. Goldsmith (2001): Unsupervised Learning of the Morphology of a Natural Language.
D. Yarowsky & R. Wicentowski (2000): Minimally Supervised Morphological Analysis by Multimodal Alignment.
* R. Wicentowski (2004): Multilingual noise-robust supervised morphological analysis using the WordFrame model.
S.Cucerzan & D. Yarowsky (2002): Bootstrapping a Multilingual Part-of-speech Tagger in One Person-day
*_ (2003): Minimally Supervised Induction of Grammatical Gender
P. Schone & D. Jurafsky (2001): Knowledge-free induction of inflectional morphologies
*P. J. Schone (2001): Toward knowledge-free induction of machine-readable dictionaries.
*M. G. Snover & M. R. Brent (2001): A Bayesian model for morpheme and paradigm identification.
C. Monson et al (2007): ParaMor: Minimally Supervised Induction of Paradigm Structure and Morphological Analysis.
*C. Monson (2009): ParaMor: From Paradigm Structure to Natural Language Morphology Induction. (thesis)
*C. Monson et al. (2009): ParaMor and Morpho Challenge 2008;
*T. Tchoukalov, C. Monson & B. Roark (2010): Morphological Analysis by Multiple Sequence Alignment.
Morfessor: M. Creutz and K. Lagus (2007): Unsupervised models for morpheme segmentation and morphology learning.
Kohonen et al 2010,
Tepper and Xia 2008 *Creutz, M (2006): Induction of the Morphology of Natural Language (thesis)

Support

This material is based upon work supported by

U.S. NSF (award #: 0916280).
Grant Agency Czech Republic (project ID: P406/10/P328) and

Description of the course

Slides

Old Czech MA

Recommended Reading

Links

Support