Note: There are no home work assignments this year (2017/2018), instead invest some extra energy into the project. You were, however, supposed to send me a pseudo code of an inflectional morphological analyzer during the semester.

Use the Linguistica program to generate signatures (roughly paradigms) from a plain text. You can use a text of your choice or you can process Švejk (taken from the Prague Municipal Library, converted to utf8)

You need Python 3.4, if you do not have it, I recommend using the anaconda distribution.
Install Linguistica from github, you do not need the graphical user-interface.
Download the corpus:
wget https://ufal.mff.cuni.cz/~hana/2016/docs/svejk_1_a_2.txt
Run it:
python3 -m linguistica cli
Tell it to analyze your text:
Path to your file: svejk_1_a_2.txt
For all questions, accept the defaults (you can play with them later)
The corpus is analyzed and the result is saved to the lxa_outputs directory
Write up a half a page to one page report discussing the results: what is surprisingly (in)correct, what was missed, what is linguistically wrong but technically fine, ... No prose, bullets are enough, just that you have something in hand when we discuss it in class.

An alternative homework (requires my approval)

Write a skeletton of a morphological analyser for an inflectional language.
Use your favorite language (preferably java, python, C++).
Input: a word
Output: lemma candidates with tag candidates
Be sure to specify all the necessary datastructures (for storing paradigms, lexical entries, etc)
Ignore parsing any grammar specification, simply specify one or two examples in code (e.g. instantiate two paradigms)
Ignore efficiency
Use standard datastructures such as Maps/Hashtables, Multimaps, Sets, Lists, ...
Write basic comments

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

An alternative homework (requires my approval)