SIS code: 
Semester: 
winter
E-credits: 
winter s.:3
Examination: 
2/0 MC
Instructor: 

Morphological and Syntactic Analysis

Accompanying website to the course NPFL094 at MFF UK

Note that the Zoom recordings from 2020–2021 are still available for viewing, which might be useful if you miss a class (see SIS, file attachments on the course page).

V zimním semestru 2023–2024 se budou přednášky konat každou středu v 14:00–15:30 v SW1 (Malostranské náměstí 25, přízemí). Výuka bude probíhat v angličtině (kvůli předpokládané účasti zahraničních studentů). Individuální konzultace v češtině budou možné na požádání.

In winter semester 2023–2024 the lectures take place every Wednesday at 14:00–15:30 in SW1 (Malostranské náměstí 25, ground floor). The talks will be in English (due to expected attendance by foreign students). Individual tutorials in Czech will be available on demand.

Caution: The materials below may change.

Slides

Part PDF Lab Date
Morphological Analysis: An Introduction PDF   4.10.2023
Parts of Speech Tags and Features PDF   25.10.–1.11.2023
Semi-Supervised Lexicon Acquisition PDF Lab 11.10.2023
Unsupervised Morphemic Segmentation PDF Lab 1.–8.11.2023
Chinese Word Segmentation PDF   8.11.2023
Finite-State Morphology PDF Lab 22.11.–13.12.2023
Morphology and Context-Free Grammars PDF   13.12.2023
Morphology and Unification Grammars PDF Lab 20.12.2023
Functional Morphology PDF Lab  
Syntax: Universal Dependencies (see also NPFL075 and NPFL120 in summer) PDF Lab 3.1.2024 (up to the slide no. 59; for continuation, see NPFL075 in the summer semester)
Syntactic Parsing PDF   10.1.2024

Homework

There will be two or more homework assignments and successful completion of homework is the base for awarding credits for this course. Meeting deadlines is part of homework scoring but it is possible to get points even for late submissions. In order to protect your privacy, I will no longer publish even anonymized results of the class here. The plan was to use SIS as the communication channel but so far it does not work for this course in SIS (while it does work for other courses; I am waiting for a fix by the faculty staff). In any case, I will reply to your e-mail and say whether I approve the solution and whether there are any reservations and decreased score. Please remind me if you do not hear from me a week after you submitted your homework. E-mail communication is not always reliable and your submission may have ended up in a spam folder. (A better option is that it waits in my to-do queue; but I will try to confirm that I received your solution if I cannot evaluate it immediately.)

Individual assignments may vary in points awarded; unless I say otherwise, the maximum score for one assignment is 14 points. The minimum needed to get credits for the course is 20 points. Final grading (master and bachelor students only): 20–22 points ⇒ 3; 23–25 points ⇒ 2; 26 and more points ⇒ 1.

The usual place where you find the description of the homework is at the end of a lab guide page. But see the links below.

HW1 was assigned on 11.10.2023 (see here) and is due 14.11.2023.

HW2 was assigned on ?.11.2023 (see here) and is due 14.1.2024.

HW3 (see here) will be assigned only if needed.

Projects

Some people prefer one larger project for the semester than several smaller assignments. If this is your case and if you have an idea what you want to do, talk to me. As long as the topic is (or includes) morphological and/or syntactic analysis of a language (preferably a language for which such resources are not available yet), I will probably approve it. And if you want to extend the work e.g. as a master thesis, it is possible as well.

Study Information System

Following are links to the officially announced course information at the university web.

  • Morfologická a syntaktická analýza (NPFL094)
  • The same course is offered at the Faculty of Arts (FF UK) with a different code: ATKL00343; if you are enrolled in the study program of FF UK, I should be able to award the credits to you as well. (And it should be possible to award the credits to students of other faculties as well.)

Předpoklady

Seznam doporučených předmětů, které byste ideálně měli absolvovat před tímto kurzem. Tyto předměty jsou vesměs povinné ve studijním směru matematická lingvistika, avšak jejich vazba na NPFL094 není formálně povinná, tj. není vynucována studijním oddělením. Můžete je tedy v případě potřeby absolvovat souběžně s NPFL094 nebo dokonce později.

Anotace

Základní metody a algoritmy používané pro morfematickou segmentaci, morfologickou a syntaktickou (složkovou, závislostní, tektogramatickou) analýzu přirozeného jazyka. Některé přístupy si v průběhu semestru formou miniprojektů vyzkoušíme v praxi na neznámém jazyku. Klasifikovaný zápočet bude udělován za samostatnou práci na těchto miniprojektech.

Osnova

  1. Sady morfosyntaktických značek, definice problémů, chunking, frázové a závislostní stromy.
  2. Řízená a neřízená morfematická segmentace.
  3. Dvojúrovňová morfologie.
  4. Bezkontextové gramatiky a chart parser, využití pro morfologickou analýzu.
  5. Unifikační gramatiky pro morfologickou analýzu.
  6. Zjednoznačnění morfologie (značkování).
  7. Syntaktická analýza a regulární výrazy.
  8. Pravděpodobnostní bezkontextové gramatiky a chart parser pro syntaktickou analýzu. Collinsův a Charniakův parser.
  9. Závislostní syntaktická analýza, neprojektivity, MST parser, Malt parser.
  10. Kombinace parserů.

Prerequisities

List of recommended courses that you should ideally have passed before this one. These courses are mostly obligatory in the specialization Mathematical Linguistics but their bond with NPFL094 is not formalized, i.e. it is not enforced by the study department. If necessary, you can thus attend them in parallel to NPFL094 or even later.

Annotation

Basic methods and algorithms used for morphemic segmentation, morphological and syntactic (constituency-based, dependency-based, tectogrammatical) analysis of natural languages. We will try out some of the approaches on an unknown language, as student mini-projects during the semester. Credits will be awarded for contribution to these mini-projects.

Syllabus

  1. Sets of morphosyntactic tags, definition of problems, chunking, constituency and dependency trees.
  2. Supervised and unsupervised morphemic segmentation.
  3. Two-level morphology.
  4. Context-free grammars and chart parser, usage for morphological analysis.
  5. Unification grammars for morphological analysis.
  6. Morphological disambiguation (tagging).
  7. Syntactic analysis (parsing) and regular expressions.
  8. Probabilistic context-free grammars and chart parser for syntactic analysis. Collins and Charniak parser.
  9. Dependency parsing, nonprojectivities, MST parser, Malt parser.
  10. Parser combination.

Literature

  • James Allen: Natural Language Understanding. The Benjamin/Cummings Publishing Company, Inc.; Redwood City, California,1994. ISBN 0-8053-0334-0.
  • Adolf Erhart: Základy jazykovědy. Státní pedagogické nakladatelství; Praha, 1990
  • Kimmo Koskenniemi: Two-level Morphology: A General Computational Model for Word-form Recognition and Production. University of Helsinki, Department of General Linguistics, Publications No. 11; Helsinki, 1983
  • Kenneth R. Beesley, Lauri Karttunen: Finite State Morphology. CSLI Publications, 2003
  • Jan Hajič: Unification Morfology Grammar (doktorandská práce). Univerzita Karlova, Praha, 1994
  • Jan Hajič: Disambiguation of Rich Inflection. Karolinum, Praha, 2004. ISBN 978-80-246-0282-0
  • Richard Sproat: Morphology and Computation. Massachusetts Institute of Technology, Cambridge, Massachusetts, 1992
  • Stuart Shieber: An Introduction to Unification-based Approaches to Grammar. CSLI Lecture Notes No. 4, Stanford, California, 1986

License

Unless otherwise stated, teaching materials for this course are available under CC BY-SA 4.0.