SIS code: 
Semester: 
winter
E-credits: 
winter s.:3
Examination: 
2/0 MC
Guarantor: 

Morphological and Syntactic Analysis

Accompanying website to the course NPFL094 at MFF UK

Please note that there will be no classes on December 15 (I will be at a conference) and December 22 (according to a poll, the attendance would be close to zero). See you all in January!

V zimním semestru 2016–2017 se budou přednášky konat každý čtvrtek v 10:40–12:10 v SU1 (Malostranské náměstí 25, přízemí). Výuka bude probíhat v angličtině (kvůli předpokládané účasti zahraničních studentů). Individuální konzultace v češtině budou možné na požádání.

In winter semester 2016–2017 the lectures take place every Thursday at 10:40–12:10 in SU1 (Malostranské náměstí 25, ground floor). The talks will be in English (due to expected attendance by foreign students). Individual tutorials in Czech will be available on demand.

Caution: The materials below may change.

Note that there is a follow-up course in the summer semester, Morphological and Syntactic Analysis II.

Slides

Part PowerPoint PDF Lab Date
Morphological Analysis: An Introduction PowerPoint PDF   6.10.2016
Parts of Speech Tags and Features PowerPoint PDF   6.–13.10.2016
Semi-Supervised Lexicon Acquisition PowerPoint PDF Lab 13.10.2016
Unsupervised Morphemic Segmentation PowerPoint PDF Lab 3.11.2016
Chinese Word Segmentation PowerPoint PDF   3.–10.11.2016
Finite-State Morphology PowerPoint PDF Lab 10.–24.11.2016
Morphology and Context-Free Grammars PowerPoint PDF   24.11.2016
Morphology and Unification Grammars PowerPoint PDF   1.12.2016
Functional Morphology PowerPoint PDF Lab 1.12.2016
Syntax: Constituent Sentence Parsing PowerPoint PDF   8.12.2016
Syntax: Dependency Sentence Parsing PowerPoint PDF Lab 8.12.2016–5.1.2017
Universal Dependencies   PDF   12.1.2017
Cross-Language Model Transfer   PDF   12.1.2017

Homework

There will be two or more homework assignments and successful completion of homework is the base for awarding credits for this course. Meeting deadlines is part of homework scoring but it is possible to get points even for late submissions. Please check this page a week after you submit a homework solution and get in touch if you do not see your submission mentioned here. E-mail communication is not always reliable and your submission may have ended up in a spam folder.

Individual assignments may vary in points awarded; unless I say otherwise, the maximum score for one assignment is 14 points. The minimum needed to get credits for the course is 20 points. Final grading (master and bachelor students only): 20–22 points ⇒ 3; 23–25 points ⇒ 2; 26 and more points ⇒ 1.

HW1 was assigned on 13.10.2016 (see here) and due 3.11.2016.

HW2 was assigned on 11.11.2016 (see here) and due 16.12.2016.

HW3 (optional) was made available on 5.1.2017 (see here) and due 15.2.2017.

Abbrev. Name HW1 HW2 Total
Submitted Score Submitted Score
LuBe 3.11.2016 14 17.12.2016 14 28
DaBo 3.11.2016 14 17.12.2016 14 28
KiDr 3.11.2016 12 29.12.2016 14 26
VlGl 3.11.2016 14 17.12.2016 16 30
PeLa 3.11.2016 14     14
NiMe 3.11.2016 14 1.12.2016 14 28
AdNo 10.11.2016 11 20.11.2016 15 26
JoVá 19.10.2016 14 15.12.2016 14 28

Projects

Note: This section is becoming obsolete. In the winter semester 2016/2017, I will be reshaping the course. We will be in a lab, there will be several smaller tasks and it will be possible to pass the course by completing these tasks. The larger projects are no longer a requirement, but they remain an option during the transitional period.

We will experiment with selected methods in mini-projects during the semester. Parts of the projects will be assigned as homework. Solution of the homework (including its presentation at the seminar on agreed on day) will be required for the credits to be awarded. It could be implementation of rules of a grammar, filling a lexicon by words from a corpus, possibly implementation of a whole tool.

A typical example of a project is creating a simple analyzer (or part thereof) for an unknown language. The student may select any language except for Czech, English and languages selected by other students (this last condition may be lifted in well justified cases). The student need not actually know the language but they should be able to obtain some description of its grammar (e.g. from a conventional textbook or Wikipedia). If the “unknown” language is actually known to the student, it is of course an advantage.

There is a wiki page for students of this course. Use it to collect links to resources needed for your miniproject. 

There is also a subversion (svn) repository where you will save your actual work, i.e. data you collect, tools you collect or build, grammar rules and lexicons you design etc. (See wiki on how to access the repository.) See the wiki on how to register and get RW access there. Contact me for RW access to the svn repository.

Study Information System

Following are links to the officially announced course information at the university website.

  • Morfologická a syntaktická analýza (NPFL094)
  • The same course is offered at the Faculty of Arts (FF UK) with different code: ATKL00343; if you are enrolled in the study program of the FF UK, I should be able to award the credits to you as well. (And it should be possible to award the credits to students of other faculties as well.)

Předpoklady

Seznam doporučených předmětů, které byste ideálně měli absolvovat před tímto kurzem. Tyto předměty jsou vesměs povinné ve studijním směru matematická lingvistika, avšak jejich vazba na NPFL094 není formálně povinná, tj. není vynucována studijním oddělením. Můžete je tedy v případě potřeby absolvovat souběžně s NPFL094 nebo dokonce později.

Anotace

Základní metody a algoritmy používané pro morfematickou segmentaci, morfologickou a syntaktickou (složkovou, závislostní, tektogramatickou) analýzu přirozeného jazyka. Některé přístupy si v průběhu semestru formou miniprojektů vyzkoušíme v praxi na neznámém jazyku. Klasifikovaný zápočet bude udělován za samostatnou práci na těchto miniprojektech.

Osnova

  1. Sady morfosyntaktických značek, definice problémů, chunking, frázové a závislostní stromy.
  2. Řízená a neřízená morfematická segmentace.
  3. Dvojúrovňová morfologie.
  4. Bezkontextové gramatiky a chart parser, využití pro morfologickou analýzu.
  5. Unifikační gramatiky pro morfologickou analýzu.
  6. Zjednoznačnění morfologie (značkování).
  7. Syntaktická analýza a regulární výrazy.
  8. Pravděpodobnostní bezkontextové gramatiky a chart parser pro syntaktickou analýzu. Collinsův a Charniakův parser.
  9. Závislostní syntaktická analýza, neprojektivity, MST parser, Malt parser.
  10. Kombinace parserů.

Prerequisities

List of recommended courses that you should ideally have passed before this one. These courses are mostly obligatory in the specialization Mathematical Linguistics but their bond with NPFL094 is not formalized, i.e. it is not enforced by the study department. If necessary, you can thus attend them in parallel to NPFL094 or even later.

Annotation

Basic methods and algorithms used for morphemic segmentation, morphological and syntactic (constituency-based, dependency-based, tectogrammatical) analysis of natural languages. We will try out some of the approaches on an unknown language, as student mini-projects during the semester. Credits will be awarded for contribution to these mini-projects.

Syllabus

  1. Sets of morphosyntactic tags, definition of problems, chunking, constituency and dependency trees.
  2. Supervised and unsupervised morphemic segmentation.
  3. Two-level morphology.
  4. Context-free grammars and chart parser, usage for morphological analysis.
  5. Unification grammars for morphological analysis.
  6. Morphological disambiguation (tagging).
  7. Syntactic analysis (parsing) and regular expressions.
  8. Probabilistic context-free grammars and chart parser for syntactic analysis. Collins and Charniak parser.
  9. Dependency parsing, nonprojectivities, MST parser, Malt parser.
  10. Parser combination.

Literature

  • James Allen: Natural Language Understanding. The Benjamin/Cummings Publishing Company, Inc.; Redwood City, California,1994. ISBN 0-8053-0334-0.
  • Adolf Erhart: Základy jazykovědy. Státní pedagogické nakladatelství; Praha, 1990
  • Kimmo Koskenniemi: Two-level Morphology: A General Computational Model for Word-form Recognition and Production. University of Helsinki, Department of General Linguistics, Publications No. 11; Helsinki, 1983
  • Kenneth R. Beesley, Lauri Karttunen: Finite State Morphology. CSLI Publications, 2003
  • Jan Hajič: Unification Morfology Grammar (doktorandská práce). Univerzita Karlova, Praha, 1994
  • Jan Hajič: Disambiguation of Rich Inflection. Karolinum, Praha, 2004. ISBN 978-80-246-0282-0
  • Richard Sproat: Morphology and Computation. Massachusetts Institute of Technology, Cambridge, Massachusetts, 1992
  • Stuart Shieber: An Introduction to Unification-based Approaches to Grammar. CSLI Lecture Notes No. 4, Stanford, California, 1986