SIS code: 

NPFL128 Language Technologies in Practice
2020 Summer

Instructor: Jirka Hana
e-mail for homeworks etc: (start the email's subject with NPFL128)
Time & Place:

Thu 10:40 - 12:10 S11
(an aditional 45 min time is reserved for individual discussions of projects,
time and place set by mutual agreement)


1  Description and objectives of the course

The course surveys solutions to common NLP tasks ranging from entity recognition to text generation. It evaluates various approaches (machine learning, rules, larger resources, ...) and their combinations.

Most of the course consists of students presenting and discussing papers relevant to a given topic. Each student implements a prototype system solving a particular task.

2  Readings and discussion

In each class, we will discuss one or more papers (sometimes books or dissertations). It is expected that everybody will have read the papers. For each paper, one or two people will be responsible for leading the discussion (in some cases it will be me).

3  Project

There is one programming [Project] due on July 31 (talk to me if you cannot meet the deadline)

  • Place all your code on a branch of a repository in bitbucket or github
  • Create a Pull Request
  • Add me as a reviewer (jhana for bitbucket, jirka-x1 for github)
  • Send me a non-automated email about this with subject "NPFL128 Project"

4  Active class participation

"Active participation" refers to your comments and questions during class, your answers to my questions, etc. I do not keep track of whether your answers, etc. are correct, but simply whether or not you participate. It is important that you read the assigned papers (especially if you are leading the discusssion).

5  Grading

Project 0-50
Active class participation 0-50
Total: 0-100
Grade Points
1 90-100
2 76-89
3 60-75
4 0-59

6  Schedule


Date   Topic Related/Other papers
20 Feb  me Introduction;

Intro to Computational Morphology: [slides]
A. Feldman & J. Hana (2010). A resource-light approach to morpho-syntactic tagging (Chapter 6, 7) [slides]
27 Feb   NO CLASS  
5 Mar  Vilém J. Goldsmith (2001). Unsupervised Learning of the Morphology of a Natural Language.  
12 Mar    NO CLASS  
19 Mar  me D. Yarowsky & R. Wicentowski (2000): Minimally Supervised Morphological Analysis by Multimodal Alignment. R. Wicentowski (2004): Multilingual noise-robust supervised morphological analysis using the WordFrame model.
26 Mar  me P. Schone & D. Jurafsky (2001): Knowledge-free induction of inflectional morphologies P. J. Schone (2001): Toward knowledge-free induction of machine-readable dictionaries.
2 Apr  Vilém B. Jurish & K. Würzner (2013). Word and Sentence Tokenization with Hidden Markov Models. J. Lang. Technol. Comput. Linguistics, 28, 61-83.  
9 Apr  me  J. Shlens: A Tutorial on Principal Component Analysis  
16 Apr   S. Cucerzan & D. Yarowsky (2002): Bootstrapping a Multilingual Part-of-speech Tagger in One Person-day
_ (2003): Minimally Supervised Induction of Grammatical Gender
23 Apr  Ondřej  Albert Gatt, Emiel Krahmer (2018): Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation    
30 Apr      
7 May    D. Nadeau & S. Sekine (2007): A survey of named entity recognition and classification  
14 May Ondřej Named Entity Recognition - Cucerzan (2007). Large-Scale Named Entity Disambiguation Based on Wikipedia Data  
21 May   Mihai Surdeanu, David McClosky, Mason R. Smith, Andrey Gusev, and Christopher D. Manning. 2011. Customizing an Information Extraction System to a New Domain