SIS code: 

NPFL128 Language Technologies in Practice
2021 Summer

Instructor: Jirka Hana
e-mail for homeworks etc: (start the email's subject with NPFL128)
Time & Place: The first class is online via Zoom, Fri March 5, 9:00 - 10:30
After that, we will meet for 30 min every Friday at 10:00


1.  Description and objectives of the course

The course surveys solutions to common NLP tasks ranging from entity recognition to text generation. It evaluates various approaches (machine learning, rules, larger resources, ...) and their combinations.

Most of the course consists of students presenting and discussing papers relevant to a given topic. Part of the course is also implementation of a prototype system, typically replicating one described in one of the papers.

2. Discussion & remote teaching

Typically this course is organized as a discussion of important papers. Everybody reads all the papers to be able to participate in the discussion. For each paper, one student will be responsible for leading the discussion. 

Discussion is really hard over Zoom. So instead, the course will be offline during the lockdown. It will resume normally once/if possible.

This does not change:

  • everybody chooses two papers and tells me which  
  • everybody reads all papers

This changes:

  • Create a Google doc document named (NPFL128 ) and share it with me.
  • For a paper that you did not choose, write several bullets summarizing:
    • the paper in general
    • aspects of the paper you find interesting, useful in other areas, etc.
    • aspects of the paper you think could be improved
    • In total, there should be at least 4 bullets. 
    • Do this before the paper is scheduled and send me a notification about it.
  • For a paper that you chose, do the same but (2a) should cover all the important parts of the paper.
  • I will summarize all these notes, add mine and post the result in a separate document.
  • Use the same Google doc for all the papers. You can write the longer summaries of the papers you chose to separate documents if you prefer, but still add a link to them to the main Google doc.

3.  Project

There is one programming [Project] due on July 31 (talk to me if you cannot meet the deadline)

  1. Create an empty repository in bitbucket or github
  2. Create a branch
  3. Commit all your code on that branch (not in master)
  4. Create a Pull Request
  5. Add me as a reviewer (jhana for bitbucket, jirka-x1 for github)
  6. Send me a non-automated email about this with the subject "NPFL128 Project"

4. Grading

Project 0-50
Active class participation 0-50
Total: 0-100
Grade Points
1 90-100
2 76-89
3 60-75
4 0-59

5.  Schedule


Discussion on Summary by   Topic Related/Other papers
5 Mar    me Introduction;

Intro to Computational Morphology: [slides]
A. Feldman & J. Hana (2010). A resource-light approach to morpho-syntactic tagging (Chapter 6, 7) [slides]
12 Mar     -- --
19 Mar 16 Mar me D. Yarowsky & R. Wicentowski (2000): Minimally Supervised Morphological Analysis by Multimodal Alignment R. Wicentowski (2004): Multilingual noise-robust supervised morphological analysis using the WordFrame model
26 Mar 20 Mar   Niyati  J. Goldsmith (2001). Unsupervised Learning of the Morphology of a Natural Language.  
9 Apr 27 Mar  Niyati   P. Schone & D. Jurafsky (2001): Knowledge-free induction of inflectional morphologies P. J. Schone (2001): Toward knowledge-free induction of machine-readable dictionaries.
16 Apr 10 Apr me   J. Shlens: A Tutorial on Principal Component Analysis  
23 Apr 17 Apr Claésia S. Cucerzan & D. Yarowsky (2002): Bootstrapping a Multilingual Part-of-speech Tagger in One Person-day
_ (2003): Minimally Supervised Induction of Grammatical Gender
30 Apr 24 Apr  Anna  Albert Gatt, Emiel Krahmer (2018): Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation    
7 May 1 May      
14 May 8 May  Saad  D. Nadeau & S. Sekine (2007): A survey of named entity recognition and classification  
21 May 15 May   Named Entity Recognition - Cucerzan (2007). Large-Scale Named Entity Disambiguation Based on Wikipedia Data  
28 May 22 May  Saad  Recursive Deep Models for Semantic Constitutionality Over a Sentiment Treebank  
4 June 29 May Kohonen, Virpioja and Lagus (2010): Semi-supervised learning of concatenative morphology.