SIS code: 

NPFL128 Language Technologies in Practice
2022 Summer

Instructor: Jirka Hana
e-mail for homeworks etc: (start the email's subject with NPFL128)
Time & Place: Wed 15:40-17:10 S8


1.  Description and objectives of the course

The course surveys solutions to common NLP tasks ranging from entity recognition to text generation. It evaluates various approaches (machine learning, rules, larger resources, ...) and their combinations.

Most of the course consists of students presenting and discussing papers relevant to a given topic. Part of the course is also implementation of a prototype system, typically replicating one described in one of the papers.

2. Discussion

This course is organized as a discussion of important papers. Everybody reads all the papers to be able to participate in the discussion. For each paper, one student will be responsible for leading the discussion. 

  • Choose two papers and tell me which  
  • Reads all papers
  • Create a Google doc document named (NPFL128 ) and share it with me. Use the same document for all papers.
  • For a paper that you did not choose, write 4-8 bullets summarizing:
    • the basic idea of the paper
    • aspects of the paper you find interesting, useful in other areas, etc.
    • aspects of the paper you think could be improved
    • Do this before the paper is scheduled and send me a notification about it.
  • For a paper that you did choose, create a presentation and send it to me at least a day before the class.

3.  Project

There is one programming [Project] due on July 31 (talk to me if you cannot meet the deadline). Note that using git and Pull requests is required.

4. Grading

Project 0-50
Active class participation 0-50
Total: 0-100
Grade Points
1 90-100
2 76-89
3 60-75
4 0-59

5.  Schedule


Discussion on Summary by   Topic Related/Other papers
16 Feb    me Introduction;

Intro to Computational Morphology: [slides]
A. Feldman & J. Hana (2010). A resource-light approach to morpho-syntactic tagging (Chapter 6, 7) [slides]
23 Feb    
2 Mar Jacob D. Yarowsky & R. Wicentowski (2000): Minimally Supervised Morphological Analysis by Multimodal Alignment R. Wicentowski (2004): Multilingual noise-robust supervised morphological analysis using the WordFrame model
9 Mar  Dominika J. Goldsmith (2001). Unsupervised Learning of the Morphology of a Natural Language. Linguistica website
16 Mar Kristýna P. Schone & D. Jurafsky (2001): Knowledge-free induction of inflectional morphologies P. J. Schone (2001): Toward knowledge-free induction of machine-readable dictionaries.
Andrew  J. Shlens: A Tutorial on Principal Component Analysis  
23 Mar Rishu  J. Pennington, R. Socher, C. Manning: GloVe: Global Vectors for Word Representation  
Jan Kohonen, Virpioja and Lagus (2010): Semi-supervised learning of concatenative morphology.
30 Mar Klára S. Cucerzan & D. Yarowsky (2002): Bootstrapping a Multilingual Part-of-speech Tagger in One Person-day
_ (2003): Minimally Supervised Induction of Grammatical Gender
6 Apr Nalin + Goutham Albert Gatt, Emiel Krahmer (2018): Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation    
13 Apr Antonije Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, and Sameer Singh (2020): Beyond Accuracy: Behavioral Testing of NLP Models with CheckList.
20 Apr Igbal Bing Liu (2017): Many Facets of Sentiment Analysis
Anna Saif M. Mohammad (2017): Challenges in Sentiment Analysis
27 Apr Daniela D. Nadeau & S. Sekine (2007): A survey of named entity recognition and classification  
 Daragh Named Entity Recognition - Cucerzan (2007). Large-Scale Named Entity Disambiguation Based on Wikipedia Data  
4 May  Amrita Mihai Surdeanu, David McClosky, Mason R. Smith, Andrey Gusev, and Christopher D. Manning. 2011.  Customizing an Information Extraction System to a New Domain  
Borek Steven Feng et al (2021): A Survey on Data Augmentation Approaches for NLP
25 May Ondrej E2E NLG Challenge:
Juraska et al (2018): Slug2Slug: A Deep Ensemble Model with Slot Alignment for Sequence-to-Sequence Natural Language Generation;
Nguyen & Tran (2018): Structure-based Generation System for E2E NLG Challenge
 Alexander Named Entity Recognition - Gao & Cucerzan (2017). Entity Linking to One Thousand Knowledge Bases. ECIR.