SIS code: 

NPFL128 Language Technologies in Practice
2024 Summer

Instructor: Jirka Hana
e-mail:  Jirka.<my last name> (start the email's subject with NPFL128)
Time & Place: Wednesday 9-10:30 S7 (ignore the part scheduled for 8:10-9:00)


1. Description and Objectives of the Course

The course surveys solutions to common NLP tasks ranging from entity recognition to text generation. It evaluates various approaches (machine learning, rules, larger resources, ...) and their combinations.

Most of the course consists of students presenting and discussing papers relevant to a given topic. Part of the course is also implementation of a prototype system, typically replicating one described in one of the papers.

2. Discussion

This course is organized as a discussion of important papers. Everybody reads all the papers to be able to participate in the discussion. For each paper, one student will be responsible for presenting the paper and leading the discussion.

Not every detail of each paper is important to us now. Do not present aspects that are no longer relevant (e.g., there was a lot of development in word embeddings, so there is typically no reason to discuss how a paper from 2001 handled it).

  • Choose a paper and tell me which one.
  • Reads all papers
  • Create a Google doc document named (NPFL128 <your_name>) and share it with me. Use the same document for all papers.
  • For a paper that you did not choose, write 4-8 bullets summarizing:
    • the basic idea of the paper
    • aspects of the paper you find interesting, useful in other areas, etc.
    • aspects of the paper you think could be improved
    • Do this before the paper is scheduled, and send me a notification about it.
  • For the selected paper, create a presentation and send it to me by noon on the Monday before the class.

3. Project

There is one programming [Project] due on July 31 (talk to me if you cannot meet the deadline). Note that using git and Pull requests is required.

4. Grading

Project 0-50
Active class participation 0-50
Total: 0-100

Grade Points
1 90-100
2 76-89
3 60-75
4 0-59

5. Schedule

Candidate papers
Discussion on  Presented by Topic Slides
Feb 21 me Introduction Zippf's law, Processing morphology
Mar 6 Tomáš Cucerzan & Yarowsky (2002): Bootstrapping a Multilingual Part-of-speech Tagger in One Person-day
Cucerzan & Yarowsky (2003): Minimally Supervised Induction of Grammatical Gender;
Mar 13   Yarowsky & Wicentowski (2000): Minimally Supervised Morphological Analysis by Multimodal Alignment  
Mar 20 Michal Liu (2017): Many Facets of Sentiment Analysis  
Anna Mohammad (2017): Challenges in Sentiment Analysis  
Mar 27 Kate Shlens: A Tutorial on Principal Component Analysis  
Apr 3 Dan Ribeiro et al (2020): Beyond Accuracy: Behavioral Testing of NLP Models with CheckList.  
Apr 10 Danil Gao et al (2023): Retrieval-Augmented Generation for Large Language Models: A Survey.  
Apr 17 Kristýna Tonmoy et al (2024): A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models:  
Apr 24 Adam Jehangir et al (2023): A survey on Named Entity Recognition – datasets, tools, and methodologies  
May 1   National Holiday - NO CLASS  
May 8   National Holiday - NO CLASS  
May 15 Maksim Cucerzan (2007): Large-Scale Named Entity Disambiguation Based on Wikipedia Data  
May 22