Introduction to Natural Language Processing (Úvod do zpracování přirozeného jazyka)

week lecture lab homework
1: 4/10/2017 JH: Motivation for NLP. Basic notions from probability and information theory.
[slides]
ZŽ: Using basic bash command line tools for text processing. Collecting counts for a bigram language model in bash.
Optional reading:
2: 11/10/2017 JH: Language models. The noisy channel model.
[slides]
ZŽ: Character encoding.
[slides] (provisional!)
optional reading: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
3: 18/10/2017 JH/PP(?): Markov Models.
[slides]
ZŽ: Language Model exercises.
Optional reading:
  • Philipp Koehn's slides about Language Models in Statistical Machine Translation
  • a chapter on N-gram models in a book by Jurafsky&Martin
HW01: word coloring by a trigram model,
deadline 16/11/2017
4: 25/10/2017 ZŽ: Language data resources.
[slides]
ZŽ: Evaluation measures in NLP.
[slides] (provisional!)
register as a user of the Czech National Corpus (you will need it in the following week).
5: 1/11/2017 DZ: Morphological analysis.
[slides]
DZ: Czech National Corpus.
[notes/googledoc]
6: 8/11/2017 DZ: Syntactic analysis.
[slides]
DZ: Syntactically annotated corpora.
[slides]
HW02: valency dictionary of verbs,
deadline 6/12/2017
7: 15/11/2017 PP: Introduction to information retrieval, Boolean model, Inverted index. [slides] PP: Vector space model, TF-IDF weighting, Evaluation.
8: 22/11/2017 PP: Probabilistic models for information retrieval. [slides] PP: Language models for information retrieval. HW03: Experiments with an open-source IR toolkit. [slides], deadline 30/12/2017
9: 29/11/2017 OB: Machine Translation (overview, evaluation) and word alignment. [slides] OB: Word alignment. Finish the IBM1 implementation as described in the lab cell (<-). Deadline 15.1.2018.
10: 6/12/2017 OB: Statistical Machine Translation: PBMT and NMT. [main slides, PBMT decoding (P. Koehn)] OB: Neural MT with Marian at MetaCentrum.
11: 13/12/2017 OB: Linguistic features in SMT and NMT, Advanced NMT. [main slides, factored PBMT (P. Koehn), TectoMT (M. Popel), Neural MT (R. Sennrich), ACL 2016 tutorial on Neural MT (T. Luong, K. Cho, C. Manning)]. Continue with Neural MT with Marian at MetaCentrum. HW04: Empirical comparison of NMT attention and your IBM1 alignment. Deadline 15.1.2018.
12: 20/12/1017 No lecture. No lab.
13: 3/01/2018 JL: Deep Learning in NLP [slides] JL: Recurrent Neural Netowrks for checking y/i spelling in Czech [slides]
14: 10/01/2018 Written final exam test

Instructors

Homework tasks

Homework rules

Requirements for passing the course