Introduction to Natural Language Processing (Úvod do zpracování přirozeného jazyka)

week lecture lab homework
1: 3/10/2018 JH: Motivation for NLP. Basic notions from probability and information theory.
ZŽ: Using basic bash command line tools for text processing. Collecting counts for a bigram language model in bash (see the list of exercises) .
Optional reading:
2: 10/10/2018 PP: Language models. The noisy channel model.
ZŽ: Character encoding.
optional reading: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
3: 17/10/2018 JH/PP(?): Markov Models.
ZŽ: Language Model exercises.
Optional reading:
  • Philipp Koehn's slides about Language Models in Statistical Machine Translation
  • a chapter on N-gram models in a book by Jurafsky&Martin
HW01: diacritics restoration in Czech texts using a letter-trigram model, deadline: see below
4: 24/10/2018 ZŽ: Language data resources.
ZŽ: Evaluation measures in NLP.
[slides] (provisional!)
register as a user of the Czech National Corpus (you will need it in the following week).
5: 31/10/2018 PP: Introduction to information retrieval, Boolean model, Inverted index. [slides] PP: Vector space model, TF-IDF weighting, Evaluation.
6: 7/11/2018 PP: Probabilistic models for information retrieval. [slides] PP: Language models for information retrieval. HW02: Experiments with an open-source IR toolkit. [slides], deadline T.B.A.
7: 14/11/2018 DZ: Morphological analysis.
DZ: Czech National Corpus.
8: 21/11/2018 DZ: Syntactic analysis.
DZ: Syntactically annotated corpora.
HW03: valency dictionary of verbs,
deadline 31.12.2018
9: 28/11/2018 JL: Intorudction to deep Learning in NLP [slides] JL: Sentence classification in Pytorch [slides], [ipython]
10: 5/12/2018 JL: Application of deep learning in application NLP [slides] JL: Recurrent Neural Netowrks for checking y/i spelling in Czech in TensorFlow [slides], [ipython]
11: 12/12/2018 OB: Machine Translation (overview, evaluation) and alignment. [slides] OB: Word alignment. Finish IBM1, start working on HW04
12: 19/12/2018 OB: Statistical Machine Translation: PBMT and NMT. [main slides, extra illustrations: PBMT decoding (P. Koehn)] OB: to be updated: Neural MT with Marian at MetaCentrum. HW04: Empirical comparison of NMT attention and your IBM1 alignment. Deadline 09/01/2019
14: 9/1/2019 OB: Linguistic features in SMT and NMT, Advanced NMT. [to be updated main slides, factored PBMT (P. Koehn), TectoMT (M. Popel), Neural MT (R. Sennrich), ACL 2016 tutorial on Neural MT (T. Luong, K. Cho, C. Manning)]. Finalize HW04, resolve any issues.
Most probably 16/01/2019 Written final exam test


Homework tasks

Requirements for passing the course