Feb 19, 2020, 15:40 in SU1
Intro to NLP
Lecturer: Jan Hajič
- Motivation for NLP.
- Basic notions from probability and information theory.
Feb 26, 2020, 15:40 in SU1 + 17:20 in S7
Assignment on diacritics
Lecturer: Jan Hajič
- Language models.
- The noisy channel model.
- Markov models.
March 4, 2020, 15:40 in SU1
Lecturer: Daniel Zeman
- Morphological tags, parts of speech, morphological categories.
- Finite-state morphology.
March 11, 2020, 15:40 in SU1 + 17:20 in S7 CANCELED (GOVERNMENT REGULATION)!
Lecturer: Daniel Zeman
- Dependency vs. phrase-based model.
- Dependency parsing.
March 18, 2020, 15:40 in SU1
Lecturer: Pavel Pecina
- Intro to IR.
- Boolean model.
- Inverted index.
Information retrieval, cont.
March 25, 2020, 15:40 in SU1 + 17:20 in S7
Lecturer: Pavel Pecina
- Probabilistic models for Information Retrieval.
Overview of Language Data Resources
April 1, 2020, 15:40 in SU1
Lecturer: Zdeněk Žabokrtský
- Types of language data resources.
- Annotation principles.
Evaluation measures in NLP
April 8, 2020, 15:40 in SU1 + 17:20 in S7
Lecturer: Zdeněk Žabokrtský
- Purposes of evaluation.
- Evaluation best practices, estimating upper and lower bounds.
- Task-specific measures.
Introduction to Deep Learning in NLP
April 15, 2020, 15:40 in SU1
Deep learning intro
Lecturer: Jindřich Helcl
- Representing sequences by neural networks.
- Vector space representations of words.
Deep learning applications
April 22, 2020, 15:40 in SU1 + 17:20 in S7
DL in applications
Lecturer: Jindřich Helcl
- Examples of DL applied in NLP: information search, unsupervised dictionary induction, image captioning
April 29, 2020, 15:40 in SU1
MT intro+Word Alignment+PBMT
Word Alignment by Philipp Koehn
Recording of the Lecture
Assignment on Word Alignment
Lecturer: Ondřej Bojar
- Introduction to MT.
- MT evaluation.
- Phrase-Based MT.
Machine translation, cont.
May 13, 2020, 15:40 in SU1 + 17:20 in S7
Main Slides: Neural MT
Extra Slides: Transformer
Recording of the Lecture
- Fundamental problems of PBMT.
- Neural machine translation (NMT).
- Brief summary of NNs.
- Sequence-to-sequence, with attention.
- Transformer, self-attention.
- Linguistic features in NMT.
Final written test
May 20, 2020, 15:40 in SU1
The first option for passing the final exam written test ("předtermín").
Deadline: 5th April 2020, 23:59
- Implement a program that reads a Czech text with removed diacritics from STDIN and print the same text with restored diacritics to STDOUT.
- Possible solution: build a Czech corpus of your own (e.g. by downloading a few e-books or news or Wikipedia or ...) that contains at least one 100k words. Create a modified copy of the corpus in which all Czech diacritics is removed. Extract a mapping from words without diacritics to words with diacritics. For out-of-vocabulary words use letter-trigram language model.
- Evaluate the accuracy of the restoration as a percentage of correct non-white characters in the output.
- Evaluation datasets - 2 randomly chosen articles from vesmir.cz:
- You can use any programming language as long as it can be compiled/executed on a Linux without too much tweaking (esp. without purchasing any license). Recommended choice: Python 3.
- You can use the devtest data any times you need, but you should use the etest data for evaluation only once.
- Ideally, organize the execution of the whole experiment into a Makefile that (after typing make all) downloads your training data, as well as the development and evaluation test sets from the links above, trains the model, applies it on the development data and evaluates the accuracy.
- Submission: please zip the whole directory and send it by email to email@example.com by the given deadline.
Deadline: 10th May 2020, 23:59
Information retrieval toolkits
- Learn about publicly available information retrieval toolkits and choose one of them.
- Use the selected toolkit to experiment with various retrieval techniques, pre- and post-processing, and other methods.
- Optimize the system on a test collection and a set of training topics
- Write a report on your experiments.
Detailed instructions and data will be sent by email.
Deadline: 8th June 2020, 23:59
Word alignment -- IBM Model 1
The goal of the homework is to implement IBM Model 1 for word alignment (a model
that considers only lexical values of words, i.e. the words as they are
written, not their position etc.). For simplicity, we are going to evaluate the output only visually, by looking at the resulting alignments and dictionary.
- Implement the IBM model 1 as shown in pseudocode in the slides.
- Download manual word alignments: czenali.gz (2501 lines)
- The data originally come from: Czech-English manual word alignments.
- I concatenated all files
- I stripped SGML and converted to four tab-delimited columns: English, Czech, sure alignments, possible alignments.
- IMPORTANT: The alignments provided are only for reference, your script must not look at them. They are to contrast your outputs with manual outputs.
- For illustration, you may want to print the manual alignments in plaintext using alitextview.pl:
zcat czenali.gz | ./alitextview.pl --indexed-from-one | less
- Run your implementation (use only top N sentence pairs and only few iterations so that your non-optimized code finishes quickly).
- Print automatic translation dictionary:
- For each pair of English-Czech tokens, print top three pairs according to P(English|Czech).
- Pick a threshold for the conditional probability and print alignments in the format accepted by alitextview.
- Try improving the resulting alignment and dictionary by various token-level pre-processing (lowercasing, stemming, lemmatization).
To submit your homework, send by e-mail to Ondřej Bojar the following:
Your IBM Model 1 implementation.
- The settings you used: how many sentences, how many iterations, what type of pre-processing, what threshold.
- Your best translation dictionary, i.e. the top three pairs.
- Your best alignments, i.e. the input to alitextview:
- Source tokens (in your pre-processing).
- Target tokens (in your pre-processing).
- Your best alignment, in the x-y notation.
Pool of possible exam questions
The final written exam tests will be assembled exclusively from questions selected from the following list:
(warning: the question list might be subject to occasional changes during the semester; it's final version will be announced at least three weeks before the first exam date.)
Basic notions from probability and information theory.
Language models. The noisy channel model.
- What are the three basic properties of a probability function? (1 point)
- When do we say that two events are (statistically) independent? (1 point)
- Show how Bayes' Theorem can be derived. (1 point)
- Explain Chain Rule. (1 point)
- Explain the notion of Entropy (formula expected too). (1 point)
- Explain Kullback-Leibler distance (formula expected too). (1 point)
- Explain Mutual Information (formula expected too). (1 point)
- Explain the notion of The Noisy Channel. (1 point)
- Explain the notion of the n-gram language model. (1 point)
- Describe how Maximum Likelihood estimate of a trigram language model is computed. (2 points)
- Why do we need smoothing (in language modelling)? (1 point)
- Give at least two examples of smoothing methods. (2 points)
- What is a morphological tag? List at least five features that are often encoded in morphological tag sets. (1 point)
- List the open and closed part-of-speech classes and explain the difference between open and closed classes. (1 point)
- Explain the difference between a finite-state automaton and a finite-state transducer. Describe the algorithm of using a finite-state transducer to transform a surface string to a lexical string (pseudocode or source code in your favorite programming language). (2 points)
- Give an example of a phonological or an orthographical change caused by morphological inflection (any natural language). Describe the rule that would take care of the change during analysis or generation. It is not required that you draw a transducer, although drawing a transducer is one of the possible ways of describing the rule. (1 point)
- How would you handle irregular inflection in a finite-state toolkit? Give an example (any natural language). (1 point)
- Give an example of a long-distance dependency in morphology (any natural language). How would you handle it in a morphological analyzer? (1 point)
Language data resources.
- What is the difference between information need and query.
- What is inverted index and what are the optimal data structures for it?
- What is stopword and what is it useful for?
- What is the bag-of-word principle?
- What is the main advantage and disadvantage of boolean model.
- Explain the role of the two components in the TF-IDF weighting scheme.
- What is length normalization in vector space model what is it useful for?
- What is precision/recall trade-off?
Evaluation measures in NLP.
- Explain what a corpus is. (1 point)
- Explain what annotation is (in the context of language resources). What types of annotation do you know? (2 points)
- What are the reasons for variability of even basic types of annotation, such as the annotation of morphological categories (parts of speech etc.).(1 point)
- Explain what a treebank is. Why trees are used? (2 points)
- Explain what a parallel corpus is. What kind of alignments can we distinguish? (2 points)
- What is a sentiment-annotated corpus? How can it be used? (1 points)
- What is a coreference-annotated corpus? (1 points)
- Explain how WordNet is structured? (1 points)
- Explain the difference between derivation and inflection? (1 points)
Vector space models and deep learning in NLP.
- Give at least two examples of situations in which measuring a percentage accuracy is not adequate.
- Explain: precision, recall
- What is F-measure, what is it useful for?
- What is k-fold cross-validation ?
- Explain BLEU (the exact formula not needed, just the main principles).
- Explain the purpose of brevity penalty in BLEU.
- What is Labeled Attachment Score (in parsing)?
- What is Word Error Rate (in speech recognition)?
- What is inter-annotator agreement? How can it be measured?
- What is Cohen's kappa?
- What training objectives can be used while pre-training word embeddings? (1 point)
- Name deep learning techniques that can be used for sentence classification? (1 point)
- What is the common intuition behind using convolutional networks for sentence classification? (1 point)
- Describe Gated Recurrent Unit. What are the main advantages while compared to vanilla recurrent units? (2 points)
- Imagine you were asked to implement a spellchecker component that would check the capitalization in English sentences. How would you solve it using deep learning — what data would you use, how would you process them, what model would you use? (2 points)
- Why is MT difficult from linguistic point of view? Provide examples and explanation for at least three different phenomena. (2 points)
- Why is MT difficult from computational point of view? We have mentioned only some of the issues. (1 point)
- Briefly describe at least three methods of manual MT evaluation. (1-2 points)
- Describe BLEU. 1 point for the core properties explained, 1 point for the (commented) formula.
- Describe IBM Model 1 for word alignment, highlighting the EM structure of the algorithm. (1 point)
- Suggest limitations of IBM Model 1. Provide examples of sentences and their translations where the model is inadequate, suggest a solution for at least one of them. (1 point)
- What are the benefits on modelling statistical MT using the Noisy Channel model (Bayes law)? Use equations. (1 point)
- Explain using equations the relation between Noisy channel model and log-linear model. (2 points)
- In the first step of phrase-based translation, all relevant phrase translations are considered for an input sentence. How the phrase translations were obtained? What scores are associated with phrase translations? Roughly suggest how the scores can be estimated. (2 points)
- Describe the loop of weight optimization for the log-linear model as used in phrase-based MT. (1 point)
- Describe the critical limitation of PBMT that NMT solves. Provide example training data and example input where PBMT is very likely to introduce an error. (1 points)
- Use formulas to highlight the similarity of NMT and LMs. (1 point)
- Describe, how words are fed to current NMT architectures and explain why is this necessary. (1 point)
- Sketch the structure of an encoder-decoder architecture of neural MT, remember to describe the components in the picture (2 points)
- What is the difference in RNN decoder application at training time vs. at runtime? (1 point)
- Explain the idea of representation learnign and provide examples where some representation is learnt in NMT, or (as a fallback option) in computer vision. (1 point)
- What problem does attention in NMT address? Provide the key idea of the method. (1 point)
- What problem/task do both RNN and self-attention resolve and what is the main benefit of self-attention over RNN? (1 point)
- Briefly explain (single-head) self-attention (you may or may not use formulas). (1 point)
- Suggest phenomena that multi-head attention may potentially want to reflect (but no guarantee that it will actually do anything like that). (1 point)
- What are the three uses of self-attention in the Transformer model? (1 point)
- Describe at least two possible ways of using linguistic information in NMT. Did it convincingly help, or were there some issues? (1 point)
- Summarize and compare the goal of "classical statistical MT" vs. the goal of neural approaches to MT. (1 point)
- There will be 3 homework assignments.
- For each assignment, you will get points, up to a given maximum
(the maximum is specified with each assignment).
- All assignments will have a fixed deadline (usually in two weeks).
- If you submit the assignment after the deadline, you will get:
- up to 50% of the maximum points if it is less than 2 weeks after the deadline;
- 0 points if it is more than 2 weeks after the deadline.
- Once we check the submitted assignments, you will see the points you got and
the comments from us in:
- To be allowed to take the test (which is required to pass the course), you need to get at least 50% of the total points from
Your grade is based on the average of your performance;
the exam test and the homework assignments are weighted 1:1.
- ≥ 90%: grade 1 (excellent)
- ≥ 70%: grade 2 (very good)
- ≥ 50%: grade 3 (good)
- < 50%: grade 4 (fail)
For example, if you get
600 out of 1000 points for homework assignments (60%)
and 36 out of 40 points for the test (90%),
your total performance is 75% and you get a 2.
- Cheating is strictly prohibited and any student found cheating will be punished.
The punishment can involve failing the whole course, or, in grave cases,
being expelled from the faculty.
- Discussing homework assignments with your classmates is OK. Sharing code is
not OK (unless explicitly allowed); by default, you must complete the assignments yourself.
- All students involved in cheating will be punished. E.g. if you share
your assignment with a friend, both you and your friend will be punished.