Final exam test questions

Questions to the topic "Basic notions from probability and information theory"

What are the three basic properties of a probability function? (1 point)
When do we say that two events are (statistically) independent? (1 point)
Show how Bayes' Theorem can be derived. (1 point)
Explain Chain Rule. (1 point)
Explain the notion of Entropy (formula expected too). (1 point)
Explain Kullback-Leibler distance (formula expected too). (1 point)
Explain Mutual Information (formula expected too). (1 point)

Questions to the topic "Language models. The noisy channel model"

Explain the notion of The Noisy Channel. (1 point)
Explain the notion of the n-gram language model. (1 point)
Describe how Maximum Likelihood estimate of a trigram language model is computed. (2 points)
Why do we need smoothing (in language modelling)? (1 point)
Give at least two examples of smoothing methods. (2 points)

Questions to the topic "Markov Models"

What are the basic properties of the Markov Model? (1 point)
Describe the Hidden Markov Model. (2 points)
What is a trellis. (1 point)
Explain the Viterbi algorithm. (2 points)
Explaint the Baum-Welch algorithm. (2 points)

Questions to the topic "Character Encoding "

Explain the notions "character set" and "character encoding". (1 point)
Explain the main properties of ASCII. (1 point)
What 8-bit encoding do you know for Czech (or other European languages)? How do they differ from ASCII? (1 point)
What is Unicode and what Unicode encodings do you know? (1 point)
What is the relation between UTF-8 a ASCII? (1 point)
How can you detect the encoding of a file? (1 point)
How would you proceed if you are supposed to read a file encoded in ISO-8859-1, add a line number to each line and store it in UTF8? (a source code snippet in your favourite programming language is expected here) (1 point)

Questions to the topic "Language Data Resources "

Explain what a corpus is. (1 point)
Explain what annotation is (in the context of language resources). What types of annotation do you know? (2 points)
What are the reasons for variability of even basic types of annotation, such as the annotation of morphological categories (parts of speech etc.).(1 point)
Explain what a treebank is. Why trees are used? (2 points)
Explain what a parallel corpus is. What kind of alignments can we distinguish? (2 points)
What is a sentiment-annotated corpus? How can it be used? (1 points)
What is a coreference-annotated corpus? (1 points)
Explain how WordNet is structured? (1 points)
Explain the difference between derivation and inflection? (1 points)

Questions to the topic "Evaluation measures in NLP"

Explain the difference between intrinsic and extrinsic evaluation.(1 point)
In what kind of situations it makes no sense (or it is not possible) to evaluate just a simple one-dimensional accuracy percentage?(1 point)
Describe precision and recall formulas. Describe f-measure formula and explain why it is used for combining precision and recall (above all, why the simpler arithmetic means is not used instead). (2 points)
What is interannotator agreement and when it should be measured? (1 point)
Explain Cohen's cappa. (1 point)
Explain K-fold cross validation and when it is typically needed. (1 point)
Give examples of at least three distinct measures used in NLP, e.g. for automatic speech recognition, machine translation, information retrieval, or dependency parsing. (2 points)

Questions to the topic “Morphological Analysis”

What is a morphological tag? List at least five features that are often encoded in morphological tag sets. (1 point)
List the open and closed part-of-speech classes and explain the difference between open and closed classes. (1 point)
Explain the difference between a finite-state automaton and a finite-state transducer. Describe the algorithm of using a finite-state transducer to transform a surface string to a lexical string (pseudocode or source code in your favorite programming language). (2 points)
Give an example of a phonological or an orthographical change caused by morphological inflection (any natural language). Describe the rule that would take care of the change during analysis or generation. It is not required that you draw a transducer, although drawing a transducer is one of the possible ways of describing the rule. (1 point)
How would you handle irregular inflection in a finite-state toolkit? Give an example (any natural language). (1 point)
Give an example of a long-distance dependency in morphology (any natural language). How would you handle it in a morphological analyzer? (1 point)

Questions to the topic “Syntactic Analysis”

Describe dependency trees, constituent trees, differences between them and phenomena that must be addressed when converting between them. (2 points)
Give an example of a sentence (in any natural language) that has at least two plausible, semantically different syntactic analyses (readings). Draw the corresponding dependency trees and explain the difference in meaning. Are there other additional readings that are less probable but still grammatically acceptable? (2 points)
Explain the notion of non-projectivity. Why is it important in syntactic parsing? Give an example of a sentence (in any natural language) with at least one non-projective dependency relation; draw the dependency structure. (2 points)
What is coordination? Why is it difficult in dependency parsing? How would you capture coordination in a dependency structure? What are the advantages and disadvantages of your solution? (1 point)
What is ellipsis? Why is it difficult in parsing? Give examples of different kinds of ellipsis (any natural language). (1 point)
Explain the notion of verb valency. If you had a valency dictionary, could it help with syntactic analysis? How? (1 point)
How would you construct a syntactic parser for a language for which you have a reasonably large treebank (constituent-based or dependency-based, the choice is yours). Outline the backbone of the algorithm you would use. (1-2 points)

Questions to the topic "Information retrieval"

What is the difference between information need and query.
What is inverted index and what are the optimal data structures for it?
What is stopword and what is it useful for?
What is the bag-of-word principle?
What is the main advantage and disadvantage of boolean model.
Explain the role of the two components in the TF-IDF weighting scheme.
What is length normalization in vector space model what is it useful for?
What is precision/recall trade-off?
Give the basic formula used to rank documents in the language model approach to IR.
What is smoothing (in language modeling) and name three methods.

Questions to the topic "Machine Translation (overview, evaluation) and word alignment "

Why is MT difficult from linguistic point of view? Provide examples and explanation for at least three different phenomena. (2 points)
Why is MT difficult from computational point of view? We have mentioned only some of the issues. (1 point)
Briefly describe at least three methods of manual MT evaluation. (1-2 points)
Describe BLEU. 1 point for the core properties explained, 1 point for the (commented) formula.
Describe IBM Model 1 for word alignment, highlighting the EM structure of the algorithm. (1 point)
Suggest limitations of IBM Model 1. Provide examples of sentences and their translations where the model is inadequate, suggest a solution for at least one of them. (1 point)

Questions to the topic "Statistical Machine Translation: PBMT vs. NMT"

Explain using equations the relation between Noisy channel model and log-linear model. (2 points)
In the first step of phrase-based translation, all relevant phrase translations are considered for an input sentence. How the phrase translations were obtained? What scores are associated with phrase translations? Roughly suggest how the scores can be estimated? (2 points)
Describe the loop of weight optimization for the log-linear model as used in phrase-based MT. (1 point)
Describe the critical limitation of PBMT that NMT solves. Provide example training data and example input where PBMT is very likely to introduce an error. (1 points)
Use formulas to highlight the similarity of NMT and LMs. (1 point)
Describe, how words are fed to current NMT architectures and explain why is this necessary. (1 point)
Sketch the structure of an encoder-decoder architecture of neural MT, remember to describe the components in the picture (2 points)
What problem does attention in NMT address? Provide the key idea of the method. (1 point)

Questions to the topic "Linguistic features in SMT and NMT"

Provide 3 examples of factored phrase-based MT setups addressing various linguistics phenomena, explaining what are their potential benefits. (2 point)
When factors are used for target-side morphology, what they are meant to solve and how and why they can fail? (2 point)
Sketch the idea of the reverse self-training approach. What benefits it brings? (1 point)
Why is non-projectivity important in MT? Provide an example. (1 point)
Describe what a 'transfer-based' MT architecture means, illustrate the design of the deep-syntactic layer used for Czech-English translation. What are the potential benefits of transferring at this deep-syntactic layer? (2 points)
What are the problems of transfer-based MT? (1 point)
Describe one possible approach of combining an external MT system with a phrase-based MT system. What benefits can this approach have? (2 point)
What is multi-task training? Describe and provide at least two examples of multi-task training for sequence-to-sequence architecture (achieved simply by data mangling, not specific model architectures). (1 point)
What is catastrophic forgetting? Provide a specific example. (1 point)

Questions to the topic "Vector space models (word embeddings)"

What training objectives can be used while pre-training word embeddings? (1 point)
Name deep learning techniques that can be used for sentence classification? (1 point)
What is the common intuition behind using convolutional networks for sentence classification? (1 point)
Describe Gated Recurrent Unit. What are the main advantages while compared to vanilla recurrent units? (2 points)
Imagine you were asked to implement a spellchecker component that would check the capitalization in English sentences. How would you solve it using deep learning — what data would you use, how would you process them, what model would you use? (2 points)