NPFL067 Introduction to Statistical NLP I

Winter semester exam: Jan 14, 2010 12:20pm (60 min.), S4

Name: ___________________________________________ Year: _____________

If asked to compute something for which you have the numbers, that really means to compute the final number, not just to write the formula. If asked for a formula, write down the formula.

1. Probability

Let S = { a, b, c } (the sample space), and p be the joint distribution on a sequence of two events (i.e. on S x S, ordered). If you know that p(a,a) [a followed by a] = 0.25, p(c,c) [c followed by c] = 0.25, p(b,a) [b followed by a] = 0.125, p(b,b) [b followed by b] = 0, p(a,c) [a followed by c] = 0.25, pL(a) [unigram probability of a as a left-hand bigram member] = .5, and pR(b) [unigram probability of b as the right-hand bigram member] = 0.125, is it enough to compute p(b|c) (i.e., the probability of seeing b if we already know that the preceding event generated c)?

2. Estimation and Cross-entropy

Use the bigram distribution p from question 1.
1 The cross-entropy and perplexity computation is the only one here for which you might need a calculator; but it is ok if you use an expression (use the appropriate (integer) numbers, though!).

3. Mutual information

4. Smoothing and the sparse data problem

5. Classes based on Mutual Information

Suppose you have the following data:

It is interesting to watch , at least from the social point of view , how the current financial crisis differs from the big financial meltdown following the 1929 stock market crash .

What is the best pair of candidates for the first merge, if you use the greedy algorithm for classes based on bigram mutual information? Use your judgment, not computation; in case of two or more best candidates, write as many as you can find.

6. Hidden Markov Models

Now check if you have filled in your name and year of study (graduate students please write "PGS"). Also, please carefully check your answers and hand the exam in.