600.465 Introduction to NLP (Fall 2000)

Midterm Exam Answers

Date: Oct 30, 2000 2pm (30 min.)

If asked to compute something for which you have the numbers, that really means to compute the final number, not just to write the formula. If asked for a formula, write down the formula.

1. Probability

Let S = { a, b, c } (the sample space), and p be the joint distribution on a sequence of two events (i.e. on S x S, ordered). If you know that p(a,a) [a followed by a] = 0.25, p(c,c) [c followed by c] = 0.25, p(b,a) [b followed by a] = 0.125, p(b,b) [b followed by b] = 0, p(a,c) [a followed by c] = 0.25, pL(a) [unigram probability of a as a left-hand bigram member] = .5, and pR(b) [unigram probability of b as the right-hand bigram member] = 0.125, is it enough to compute p(b|c) (i.e., the probability of seeing b if we already know that the preceding event generated c)?

Yes / No: __Yes__
why? __The bigram probabilities sum up to 0.875; the unigram constraints further determine that p(a,b) = 0, thus the remaining .125 must be at p(c,b) (from p(a,b) + p(b,b) + p(c,b) = pR(b)); therefore p is fully defined.___

__Therefore we can get p_L(c) = sum over i of p(c,i), then p(b|c) = p(c,b)/p(c)._
If yes, compute: p(b|c) = ___1/3__( = p(c,b) / p_L(c) = .125 / .375)____

2. Estimation and Cross-entropy

Use the bigram distribution p from question 1.

Write one example of a data sequence which faithfully follows the distribution (i.e., a training data from which we would get the above bigram distribution using the MLE method):

E.g.: __a__ __c__ __c__ __b__ __a__ __a__ __a__ __c__ __c__
What is the cross-entropy H_data(p) in bits and the perplexity¹ G_data(p) of the bigram distribution from question 1 if computed against the following data (use the data-oriented formula for conditional distribution derived from p):

data = b a a a

H_data(p) = ____5/4______
_{___}
G_data(p) = _ 2 x ⁴V2 __ (twice the square root of square root of 2)

3. Mutual information

Use the bigram distribution from question 1.

What is the pointwise mutual information of b and a (in this order)?

I_pointwise(b,a) = _ 3-log₂(3) = 1.415... _( = log₂(p(b,a)/p_L(b)p_R(a)) = log₂(0.125/((0.125)(0.375))) = log₂(8/3) __

4. Smoothing and the sparse data problem

Name three methods of smoothing:
- ____"Add 1" (or "Add lambda")_____________________________
- ____Good-Turing___________________________________________
- ____Linear Interpolation__________________________________
If you were to design a trigram language model, how would the final smoothed distribution be defined if you use the linear interpolation smoothing method?
- ____p₃'(w_i|w_i-2,w_i-1) = l₃p₃(w_i|w_i-2,w_i-1) + l₂p₂(w_i|w_i-1) + l₁p₁(w_i) + l₀(1/|V|), l₀ + l₁ + l₂ + l₃ = 1___

5. Classes based on Mutual Information

Suppose you have the following data:

It is interesting to watch , at least from the foreign policy perspective , how the wannabe president George W . differs from his father , the former president George Bush .

What is the best pair of candidates for the first merge, if you use the greedy algorithm for classes based on bigram mutual information (i.e. the homework #2 algorithm)? Use your judgment, not computation; in case of two or more best candidates, write as many as you can find.

2 solutions:

Word 1: ___W________________ Word 2: ___Bush_____________

Word 1: ___wannabe__________ Word 2: ___former___________

6. Hidden Markov Models

What is the Trellis algorithm good for? (Use max. 5 sentences for the answer.)

____To find the probability of a string based on____________________________

____a parametrized (trained) HMM which presumably_______________

____generated the string._______________________________________________

_________________________________________________________________________

_________________________________________________________________________
What is the Viterbi algorithm good for? (Use max. 5 sentences for the answer.)

____To find the best path through a state sequence graph given________________

____a parametrized (trained) HMM and some input data presumably_______________

____generated by that HMM._______________________________________________

_________________________________________________________________________

_________________________________________________________________________

Now check if you have filled in your name and SSN. Also, please carefully check your answers and hand the exam in.

¹ The cross-entropy and perplexity computation is the only one computation here for which you might need a calculator; it is ok if you use an expression (use the appropriate (integer) numbers, though!).