600.465 Introduction to NLP (Fall 2000)

Midterm Exam

Date: Oct 30, 2000 2pm (30 min.)

Name: ___________________________________________

SSN: ___________________________________________

If asked to compute something for which you have the numbers, that really means to compute the final number, not just to write the formula. If asked for a formula, write down the formula.

1. Probability

Let S = { a, b, c } (the sample space), and p be the joint distribution on a sequence of two events (i.e. on S x S, ordered). If you know that p(a,a) [a followed by a] = 0.5, p(c,c) [c followed by c] = 0.25, p(b,a) [b followed by a] = 0.25, p(b,b) [b followed by b] = 0, p(a,c) [a followed by c] = 0.125, pL(a) [unigram probability of a as a left-hand bigram member] = .25, and pR(b) [unigram probability of b as the right-hand bigram member] = 0.125, is it enough to compute p(b|c) (i.e., the probability of seeing b if we already know that the preceding event generated c)?

Yes / No: _______
why? _________________________________________________________________

______________________________________________________________________

______________________________________________________________________
If yes, compute: p(b|c) = _________________________________________

2. Estimation and Cross-entropy

Use the bigram distribution p from question 1.

Write one example of a data sequence which faithfully follows the distribution (i.e., a training data from which we would get the above bigram distribution using the MLE method):

_____ _____ _____ _____ _____ _____ _____ _____ _____
What is the cross-entropy H_data(p) in bits and the perplexity¹ G_data(p) of the bigram distribution from question 1 if computed against the following data (use the data-oriented formula for conditional distribution derived from p):

data = b a a b

H_data(p) = ___________________________ G_data(p) = ___________________________

3. Mutual information

Use the bigram distribution from question 1.

What is the pointwise mutual information of a and b (in this order)?

I_pointwise(b,a) = ___________________________________________

4. Smoothing and the sparse data problem

Name three methods of smoothing:
- _________________________________________________
- _________________________________________________
- _________________________________________________
If you were to design a trigram language model, how would the final smoothed distribution be defined if you use the linear interpolation smoothing method?
- _______________________________________________________________________

5. Classes based on Mutual Information

Suppose you have the following data:

It is interesting to watch , at least from the foreign policy perspective , how the wannabe president George W . differs from his father , the former president George Bush .

What is the best pair of candidates for the first merge, if you use the greedy algorithm for classes based on bigram mutual information (i.e. the homework #2 algorithm)? Use your judgment, not computation; in case of two or more best candidates, write as many as you can find.

Word(s) 1: ____________________________________________________________
Word(s) 2: ____________________________________________________________

6. Hidden Markov Models

What is the Trellis algorithm good for? (Use max. 5 sentences for the answer.)

_________________________________________________________________________

_________________________________________________________________________

_________________________________________________________________________

_________________________________________________________________________

_________________________________________________________________________
What is the Viterbi algorithm good for? (Use max. 5 sentences for the answer.)

_________________________________________________________________________

_________________________________________________________________________

_________________________________________________________________________

_________________________________________________________________________

_________________________________________________________________________

Now check if you have filled in your name and SSN. Also, please carefully check your answers and hand the exam in.

¹ The cross-entropy and perplexity computation is the only one computation here for which you might need a calculator; but it is ok if you use an expression (use the appropriate (integer) numbers, though!).