600.465 Introduction to NLP (Fall 1999)

Final Exam

Date: Dec 21 9am (90 min.)

SSN:__________________________________

Name:_________________________________




  If asked to compute something for which you have the numbers, that really means to compute the final number, not just to write the formula1. If asked for a formula, write down the formula.

1. The Basics: Probability, Smoothing Entropy, and Cross-Entropy

Suppose your training data contains (only) the sentence (http://www.petpainter.com,  slightly adapted)

We specialize in petbirds including but not limited to dogs , cats , and horses .

You are building a bigram language model p(wi|wi-1) using maximum likelihood estimation. You do not add any special symbols/words to the beginning and end of the data (i.e., you use separate unigram distribution for the first position in any given data).

              ________________________________________________________               ___________________________

Now let's turn to the issues of smoothing and cross-entropy. Suppose this is your test data (http://hometown.aol.com/~arkful):

We also have gift baskets for dogs , cats , and horses and their owners .
 

             __________________________________________________              __________________________________________________
 

2. Maximum Entropy

               p(y|x) =  ____________________________________________                       _______________________________________


               f(y,x) = 1 iff y = , or and, and there is no other comma to the left of y (within the same sentence).
               g(y,x) = 1 iff y = and, and there is a comma immediately in front of it.

               Write down the constraints of the corresponding ME model, using the training data from the part 1:

   We specialize in petbirds including but not limited to dogs , cats , and horses .
              __________________________________________________________

               __________________________________________________________


                   lambdaf  = __________________________________________________________

                   lambdag  = __________________________________________________________
 

3. Tagging

1. ____________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
2. ____________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
3. ____________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

4. Shift-Reduce Parsing

Let's assume the following Context-Free Grammar  G (here with seven numbered rules):
 
#1 S -> A R
#2 A -> A B
#3 A -> a
#4 B -> b
#5 R -> B R C
#6 R -> r
#7 C -> c
aaaabr      Yes/No: ____________
abbbrcc     Yes/No: ____________
ar          Yes/No: ____________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

 
Input stack                                  State tack           Rule (Reduction)                Backtracking stack
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

5. Statistical Parsing

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________


Now suppose the following "treebank" data, where the pair of symbols ( ... )XX denotes a (sub)tree rooted in a nonterminal XX.

Sentence 1:

(((the)DET (dog)N)NP ((saw)V (((the)DET (cat)N)NP ((with)PREP (((a)DET (telescope)N)NP ((with)PREP ((a)DET (mirror)N)NP)PP)NP)PP)M2)VP)S

Sentence 2:

(((the)DET (dog)N)NP ((saw)V (((the)DET (cat)N)NP ((with)PREP ((a)DET (mirror)N)NP)PP )NP)VP)S
 


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

           the dog saw the cat with a telescope with a mirror
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

                 the dog saw the cat with a mirror
 
 

#01 ______________________________
#02 ______________________________
#03 ______________________________
#04 ______________________________
#05 ______________________________
#06 ______________________________
#07 ______________________________
#08 ______________________________
#09 ______________________________
#10 ______________________________
#11 ______________________________
#12 ______________________________
#13 ______________________________
#14 ______________________________
#15 ______________________________

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

                 the dog saw the cat with a mirror

OK. Now you have everything to answer these "numerical" questions:
 


Suppose we stick to the usual three independence asumptions and define a probability of a parse tree s (of a sentence W) as a simple product of p(<rule>) over all rules used in the course of generation of the sentence W.
 
 
 
 

Describe the main idea of the algorithm to compute:

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

 
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

  And that's it for today!
============================================================================

Now check if you have filled in your name and SSN. Also, please carefully check your answers and hand the exam in. 


1 Use of calculators is allowed.