PFL068 Statistical methods in NLP II

Final Exam

Date: May 16, 2006 @ 2pm (90 min.)

Year / program:_________________________

Name:__________________________




 
 
 
 
  If asked to compute something for which you have the numbers, that really means to compute the final number, not just to write the formula1. If asked for a formula, write down the formula.
 
 

1. The Basics: Probability

Suppose your training data contains (only) the sentence (http://www.petpainter.com,  slightly adapted)

We specialize in petbirds including but not limited to dogs , cats , and horses .

You are building a bigram language model p(wi|wi-1) using maximum likelihood estimation. You do not add any special symbols/words to the beginning and end of the data (i.e., you use separate unigram distribution for the first position in any given data).

              ________________________________________________________

2. Tagging

1. ____________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
2. ____________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
3. ____________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

3. Shift-Reduce Parsing

Let's assume the following Context-Free Grammar  G (here with seven numbered rules):
 
#1 S -> A R
#2 A -> A B
#3 A -> a
#4 B -> b
#5 R -> B R C
#6 R -> r
#7 C -> c
 
aaaabr      Yes/No: ____________
abbbrcc     Yes/No: ____________
ar          Yes/No: ____________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

 
Input stack                                  State tack           Rule (Reduction)                Backtracking stack
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

4. Statistical Parsing

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________


Now suppose the following "treebank" data, where the pair of symbols ( ... )XX denotes a (sub)tree rooted in a nonterminal XX.

Sentence 1:

(((the)DET (dog)N)NP ((saw)V (((the)DET (cat)N)NP ((with)PREP (((a)DET (telescope)N)NP ((with)PREP ((a)DET (mirror)N)NP)PP)NP)PP)M2)VP)S

Sentence 2:

(((the)DET (dog)N)NP ((saw)V (((the)DET (cat)N)NP ((with)PREP ((a)DET (mirror)N)NP)PP )NP)VP)S
 


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

           the dog saw the cat with a telescope with a mirror
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

                 the dog saw the cat with a mirror
 
 

#01 ______________________________
#02 ______________________________
#03 ______________________________
#04 ______________________________
#05 ______________________________
#06 ______________________________
#07 ______________________________
#08 ______________________________
#09 ______________________________
#10 ______________________________
#11 ______________________________
#12 ______________________________
#13 ______________________________
#14 ______________________________
#15 ______________________________

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

                 the dog saw the cat with a mirror

OK. Now you have everything to answer these "numerical" questions:
 


Suppose we stick to the usual three independence asumptions and define a probability of a parse tree s (of a sentence W) as a simple product of p(<rule>) over all rules used in the course of generation of the sentence W.
 
 
 
 

Describe the main idea of the algorithm to compute:

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

 
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

  And that's it for today!
============================================================================

Now check if you have filled in your name. Also, please carefully check your answers and hand the exam in. 


1 Use of calculators is allowed.