PFL068 Statistical methods in NLP II

Final Exam

Date: May 22, 2007 @ 2pm (90 min.)

Year / program:_________________________

Name:__________________________




 
 
 
 
 
 
 
  If asked to compute something for which you have the numbers, that really means to compute the final number, not just to write the formula1. If asked for a formula, write down the formula.
 
 

1. The Basics: Probability

Suppose your training data contains (only) the sentence (http://www.petpainter.com,  slightly adapted)

We specialize in petbirds including but not limited to dogs , cats , and horses .

You are building a bigram language model p(wi|wi-1) using maximum likelihood estimation. You do not add any special symbols/words to the beginning and end of the data (i.e., you use separate unigram distribution for the first position in any given data).

              ____________________________________________________________________________________________  
 
 
 
 

2. Tagging

1. ____________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
2. ____________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
3. ____________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
 
 
 
 

3. Shift-Reduce Parsing

Let's assume the following Context-Free Grammar  G (here with seven numbered rules):
 
#1 S -> K W
#2 K -> K B
#3 B -> b
#4 K -> k
#5 W -> B W C
#6 C -> c
#7 W -> w
 
kkkkbw      Yes/No: ____________
kbbbwcc     Yes/No: ____________
kw          Yes/No: ____________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

4. Statistical Parsing

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________


Now suppose the following "treebank" data, where the pair of symbols ( ... )XX denotes a (sub)tree rooted in a nonterminal XX.

Sentence 1:

(((the)DET
(dog)N)NP
((saw)V
(((the)DET
(cat)N)NP
((with)PREP
(((a)DET
(telescope)N)NP
((with)PREP
((a)DET
(mirror)N)NP)PP)NP)PP)M2)VP)S

Sentence 2:

(((the)DET
(dog)N)NP
((saw)V
(((the)DET
(cat)N)NP
((with)PREP
((a)DET
(mirror)N)NP)PP)NP)VP)S

 


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

           the dog saw the cat with a telescope with a mirror
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

                 the dog saw the cat with a mirror
 
 

#01 ______________________________
#02 ______________________________
#03 ______________________________
#04 ______________________________
#05 ______________________________
#06 ______________________________
#07 ______________________________
#08 ______________________________
#09 ______________________________
#10 ______________________________
#11 ______________________________
#12 ______________________________
#13 ______________________________
#14 ______________________________
#15 ______________________________

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

                 the dog saw the cat with a mirror

OK. Now you have everything to answer these "numerical" questions:
 


Suppose we stick to the usual three independence asumptions and define a probability of a parse tree s (of a sentence W) as a simple product of p(<rule>) over all rules used in the course of generation of the sentence W.
 

Describe the main idea of the algorithm to compute:

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

 
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________

 

5. Statistical Machine Translation

__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________


__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________


And that's it for today!
============================================================================

Now check if you have filled in your name. Also, please carefully check your answers and hand the exam in. 


1 Use of calculators is allowed.