PFL068 Statistical methods in NLP II

Final Exam

Date: May 16, 2006 @ 2pm (90 min.)

Year / program:_________________________

Name:__________________________

If asked to compute something for which you have the numbers, that really means to compute the final number, not just to write the formula¹. If asked for a formula, write down the formula.

1. The Basics: Probability

Suppose your training data contains (only) the sentence (http://www.petpainter.com, slightly adapted)

We specialize in petbirds including but not limited to dogs , cats , and horses .

You are building a bigram language model p(w_i|w_i-1) using maximum likelihood estimation. You do not add any special symbols/words to the beginning and end of the data (i.e., you use separate unigram distribution for the first position in any given data).

What is the (raw) probability of p(cats|,)? ________________________________

What is the definition of the training data probability (using the bigram approximation)?

________________________________________________________

2. Tagging

Describe three different approaches to tagging. Include formulas if/where appropriate.

1. ____________________________________________________________________________

______________________________________________________________________________

2. ____________________________________________________________________________

______________________________________________________________________________

3. ____________________________________________________________________________

______________________________________________________________________________

3. Shift-Reduce Parsing

Let's assume the following Context-Free Grammar G (here with seven numbered rules):

#1 S -> A R

#2 A -> A B

#3 A -> a

#4 B -> b

#5 R -> B R C

#6 R -> r

#7 C -> c

Decide which of the following strings belong to the language defined by the grammar G:

aaaabr Yes/No: ____________

abbbrcc Yes/No: ____________

ar Yes/No: ____________

Create the shift-reduce parse tables for this grammar.

__________________________________________________________________________________

Show how the shift-reduce parser parses the string abbbrc:

Input stack State tack Rule (Reduction) Backtracking stack

__________________________________________________________________________________

4. Statistical Parsing

What is a Probabilistic CFG? Use five sentences at most.

__________________________________________________________________________________

Now suppose the following "treebank" data, where the pair of symbols ( ... )XX denotes a (sub)tree rooted in a nonterminal XX.

Sentence 1:

(((the)DET (dog)N)NP ((saw)V (((the)DET (cat)N)NP ((with)PREP (((a)DET (telescope)N)NP ((with)PREP ((a)DET (mirror)N)NP)PP)NP)PP)M2)VP)S

Sentence 2:

(((the)DET (dog)N)NP ((saw)V (((the)DET (cat)N)NP ((with)PREP ((a)DET (mirror)N)NP)PP )NP)VP)S

Draw the usual parse trees, with terminal symbols (words) as leaves, and other nodes labeled by nonterminals:

the dog saw the cat with a telescope with a mirror

the dog saw the cat with a mirror

Write down the (general, non-lexicalized) CFG grammar G as extracted from this training data.

#01 ______________________________

#02 ______________________________

#03 ______________________________

#04 ______________________________

#05 ______________________________

#06 ______________________________

#07 ______________________________

#08 ______________________________

#09 ______________________________

#10 ______________________________

#11 ______________________________

#12 ______________________________

#13 ______________________________

#14 ______________________________

#15 ______________________________

Now draw the lexicalized parse tree for the second sentence, assuming that whenever there is a choice of heads in any given subtree, chose (in the following precedence order): VP, V, PREP, N, NP.

the dog saw the cat with a mirror

OK. Now you have everything to answer these "numerical" questions:

Estimate p(NP -> NP PP) = _______________

Estimate p(NP -> PREP N) = _______________

Estimate p(VP -> V NP) = _______________

Estimate p(VP -> V M2) = _______________

Suppose we stick to the usual three independence asumptions and define a probability of a parse tree s (of a sentence W) as a simple product of p(<rule>) over all rules used in the course of generation of the sentence W.

Describe the main idea of the algorithm to compute:

(a) the best parse given a sentence and a model. Use five sentences at most.

__________________________________________________________________________________

(b) the probability of a string given the model. Use five sentences at most.

__________________________________________________________________________________

What is then the probablity of the following parse tree, using the estimated nonlexicalized PCFG:

p( (((the)DET (dog)N)NP ((saw)V ((the)DET (cat)N)NP)VP)S ) = ______________

What is the probability of the training data parse of sentence 2?

p(s|sentence₂) = _____________

What is the probability of the string "the dog saw the cat with a mirror"?

p(sentence₂) = _____________

Now a lexicalized grammar question. What is the (raw) probability estimate of the lexicalized rule p(PP(with) -> PREP(with) NP(mirror))? [Imagine you generate the lexicalized rules, too, under the same independence assumptions; however, to answer this question, there is no need to generate them all, of course. However, do not forget to look at both training sentences!]

p(PP(with) -> PREP(with) NP(mirror)) = __________________

And that's it for today!
============================================================================

Now check if you have filled in your name. Also, please carefully check your answers and hand the exam in.

¹ Use of calculators is allowed.