600.465 Introduction to NLP (Fall 1999)
Final Exam
Date: Dec 21 9am (90 min.)
SSN:__________________________________
Name:_________________________________
If asked to compute something for which you have the numbers,
that really means to compute the final number, not just to write the formula1.
If asked for a formula, write down the formula.
1. The Basics: Probability, Smoothing Entropy, and Cross-Entropy
Suppose your training data contains (only) the sentence (http://www.petpainter.com,
slightly adapted)
We specialize in petbirds including but not limited
to dogs , cats , and horses .
You are building a bigram language model p(wi|wi-1)
using maximum likelihood estimation. You do not add any special
symbols/words to the beginning and end of the data (i.e., you use separate
unigram distribution for the first position in any given data).
-
What is the (raw) probability of p(cats|,)?
________________________________
-
What is the definition of the training data probability,
using the bigram approximation?
________________________________________________________
-
What is the entropy and perplexity of the raw bigram distribution?
___________________________
Now let's turn to the issues of smoothing and cross-entropy.
Suppose this is your test data (http://hometown.aol.com/~arkful):
We also have gift baskets for dogs , cats , and horses
and their owners .
-
What is the cross-entropy of the raw distribution extracted
from the above training data if applied to this test data?
__________________________________________________
-
What is the cross-entropy on the same test data, when you
use a linearly smoothed distribution pµ(wi|wi-1)
= µ p(wi|wi-1) + (1- µ)/32. Suppose
the parameter µ has been set to 0.5 on some heldout data, and that
if you have "undefined value" for p(wi|wi-1) from
the training data (either because you have not seen the history wi-1
or
you are dealing with the first position of the test data), you take p(wi|wi-1)
= 1/32.
__________________________________________________
2. Maximum Entropy
-
What is the general form of so-called Maximum Entropy (ME)
model?
p(y|x) = ____________________________________________
-
Let's assume S = {f1, f2, ..., fn}
are all the binary valued (0/1) features you are supposed to use (and find
the weights of). What is the general form of the constraints put
on the model?
_______________________________________
-
Suppose you use only two binary (0/1) valued features, f
and g, defined as follows:
f(y,x) = 1 iff y = , or and, and there is no other comma to the left of
y (within the same sentence).
g(y,x) = 1 iff y = and, and there is a comma immediately in front of it.
Write down the constraints of the corresponding ME model, using the training
data from the part 1:
We specialize in petbirds including but
not limited to dogs , cats , and horses .
__________________________________________________________
__________________________________________________________
-
What can you say about the weights associated with these
two features, after performing the maximum netropy training? (If possible,
compute their values and write them down, too.)
lambdaf = __________________________________________________________
lambdag = __________________________________________________________
3. Tagging
-
Describe three different approaches to tagging. Include formulas
if/where appropriate.
1. ____________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
2. ____________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
3. ____________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
4. Shift-Reduce Parsing
Let's assume the following Context-Free Grammar G (here
with seven numbered rules):
#1 S -> A R
#2 A -> A B
#3 A -> a
#4 B -> b
#5 R -> B R C
#6 R -> r
#7 C -> c
-
Decide which of the following strings belong to the language
defined by the grammar G:
aaaabr Yes/No:
____________
abbbrcc Yes/No: ____________
ar
Yes/No: ____________
-
Create the shift-reduce parse tables for this grammar.
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
-
Show how the shift-reduce parser parses the string
abbbrc:
Input stack
State tack
Rule (Reduction)
Backtracking stack
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
5. Statistical Parsing
-
What is a Probabilistic CFG? Use five sentences at most.
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
Now suppose the following "treebank" data, where the
pair of symbols ( ... )XX denotes a (sub)tree rooted in a nonterminal XX.
Sentence 1:
(((the)DET (dog)N)NP ((saw)V (((the)DET (cat)N)NP
((with)PREP (((a)DET (telescope)N)NP ((with)PREP ((a)DET (mirror)N)NP)PP)NP)PP)M2)VP)S
Sentence 2:
(((the)DET (dog)N)NP ((saw)V (((the)DET (cat)N)NP
((with)PREP ((a)DET (mirror)N)NP)PP )NP)VP)S
-
Draw the usual parse trees, with terminal symbols (words)
as leaves, and other nodes labeled by nonterminals:
the dog saw the cat with a telescope with a mirror
the dog saw the cat with a mirror
-
Write down the (general, non-lexicalized) CFG grammar G as
extracted from this training data.
#01 ______________________________
#02 ______________________________
#03 ______________________________
#04 ______________________________
#05 ______________________________
#06 ______________________________
#07 ______________________________
#08 ______________________________
#09 ______________________________
#10 ______________________________
#11 ______________________________
#12 ______________________________
#13 ______________________________
#14 ______________________________
#15 ______________________________
-
Now draw the lexicalized parse tree for the
second sentence, assuming that whenever there is a choice
of heads in any given subtree, chose (in the following precedence order):
VP, V, PREP, N, NP.
the dog saw the cat with a mirror
OK. Now you have everything to answer these "numerical"
questions:
-
Estimate p(NP -> NP PP) = _______________
-
Estimate p(NP -> PREP N) = _______________
-
Estimate p(VP -> V NP) = _______________
-
Estimate p(VP -> V M2) = _______________
Suppose we stick to the usual three independence asumptions
and define a probability of a parse tree s (of a sentence W) as
a simple product of p(<rule>) over all rules used in the course of generation
of the sentence W.
Describe the main idea of the algorithm to compute:
-
(a) the best parse given a sentence and a model. Use five
sentences at most.
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
-
(b) the probability of a string given the model. Use five
sentences at most.
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
-
What is then the probablity of the following parse tree,
using the estimated nonlexicalized PCFG:
-
p( (((the)DET (dog)N)NP ((saw)V ((the)DET
(cat)N)NP)VP)S ) = ______________
-
What is the probability of the training data parse of sentence
2?
-
p(s|sentence2) = _____________
-
What is the probability of the string "the dog saw the cat
with a mirror"?
-
p(sentence2) = _____________
-
Now a lexicalized grammar question. What is the (raw) probability
estimate of the lexicalized rule p(PP(with) -> PREP(with) NP(mirror))?
[Imagine you generate the lexicalized rules, too, under the same independence
assumptions; however, to answer this question, there is no need to generate
them all, of course. However, do not forget to look at both training
sentences!]
-
p(PP(with) -> PREP(with) NP(mirror)) = __________________
And that's it for today!
============================================================================
Now check if you have filled in your name and SSN. Also,
please carefully check your answers and hand the exam in.
1 Use of calculators is allowed.