Querying Dependency Treebanks in PML-TQ

Jan Štěpánek

Charles University in Prague

ÚFAL

http://jan.stepanek.matfyz.cz/depling/

jan.stepanek@matfyz.cz

PML Tree Query

The SQL engine requires the data to be converted to SQL and loaded into a database → stable datasets.

The Perl engine is slow, but works directly with the data files → data in progress.

The Query Language

Simplest Queries

Search for a given word.

node $be := [
    lemma = "be"
]

Search for all subjects.

node $subj := [
    deprel = "SBJ"
]

Simple Reports

Count the occurrences.

node $subj := [
    deprel = "SBJ"
]
>> give count()

Structure and Aggregation

Group the results by a value of an attribute.

node $parent := [ child node $subj := [ deprel = "SBJ" ] ] >> for $parent.pos give $1, count() sort by $2

Relations

We can use the inverted relation.

node $subj := [ deprel = "SBJ", parent node $parent := [] ] >> for $parent.pos give $1, count() sort by $2

Relations (2)

References, Quantifying Relations

Find two instances of the same lemma, separated by at most three words.

node $n1 := [ order-precedes{1,4} node $n2 := [ lemma = $n1.lemma ] ]

Predicates

For example, to select all nouns in the PDT:

a-node $noun := [
     m/tag ~ "^N"
]

Or, using a function:

a-node $noun := [
     substr(m/tag, 0, 1) = "N"
]

Not possible for English—tags are not positional.

Predicates (2)

node $noun := [
     pos ~ "^N"
]
>> for $noun.pos
   give $1, count()
   sort by $2 desc

NIL is not a noun!

node $noun := [
     pos in { "NN", "NNS", "NNP", "NNPS" }
]

Operators and Functions

Operators

+, -, *, div, mod, &

Functions

Multi-Layered Treebanks

Function of Actor

t-node $act := [
    functor = "ACT",
    a/lex.rf a-node $a := [ ]
]
>> for $a.afun
   give $1, count()
   sort by $2

Multi-Layered Treebanks (2)

Reversed Dependency

t-node [
    a/lex.rf $a_child,
    echild t-node [ a/lex.rf $a_parent ]
];
a-node $a_parent := [
    echild a-node $a_child := [  ]
];

Negation

Atomic

cat = 'NP'
is equivalent to
∃ x ∈ cat ( x = 'NP' )
cat != 'NP'
is equivalent to
∃ x ∈ cat ( x ≠ 'NP' )
! cat = 'NP'
is equivalent to
∀ x ∈ cat ( x ≠ 'NP' )
* cat = 'NP'
is equivalent to
∀ x ∈ cat ( x = 'NP' )

Same for in, ~, etc.

Existencial

nonterminal [ 0x descendant terminal [pos = "NN"]]

Projectivity

A tree is projective

if and only if
for any two nodes connected by an edge all the nodes between them are descendants of some of the two.

Non-Projectivity

node $upper := [ 
    same-tree-as $between,
    child node $lower := [ ] ];
node $between := [
    ! ancestor $upper,
    ( ( order-precedes $upper 
        and order-follows $lower)
      or (order-follows $upper
          and order-precedes $lower) ) ]

$upper is not an ancestor of $between

Counting Non-Projectivities: The Trap

More Complex Example

Verb Clause without a Subject

a-node [
    $$     = $aux and m/tag ~ '^V[^fs]'
    or $$ != $aux and m/tag ~ '^V[sf]',
    0x echild a-node [ afun = 'Sb' ], 
    ?  echild a-node $aux := [
        afun  = 'AuxV',
        m/tag ~ '^V[^f]'
] ]

Word Order Typology

For German

node $p := [
    substr(pos, 0, 1) = "V", 
    ? child node $ch := [
        deprel in { "SB", "OA", "OC", "OA2", "OP" }
] ]
>> give $p.xml:id,
        if($p = $ch,
           if($p.deprel = "ROOT", "V", "v"),
           substr($ch.deprel, 0, 1)),
        $ch.order
>> give distinct $1, concat($2, "" over $1 sort by $3)
>> give substitute($2, "([OS])\\1+", "\\1", "g")
>> filter ($1 ~ "O" and $1 ~ "S")
>> for $1
   give $1, if($1 ~ "V", count(), 0), if($1 ~ "V", 0, count())
>> give $1, if($1 ~ "V", 0, 1),
        percnt(ratio($2 over all) + ratio($3 over all), 3)
   sort by $2, $3 desc
>> give $1, $3 & " %"

Word Order Typology

For German (2)

Main clauseNum. of occurrences Dependent clauseNum. of occurrences
SVO 52.336 %SOv 63.942 %
VSO 33.031 %SvO 19.235 %
OVS 10.261 %vSO 9.419 %
VOS 2.903 %OSv 5.128 %
SOV 0.511 %OvS 0.922 %
OVSO 0.423 %vOS 0.542 %
VOSO 0.297 %SOvO 0.313 %

Thank you.

Questions?