The SQL engine requires the data to be converted to SQL and loaded into a database → stable datasets.
The Perl engine is slow, but works directly with the data files → data in progress.
Search for a given word.
node $be := [ lemma = "be" ]
Search for all subjects.
node $subj := [ deprel = "SBJ" ]
Count the occurrences.
node $subj := [ deprel = "SBJ" ] >> give count()
Group the results by a value of an attribute.
node $parent := [ child node $subj := [ deprel = "SBJ" ] ] >> for $parent.pos give $1, count() sort by $2
We can use the inverted relation.
node $subj := [ deprel = "SBJ", parent node $parent := [] ] >> for $parent.pos give $1, count() sort by $2
Find two instances of the same lemma, separated by at most three words.
node $n1 := [ order-precedes{1,4} node $n2 := [ lemma = $n1.lemma ] ]
For example, to select all nouns in the PDT:
a-node $noun := [ m/tag ~ "^N" ]
Or, using a function:
a-node $noun := [ substr(m/tag, 0, 1) = "N" ]
Not possible for English—tags are not positional.
node $noun := [ pos ~ "^N" ] >> for $noun.pos give $1, count() sort by $2 desc
NIL
is not a noun!
node $noun := [ pos in { "NN", "NNS", "NNP", "NNPS" } ]
+, -, *, div, mod, &
t-node $act := [ functor = "ACT", a/lex.rf a-node $a := [ ] ] >> for $a.afun give $1, count() sort by $2
t-node [ a/lex.rf $a_child, echild t-node [ a/lex.rf $a_parent ] ]; a-node $a_parent := [ echild a-node $a_child := [ ] ];
cat = 'NP' | is equivalent to | ∃ x ∈ cat ( x = 'NP' ) |
cat != 'NP' | is equivalent to | ∃ x ∈ cat ( x ≠ 'NP' ) |
! cat = 'NP' | is equivalent to | ∀ x ∈ cat ( x ≠ 'NP' ) |
* cat = 'NP' | is equivalent to | ∀ x ∈ cat ( x = 'NP' ) |
Same for in, ~, etc.
nonterminal [ 0x descendant terminal [pos = "NN"]]
if and only if
for any two nodes connected by an edge all the nodes between
them are descendants of some of the two.
node $upper := [ same-tree-as $between, child node $lower := [ ] ]; node $between := [ ! ancestor $upper, ( ( order-precedes $upper and order-follows $lower) or (order-follows $upper and order-precedes $lower) ) ]
$upper is not an ancestor of $between
count()
returns 15741.
distinct
:
>> give distinct $lower >> give count()
a-node [ $$ = $aux and m/tag ~ '^V[^fs]' or $$ != $aux and m/tag ~ '^V[sf]', 0x echild a-node [ afun = 'Sb' ], ? echild a-node $aux := [ afun = 'AuxV', m/tag ~ '^V[^f]' ] ]
node $p := [ substr(pos, 0, 1) = "V", ? child node $ch := [ deprel in { "SB", "OA", "OC", "OA2", "OP" } ] ] >> give $p.xml:id, if($p = $ch, if($p.deprel = "ROOT", "V", "v"), substr($ch.deprel, 0, 1)), $ch.order >> give distinct $1, concat($2, "" over $1 sort by $3) >> give substitute($2, "([OS])\\1+", "\\1", "g") >> filter ($1 ~ "O" and $1 ~ "S") >> for $1 give $1, if($1 ~ "V", count(), 0), if($1 ~ "V", 0, count()) >> give $1, if($1 ~ "V", 0, 1), percnt(ratio($2 over all) + ratio($3 over all), 3) sort by $2, $3 desc >> give $1, $3 & " %"
Main clause | Num. of occurrences | Dependent clause | Num. of occurrences |
---|---|---|---|
SVO | 52.336 % | SOv | 63.942 % |
VSO | 33.031 % | SvO | 19.235 % |
OVS | 10.261 % | vSO | 9.419 % |
VOS | 2.903 % | OSv | 5.128 % |
SOV | 0.511 % | OvS | 0.922 % |
OVSO | 0.423 % | vOS | 0.542 % |
VOSO | 0.297 % | SOvO | 0.313 % |
Depling | Prague 2013 |