Prague Dependency Treebank 3.0
(full information and documentation see https://ufal.mff.cuni.cz/pdt3.0 ). For KonText, we converted only the training data:
articles |
sentences |
tokens/positions |
2533 |
38727 |
652542 |
The information from the PDT 3.0 comes in three layers: morphological, analytical (shallow syntactic) and tectogrammatical (deep syntactic). Including attributes from the tectogrammatical layer can be disputable. First of all, on the tectogrammatical layer, the auxiliary nodes are collapsed, and some other nodes (like dropped personal pronouns) appear. This does not fit into KonText system because this query tool is more about 'surface' representation of a sentence.
Attributes
Following are the attributes which can be quired with KonText (see basic howto on querying LINDAT corpora). The full information on the attributes can be found here.
- m-layer:
- a-layer:
- t-layer:
- title name
- issue id, issue year
- article genre, number
Example CQL queries
- Let us explore word order in Czech with the help of afun attribute:
- Subject-Verb : [afun="Sb" & p_afun="Pred" & parent="+.*"] (find all Sb - subjects whose parent is Predicate and the parent stands to the right from the subject)
- Verb-Subject: [afun="Sb" & p_afun="Pred" & parent="-.*"] (opposite of the previous)
- Find sentences where the reflexive particle se stands more than ten positions far from its governing verb: [lemma="se" & ep_afun="Pred" & parent="+(1|2).+"]
- Funktor:
- Find nouns with the functor "MEANS" that is not in the Instrumental(7th) case: [tag="N...[^7].*" & functor="MEANS"]
- Find idioms of length three: [functor="DPHR"]{3}
- Coreference. To track the coreference issues, PML-TQ is needed, in KonText the query options are very limited:
- Antes (antecedents) are some features of the antecedents of a word ( got from get_coref_text_nodes(), see the code), namely t_lemma and functor. E.g. the query [antes="tlemma=#PersPron.*"] will give a list of tokens for which one of the antecedents is a personal pronoun. Though, this attribute will be either reduced or refined.
- [coref_special="exoph"]
- Discourse:
- [discourse_special="heading|caption|metatext"]
- [discource_type="reason"]