Prague Dependency Treebank 3.0

(full information and documentation see https://ufal.mff.cuni.cz/pdt3.0 ). For KonText, we converted only the training data:

articles sentences tokens/positions
2533 38727 652542

 

The information from the PDT 3.0 comes in three layers: morphological, analytical (shallow syntactic) and tectogrammatical (deep syntactic). Including attributes from the tectogrammatical layer can be disputable. First of all, on the tectogrammatical layer, the auxiliary nodes are collapsed, and some other nodes (like dropped personal pronouns) appear. This does not fit into KonText system because this query tool is more about 'surface' representation of a sentence.

Attributes

Following are the attributes which can be quired with KonText (see basic howto on querying LINDAT corpora). The full information on the attributes can be found here.  

Structural attributes (metainformation)

  • title name
  • issue id, issue year
  • article genre, number

Example CQL queries

  • Let us explore word order in Czech with the help of afun attribute:
    • Subject-Verb : [afun="Sb" & p_afun="Pred" & parent="+.*"] (find all Sb - subjects whose parent is Predicate and the parent stands to the right from the subject)
    • Verb-Subject: [afun="Sb" & p_afun="Pred" & parent="-.*"] (opposite of the previous)
  • Find sentences where the reflexive particle se stands more than ten positions far from its governing verb: [lemma="se" & ep_afun="Pred" & parent="+(1|2).+"]
  • Funktor:
    • Find nouns with the functor "MEANS" that is not in the Instrumental(7th) case: [tag="N...[^7].*" & functor="MEANS"]
    • Find idioms of length three: [functor="DPHR"]{3}
  • Coreference. To track the coreference issues, PML-TQ is needed, in KonText the query options are very limited:
    • Antes (antecedents) are some features of the antecedents of a word ( got from get_coref_text_nodes(), see the code), namely t_lemma and functor. E.g. the query [antes="tlemma=#PersPron.*"] will give a list of tokens for which one of the antecedents is a personal pronoun. Though, this attribute will be either reduced or refined.
    • [coref_special="exoph"]
  • Discourse:
    • [discourse_special="heading|caption|metatext"]
    • [discource_type="reason"]