Monday, May 2, 2016 - 13:30

Grammar-licensed Treebank of Czech

We describe main features of a treebank of Czech, licensed by an HPSG-style grammar. The treebank is parsed by a stochastic parser, then converted to phrase-structure trees, which are then checked by the formal grammar. During the conversion to phrase structures, the information on terminal nodes is transformed into a 3D structure - every node is described by its morphological, syntactic and lexical properties. For example, the relative pronoun 'který' can have three different POSs (morphological adjective, syntactic noun, lexical pronoun) on the three levels of description.

The grammar cooperates with the VALLEX valency lexicon. Lexical rules are used for the derivation of surface frames for various morphological forms of every verb (infinitive, indicative, l-participle, passive participle, etc.). The actual data are matched with these surface valency frames. If the match is successful, the resulting annotation is enriched with information derived from the parsed data and from the lexicon.