Introduction

The Prague Dependency Treebank 3.0 (PDT 3.0) annotates the same texts as the PDT 2.0 (Hajič et al. 2006), PDT 2.5 (Bejček et al. 2011), and the Prague Discourse Treebank 1.0 (PDiT 1.0, Poláková et al. 2012). The annotation on the four layers was further fixed and improved in various aspects. Moreover, new information was added to the data:

  • from PDT 2.0 to PDT 2.5
    • Multiword expressions
    • Pair/group meaning
    • Clause segmentation
  • from PDT 2.5 to PDiT 1.0
    • Extended textual coreference
    • Bridging anaphora
    • Discourse relations marked by explicit connectives
  • from PDiT 1.0 to PDT 3.0
    • Revision of several grammatemes
    • Revision of sentence modality annotation
    • Replacement of t_lemma #Benef
    • Genres of documents
    • Pronominal textual coreference of 1st and 2nd person
    • Updated discourse relations marked by explicit connectives

All the additional annotation (with the exception of clause segmentation) was performed on the tectogrammatical trees and technically is a part of the underlying syntax layer of the PDT. The annotation of clause segmentation was done on the analytical layer. Numerous errors were fixed across all layers of annotation.

The Prague Dependency Treebank 3.0 can be downloaded from the LINDAT-Clarin repository (see the Licence).

UPDATE (2016): Please note that an update of the annotation of discourse relations in PDT (newly enriched by the annotation of secondary connectives) was published in December 2016 as Prague Discourse Treebank 2.0.