The Prague English Dependency Treebank 2.0 (PEDT 2.0) is a major update of the Prague English Dependency Treebank 1.0. / It is a manually parsed English corpus sized over 1.2 million running words in almost 50,000 sentences. The corpus containsPenn Treebank - Wall Street Journal Section (LDC99T42). The original Penn Treebank-like file structure (25 sections, each containing up to one hundred files) has been preserved. The corpus is enhanced with a comprehensive manual linguistic annotation in the PDT 2.0 style (LDC2006T01, Prague Dependency Treebank 2.0). The main features of this annotation style are:

  • dependency structure of the content words and coordinating and similar structures (function words are attached as their attribute values)
  • semantic labeling of content words and types of coordinating structures
  • argument structure, including an argument structure ("valency") lexicon
  • ellipsis and anaphora resolution.