The project aims at research and development of methods for cost-effective discourse annotation in various types of text corpora available in the Prague Dependency Treebank - Consolidated 1.0 (PDT-C). We will use and further develop existing methods for automatic pre-annotation of the data and, within the limits of this small project, perform the most important manual corrections of the pre-annotated data, thus creating a unique text-type diversified discourse annotated corpus in Czech. The project will deal with explicit discourse relations marked by so-called primary discourse connectives. Research effort will be dedicated to making the results (both theoretical and practical) available to the international scientific community, including transformation and publication of the data in a widely used Penn Discourse Treebank (PDTB) format and taxonomy. The outcomes will contribute both to theoretical knowledge about discourse relations in various types of texts in Czech, newly especially in spoken and translated data, and to natural language processing related to discourse relations.

The main goals of the project are:

  • to develop cost-effective semi-automatic methods for discourse annotation in various types of text corpora in the Prague Dependency Treebank - Consolidated (PDT-C),
  • to create a genre-diversified discourse-annotated resource in Czech,
  • to improve existing discourse annotation of the PDT part of the PDT-C.

Please follow the individual web pages describing the progress in the individual project years: