In a preparation of the transformation process from the Penn to the Prague style of discourse annotation (WP1), a study of principal differences between the two discourse sense taxonomies in this direction was performed.
The annotation of discourse relations in the Prague Discourse Treebank (PDiT) was largely revised (WP2) - altogether, approx. 3,600 contexts were manually checked, resulting in an annotation of approx. 400 new relations, deletion of approx. 50 relations and corrections of approx. 850 relations.
Extensive effort was dedicated to a correct transformation of Prague discourse types to Penn senses (WP3) – apart from the two annotation manuals, thousands of discourse relations in the PDiT data were examined, resulting in many rules embedded in the transformation procedures. Discourse types of 1.8 percent of all discourse relations in the PDiT data had to be disambiguated manually.
An updated version of the Prague Discourse Treebank was be published, reflecting all corrections and updates from WP2. The data were published in their native format (PDT-like), as well as – using results of WP3 – in the PDTB column format and taxonomy.
An article was prepared and – due to its length – submitted to a journal (PBML) instead of to a conference.