Year 1 - 2022
Three directions of research will be followed in three workpackages:
WP1: Pre-annotation of the PCEDT-cz data (JM: 50%, PS: 30%, MR: 20%)
update of the annotation projection of the PDTB discourse data in the English part of the PCEDT to its Czech part, using existing scripts and new alignment available in the new version of the PCEDT; preparation of the transformation process to the Prague system of discourse annotation, study of principal differences in the discourse sense taxonomies in the direction from the PDTB to the Prague formalism and formulation of the best strategy for the conversion of the senses
based on the previous point, the automatic transformation will be performed on the PCEDT-cz data
WP2: Manual corrections of PDT discourse data (PS: 70%, JM: 20%, MR: 10%)
manual corrections and updates of the PDT data based on logs and notes from the previous work on the lexicon of Czech discourse connectives CzeDLex
WP3: Transformation of PDT data to the PDTB format and taxonomy (JM: 50%, PS: 30%, MR: 20%)
research of principal differences in the two formalisms focused on the taxonomies of discourse senses and annotation preferences in the direction from the Prague system to the PDTB system.
transformation of Prague senses to the PDTB taxonomy in the PDT data; manual fixes of the most important transformation errors
Results: An updated version of the PDT discourse data will be published and immediately available to the scientific community, reflecting all corrections and updates from WP2. The data will be published in its native format (PDT-like), as well as – using results of WP3 – in the PDTB column format and taxonomy. Pre-annotated PCEDT-cz data will be ready for classification of errors and manual corrections in the subsequent year. Theoretical and practical results of the research will be used to prepare an article that will be submitted to an international conference that takes place the following year (2023; TSD, TLT).