Year 2 - 2023

The research and work will be divided into four workpackages:

WP4: Pre-annotation of the PDTSC data (JM: 80%, PS: 10%, MR: 10%)
  • research and preparation of automatic pre-annotation of inter-sentential relations of the PDTSC data
  • pre-annotation of the PDTSC data based on the previous point; intra-annotation relations will be pre-annotated using existing procedures based on the tectogrammatical layer and newly also CzeDLex
WP5: Manual annotation of a PDTSC data sample (PS: 80%, JM: 10%, MR: 10%)
  • manual annotation of approx. 1000 sentences of the PDTSC data to test the pre-annotation procedure
WP6: Considering annotation of Faust data (MR: 50%, PS: 40%, JM: 10%)
  • checking meaningfulness of possible discourse annotation of individual sentences of the Faust corpus
  • if considered useful, automatic pre-annotation of intra-sentential relations based on the existing manual tectogrammatical layer will be performed using the same scripts as in WP4
WP7: Manual corrections in the pre-annotated PCEDT-cz (PS: 70%, JM: 20%, MR: 10%)
  • classification of errors in the pre-annotation of the PCEDT-cz; the most severe/common types of errors of the pre-annotation will be manually fixed

Results: Pre-annotated PDTSC (and possibly Faust) data will be ready for classification of errors and manual corrections in the subsequent year. The PCEDT-cz data will be ready for a joint publication in the subsequent year. An article with results from the first year of the project will be presented at an international conference and published in its proceedings. Theoretical and practical results of the second year will be used to prepare an article that will be submitted to an international conference that takes place the following year (2024; LREC, Coling).