Monday, 9 December, 2024 - 14:00
Room: 

HiČKoK: History of Czech in Corpus Continuum

We will present the ongoing TAČR project focused on morphological annotation of texts from all historical stages of the Czech language. The goal of the project is to connect text corpora of different periods, so far built independently at different institutes, and to enrich them with lemmatization and uniform morphological annotation according to the Universal Dependencies standard. Manually annotated datasets will subsequently be used to train models capable of annotating other historical texts. After the initial overview, we will focus on some issues with designing uniform description of the changing language, especially in the oldest period (14th-15th centuries).

 

The talk will be in English if there are people in the audience who prefer English; otherwise it will be in Czech.

*** The talk will be delivered in person (MFF UK, Malostranské nám. 25, 4th floor, room S1) and will be streamed via Zoom. For details how to join the Zoom meeting, please write to sevcikova et ufal.mff.cuni.cz ***