Monday, 28 March, 2022 - 14:00

Multilingual Coreference Resolution with Harmonized Annotation

Miloslav Konopík
Ondřej Pražák (Faculty of Applied Sciences, University of West Bohemia, Plzeň)

We will describe an end-to-end coreference resolution system and experiments on recently created multilingual corpus CorefUD. In addition to monolingual experiments, we combine the training data in multilingual experiments and train two joined models – for Slavic languages and all the languages together. We rely on an end-to-end deep learning model adapted for the CorefUD corpus. We discuss the difficulties we faced, mainly regarding the differences between corpora in different languages. Next, we focus on the problem of predicting singleton relations. Finally, we deal with the benefits of harmonized annotations. We will show that using joined models helps significantly for the languages with smaller training data.


*** The talk will be delivered in person (MFF UK, Malostranské nám. 25, 4th floor, room S1) and will be streamed via Zoom. For details how to join the Zoom meeting, please write to sevcikova et ***