Monday, 28 November, 2022 - 14:00

UFAL Ph.D. Student Conference

Federica Gamba: Dealing with Latin variability in dependency parsing

Abstract: Spread over a span of more than two millennia and all across an area that corresponds to today’s Europe, the Latin language has undergone a number of significant changes, affecting not only syntax but each of its linguistic layers. Diachronic, diatopic and diaphasic variability – respectively, variability in time, space and context - thus represents a non-negligible issue when it comes to parsing Latin.
By exploiting the five treebanks already available in Universal Dependencies, I will examine how Latin varies across time, space, and literary genres, in order to detect the most diverse linguistic phenomena, as well as the most challenging ones for parsers, and try to define a strategy to handle such diversity in parsing. Particular attention will be paid also to divergences in annotation, which survive despite UD standardization effort and represent a first significant setback.
In the talk I will present the current state of research on the topic, and outline some of the main challenges to be faced, with a special focus on the issue of divergent annotation choices in UD.

Dávid Javorský: Constraints for Neural Networks in NLP

Abstract: For the given task, the output quality of neural models is primarily affected by the amount of training data and the network size, which brings a problem of the inherent bias towards facts and formulations exemplified in the data. One option to combat the bias for a particular use case could be to provide constraints during the inference. Furthermore, the availability of training data is closely tied with its complexity: Corpora with simple or no annotation (e.g. monolingual texts, parallel corpora) are more accessible than high-level structural datasets (e.g. coreference). The quality of task-specific models is then typically improved by using both sources, generally known as transfer learning. In this talk, I will show that transfer learning can be viewed from the perspective of combining datasets of a different character / nature, presenting shortening machine translation as an exemplary study of neural modeling with constraints. I will also illustrate that this approach can be further extended to other areas of neural modeling with pre-defined constraints.

Ondřej Plátek: Explainable Evaluation Metrics For TTS and NLG in Task Oriented Dialogue

Abstract: A spoken dialogue system typically produces a textual response to the user's input using a natural language generation (NLG) module, which is subsequently read to the user by a text-to-speech (TTS) module. Both NLG and TTS should adapt to the dialogue context. However, conversational text-to-speech (TTS) literature that would consider dialogue context for synthesizing speech is almost non-existing, and there is no established evaluation method for synthetic conversational speech to our knowledge. While specific NLG models for dialogue response are available, there is no clear choice or straightforward interpretation of the currently used evaluation metrics.
In this talk, I will briefly illustrate the problems of evaluation metrics for TTS and NLG in dialogue, specifically focusing on modern trainable neural metrics. I argue that evaluation is a tool for a researcher and is only valid as long as it is understood. This is an issue; the current neural metrics can be hard to interpret and suffer from poor generalization to unseen data. I plan to improve trainable metrics by using dialogue context and make them more interpretable using model confidence, local explanation, and outlier detection methods. This should lead to a better correlation with human judgments and a better understanding of the qualitative properties of the improved metrics.