ÚFAL PhD conference

Monday, 11 December, 2023 - 14:00

Room:

ÚFAL PhD conference

Andrei-Alexandru Manea
Patrícia Schmidtová
Dima Taji (ÚFAL MFF UK)

14:00-14:25

Andrei-Alexandru Manea

Multilingual Visian-Language Models

Abstract: Vision-Language models are extensive neural networks that handle both image and text inputs and necessitate a substantial volume of training data to address particular tasks, such as Visual Question Answering. Most current state-of-the-art Vision-Language models do not work with low-resourced languages and are trained in (mostly American) English.

In the following presentation, I will come up with more details about my current research plans. At the moment, I am working on extending state-of-the-art English Vision-Language models using parallel data. In the next years, we aim to create a new evaluation Vision-Language Task, more challenging from the multilingual perspective, using the available annotation funds: any kind of suggestions are welcome.

14:25-14:50

Patrícia Schmidtová

Semantic Accuracy in Natural Language Generation

Abstract: Large pre-trained language models (LLMs) that can follow instructions have become widely used by the public all over the world. However, their reliability is still an issue, with LLMs manifesting a phenomenon dubbed hallucination: the models generate plausible-sounding texts, which are not supported by the user-provided data or any knowledge base.

In this talk, I will discuss my plans to investigate various factors contributing to the prevalence of hallucinations with mitigation as the end goal. As an additional challenge, there is not yet a gold standard for determining whether a generated text contains a hallucination and if so, to what extent.

14:50-15:15

Dima Taji

Coreference Resolution and Representation in Deep Universal Dependencies

Abstract: Coreference resolution has long been a task of interest to the research community. Varying approaches have been proposed and applied, and, alongside them, several different datasets have been created. Unfortunately, these datasets do not all follow the same annotation scheme, even within the same language. In spite of the various efforts to create multi-lingual datasets or harmonize the annotation schemes of existing ones, the training of multi-lingual coreference resolution parsers is still a challenge.

Our vision is to be able to integrate existing datasets with the widely-used Universal Dependency and CorefUD frameworks, as well as create new datasets with the least amount of human intervention. In this talk, I will present the current state of knowledge, and discuss my current plan to approach the aforementioned challenges.

*** The talks will be delivered in person (MFF UK, Malostranské nám. 25, 4th floor, room S1) and will be streamed via Zoom. For details how to join the Zoom meeting, please write to sevcikova et ufal.mff.cuni.cz ***

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form