Identification and Prevention of Unwanted Gender Bias in Neural Language Models

Principal investigator (ÚFAL):

David Mareček

Project Manager (ÚFAL):

Hana Kubištová

Provider:

GAČR

Grant id:

23-06912S

ÚFAL budget:

1.721 mil.

Duration:

2023-2024

People:

Tomáš Musil

Recent years saw a remarkable success of deep neural networks in a wide range of Natural Language Processing tasks (e.g. machine translation or question answering). Large neural networks exhibit black-box behavior. We can observe only the inputs and outputs of the model and everything else is opaque. It has been shown that models trained on large raw corpora are vulnerable to learning unfair biases present in the data. This project aims to investigate gender biases learned by Transformer, a widely used neural network in NLP. We will analyze Transformer's contextual representations of words and search for a transformation that would project them to a vector space in which gender-bias is well separated and can be filtered out. At the same time, we want to keep other factual gender information as pronouns or gendered words like `boy' or `queen', which makes this task challenging. Our methods will be generalized to machine translation from English to morphologically rich languages, to mitigate the gender-bias and reducing gender mistakes in the generated output texts.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form