Principal investigator (ÚFAL): 
Project Manager (ÚFAL): 
Provider: 
Grant id: 
272323
Duration: 
2023-2025

The improvements of neural models recently surpass our expectations in various areas of natural language processing. Although the quality of models is in many cases comparable with humans’ decisions, there is still a notable gap in the output quality for tasks which are too specific under some constraints, and for which we usually do not have enough training data. The dependence on large amounts of training data brings a new problem, the inherent bias towards facts and formulations exemplified in the data.

In many situations, the user of a trained model can provide not just the input but also additional specific pieces of information which critically influence which outputs are desired and which are not, in various aspects. For instance, in speech translation knowing the gender of the speaker is a critical information when translating from a language which does not express gender often to a language which requires this information for every verb. Another useful constraint is that the output has to match certain presentation criteria, be it the overall length or some articulation into shorter units compared to what the generally available parallel texts exhibit. Some of these constraints are global in the sense that the whole run of the model during a given session should reflect them, some of these constraints are local in the sense that the model's previous output affects their values.

Until now, training a neural model for a task with predefined constraints has been done dependently on the given task. In cases when there are very few training samples for the selected task (often because of given constraints which had not been exposed to the model during training), the quality of the model can be improved by transferring the knowledge from a different source domain (where the constraints are captured). This procedure is known as transfer learning. The simplest way to perform transfer learning is to take the large-scale data, train a robust model, select task-specific and usually underrepresented data and fine-tune the pretrained model. Thanks to this approach, the dependence on large amounts of the training data for constructing task-constrained models is reduced.

Transfer learning is a powerful way to share learned knowledge across several source domains which are different but related. However, there are situations when the number of samples from the target domain is still insufficient or a suitable related dataset does not exist because of its high specificity. In such situations, transfer learning, as it is known and used, cannot be employed.

In this project, we will develop (to our best knowledge) an innovative way that transfer learning can be viewed from the perspective of combining datasets of a different character / nature. We anticipate that this approach can be used for other tasks with predefined constraints. In particular:

  1. We will explore datasets and provide a detailed classification of tasks with global or local constraints in the field of natural language processing, narrowing our focus solely on textual tasks to provide more detailed analysis.
  2. We will repurpose attribution techniques originally introduced to explain the predictions of a model trained for a specific task. Attribution methods typically compute sentence level scores for each input word, identifying the ones that contribute most to the decision. By explicitly targeting a task which excessively expresses given constraints, we believe to extract attribution scores that correlate well with the importance of the constraint features.
  3. We will analyze how to effectively apply these attribution scores to the models which are trained for unconstrained, yet well data-covered tasks.

By working on this project, we will consolidate the knowledge about available datasets in this research area, thoroughly analyze our approach of reviewing transfer learning and establish a stable procedure which facilitates training models with given constraints.