Uniform Meaning Representation (UMR)

Computational natural language processing (NLP) in all its modalities (i.e., in both spoken and written form) has been developing rapidly, especially recently thanks to advances in language modelling using artificial neural networks. Progress in the development of artificial intelligence systems, in which the possibility of communication in written or spoken form is often essential, is also contributing to these developments. NLP systems are also making their way into practice, in particular as machine (automatic) translation between natural languages or various voice assistant(s) developed in global companies, or automatic "chatbots", both text and voice, used for example for customer service centers. Such systems applied in practice are very useful, but their quality varies greatly: among them, we can find those that are almost at the human level of communication (machine translation of written text in laboratory conditions), but also those that we would characterize more as prototypes that fail in more difficult conditions (e.g., car navigation voice control). Many of these systems are also often only available for a limited number of languages (especially localizations into Czech are put into practice only after several years of delay). In the future, the quality (in a broader sense) of these systems needs to be improved in several directions: from the elimination of mainly semantic and otherwise critical errors, through the elimination of stereotypes in the current large language models, to the ability of interpretation and self-explanation. In particular, the last requirement is absolutely necessary if communication in natural language (within artificial intelligence applications) should penetrate, e.g., into medicine, into national critical infrastructure, defense and security, into tools for disaster avoidance and relief, and also in systems of (semi)autonomous transport or even into the scientific and research activity itself. The submitted project focuses - at the level of basic research, but with a vision of the future use of communication through natural language between humans and automatic systems - on the relationship of language and its "meaning," i.e., on the relationship of semantics and pragmatics with regard to (structured) knowledge of the world, which the current, albeit technologically and application successful, large language models not only fail to express explicitly, let alone to work with this relationship. If the relationship between language and its "meaning" is well understood in theory and practice, it will eliminate or minimize the above problems regarding semantic inadequacy (e.g. in automatic translation) and gain the ability to explain and justify their decisions and communicate them effectively to humans. The multilingual aspect, which the project under the planned international cooperation strongly emphasizes by addressing many languages from different language groups, is then an essential part of removing language barriers in communication between people (especially, but not only in Europe) and between people and machines, including systems using other components of artificial intelligence (e.g. image processing, planning, robot control and others).

There are no Czech partners. The project is related to the U.S. CCIR grant on Uniform Meaning Representation led by Nianwen Xue (Brandeis University), with partners from University of New Mexico and University of Colorado Boulder.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form