Uniform Meaning Representation for a low-resource language (Persian)

Project Manager (ÚFAL):

Hana Kubištová

Provider:

GAUK

Grant id:

394625

Duration:

2025 2027

Tags:

Corpora

Semantics

People:

Daniel Zeman

Multilingual research plays a crucial role in the field of natural language processing (NLP) in different ways such as facilitating global communication, preserving linguistic diversity, understanding cultural nuances, enhancing business activities in international markets. In other words, such studies provide a foundation for building artificial intelligence (AI) models that can process natural languages regardless of their differences. Uniform Meaning Representation (UMR), which is primarily based on Abstract Meaning Representation (AMR), can be considered as one of the frameworks providing a consistent semantic representation across different languages, facilitating better understanding and processing of multilingual data. Furthermore, this framework provides considerable detail about how to represent low-resource languages which are typologically quite distinct from languages like English by abstracting away from language-specific syntax and focusing on the underlying meaning resulted in better processing the semantics of languages with limited data.

There are several UMR corpora for a few number of languages such as English, Czech, Chinese, Arapaho, Kukama, Navajo, and Sanapaná. The current research intends to apply UMR framework to Persian for the first time. This language is considered as one of the Indo-European languages and has a rich morphology. So, not only can this proposed research boost NLP capabilities for Persian, but it will also advance the wider field of multilingual semantic representation and provide a valuable resource for future research in Persian linguistics and computational linguistics such as translation, summarization, and information extraction as UMR consists of both sentence-level representation that focuses on predicate-argument structures and a document-level representation that captures semantic relations that go beyond sentence boundaries.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form