To be held as part of the DMR 2026 workshop, collocated with LREC (Palma de Mallorca, Spain).
We tentatively work with the following schedule:
Training data is largely based on UMR 2.1 but it is not identical. There are six languages with training data: English, Czech, Latin, Chinese, Arapaho, Navajo. Test data may contain additional languages, leading to zero-shot scenarios. Note that we distinguish two types of data available for training: “clean” (which should be reasonably similar to gold-standard test data, but is typically very small, in case of Latin non-existent) and “dirty” (which is much larger, especially for Czech and English, but it is imperfect or incomplete in various aspects – use at your own risk!)
All training/development data is freely available, without need for registration or signing a contract. A temporary download URL is active during the shared task. After the shared task, the data will be published at a permanent location.
We have published a specification of the UMR file format. Participants will be expected to submit valid system outputs in the same format. Blind data provided as system input will be tokenized and segmented to sentences; to facilitate evaluation, system outputs must preserve tokenization and segmentation.
At the beginning of the test phase, participants will receive blind test data, tokenized and segmented into sentences. Each document will be in a separate text file, one sentence per line, tokens separated by a space. For each input file, the participating system must generate corresponding valid UMR file with exactly the same sentences and tokens. Each sentence must have all four annotation block (tokens, sentence level graph, alignment, document level graph); if the system cannot predict certain types of annotation (e.g. document-level relations), the corresponding block must still exist, even if empty. Such omissions will naturally be penalized by lower score. We specifically point out that token-node alignment should not be omitted, as it affects mapping of system and gold nodes, and, consequently, evaluation of all relations and attributes.
Submitted UMR files will be first checked by the validation script. A file that does not pass validation will not be processed by the scoring script and its score will be set to 0. Not all tests available in the validation script must be passed. It is sufficient if the validation is passed with the following options (replace myfile.umr with the path to the file being validated):
python validate.py --level 2 myfile.umr
If the file passes validation, it will be compared with the gold standard file and scored. As is usual in evaluation of graph-based semantic representations, the main score is F₁ of triples (node0 :relation node1), resp. (node :attribute value) or (node :concept concept). Node identifiers (variables) in the system-produced file do not have to match ids in the gold-standard file. The algorithm that maps system nodes to gold nodes is taylored to the specifics of Uniform Meaning Representation (in particular, the availability of node-token alignment). This contrasts with the smatch score that is often used to evaluate AMR. The evaluation script can be invoked as follows:
perl compare_umr.pl GOLD goldfile.umr SYS myfile.umr --quiet
By default, the script runs in verbose mode and prints a lot of diagnostic information comparing the two files. With the --quiet option, it prints only the final score.
Besides the main metric for ranking of the participating systems, we also plan on computing various additional metrics (such as a separate F1-score for concepts, or a score for sentence-level graphs, disregarding document-level relations).
The shared task is not divided into any tracks. System outputs will be submitted to the task as a whole, and every submission will be evaluated along the same set of metrics.
Individuals and teams considering participation should register via a simple Google form (https://forms.gle/pc2c7A27TxeHjRKZ7). There is no deadline for registration but the sooner the better, as we intend to send important information to registered participants by e-mail.
There are no restrictions on who can participate. (The two main organizers will not participate.)
The link to the submission form will be posted here before the test phase starts. Participants will submit system outputs (parsed data), not the systems themselves. Each submission will be automatically checked for validity, so the participants know whether their submission can be evaluated.
Questions? Contact the organizers: