Event-type Ontology in Multiple Languages

(Czech, English, German, Spanish)

 

 Introduction

The project attempts to create specifications and definitions of a hierarchical event-type ontology, populated with words denoting events or states (primarily verbs, verbal nouns, and adjectives, but also any other single- or multiword units denoting events or states). It links its entries or "classes" (and the words that evoke them) to several existing lexical resources that exist and has, to some extent, similar goals; such linking allows for both theoretical and practical comparison and use of the resources.

 

Quick links to the current version of SynSemClass: the browser, the search tool, download (ZIP/XML), source code (GitHub).

 

The project has dynamically evolved from the original idea to create a bilingual synonym lexicon, which has focused on contextually-based synonymy and valency of verbs in a bilingual setting. The analysis of semantic ‘equivalence’ (synonymy or near synonymy) of verb senses, and their valency behavior in parallel Czech-English language resources (linked to the Prague Dependency Treebank family, and in turn, to the underlying Functional Generative Description theory) was in its core. It has used translational context, which helped to define the properties of verb sense classes of synonyms. An initial sample bilingual verb lexicon of classes representing synonym or near-synonym pairs of verbs (verb senses) based on richly annotated corpora and existing lexical resources have been created. Based on the contribution of that original project, a follow-up project has been set up to add more languages, gearing towards an ontology view of event types.

In fact, there has been a predecessor project before the original bilingual approach, which was of great use (both as a resource and as an inspiration), namely the project called A comparison of Czech and English verbal valency based on corpus material (theory and practice). In all projects, the richly annotated Prague Czech-English Dependency Treebank (PCEDT) served as the source of corpus material and examples, which are also included in the resulting resources.

For other languages, standard parallel corpora have to be used with automatic alignment techniques to preselect words that are possibly evoking the existing classes, and then the usual manual annotation (i.e., entry expansion) process begins, supported by various analytical and annotation tools.

The resulting resource will be always freely available for academic purposes through the LINDAT/CLARIAH-CZ Research Infrastructure repository and online services, including a web/based browsing and search tool. The central repository of the source code developed under the SynSemClass project can be found at GitHub. It is also linked back to the Unified Verb Index available at the University of Colorado Boulder.

 

The project uses (and links back to) the following lexical resources:

Czech:

English: 

German: 

Spanish:

 

​Research Team

Starting from the beginning:  

Starting in 2021

  • Prof. Georg Rehm, Ph.D. is the Co-PI of the Humane AI Net microproject META/O/NLU which adds the German language.
  • Karolina Zaczynska, M.A  is leading and coordinating the German part.
  • PhDr. Kateřina Rysová, Ph.D. is co-leading and coordinating the German part.

Starting in 2023:  

  • Christina Fernandéz Alcaina, Ph.D. is in charge of the Spanish part of the SynSemClass project.
  • RNDr. Jana Straková, Ph.D. started to involve Large language models in the creation of synonymous classes.

Starting in 2025:  

 

​Annotators 

  • Czech: Darya Klambotskaya (2021-2023), Petr Kujal (2021-2024), Markéta Malcová (2021-2024), Tomáš Razím (2023 - present), Anna Staňková (2023 - present)
  • English: Kathryn Conger (2021-2022), Darya Klambotskaya (2021-2023), Petr Kujal (2021-2024), Markéta Malcová (2021-2024), Tomáš Razím (2023 - present), Anna Staňková (2023 - present)
  • German: Charlotte Friedrich (2021-2022), Danny Srp, PhDr (2021-2022), Kateřina Rysová, Ph.D. (2021-2022),Ilyas Zivana (2022- 2025), Anna Magdalena Fišerová (2022- 2025, Kryštof Navrátil (2023 - 2025)​
  • Spanish: Alba E. Ruz Gómez (2021-2025), Cristina Lara Clares (2021-2022), Laura Martínez Abarca (2021-2025)

Since mid 2025 (Czech and English) Anna Houžvičková, Vladimír Ludvík, Anna Staňková, Tomáš Razím, Klára Zborníková, Daniel Žák

 

SynSemClass versions:

  • SynSemClass 5.5 English-Czech-German-Spanish

         (browsable here and searchable here), issued 2025

​The SynsemClass 5.5 builds on previous versions (see below for all versions and contributors) but substantially enriches them with new Czech, English, and Spanish synonymous classes and their class members. The number of classes has risen from 1546 to 1993. Moreover, version 5.5. has two novelties: Hierarchichal relations and Czech deverbal nouns (a small sample). The hierarchical structure models specialization and generalization relations between classes that are formally and technically unrelated in the original ontology. The goal is to enable one to use the ontology enriched by the hierarchical concepts for annotation of running texts in symbolic meaning representations, such as UMR or PDT. The hierarchy is in principle built bottom-up, based on existing SSC classes (concepts). This approach differs from other approaches to semantic classes, such as in WordNet or VerbNet. Although the hierarchical relations are similar, the underlying nodes in the hierarchy are not.

  • SynSemClass 5.1 English-Czech-German-Spanish

         (browsable here and searchable here), issued 2024

The major change against v5.0 is that links to English Princeton Wordnet and to German GUP point to their new versions and new websites that host them. English Wordnet now links to the Open English Wordnet, a fork of the Princeton WordNet developed under an open source methodology and released through the Open English Wordnet website. German Universal PropBank (GUP) is now part of the Universal Propbanks.

  • SynSemClass 5.0 English-Czech-German-Spanish

        (browsable here and searchable here), issued 2023

Version 5.0. enriches previous editions with a new language, Spanish. The existing languages, English, Czech and German, are further substantially extended by a larger number of classes. 

  • SynSemClass 4.0 English-Czech-German

        (browsable here and searchable here), issued 2022

  • SynSemClass 3.5 English-Czech-German

         (browsable here), issued 2021

Version 3.5. adds German verb synonyms (454 classes). The German part of the lexicon has been created within the project Multilingual Event-Type-Anchored Ontology for Natural Language Understanding (META-O-NLU), a microproject that lasted from January to June 2021, as part of the Humane AI Net EC-funded Center of Excellence in AI (project No. 952026), by two cooperating teams:

  1. The team of the Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Prague (ÚFAL), Czech Republic - PhDr. Zdeňka Urešová, Ph.D., Mgr. Eva Fučíková, PhDr. Kateřina Rysová, Ph.D., prof. RNDr. Jan Hajič, Dr., and prof. PhDr. Eva Hajičová, DrSc.
  2. The team of the German Research Center for Artificial Intelligence (DFKI) Speech and Language Technology, Berlin, Germany - prof. Dr. Phil. Georg Rehm, Karolina Zaczynska, and Peter Bourgonje (currently from the University of Potsdam).  
  • SynSemClass 3.0 Czech-English

         (searchable here). issued 2020

  • SynSemClass 2.0 Czech-English

         (searchable here), issued 2020

  • SynSemClass 1.0 (formerly CzEngClass) Czech-English

         (searchable here), issued 2019

 

​SynSemClass (CzEngClass)

 Annotation process

 

 

SynSemClass (CzEngClass) structure

 

SynSemClass 

The overall scheme of the SenSemClass lexicon and an example of a class (``complain-stěžovat si'')

 

Support acknowledgements (main grants only)

2017-2019

  • Grant Agency of the Czech Republic: No. GA17-07313S - "Contextually-based synonymy and valency of verbs in a bilingual setting" (CzEngClass)

2019-2022

​2020-2026

  • Grant Agency of the Czech Republic: project No GX20-16819X - "LuSyD" (Language Understanding: from Syntax to Discourse).
  • Human AI Net: EC 952026 - within the microproject "Multilingual Event-type-anchored Ontology for Natural Language Understanding META-O-NLU"

2023-2027

  • Grant Agency of the Czech Republic: project No. LM2023062, MŠMT LRI program: LINDAT/CLARIAH-CZ 

 

References

Data/Software

  • Urešová Zdeňka, Fučíková Eva, Hajič Jan, Kolářová Veronika, Fernández Alcaina Cristina, Bourgonje Peter, Hajičová Eva, Rehm Georg, Rysová Kateřina, Zaczynska Karolina : SynSemClass 5.5. Data/software, LINDAT/CLARIN Research Infrastructure, Prague, Czech Republic, http://hdl.handle.net/11234/1-5915, Jun 2025

  • Urešová Zdeňka, Fučíková Eva, Fernández Alcaina Cristina, Hajič Jan: SynSemClass 5.0. Data/software, LINDAT/CLARIN Research Infrastructure, Prague, Czech Republic, https://lindat.cz/services/SynSemClass50/, Nov 2023

  • Petliak Nataliia, Fučíková Eva, Hajič Jan, Urešová Zdeňka: SynSemClass Search Tool. Data/software, LINDAT/CLARIN Research Infrastructure, Prague, Czech Republic, https://lindat.mff.cuni.cz/services/SynSemClassSearch/, Oct 2023

  • Urešová Zdeňka, Bourgonje Peter, Fučíková Eva, Hajič Jan, Hajičová Eva, Rysová Kateřina, Rehm Georg, Zaczynska Karolina : SynSemClass 4.0. Data/software, LINDAT/CLARIN Research Infrastructure, Prague, Czech Republic, https://lindat.cz/services/SynSemClass40/, Jun 2022

  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan, Rehm Georg, Zaczynska Karolina : SynSemClass 3.5. Data/software, LINDAT/CLARIN Research Infrastructure, https://lindat.cz/services/SynSemClass35/, 2021

  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: SynSemClass 2.0. Data/software, LINDAT/CLARIN Research Infrastructure, https://ufal.mff.cuni.cz/synsemclass, 202

  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: SynSemClass 3.0. Data/software, LINDAT/CLARIN Research Infrastructure, https://ufal.mff.cuni.cz/synsemclass, 2020

  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: SynSemClass 1.0. Data/software, LINDAT/CLARIN Research Infrastructure, https://ufal.mff.cuni.cz/synsemclass, 2019

  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: CzEngClass 0.1. Data/software, LINDAT/CLARIN Research Infrastructure, http://hdl.handle.net/11234/1-2823, 2018

  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: CzEngClass 0.2. Data/software, LINDAT/CLARIN Research Infrastructure, http://hdl.handle.net/11234/1-2824, 2018

Publications

 

How to cite

If you use SynSemClass (or CzEngClass), please use the BiBTeX entries below or one or more of the relevant papers listed above (for plaintext citation format).

@article{ biblio:UrFuCzEngClass2017,
journal = {Jazykovedn{\'{y}} {\v{c}}asopis / Journal of Linguistics},
title = {CzEngClass – Towards a Lexicon of verb Synonyms with Valency linked to Semantic Roles},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}}},
year = {2017},
volume = {68},
number = {2},
pages = {364--371},
issn = {0021-5597}
}

@article{ biblio:UrFuMeaningand2019,
journal = {Jazykovedn{\'{y}} {\v{c}}asopis / Journal of Linguistics},
title = {Meaning and Semantic Roles in CzEngClass Lexicon},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2019},
address = {Bratislava, Slovakia},
volume = {70},
number = {2},
pages = {403--411},
issn = {0021-5597}
}

@article{ biblio:UrFuACROSSLINGUAL2018,
journal = {Prace Filologiczne},
title = {A {CROSS}-{LINGUAL} {SYNONYM} {CLASSES} {LEXICON}},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
volume = {{LXXII}},
pages = {405--418},
issn = {0138-0567}
}

@inproceedings{ biblio:UrFuCreatinga2018,
booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation ({LREC} 2018)},
title = {Creating a Verb Synonym Lexicon Based on a Parallel Corpus},
editor = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Bente Maegaard and Joseph Mariani and H{\'{e}}l{\`{e}}ne Mazo and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {European Language Resources Association},
address = {Paris, France},
venue = {Phoenix Seagaia Conference Center},
pages = {1432--1437},
isbn = {979-10-95546-00-9}
}

@inproceedings{ biblio:UrFuDefiningVerbal2018,
booktitle = {Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories ({TLT} 2018)},
title = {Defining Verbal Synonyms: between Syntax and Semantics},
editor = {Dag Haug and Stephan Oepen and Lilja {\O{}}vrelid and Marie Candito and Jan Haji{\v{c}}},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {Link{\"{o}}ping University Electronic Press},
organization = {Universitetet i Oslo},
address = {Link{\"{o}}ping, Sweden},
venue = {Universitetet i Oslo},
number = {155},
pages = {75--90},
isbn = {978-91-7685-137-1},
issn = {1650-3740}
}

@inproceedings{ biblio:UrFuSynonymyin2018,
booktitle = {Proceedings of The 27th International Conference on Computational Linguistics },
title = {Synonymy in Bilingual Context: The CzEngClass Lexicon},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {{ICCL}},
organization = {{ICCL}},
address = {Sheffield, {GB}},
pages = {2456--2469},
isbn = {978-4-87974-703-7}
}

@inproceedings{ biblio:UrFuToolsfor2018,
booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation ({LREC} 2018)},
title = {Tools for Building an Interlinked Multilingual Synonym Lexicon Network},
editor = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Bente Maegaard and Joseph Mariani and H{\'{e}}l{\`{e}}ne Mazo and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {European Language Resources Association},
address = {Paris, France},
venue = {Phoenix Seagaia Conference Center},
pages = {850--856},
isbn = {979-10-95546-00-9}
}

@unpublished{ biblio:UrZaMakinga2022,
title = {Making a Semantic Event-type Ontology Multilingual},
editor = {Nicoletta Calzolari and Fr{\'{e}}d{\'{e}}ric B{\'{e}}chet and Philippe Blache and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and H{\'{e}}l{\`{e}}ne Mazo and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Karolina Zaczynska and Peter Bourgonje and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Georg Rehm and Jan Haji{\v{c}}},
year = {2022},
publisher = {European Language Resources Association},
address = {Marseille, France},
venue = {Le Palais du Pharo},
pages = {1--10},
}

@article{ biblio:FeFuSpanishSynonyms2023,
journal = {Jazykovedn{\'{y}} {\v{c}}asopis / Journal of Linguistics},
title = {Spanish Synonyms as Part of a Multilingual Event-Type Ontology},
author = {Cristina Fern{\'{a}}ndez Alcaina and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Jan Haji{\v{c}} and Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}}},
year = {2023},
volume = {74},
number = {1},
pages = {153--162},
issn = {0021-5597},
}

@inproceedings{ biblio:FeFuSpanishVerbal2023,
booktitle = {Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories},
title = {Spanish Verbal Synonyms in the SynSemClass Ontology},
editor = {Sandra K{\"{u}}bler},
author = {Cristina Fern{\'{a}}ndez-Alcaina and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Jan Haji{\v{c}} and Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}}},
year = {2023},
publisher = {Association for Computational Linguistics},
organization = {Association for Computational Linguistics},
address = {Washington, D.C., {USA}},
venue = {Georgetown University in Washington D.C.},
pages = {11--20},
isbn = {978-1-959429-33-3},
}

@inproceedings{ biblio:FuHaCorpusBasedMultilingual2023,
booktitle = {Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories},
title = {Corpus-Based Multilingual Event-type Ontology: annotation tools and principles},
editor = {Sandra K{\"{u}}bler},
author = {Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Jan Haji{\v{c}} and Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}}},
year = {2023},
publisher = {Association for Computational Linguistics},
organization = {Association for Computational Linguistics},
address = {Washington, D.C., {USA}},
venue = {Georgetown University in Washington D.C.},
pages = {1--10},
isbn = {978-1-959429-33-3},
}

@misc{ biblio:PeFuSynSemClassSearch2023,
title = {SynSemClass Search Tool},
author = {Nataliia Petliak and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Jan Haji{\v{c}} and Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}}},
year = {2023},
publisher = {{LINDAT}/{CLARIN} Research Infrastructure},
organization = {Charles University},
address = {Prague, Czech Republic},
}

@inproceedings{ biblio:StFuExtendingan2023,
booktitle = {Proceedings of the 17th Linguistic Annotation Workshop},
title = {Extending an Event-type Ontology: Adding Verbs and Classes using Fine-tuned {LLM}s Suggestions},
editor = {Jakob Prange and Annemarie Fridrich},
author = {Jana Strakov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Jan Haji{\v{c}} and Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}}},
year = {2023},
publisher = {Association for Computational Linguistics},
organization = {Association for Computational Linguistics},
address = {Stroudsburg, {PA}, {USA}},
pages = {85--95},
}

@inproceedings{uresova-etal-2025-creating,
    title = "Creating Hierarchical Relations in a Multilingual Event-type Ontology",
    author = "Ure{\v{s}}ov{\'a}, Zde{\v{n}}ka  and
      Fu{\v{c}}{\'i}kov{\'a}, Eva  and
      Haji{\v{c}}, Jan",
    editor = "Peng, Siyao  and
      Rehbein, Ines",
    booktitle = "Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.law-1.19/",
    doi = "10.18653/v1/2025.law-1.19",
    pages = "240--249",
    ISBN = "979-8-89176-262-6",
   }