Event-type Ontology in Multiple Languages

(Czech, English, German, Spanish)

 

Introduction

 

The project attempts to create specifications and definitions of a hierarchical event-type ontology, populated with words denoting events or states (primarily verbs, verbal nouns, and adjectives, but also any other single- or multiword units denoting events or states). It links its entries or "classes" (and the words that evoke them) to several existing lexical resources that exist and has, to some extent, similar goals; such linking allows for both theoretical and practical comparison and use of the resources.

 

Quick links to the current version of SynSemClass: the browser, the search tool, download (ZIP/XML), source code (GitHub).

 

 

The project has dynamically evolved from the original idea to create a bilingual synonym lexicon, which has focused on contextually-based synonymy and valency of verbs in a bilingual setting. The analysis of semantic ‘equivalence’ (synonymy or near synonymy) of verb senses, and their valency behavior in parallel Czech-English language resources (linked to the Prague Dependency Treebank family, and in turn, to the underlying Functional Generative Description theory) was in its core. It has used translational context, which helped to define the properties of verb sense classes of synonyms. An initial sample bilingual verb lexicon of classes representing synonym or near-synonym pairs of verbs (verb senses) based on richly annotated corpora and existing lexical resources have been created. Based on the contribution of that original project, a follow-up project has been set up to add more languages, gearing towards an ontology view of event types.

 

In fact, there has been a predecessor project before the original bilingual approach, which was of great use (both as a resource and as an inspiration), namely the project called A comparison of Czech and English verbal valency based on corpus material (theory and practice). In all projects, the richly annotated Prague Czech-English Dependency Treebank (PCEDT) served as the source of corpus material and examples, which are also included in the resulting resources.

 

For other languages, standard parallel corpora have to be used with automatic alignment techniques to preselect words that are possibly evoking the existing classes, and then the usual manual annotation (i.e., entry expansion) process begins, supported by various analytical and annotation tools.

 

The resulting resource will be always freely available for academic purposes through the LINDAT/CLARIAH-CZ Research Infrastructure repository and online services, including a web/based browsing and search tool. The central repository of the source code developed under the SynSemClass project can be found at GitHub.
It is also linked back to the Unified Verb Index available at the University of Colorado Boulder.

 The project uses (and links back to) the following lexical resources: 

The latest version online (SynSemClass 5.0, English-Czech-German-Spanish)

The current (latest) version of SynSemClass is SynsemClass5.0 (see also below in more detail; download and browse here: SynsemClass5.0For searching data you can use the SynSemClass Search Tool.

This version contains also data from previous versions, see below (SynSemClass1.0 - SynSemClass4.0) for all the contributors.

Research team

PhDr. Zdeňka Urešová, Ph.D. is in charge of the SynSemClass project.

Mgr. Eva Fučíková provides technical support.

prof. RNDr. Jan Hajič, Dr. is the PI of LINDAT/CLARIAH-CZ and of the LuSyD project and coordinates the expansion work on the lexicon.

Cristina Fernández Alcaina, Ph.Dis in charge of the Spanish part of the SynSemClass project.

 

Annotators  

Czech:  Darya KlambotskayaPetr Kujal, Markéta Malcová, Tomáš, Razím, Anna Staňková 

English:  Darya KlambotskayaPetr Kujal, Markéta Malcová, Tomáš, Razím, Anna Staňková 

German:  Ilyas Zivana, Anna Magdalena Fišerová, Kryštof Navrátil

Spanish: Alba E. Ruz Gómez , Cristina Lara Clares (2021-2022), Laura Martínez Abarca 

 

Previous version(s) and future plans

 

SynSemClass 4.0: Czech-English-German

Research team

PhDr. Zdeňka Urešová, Ph.D. is in charge of the SynSemClass project.

Mgr. Eva Fučíková provides technical support.

prof. RNDr. Jan Hajič, Dr. is the PI of LINDAT/CLARIAH-CZ and of the LuSyD project and coordinates the expansion work on the lexicon.

Prof. Georg Rehm, Ph.D., (2021) of DFKI Berlin, is the Co-PI of the Humane AI Net microproject META/O/NLU which adds the German language.

Karolina Zaczynska, M.A,  (2021) of DFKI Berlin/Univ. of Potsdam, is leading and coordinating the linguistic specification and annotation effort for German.

PhDr. Kateřina Rysová, Ph.D. (2021-2022) is co-leading and coordinating the linguistic specification and annotation effort for German

 

Annotators 

Czech:  Mgr. Darya Klambotskaya (2021-2023), Mgr. Petr Kujal, Markéta Malcová Bc., Mgr. et Mgr. Tomáš, Razím, Anna Staňková Bc. 

English: Kathryn Conger (2021-2022), Mgr. Darya Klambotskaya (2021-2023), Mgr. Petr Kujal, Markéta Malcová Bc., Mgr. et Mgr. Tomáš, Razím, Anna Staňková Bc.

German: Charlotte Friedrich (2021-2022), Danny Srp, PhDr (2021-2022). Kateřina Rysová, Ph.D. (2021-2022), Ilyas Zivana (2022- present), Anna Magdalena Fišerová (2022- present), Kryštof Navrátil (2023 - present)

SynSemClass 3.5: Czech-English-German

The SynsemClass3.5 (download: SynSemClass3.5, search and browse: SynSemClass version 3.5 web) which adds new German verb synonyms. Based on both parallel Czech-English and German-English language resources, it now collects and investigates semantic ‘equivalence’ of Czech, English, and German verb senses and their valency behavior in parallel Czech-English and German-English corpora In addition to the already used external links (see above) to Czech and English semantic lexicon entries, new German-language lexical resources are exploited: FrameNet des Deutschen, Woxikon, Elektronisches Valenzwöerterbuch Deutscher Verben, and German Universal Proposition Bank. Some more Czech and English verbs are also included (compared to version 3.0).

The German part of the lexicon has been created within the project Multilingual Event-Type-Anchored Ontology for Natural Language Understanding (META-O-NLU), a microproject that lasted from January to June 2021, as part of the Humane AI Net EC-funded Center of Excellence in AI (project No. 952026), by two cooperating teams:

  1. The team of the Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Prague (ÚFAL), Czech Republic - PhDr. Zdeňka Urešová, Ph.D.Mgr. Eva Fučíková, PhDr. Kateřina Rysová, Ph.D., prof. RNDr. Jan Hajič, Dr., and prof. PhDr. Eva Hajičová, DrSc.
  2. The team of the German Research Center for Artificial Intelligence (DFKI) Speech and Language Technology, Berlin, Germany - prof. Dr. Phil. Georg Rehm, Karolina Zaczynska, and Peter Bourgonje (currently from the University of Potsdam).  

Annotators 

Czech:  Mgr. Darya Klambotskaya, Mgr. Petr Kujal, Markéta Malcová Bc., Mgr. et Mgr. Tomáš, Razím, Anna Staňková Bc. 

English: Mgr. Darya Klambotskaya, Mgr. Petr Kujal, Markéta Malcová Bc., Mgr. et Mgr. Tomáš, Razím, Anna Staňková Bc.

German: Charlotte Friedrich, Danny Srp, PhDr. Kateřina Rysová, Ph.D.

 

SynSemClass 3.0 and earlier (Czech-English)

 

Planned (and "under construction") versions

Czech-English-German-Korean (UC)

Research team

RNDr. Jana Straková, Ph.D. is in charge of the Korean part of the SynSemClass project.

prof. RNDr. Jan Hajič, Dr. provides expert support and professional consultations.

PhDr. Zdeňka Urešová, Ph.D. provides expert support and professional consultations.

Mgr. Eva Fučíková provides technical support.

Petr Kašpárek  

Annotators

TBD

Czech-English-German-Polish (TBD)

 

SynSemClass (CzEngClass)

 Annotation process

 

 

SynSemClass (CzEngClass) structure

 

SynSemClass 

The overall scheme of the SenSemClass lexicon and an example of a class (``complain-stěžovat si'')

Support acknowledgements (main grants only)

2017-2019

  • Grant Agency of the Czech Republic: No. GA17-07313S - "Contextually-based synonymy and valency of verbs in a bilingual setting" (CzEngClass)

2019-2022

​2020-2024

  • Grant Agency of the Czech Republic: No GX20-16819X "LuSyD" (Language Understanding: from Syntax to Discourse).
  • Human AI Net: EC 952026 - within the microproject "Multilingual Event-type-anchored Ontology for Natural Language Understanding META-O-NLU"

PIs: Zdeňka Urešová (2017-2019), Jan Hajič (2020-2024)

References

Data/Software

  • Urešová Zdeňka, Fučíková Eva, Fernández Alcaina Cristina, Hajič Jan: SynSemClass 5.0. Data/software, LINDAT/CLARIN Research Infrastructure, Prague, Czech Republic, https://lindat.cz/services/SynSemClass50/, Nov 2023
  • Petliak Nataliia, Fučíková Eva, Hajič Jan, Urešová Zdeňka: SynSemClass Search Tool. Data/software, LINDAT/CLARIN Research Infrastructure, Prague, Czech Republic, https://lindat.mff.cuni.cz/services/SynSemClassSearch/, Oct 2023
  • Urešová Zdeňka, Bourgonje Peter, Fučíková Eva, Hajič Jan, Hajičová Eva, Rysová Kateřina, Rehm Georg, Zaczynska Karolina : SynSemClass 4.0. Data/software, LINDAT/CLARIN Research Infrastructure, Prague, Czech Republic, https://lindat.cz/services/SynSemClass40/, Jun 2022
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan, Rehm Georg, Zaczynska Karolina : SynSemClass 3.5. Data/software, LINDAT/CLARIN Research Infrastructure, https://lindat.cz/services/SynSemClass35/, 2021
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: SynSemClass 2.0. Data/software, LINDAT/CLARIN Research Infrastructure, https://ufal.mff.cuni.cz/synsemclass, 202
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: SynSemClass 3.0. Data/software, LINDAT/CLARIN Research Infrastructure, https://ufal.mff.cuni.cz/synsemclass, 2020
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: SynSemClass 1.0. Data/software, LINDAT/CLARIN Research Infrastructure, https://ufal.mff.cuni.cz/synsemclass, 2019
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: CzEngClass 0.1. Data/software, LINDAT/CLARIN Research Infrastructure, http://hdl.handle.net/11234/1-2823, 2018
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: CzEngClass 0.2. Data/software, LINDAT/CLARIN Research Infrastructure, http://hdl.handle.net/11234/1-2824, 2018

Publications

 

How to cite

If you use SynSemClass (or CzEngClass), please use the BiBTeX entries below or one or more of the relevant papers listed above (for plaintext citation format).

@article{ biblio:UrFuCzEngClass2017,
journal = {Jazykovedn{\'{y}} {\v{c}}asopis / Journal of Linguistics},
title = {CzEngClass – Towards a Lexicon of verb Synonyms with Valency linked to Semantic Roles},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}}},
year = {2017},
volume = {68},
number = {2},
pages = {364--371},
issn = {0021-5597}
}

@article{ biblio:UrFuMeaningand2019,
journal = {Jazykovedn{\'{y}} {\v{c}}asopis / Journal of Linguistics},
title = {Meaning and Semantic Roles in CzEngClass Lexicon},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2019},
address = {Bratislava, Slovakia},
volume = {70},
number = {2},
pages = {403--411},
issn = {0021-5597}
}

@article{ biblio:UrFuACROSSLINGUAL2018,
journal = {Prace Filologiczne},
title = {A {CROSS}-{LINGUAL} {SYNONYM} {CLASSES} {LEXICON}},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
volume = {{LXXII}},
pages = {405--418},
issn = {0138-0567}
}

@inproceedings{ biblio:UrFuCreatinga2018,
booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation ({LREC} 2018)},
title = {Creating a Verb Synonym Lexicon Based on a Parallel Corpus},
editor = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Bente Maegaard and Joseph Mariani and H{\'{e}}l{\`{e}}ne Mazo and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {European Language Resources Association},
address = {Paris, France},
venue = {Phoenix Seagaia Conference Center},
pages = {1432--1437},
isbn = {979-10-95546-00-9}
}

@inproceedings{ biblio:UrFuDefiningVerbal2018,
booktitle = {Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories ({TLT} 2018)},
title = {Defining Verbal Synonyms: between Syntax and Semantics},
editor = {Dag Haug and Stephan Oepen and Lilja {\O{}}vrelid and Marie Candito and Jan Haji{\v{c}}},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {Link{\"{o}}ping University Electronic Press},
organization = {Universitetet i Oslo},
address = {Link{\"{o}}ping, Sweden},
venue = {Universitetet i Oslo},
number = {155},
pages = {75--90},
isbn = {978-91-7685-137-1},
issn = {1650-3740}
}

@inproceedings{ biblio:UrFuSynonymyin2018,
booktitle = {Proceedings of The 27th International Conference on Computational Linguistics },
title = {Synonymy in Bilingual Context: The CzEngClass Lexicon},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {{ICCL}},
organization = {{ICCL}},
address = {Sheffield, {GB}},
pages = {2456--2469},
isbn = {978-4-87974-703-7}
}

@inproceedings{ biblio:UrFuToolsfor2018,
booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation ({LREC} 2018)},
title = {Tools for Building an Interlinked Multilingual Synonym Lexicon Network},
editor = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Bente Maegaard and Joseph Mariani and H{\'{e}}l{\`{e}}ne Mazo and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {European Language Resources Association},
address = {Paris, France},
venue = {Phoenix Seagaia Conference Center},
pages = {850--856},
isbn = {979-10-95546-00-9}
}

@unpublished{ biblio:UrZaMakinga2022,
title = {Making a Semantic Event-type Ontology Multilingual},
editor = {Nicoletta Calzolari and Fr{\'{e}}d{\'{e}}ric B{\'{e}}chet and Philippe Blache and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and H{\'{e}}l{\`{e}}ne Mazo and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Karolina Zaczynska and Peter Bourgonje and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Georg Rehm and Jan Haji{\v{c}}},
year = {2022},
publisher = {European Language Resources Association},
address = {Marseille, France},
venue = {Le Palais du Pharo},
pages = {1--10},
}

@article{ biblio:FeFuSpanishSynonyms2023,
journal = {Jazykovedn{\'{y}} {\v{c}}asopis / Journal of Linguistics},
title = {Spanish Synonyms as Part of a Multilingual Event-Type Ontology},
author = {Cristina Fern{\'{a}}ndez Alcaina and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Jan Haji{\v{c}} and Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}}},
year = {2023},
volume = {74},
number = {1},
pages = {153--162},
issn = {0021-5597},
}

@inproceedings{ biblio:FeFuSpanishVerbal2023,
booktitle = {Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories},
title = {Spanish Verbal Synonyms in the SynSemClass Ontology},
editor = {Sandra K{\"{u}}bler},
author = {Cristina Fern{\'{a}}ndez-Alcaina and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Jan Haji{\v{c}} and Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}}},
year = {2023},
publisher = {Association for Computational Linguistics},
organization = {Association for Computational Linguistics},
address = {Washington, D.C., {USA}},
venue = {Georgetown University in Washington D.C.},
pages = {11--20},
isbn = {978-1-959429-33-3},
}

@inproceedings{ biblio:FuHaCorpusBasedMultilingual2023,
booktitle = {Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories},
title = {Corpus-Based Multilingual Event-type Ontology: annotation tools and principles},
editor = {Sandra K{\"{u}}bler},
author = {Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Jan Haji{\v{c}} and Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}}},
year = {2023},
publisher = {Association for Computational Linguistics},
organization = {Association for Computational Linguistics},
address = {Washington, D.C., {USA}},
venue = {Georgetown University in Washington D.C.},
pages = {1--10},
isbn = {978-1-959429-33-3},
}

@misc{ biblio:PeFuSynSemClassSearch2023,
title = {SynSemClass Search Tool},
author = {Nataliia Petliak and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Jan Haji{\v{c}} and Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}}},
year = {2023},
publisher = {{LINDAT}/{CLARIN} Research Infrastructure},
organization = {Charles University},
address = {Prague, Czech Republic},
}

@inproceedings{ biblio:StFuExtendingan2023,
booktitle = {Proceedings of the 17th Linguistic Annotation Workshop},
title = {Extending an Event-type Ontology: Adding Verbs and Classes using Fine-tuned {LLM}s Suggestions},
editor = {Jakob Prange and Annemarie Fridrich},
author = {Jana Strakov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Jan Haji{\v{c}} and Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}}},
year = {2023},
publisher = {Association for Computational Linguistics},
organization = {Association for Computational Linguistics},
address = {Stroudsburg, {PA}, {USA}},
pages = {85--95},
}