Event-type Ontology in Multiple Languages

Introduction

 

The project attepmts to create specifications and definitions of a hierarchical event-type ontology, populated with words denotinng events or states (primarily verbs, verbal nouns and adjectives, but also any other single/ or multiword units denoting events or states). It links its entries or "classes" (and the words that evoke them) to several existing lexical resources that exist adn have, to some extent, similar goals; such linking allows for both theoretical and practical comparison and use to the resources.

 

The project has dynamically evolved from the original idea to create a bilingual synonym lexicon, which has focused on contextually-based synonymy and valency of verbs in a bilingual setting. The analysis of semantic ‘equivalence’ (synonymy or near synonymy) of verb senses, and their valency behavior in parallel Czech-English language resources (linked to the Prague Dependency Treebank family, and in turn, to the underlying Functional Generative Description theory) was in its core. Is has used translational context, which helped to define properties of verb sense classes of synonyms. An initial sample bilingual verb lexicon of classes representing synonym or near-synonym pairs of verbs (verb senses) based on richly annotated corpora and existing lexical resources will be created. Based on the contribution of that original project, a follow-up project has been set up to add more languages, gearing towards an ontology view of event types.

 

In fact, there has been a predecessor project before the original bilingual approach, which was of great use (both as a resource and as an inspiration), namely the project called A comparison of Czech and English verbal valency based on corpus material (theory and practice). In all projects, the richly annotated Prague Czech-English Dependency Treebank (PCEDT) served as the source of corpus material and exmaples, which are also included in the resulting resources.

 

For other languages, standard parallel corpora have to be used with automatic alignment techniques to preselect words that are possibly evoking the existing classes, and then the usual manual annotation (i.e., entry expansion) process begins, supported by various anlytical and annotation tools.

 

The resulting resource will be alwasy freely available for academic purposes through the LINDAT/CLARIAH-CZ Research Infrastructure repository and online services, including a web/based browsing and search tool. It is also linked back from the Unified Verb Index available at University of Colorado Boulder.

 

The project uses (and links back to) the following lexical resources: 

The latest version of SynSemClass is SynsemClass3.5 (downolad: SynSemClass3.5, search and browse: SynSemClass version 3.5 web) which adds new German verb synonyms. Based on both parallel Czech-English and German-English language resources, it now collects and investigates semantic ‘equivalence’ of Czech, English and German verb senses and their valency behavior in parallel Czech-English and German-English corpora In addition to the already used external links (see above) to Czech and English semantic lexicon entries, new German language lexical resources are exploited: FrameNet des Deutschen, Woxikon, Elektronisches Valenzwöerterbuch deutscher Verben, and German Universal Proposition Bank. Some more Czech and English verbs are also included (compared to version 3.0).

The German part of the lexicon has been created within the project Multilingual Event-Type-Anchored Ontology for Natural Language Understanding (META-O-NLU), a microproject that lasted from January to June 2021, as part of the Humane AI Net EC-funded Center of Excellence in AI (project No. 952026), by two cooperating teams:

  1. The team of the Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Prague (ÚFAL), Czech Republic - PhDr. Zdeňka Urešová, Ph.D., Mgr. Eva Fučíková, PhDr. Kateřina Rysová, Ph.D., prof. RNDr. Jan Hajič, Dr., and prof. PhDr. Eva Hajičová, DrSc.
  2. The team of the German Research Center for Artificial Intelligence (DFKI) Speech and Language Technology, Berlin, Germany - prof. Dr. Phil. Georg Rehm, Karolina Zaczynska. 

Support (main grants only)

2017-2019

  • Grant Agency of the Czech Republic: No. GA17-07313S - "Contextually-based synonymy and valency of verbs in a bilingual setting" (CzEngClass)

2019-2022

​2020-2024

  • Grant Agency of the Czech Republic: No GX20-16819X "LuSyD" (Language Understanding: from Syntax to Discourse).
  • Human AI Net: EC 952026 - within the microproject "Multilingual Event-type-anchored Ontology for Natural Language Understanding META-O-NLU"

PIs: Zdeňka Urešová (2017-2019), Jan Hajič (2020-2024)

Research team

PhDr. Zdeňka Urešová, Ph.D. is in charge of the SynSemClass project.

Mgr. Eva Fučíková provides technical support.

prof. PhDr. Eva Hajičová, DrSc., dr. h.c. provides expert support and professional consultations.

prof. RNDr. Jan Hajič, Dr. is the PI of LINDAT/CLARIAH-CZ and of the LuSyD project and coordinates the expansion work on the lexicon.

Prof. Georg Rehm, Ph.D., oF DFKI Berlin, is the Co-PI of the Humane AI Net microproject META/O/NLU which adds German language.

Karolina Zaczynska, M.A, of DFKI Berlin/Univ. of Potsdam, is leading and coordinating the linguistic, specificaiton and annotation effort for German.

Annotators 

Daniela Bodanská Bc., Mgr. Barbora BulantováKathryn CongerCharllotte Friedrich, Mgr. Václava Kettnerová, Ph.D.Mgr. Darya Klambotskaya, Mgr. Vladimíra Krajcsovicsová, Mgr. Petr KujalMarkéta Malcová Bc., Albert Maršík, Světlana Ondroušková Ph.D., Mgr. et Mgr. Tomáš, RazímMgr. et Mgr. Jakub Sláma, PhDr. Kateřina Rysová, Ph.D.PhDr. Magdaléna Rysová, Ph.D., Galina Shchelokova Mgr., Danny SrpAnna Staňková Bc., Mgr. Zuzana Vorlíková.

Current version 

Czech-English-German

Previous (and planned) versions

Czech-English 

Czech-English-German- Polish 

  • TBD

SynSemClass (CzEngClass)

 Annotation process

 

 

SynSemClass (CzEngClass) structure

 

SynSemClass 

The overall scheme of the SenSemClass lexicon and an example of a class (``complain-stěžovat si'')

Publications

  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva: CzEngClass – Towards a Lexicon of verb Synonyms with Valency linked to Semantic Roles. In: Jazykovedný časopis / Journal of Linguistics, Vol. 68, No. 2, Copyright © SAP – Slovak Academic Press, ISSN 0021-5597, pp. 364-371, 2017
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: A CROSS-LINGUAL SYNONYM CLASSES LEXICON. In: Prace Filologiczne, Vol. LXXII, Copyright © Uniwersytet Warszawski, ISSN 0138-0567, pp. 405-418, 2018
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: Creating a Verb Synonym Lexicon Based on a Parallel Corpus. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), Copyright © European Language Resources Association, Paris, France, ISBN 979-10-95546-00-9, pp. 1432-1437, 2018
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: Defining Verbal Synonyms: between Syntax and Semantics. In: Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT 2018), Copyright © Linköping University Electronic Press, Linköping, Sweden, ISBN 978-91-7685-137-1, ISSN 1650-3740, pp. 75-90, 2018
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: Synonymy in Bilingual Context: The CzEngClass Lexicon. In: Proceedings of The 27th International Conference on Computational Linguistics , Copyright © ICCL, Sheffield, GB, ISBN 978-4-87974-703-7, pp. 2456-2469, 2018
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: Tools for Building an Interlinked Multilingual Synonym Lexicon Network. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), Copyright © European Language Resources Association, Paris, France, ISBN 979-10-95546-00-9, pp. 850-856, 2018
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: Meaning and Semantic Roles in CzEngClass Lexicon. In: Jazykovedný časopis / Journal of Linguistics, Vol. 70, No. 2, Copyright © SAP – Slovak Academic Press, Bratislava, Slovakia, ISSN 0021-5597, pp. 403-411, Oct 2019
 

How to cite

If you use SynSemClass (or CzEngClass), please use the BiBTeX entries below or one or more of the relevant papers listed above (for plaintext citation format).

@article{ biblio:UrFuCzEngClass2017,
journal = {Jazykovedn{\'{y}} {\v{c}}asopis / Journal of Linguistics},
title = {CzEngClass – Towards a Lexicon of verb Synonyms with Valency linked to Semantic Roles},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}}},
year = {2017},
volume = {68},
number = {2},
pages = {364--371},
issn = {0021-5597}
}

@article{ biblio:UrFuMeaningand2019,
journal = {Jazykovedn{\'{y}} {\v{c}}asopis / Journal of Linguistics},
title = {Meaning and Semantic Roles in CzEngClass Lexicon},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2019},
address = {Bratislava, Slovakia},
volume = {70},
number = {2},
pages = {403--411},
issn = {0021-5597}
}

@article{ biblio:UrFuACROSSLINGUAL2018,
journal = {Prace Filologiczne},
title = {A {CROSS}-{LINGUAL} {SYNONYM} {CLASSES} {LEXICON}},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
volume = {{LXXII}},
pages = {405--418},
issn = {0138-0567}
}

@inproceedings{ biblio:UrFuCreatinga2018,
booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation ({LREC} 2018)},
title = {Creating a Verb Synonym Lexicon Based on a Parallel Corpus},
editor = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Bente Maegaard and Joseph Mariani and H{\'{e}}l{\`{e}}ne Mazo and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {European Language Resources Association},
address = {Paris, France},
venue = {Phoenix Seagaia Conference Center},
pages = {1432--1437},
isbn = {979-10-95546-00-9}
}

@inproceedings{ biblio:UrFuDefiningVerbal2018,
booktitle = {Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories ({TLT} 2018)},
title = {Defining Verbal Synonyms: between Syntax and Semantics},
editor = {Dag Haug and Stephan Oepen and Lilja {\O{}}vrelid and Marie Candito and Jan Haji{\v{c}}},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {Link{\"{o}}ping University Electronic Press},
organization = {Universitetet i Oslo},
address = {Link{\"{o}}ping, Sweden},
venue = {Universitetet i Oslo},
number = {155},
pages = {75--90},
isbn = {978-91-7685-137-1},
issn = {1650-3740}
}

@inproceedings{ biblio:UrFuSynonymyin2018,
booktitle = {Proceedings of The 27th International Conference on Computational Linguistics },
title = {Synonymy in Bilingual Context: The CzEngClass Lexicon},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {{ICCL}},
organization = {{ICCL}},
address = {Sheffield, {GB}},
pages = {2456--2469},
isbn = {978-4-87974-703-7}
}

@inproceedings{ biblio:UrFuToolsfor2018,
booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation ({LREC} 2018)},
title = {Tools for Building an Interlinked Multilingual Synonym Lexicon Network},
editor = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Bente Maegaard and Joseph Mariani and H{\'{e}}l{\`{e}}ne Mazo and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {European Language Resources Association},
address = {Paris, France},
venue = {Phoenix Seagaia Conference Center},
pages = {850--856},
isbn = {979-10-95546-00-9}
}