Multilingual, contextually-based synonymy and valency of verbs

This project has been originally supported by the grant No. GA17-07313S of the Grant Agency of the Czech Republic named "Contextually-based synonymy and valency of verbs in a bilingual setting" (CzEngClass), https://ufal.mff.cuni.cz/grants/czengclass, in 2017-2019. From 2020 on, the lexicon extension (in size and to more languages) is supported by the Ministry of Education, Youth and Sports of the Czech Republic (MEYS) as the LINDAT/CLARIAH-CZ Research Infrastructure project No. LM2018101.

In addition, the project uses resources hosted by the LINDAT/CLARIN and LINDAT/CLARIAH-CZ Research Infrastructures, projects No. LM2015071 and LM2018101, supported by the MEYS.

Theoretical aspects of the use of the lexicon in semantic annotation are now studied as part of the "LuSyD" project, supported by the Czech Science Foundation project No. GX20-16819X.

The main results (on top of publications) are the SynSemClass 1.0 lexicon (formerly CzEngClass), available from the LINDAT repository (http://hdl.handle.net/11234/1-3125) and a as a web-based browser of the lexicon at https://lindat.mff.cuni.cz/services/SynSemClass/.

PIs: Zdeňka Urešová (2017-2019), Jan Hajič (2020-2022)

Research team

PhDr. Zdeňka Urešová, Ph.D. (70%) is in charge of the project.

Mgr. Eva Fučíková (30%) provides technical support.

Prof. PhDr. Eva Hajičová, DrSc., dr. h.c. (10%) provides expert support and professional consultations.

Prof. RNDr. Jan Hajič, Dr. (~10%) is the PI of LINDAT/CLARIAH-CZ and of the LuSyD project and coordinates the expansion work on the lexicon.

The project focuses on contextually-based synonymy and valency of verbs in a bilingual setting. The analysis of semantic ‘equivalence’ (synonymy or near synonymy) of verb senses, and their valency behavior in parallel Czech-English language resources is the core of the proposed research. Using the translational context supports more language-independent specification of properties of verb sense classes of synonyms and leads towards generalization across languages. An initial sample bilingual verb lexicon of classes representing synonym or near-synonym pairs of verbs (verb senses) based on richly annotated corpora and existing lexical resources will be created. The main contribution of the project will be a deeper insight from the bilingual perspective into the topic of verb meaning in context based on the Functional Generative Description theory, thus extending it towards appropriate description of contextually-based verb synonymy.

 

 

The project is closely related to the project A comparison of Czech and English verbal valency based on corpus material (theory and practice) https://ufal.mff.cuni.cz/czengvallex and to the Prague Czech-English Dependency Treebank (PCEDT) http://ufal.mff.cuni.cz/prague-czech-english-dependency-treebank.

 

 

  • Topic: verbal synonymy in translation (bilingual context) based on the Functional Description Theory to explore semantic ‘equivalence’ of verb senses of different verbal lexemes in a Czech-English setting:
    • 1. focus on valency behavior and semantic roles
    • 2. assumption:  bilingual context (translation) enables to delimit  synonymous verbs and so specify the verb senses more precisely than monolingual text 
  • Goal: to group verbs used as synonyms in Czech and English into (cross-lingual) synonym classes
  • Novelty: use of a richly annotated bilingual corpus to get more insight into the usage of verbs (together with their arguments in translation) synonym classes + “bottom-up” approach, starting with evidence in corpus (vs. “topdown”, with predefined set of semantic classes)

The project is based on the valency theory of the Functional Generative Description and on its application to a corpus, namely to the Prague Czech-English Dependency Treebank (PCEDT; http://hdl.handle.net/11858/00-097C-0000-0015-8DAF-4).

The project uses the following lexical resources: 

SynSemClass (CzEngClass)

 Annotation process

 

 

SynSemClass (CzEngClass) structure

 

SynSemClass 

The overall scheme of the SenSemClass lexicon and an example of a class (``complain-stěžovat si'')

Publications

  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva: CzEngClass – Towards a Lexicon of verb Synonyms with Valency linked to Semantic Roles. In: Jazykovedný časopis / Journal of Linguistics, Vol. 68, No. 2, Copyright © SAP – Slovak Academic Press, ISSN 0021-5597, pp. 364-371, 2017
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: A CROSS-LINGUAL SYNONYM CLASSES LEXICON. In: Prace Filologiczne, Vol. LXXII, Copyright © Uniwersytet Warszawski, ISSN 0138-0567, pp. 405-418, 2018
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: Creating a Verb Synonym Lexicon Based on a Parallel Corpus. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), Copyright © European Language Resources Association, Paris, France, ISBN 979-10-95546-00-9, pp. 1432-1437, 2018
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: Defining Verbal Synonyms: between Syntax and Semantics. In: Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT 2018), Copyright © Linköping University Electronic Press, Linköping, Sweden, ISBN 978-91-7685-137-1, ISSN 1650-3740, pp. 75-90, 2018
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: Synonymy in Bilingual Context: The CzEngClass Lexicon. In: Proceedings of The 27th International Conference on Computational Linguistics , Copyright © ICCL, Sheffield, GB, ISBN 978-4-87974-703-7, pp. 2456-2469, 2018
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: Tools for Building an Interlinked Multilingual Synonym Lexicon Network. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), Copyright © European Language Resources Association, Paris, France, ISBN 979-10-95546-00-9, pp. 850-856, 2018
  • Urešová Zdeňka, Fučíková Eva, Hajičová Eva, Hajič Jan: Meaning and Semantic Roles in CzEngClass Lexicon. In: Jazykovedný časopis / Journal of Linguistics, Vol. 70, No. 2, Copyright © SAP – Slovak Academic Press, Bratislava, Slovakia, ISSN 0021-5597, pp. 403-411, Oct 2019

How to cite

If you use SynSemClass (or CzEngClass), please use the BiBTeX entries below or one or more of the relevant papers listed above (for plaintext citation format).

@article{ biblio:UrFuCzEngClass2017,
journal = {Jazykovedn{\'{y}} {\v{c}}asopis / Journal of Linguistics},
title = {CzEngClass – Towards a Lexicon of verb Synonyms with Valency linked to Semantic Roles},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}}},
year = {2017},
volume = {68},
number = {2},
pages = {364--371},
issn = {0021-5597}
}

@article{ biblio:UrFuMeaningand2019,
journal = {Jazykovedn{\'{y}} {\v{c}}asopis / Journal of Linguistics},
title = {Meaning and Semantic Roles in CzEngClass Lexicon},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2019},
address = {Bratislava, Slovakia},
volume = {70},
number = {2},
pages = {403--411},
issn = {0021-5597}
}

@article{ biblio:UrFuACROSSLINGUAL2018,
journal = {Prace Filologiczne},
title = {A {CROSS}-{LINGUAL} {SYNONYM} {CLASSES} {LEXICON}},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
volume = {{LXXII}},
pages = {405--418},
issn = {0138-0567}
}

@inproceedings{ biblio:UrFuCreatinga2018,
booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation ({LREC} 2018)},
title = {Creating a Verb Synonym Lexicon Based on a Parallel Corpus},
editor = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Bente Maegaard and Joseph Mariani and H{\'{e}}l{\`{e}}ne Mazo and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {European Language Resources Association},
address = {Paris, France},
venue = {Phoenix Seagaia Conference Center},
pages = {1432--1437},
isbn = {979-10-95546-00-9}
}

@inproceedings{ biblio:UrFuDefiningVerbal2018,
booktitle = {Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories ({TLT} 2018)},
title = {Defining Verbal Synonyms: between Syntax and Semantics},
editor = {Dag Haug and Stephan Oepen and Lilja {\O{}}vrelid and Marie Candito and Jan Haji{\v{c}}},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {Link{\"{o}}ping University Electronic Press},
organization = {Universitetet i Oslo},
address = {Link{\"{o}}ping, Sweden},
venue = {Universitetet i Oslo},
number = {155},
pages = {75--90},
isbn = {978-91-7685-137-1},
issn = {1650-3740}
}

@inproceedings{ biblio:UrFuSynonymyin2018,
booktitle = {Proceedings of The 27th International Conference on Computational Linguistics },
title = {Synonymy in Bilingual Context: The CzEngClass Lexicon},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {{ICCL}},
organization = {{ICCL}},
address = {Sheffield, {GB}},
pages = {2456--2469},
isbn = {978-4-87974-703-7}
}

@inproceedings{ biblio:UrFuToolsfor2018,
booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation ({LREC} 2018)},
title = {Tools for Building an Interlinked Multilingual Synonym Lexicon Network},
editor = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Bente Maegaard and Joseph Mariani and H{\'{e}}l{\`{e}}ne Mazo and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
author = {Zde{\v{n}}ka Ure{\v{s}}ov{\'{a}} and Eva Fu{\v{c}}{\'{\i}}kov{\'{a}} and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}}},
year = {2018},
publisher = {European Language Resources Association},
address = {Paris, France},
venue = {Phoenix Seagaia Conference Center},
pages = {850--856},
isbn = {979-10-95546-00-9}
}