Universal Derivations (UDer)

Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivation, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure (as used in the DeriNet 2.0 database), in which nodes correspond to lexemes while edges represent derivational relations or compounding.

Online tools

Each individual resource in the UDer collection can be searched online using two versions of DeriNet Search.  DeriSearch v2 shows all pieces of information stored in the data, while DeriSearch v1 displays only derivational relations (not compounding relations). The data can be processed using DeriNet 2.0 API. Relevant scripts for harmonising the original resources and releasing the UDer collections are available in the GitHub repository.

 

The current version

The current version of the collection is UDer 1.0. It contains 27 harmonized resources covering 20 languages (listed in the table below). UDer 1.0 is available in the LINDAT/CLARIAH CZ digital library (item: http://hdl.handle.net/11234/1-3236). The license for each of the harmonized resources included in the collection is specified in the appropriate language/resource directory.

Resource Language Lexemes Relations Families License
CatVar English 82,675 24,873 57,802 OSL-1.1
D-CELEX Dutch 125,611 13,435 112,176 GPL-3.0 (for scripts)
Démonette French 21,290 13,808 7,482 CC BY-NC-SA 3.0
DeriNet Czech 1,027,665 809,282 218,383 CC BY-NC-SA 3.0
DeriNet.ES Spanish 151,173 36,935 114,238 CC BY-NC-SA 3.0
DeriNet.FA Persian 43,357 35,745 7,612 CC BY-NC-SA 4.0
DerIvaTario Italian 8,267 1,787 6,480 CC BY-SA 4.0
DErivBase German 280,775 43,368 237,407 CC BY-SA 3.0
DerivBase.Hr Croatian 99,606 35,289 64,317 CC BY-SA 3.0
DerivBase.Ru Russian 270,473 133,759 136,714 Apache 2.0
E-CELEX English 53,103 9,826 43,277 GPL-3.0 (for scripts)
EstWordNet Estonian 988 507 481 CC BY-SA 3.0
EtymWordNet-cat Catalan 7,496 4,568 2,928 CC BY-SA 3.0
EtymWordNet-ces Czech 7,633 5,237 2,396 CC BY-SA 3.0
EtymWordNet-gla Gaelic 7,524 5,013 2,511 CC BY-SA 3.0
EtymWordNet-pol Polish 27,797 24,876 2,921 CC BY-SA 3.0
EtymWordNet-por Portuguese 2,797 1,610 1,187 CC BY-SA 3.0
EtymWordNet-rus Russian 4,005 3,227 778 CC BY-SA 3.0
EtymWordNet-hbs Serbo-Croatian 8,033 6,303 1,730 CC BY-SA 3.0
EtymWordNet-swe Swedish 7,333 4,423 2,910 CC BY-SA 3.0
EtymWordNet-tur Turkish 7,774 5,837 1,937 CC BY-SA 3.0
FinnWordNet Finnish 20,035 11,922 8,113 CC BY-SA 4.0
G-CELEX German 53,282 13,553 39,729 GPL-3.0 (for scripts)
Nomlex-PT Portuguese 7,020 4,201 2,819 CC BY-SA 4.0
The Morpho-Semantic Database English 13,813 7,855 5,958 CC BY-NC-SA 3.0
The Polish WFN Polish 262,887 189,217 73,670 CC BY-NC-SA 3.0
Word Formation Latin Latin 36,417 32,414 4,003 CC BY-NC-SA 4.0

 

 

Related publications