[ Skip to the content ]

Institute of Formal and Applied Linguistics

at Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic


[ Back to the navigation ]

Publication


Year 2016
Type in proceedings
Status published
Language English
Author(s) Žabokrtský, Zdeněk Ševčíková, Magda Straka, Milan Vidra, Jonáš Limburská, Adéla
Title Merging Data Resources for Inflectional and Derivational Morphology in Czech
Czech title Propojení datových zdrojů pro flektivní a derivační morfologii češtiny
Proceedings 2016: Paris, France: LREC 2016: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)
Pages range 1307-1314
How published online
URL http://www.lrec-conf.org/proceedings/lrec2016/pdf/994_Paper.pdf
Supported by 2016-2018 GA16-18177S (DerInfMorph: An Integrated Approach to Derivational and Inflectional Morphology of Czech) 2016-2019 LM2015071 (LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat) 2012-2016 PRVOUK P46 (Informatika)
Czech abstract Článek se zabývá propojením dvou existujících, vzájemně se doplňujících datových zdrojů pro morfologii češtiny, a to flektivního slovníku MorfFlex CZ a derinační sítě DeriNet. MorfFlex CZ pokrývá několik milionů slovních forem v češtině. DeriNet obsahuje několik set tisíc českých lemmat, která jsou propojena explicitními slovotvornými relacemi odpovídajícímu. Výsledný zdroj je zpřístupněn pod licencí CC-BY-NC-SA a lze k němu rovněž přistupovat pomocí několika webových uživatelských rozhraní.
English abstract The paper deals with merging two complementary resources of morphological data previously existing for Czech, namely the inflectional dictionary MorfFlex CZ and the recently developed lexical network DeriNet. The MorfFlex CZ dictionary has been used by a morphological analyzer capable of analyzing/generating several million Czech word forms according to the rules of Czech inflection. The DeriNet network contains several hundred thousand Czech lemmas interconnected with links corresponding to derivational relations (relations between base words and words derived from them). After summarizing basic characteristics of both resources, the process of merging is described, focusing on both rather technical aspects (growth of the data, measuring the quality of newly added derivational relations) and linguistic issues (treating lexical homonymy and vowel/consonant alternations). The resulting resource contains 970 thousand lemmas connected with 715 thousand derivational relations and is publicly available on the web under the CC-BY-NC-SA license. The data were incorporated in the MorphoDiTa library version 2.0 (which provides morphological analysis, generation, tagging and lemmatization for Czech) and can be browsed and searched by two web tools (DeriNet Viewer and DeriNet Search tool).
Specialization linguistics ("jazykověda")
Confidentiality default – not confidential
Open access no
Editor(s)* Nicoletta Calzolari; Khalid Choukri; Thierry Declerck; Marko Grobelnik; Bente Maegaard; Joseph Mariani; Asunción Moreno; Jan Odijk; Stelios Piperidis
ISBN* 978-2-9517408-9-1
Address* Paris, France
Month* May
Venue* Grand Hotel Bernardin Conference Center
Publisher* European Language Resources Association
Creator: Common Account
Created: 10/23/16 10:36 AM
Modifier: Almighty Admin
Modified: 2/25/17 10:07 PM
***

Paperpublic2016-lrec_derinet.pdfapplication/pdf
Content, Design & Functionality: ÚFAL, 2006–2016. Page generated: Wed Nov 14 17:47:22 CET 2018

[ Back to the navigation ] [ Back to the content ]

100% OpenAIRE compliant