[ Skip to the content ]

Institute of Formal and Applied Linguistics

at Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic


[ Back to the navigation ]

Publication


Year 2016
Type in proceedings without ISBN
Status published
Language English
Author(s) Klímová, Jana Kolářová, Veronika Vernerová, Anna
Title Towards a Corpus-based Valency Lexicon of Czech Nouns
Czech title Ke slovníku valence českých substantiv založenému na korpusu
Proceedings 2016: : LREC 2016 workshop: GLOBALEX 2016: Lexicographic Resources for Human Language Technology
Pages range 1-7
How published online
URL http://ailab.ijs.si/globalex/files/2016/06/LREC2016Workshop-GLOBALEX_Proceedings-v2.pdf
Supported by 2016-2018 GA16-02196S (Corpus-based Valency Lexicon of Czech Nouns) 2016-2019 LM2015071 (LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat) 2010-2015 LM2010013 (LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat)
Czech abstract Slovník valence substantiv založený na korpusu je začínající projekt navazující na předchozí výzkumy valence substantiv v češtině. Tento příspěvek se zabývá zachycením valence substantiv v moderním slovníku založeném na korpusu, dostupném jak pro lidské uživatele, tak ve formě strojově čitelných dat. Referujeme také o omezeních, která pro výzkum jmenné valence představují v současnosti nejčastěji užívané korpusové vyhledávače.
English abstract Corpus-based Valency Lexicon of Czech Nouns is a starting project picking up the threads of our previous work on nominal valency. It builds upon solid theoretical foundations of the theory of valency developed within the Functional Generative Description. In this paper, we describe the ways of treating valency of nouns in a modern corpus-based lexicon, available as machine readable data in a format suitable for NLP applications, and report on the limitations that the most commonly used corpus interfaces provide to the research of nominal valency. The linguistic material is extracted from the Prague Dependency Treebank, the synchronic written part of the Czech National Corpus, and Araneum Bohemicum. We will utilize lexicographic software and partially also data developed for the valency lexicon PDT-Vallex but the treatment of entries will be more exhaustive, for example, in the coverage of senses and in the semantic classification added to selected lexical units (meanings). The main criteria for including nouns in the lexicon will be semantic class membership and the complexity of valency patterns. Valency of nouns will be captured in the form of valency frames, enumeration of all possible combinations of adnominal participants, and corpus examples.
Specialization linguistics ("jazykověda")
Confidentiality default – not confidential
Open access no
Publisher* GLOBALEX workshop 2016
Creator: Common Account
Created: 6/2/16 4:55 PM
Modifier: Almighty Admin
Modified: 2/25/17 10:07 PM
***

KlimovaKolarovaVernerova16towards.pdfpublicKlimovaKolarovaVernerova16towards.pdfapplication/pdf
Content, Design & Functionality: ÚFAL, 2006–2016. Page generated: Sun Nov 19 00:24:38 CET 2017

[ Back to the navigation ] [ Back to the content ]

100% OpenAIRE compliant