Institute of Formal and Applied Linguistics

at Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic

Year 2012
Type in proceedings
Status published
Language English
Author(s) Cinková, Silvie Smejkalová, Lenka Vernerová, Anna Thál, Jonáš Holub, Martin
Title Maintaining consistency of monolingual verb entries with interannotator agreement
Czech title Udržování konzistence jednojazyčných slovesných hesel pomocí mezianotátorské shody
Proceedings 2012: Lund, Sweden: Lexikografi i Norden 11: Nordiska studier i lexikografi - Rapport från Konferensen om lexikografi i Norden
Number 11
Pages range 169-180
Note Publikace -5562357083007763499, nyní zapsaná do Versa jako 170670, byla už loni zapsaná do Caddisu jako http://verso.is.cuni.cz/fcgi/verso.fpl?fname=obd_m_det&idcko=148523&zaznamcislo=
How published print
Supported by 2005-2009 LC536 (Centrum komputační lingvistiky) 2009-2012 FP7-ICT-2007-3-231720 (EuroMatrix Plus) 2010-2013 GAP406/10/0875 (Komputační lingvistika: Explicitní popis jazyka a anotovaná data se zřetelem na češtinu) 2012-2018 GBP103/12/G084 (Centrum pro multi-modální interpretaci dat velkého rozsahu) 2010-2015 LM2010013 (LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat) 2009-2012 7E09003 (EuroMatrixPlus – Bringing Machine Translation for European Languages to the User) 2012-2016 PRVOUK P46 (Informatika)
Czech abstract Článek se věnuje problematice sémantické analýzy sloves a lexikálnímu popisu sloves se zřetelem k vytvoření trénovacích dat pro automatický sémantický analyzátor. Vychází z metody Corpus Pattern Analysis a zkoumá její uplatnění v lexikálním popisu sloves pro účely NLP pomocí měření mezianotátorské shody.
English abstract There is no objectively correct way to create a monolingual entry of a polysemous verb. By structuring a verb into readings, we impose our conception onto lexicon users, no matter how big a corpus we use in support. How do we make sure that our structuring is intelligible for others?
We are performing an experiment with the validation of the fully corpus-based Pattern Dictionary of English Verbs (Hanks & Pustejovsky, 2005), created according to the lexical theory Corpus Pattern Analysis (CPA). The lexicon is interlinked with a large corpus, in which several hundred randomly selected concordances of each processed verb are manually annotated with numbers of their corresponding lexicon readings (“patterns”). It would be interesting to prove (or falsify) the leading assumption of CPA that, given the patterns are based on a large corpus, individual introspection has been minimized and most people can agree on this particular semantic structuring. We have encoded the guidelines for assigning concordances to patterns and hired annotators to annotate random samples of verbs cotained in the lexicon. Apart from measuring the interannotator agreement, we analyze and adjudicate the disagreements. The outcome is offered to the lexicographer as feedback. The lexicographer revises his entries and the agreement can be measured againg on a different random sample to test whether or not the revision has brought an improvement of the interannotator agreement score. A high interannotator agreement suggests that lexicon users are likely to find a pattern corresponding to a random verb use of which they seek explanation. A low agreement score gives a warning that there are patterns missing or vague.
We focus on machine-learning applications, but we believe that this procedure is of interest even for quality management in human lexicography.
Specialization linguistics ("jazykověda")
Confidentiality default – not confidential
Open access no
Editor(s)* Birgit Eaker; Lennart Larsson; Anki Mattisson
ISBN* 978-91-85333-42-4
ISSN* 0803-9313
Address* Lund, Sweden
Month* May
Venue* Lund, Sweden
Publisher* Nordiska föreningen for lexikografi
Institution* Svenska Akademiens Ordbok
Organization* Svenska Akademiens Ordbok
Creator: Common Account
Created: 10/10/11 1:41 PM
Modifier: Common Account
Modified: 6/2/16 1:59 PM

