[ Skip to the content ]

Institute of Formal and Applied Linguistics

at Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic


[ Back to the navigation ]

Publication


Year 2015
Type in proceedings
Status published
Language English
Author(s) Baisa, Vít Bradbury, Jane Cinková, Silvie El Maarouf, Ismail Kilgarriff, Adam Popescu, Octavian
Title SemEval-2015 Task 15: A CPA dictionary-entry-building task
Czech title SemEval-2015 Úloha 15: generování hesla ve slovníku založeném na CPA
Proceedings 2015: Stroudsburg, PA, USA: SemEval 2015: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
Pages range 315-324
How published print
URL http://www.aclweb.org/anthology/S15-2053
Supported by 2015-2017 GA15-20031S (Odkaz Zelliga S. Harrise: více lingvistické informace pro distribuční lexikální analýzu angličtiny a češtiny) 2012-2016 PRVOUK P46 (Informatika)
Czech abstract Tato studie popisuje první úkol v rámci SemEvalu, který je zaměřen na užití NLP systémů pro automatické generování slovníkových hesel podle metody Corpus Pattern Analysis.
English abstract This paper describes the first SemEval task to explore the use of Natural Language Processing systems for building dictionary entries, in the framework of Corpus Pattern Analysis. CPA is a corpus-driven technique which provides tools and resources to identify and represent unambiguously the main semantic patterns in which words are used. Task 15 draws on the Pattern Dictionary of English Verbs (www.pdev.org.uk), for the targeted lexical entries, and on the British National Corpus for the input text. Dictionary entry building is split into three subtasks which all start from the same concordance sample: 1) CPA parsing, where arguments and their syntactic and semantic categories have to be identified, 2) CPA clustering, in which sentences with similar patterns have to be clustered and 3) CPA automatic lexicography where the structure of patterns have to be constructed automatically. Subtask 1 attracted 3 teams, though none could beat the baseline (rule-based system). Subtask 2 attracted 2 teams, one of which beat the baseline (majority-class classifier). Subtask 3 did not attract any participant. The task has produced a major semantic multidataset resource which includes data for 121 verbs and about 17,000 annotated sentences, and which is freely accessible.
Specialization computer science ("informatika")
Confidentiality default – not confidential
Open access no
Editor(s)* Preslav Nakov; Torsten Zesch; Daniel Cer; David Jurgens
ISBN* 978-1-941643-40-2
Address* Stroudsburg, PA, USA
Month* June
Venue* Sheraton Denver Downtown
Publisher* Association for Computational Linguistics
Creator: Common Account
Created: 10/8/15 10:12 AM
Modifier: Almighty Admin
Modified: 2/8/16 9:52 PM
***

SemEval 2015 CPApublicSemEval053.pdfapplication/download
Content, Design & Functionality: ÚFAL, 2006–2016. Page generated: Mon Nov 20 12:52:53 CET 2017

[ Back to the navigation ] [ Back to the content ]

100% OpenAIRE compliant