[ Skip to the content ]

Institute of Formal and Applied Linguistics

at Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic


[ Back to the navigation ]

Publication


Year 2010
Type in proceedings
Status published
Language English
Author(s) Bojar, Ondřej Šindlerová, Jana
Title Building a Bilingual ValLex Using Treebank Token Alignment: First Observations
Czech title Budování dvojjazyčného valenčního slovníku zarovnáváním uzlů v závislostním korpusu: První pozorování
Proceedings 2010: Valletta, Malta: LREC 2010: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010)
Pages range 304-309
How published online
Supported by 2009-2012 7E09003 (EuroMatrixPlus – Bringing Machine Translation for European Languages to the User) 2008-2011 GAUK 19008/2008 (Multilingvální zdroj valenčních vlastností sloves) 2008-2011 GAUK 52408/2008 (Diateze a transformace povrchového vyjádření valenčních doplnění) 2005-2010 MSM 0021620838 (Moderní metody, struktury a systémy informatiky) 2010 SVV 261 314 (Specifický vysokoškolský výzkum) 2009-2012 FP7-ICT-2007-3-231720 (EuroMatrix Plus)
Czech abstract V příspěvku studujeme potenciál a omezení nápadu budovat dvojjazyčný valenční slovník pomocí zarovnávání uzlů v paralelním závislostním korpusu.
English abstract In this paper we explore the potential and limitations of a concept of building a bilingual valency lexicon based on the alignment of nodes in a parallel treebank. Our aim is to build an electronic Czech<->English Valency Lexicon by collecting equivalences from bilingual treebank data and storing them in two already existing electronic valency lexicons, PDT-VALLEX and Engvallex. For this task a special annotation interface has been built upon the TrEd editor, allowing quick and easy collecting of frame equivalences in either of the source lexicons. The issues questioning the annotation practice encountered during the first months of annotation include limitations of technical character, theory-dependent limitations and limitations concerning the achievable degree of quality of human annotation. The issues of special interest for both linguists and MT specialists involved in the project include linguistically motivated non-balance between the frame equivalents, either in number or in type of valency participants. The first phases of annotation so far attest the assumption that there is a unique correspondence between the functors of the translation-equivalent frames. Also, hardly any linguistically significant non-balance between the frames has been found, which is partly promising considering the linguistic theory used and partly caused by little stylistic variety of the annotated corpus texts.
Specialization linguistics ("jazykověda")
Confidentiality default – not confidential
Open access no
WOS Code 000356879505013
Scopus EID Code 2-s2.0-85013646681
ISBN* 2-9517408-6-7
Address* Valletta, Malta
Month* May
Venue* Mediterranean Conference Centre
Publisher* European Language Resources Association
Creator: Common Account
Created: 9/24/10 3:33 PM
Modifier: Common Account
Modified: 4/4/18 10:55 AM
***

LREC2010publicSUBMITTED.pdfapplication/pdf
Content, Design & Functionality: ÚFAL, 2006–2016. Page generated: Fri Dec 14 10:04:41 CET 2018

[ Back to the navigation ] [ Back to the content ]

100% OpenAIRE compliant