[ Skip to the content ]

Institute of Formal and Applied Linguistics

at Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic

[ Back to the navigation ]


Year 2006
Type in proceedings
Status published
Language English
Author(s) Džeroski, Sašo Erjavec, Tomaž Ledinek, Nina Pajas, Petr Žabokrtský, Zdeněk Žele, Andreja
Title Towards a Slovene Dependency Treebank
Czech title Slovinský závislostní korpus
Proceedings 2006: Genova, Italy: LREC 2006: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006)
Pages range 1388-1391
URL http://ufal.mff.cuni.cz/~zabokrtsky/papers/lrec06-sdt.pdf
Supported by 2005-2009 1ET101120503 (Integrace jazykových zdrojů za účelem extrakce informací z přirozených textů)
Czech abstract Článek popisuje první vydání Slovinského závislostního korpusu, který v současnosti obsahuje přibližně 2000 vět. Anotační postup je převzat z Pražského závislostní korpusu.
English abstract The paper presents the initial release of the Slovene Dependency Treebank, currently containing 2000 sentences or 30.000 words. Our approach to annotation is based on the Prague Dependency Treebank, which serves as an excellent model due to the similarity of the languages, the existence of a detailed annotation guide and an annotation editor. The initial treebank contains a portion of the MULTEXT-East parallel word-level annotated corpus, namely the first part of the Slovene twas first parsed automatically, to arrive at the initial analytic level dependency trees. These were then hand corrected using the treeranslation of Orwell’s “1984”. This corpus editor TrEd; simultaneously, the Czech annotation manual was modified for Slovene. The current version is available in XML/TEI, as well as derived formats, and has been used in a comparative evaluation using the MALT parser, and as one of the languages present in the CoNLL-X shared task on dependency parsing. The paper also discusses further work, in the first instance the composition of the corpus to be annotated next.
Specialization linguistics ("jazykověda")
Confidentiality default – not confidential
Open access no
ISBN* 2-9517408-2-4
Address* Genova, Italy
Month* May
Institution* ELRA
Creator: Common Account
Created: 12/13/06 3:20 PM
Modifier: Almighty Admin
Modified: 2/3/11 10:59 AM

Content, Design & Functionality: ÚFAL, 2006–2016. Page generated: Wed Jul 18 12:13:28 CEST 2018

[ Back to the navigation ] [ Back to the content ]

100% OpenAIRE compliant