[ Skip to the content ]

Institute of Formal and Applied Linguistics

at Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic


[ Back to the navigation ]

Publication


Year 2006
Type article
Status published
Language English
Author(s) Cinková, Silvie Pomikálek, Jan
Title LEMPAS: A Make-Do Lemmatizer for the Swedish PAROLE-Corpus
Czech title LEMPAS: Provizorní lematizátor pro švédský korpus PAROLE
Journal The Prague Bulletin of Mathematical Linguistics
Publisher's city and country Prague, Czech Republic
Number 86
Pages range 47-54
Month December
Supported by 2005-2009 LC536 (Centrum komputační lingvistiky) 2004-2006 GAUK 489/2004 2004-2008 1ET101120413 (Data a nástroje pro informační systémy) 2005-2009 1ET201120505 (Od jazyka ke znalostem a sémantickému webu)
Czech abstract LEMPAS je pravidlový lematizátor pro švédský morfosyntakticky tagovaný korpus PAROLE. Vznikl jako jako provizorium při přípravě PAROLE pro kolokační analýzu prováděnou nástrojem Sketch Engine. Přesto dává poměrně uspokojivé výsledky v lematizaci substantiv, sloves a částečně i adjektiv, jak jsme zjistili jeho testováním na ručně lematizovaném korpusu SUC.
English abstract LEMPAS, the lemmatizer for the Swedish corpus PAROLE, came into existence as a by-product of running the Sketch Engine (Kilgarriff et al., 2004) on Swedish, since many of the desirable features of the Sketch Engine, such as building word sketches, are only available for lemmatized corpora. We did not have access to any Swedish lexical sources and the time allowed for the lemmatization was very limited. Consequently, the lemmatizer had no great design ambitions. Initially, we were only attempting to bring related forms together under a prelemma, using general rules, and avoiding explicit lists where possible. When the initial rules gave surprisingly good lemmatizations of nouns, verbs and adjectives, we decided to transform the pre-lemmas into real lemmas. The improved lemmatizer made a very good impression. We have tested the program on the manually lemmatized Stockholm-Umeå Corpus (SUC), and have analyzed the results.
Specialization linguistics ("jazykověda")
Confidentiality default – not confidential
Open access no
ISSN* 0032-6585
Institution* Univerzita Karlova v Praze
Creator: Common Account
Created: 12/8/06 7:38 PM
Modifier: Almighty Admin
Modified: 3/14/11 1:51 PM
***

LEMPAS: A Make-Do Lemmatizer for the Swedish PAROL...publiccinkova_final.pdfapplication/pdf
Content, Design & Functionality: ÚFAL, 2006–2016. Page generated: Wed Nov 22 08:35:00 CET 2017

[ Back to the navigation ] [ Back to the content ]

100% OpenAIRE compliant