[ Skip to the content ]

Institute of Formal and Applied Linguistics

at Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic

[ Back to the navigation ]


Year 2016
Type data/software
Status published
Language English
Author(s) Mareček, David Yu, Zhiwei Zeman, Daniel Žabokrtský, Zdeněk
Title Deltacorpus
Czech title Deltacorpus
Publisher LINDAT/CLARIN digital library
Institution Univerzita Karlova v Praze
Publisher's city and country Praha, Czechia
Month March
Note Version 1.1 released 2016-06-20, id http://hdl.handle.net/11234/1-1743
How published online
URL http://hdl.handle.net/11234/1-1662
Supported by 2015-2017 GA15-10472S (Morfologicky a syntakticky anotované korpusy mnoha jazyků) 2012-2016 PRVOUK P46 (Informatika)
Czech abstract Texty ve 107 jazycích z korpusu W2C (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), první 1000000 tokenů pro každý jazyk, označkované delexikalizovaným taggerem popsaným v Yu et al. (2016, LREC, Portorož, Slovenia).
English abstract Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
Specialization linguistics ("jazykověda")
Confidentiality default – not confidential
Category data
Economic parameters The resource provides POS tagging solution for 107 languages. For most of them no such resource was available and creating a manually tagged corpus for one language may cost hundreds of thousands CZK.
Open access no
License approval required never
Fee required never
Identifier http://hdl.handle.net/11234/1-1662
Creator: Common Account
Created: 3/22/16 8:31 PM
Modifier: Almighty Admin
Modified: 2/25/17 10:07 PM

Content, Design & Functionality: ÚFAL, 2006–2016. Page generated: Wed Jul 18 03:07:07 CEST 2018

[ Back to the navigation ] [ Back to the content ]

100% OpenAIRE compliant