[ Skip to the content ]

Institute of Formal and Applied Linguistics

at Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic


[ Back to the navigation ]

Publication


Year 2016
Type in proceedings
Status published
Language English
Author(s) Bojar, Ondřej Dušek, Ondřej Kocmi, Tom Libovický, Jindřich Novák, Michal Popel, Martin Sudarikov, Roman Variš, Dušan
Date 15.9.2016
Title CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered
Czech title CzEng 1.6: Větší česko-anglický paralelní korpus
Proceedings 2016: Cham / Heidelberg / New York / Dordrecht / London: TSD 2016: Text, Speech, and Dialogue: 19th International Conference, TSD 2016
Pages range 231-238
How published print
URL http://link.springer.com/chapter/10.1007/978-3-319-45510-5_27
Supported by 2015-2018 H2020-ICT-2014-1-645452 (QT21: Quality Translation 21) 2015-2018 H2020-ICT-2014-1-644402 (Himl (Health in my Language)) 2013-2016 FP7-ICT-2013-10-610516 (QTLeap) 2015-2017 GA15-10472S (Morfologicky a syntakticky anotované korpusy mnoha jazyků) 2016-2018 GA16-05394S (Structure of coreferential chains in parallel language data) 2015-2017 GAUK 3389/2015 (Mezijazyčné techniky pro vyšetřování koreference) 2016 SVV 260 333 (Teoretické základy informatiky a výpočetní lingvistiky) 2012-2016 PRVOUK P46 (Informatika) 2016-2019 LM2015071 (LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat)
Czech abstract Představujeme nové vydání česko-anglického paralelního korpusu CzEng. CzEng 1.6 obsahuje kolem půl miliardy slov v každém z jazyků.
English abstract We present a new release of the Czech-English parallel corpus CzEng. CzEng 1.6 consists of about 0.5 billion words (“gigaword”) in each language. The corpus is equipped with automatic annotation at a deep syntactic level of representation and alternatively in Universal Dependencies. Additionally, we release the complete annotation pipeline as a virtual machine in the Docker virtualization toolkit.
Specialization linguistics ("jazykověda")
Confidentiality default – not confidential
Open access no
Scopus EID Code 2-s2.0-85008392180
DOI 10.1007/978-3-319-45510-5_27
Editor(s)* Petr Sojka; Aleš Horák; Ivan Kopeček; Karel Pala
ISBN* 978-3-319-45509-9
ISSN* 0302-9743
Address* Cham / Heidelberg / New York / Dordrecht / London
Month* September
Venue* Hotel Continental
Publisher* Springer International Publishing
Institution* Masaryk University
Journal* Lecture Notes in Computer Science
Creator: Common Account
Created: 9/6/16 2:49 PM
Modifier: Common Account
Modified: 5/23/18 3:32 PM
***

Content, Design & Functionality: ÚFAL, 2006–2016. Page generated: Wed Jul 18 04:58:17 CEST 2018

[ Back to the navigation ] [ Back to the content ]

100% OpenAIRE compliant