V oblasti počítačového zpracování je čeština rozvinutý jazyk s bohatými datovými i nástrojovými zdroji. Bohužel však stále zaostává v kvalitě strojového překladu. Dosud probíhající výzkum se zaměřoval převážně na překlad pomocí hloubkové lingvistické analýzy. Frázový překlad, který je ve světě momentálně nejúspěšnější, zůstával stranou, částečně i proto, že ho kvůli odlišným vlastnostem nelze na tento jazyk aplikovat přímočaře bez příslušné adaptace. Cílem tohoto projektu je adaptovat existující frázový překladač na překlad z angličtiny a dvou dalších jazyků a dosáhnout přijatelné kvality překladu. Samostatně studován bude překlad pojmenovaných entit. Bude vytvořena internetová aplikace pro překlad webových stránek. V rámci projektu bude také pořízeno několik referenčních překladů testovacích textů pro účely ověřování úspěšnosti automatického překladu.

Centrální evidence projektů

Řešitelský tým:

Daniel Zeman (řešitel)
Ondřej Bojar (odborný spolupracovník)

Publications

Ondřej Bojar, Daniel Zeman (2014): Czech Machine Translation in the project CzechMATE. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 101, pp. 71-96 (pdf, local PDF, local PDF, bibtex)
Daniel Zeman, Ondřej Dušek, David Mareček, Martin Popel, Loganathan Ramasamy, Jan Štěpánek, Zdeněk Žabokrtský, Jan Hajič (2014): HamleDT: Harmonized Multi-Language Dependency Treebank. In: Language Resources and Evaluation, ISSN 1574-020X, vol. 48, no. 4, pp. 601-637 (url, local PDF, bibtex)
Jan Berka, Ondřej Bojar, Mark Fishel, Maja Popović, Daniel Zeman (2013): Tools for Machine Translation Quality Inspection (technical report). In: (url, local PDF, local PDF, bibtex)
Karel Bílek, Daniel Zeman (2013): CUni Multilingual Matrix in the WMT 2013 Shared Task. In: Proceedings of the Eight Workshop on Statistical Machine Translation, pp. 85-91, Association for Computational Linguistics, Sofija, Bulgaria, ISBN 978-1-937284-57-2 (pdf, local PDF, local PDF, bibtex)
Ondřej Bojar, Matouš Macháček, Aleš Tamchyna, Daniel Zeman (2013): Scratching the Surface of Possible Translations. In: Text, Speech and Dialogue: 16th International Conference, TSD 2013. Proceedings, Lecture Notes in Computer Science, ISSN 0302-9743, 8082, pp. 465-474, Springer Verlag, Berlin / Heidelberg, ISBN 978-3-642-40584-6 (local PDF, bibtex)
Ondřej Bojar, Rudolf Rosa, Aleš Tamchyna (2013): Chimera – Three Heads for English-to-Czech Translation. In: Proceedings of the Eight Workshop on Statistical Machine Translation, pp. 92-98, Association for Computational Linguistics, Sofija, Bulgaria, ISBN 978-1-937284-57-2 (url, local PDF, local PDF, bibtex)
Ondřej Bojar, Aleš Tamchyna (2013): The Design of Eman, an Experiment Manager. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 99, pp. 39-58 (pdf, bibtex)
Michal Kalina, Ondřej Bojar (2013): Jak překladač z Matfyzu porazil Google. In: Hospodářské noviny IHNED, ISSN 1213-7693 (url, bibtex)
Matouš Macháček, Ondřej Bojar (2013): Results of the WMT13 Metrics Shared Task. In: Proceedings of the Eight Workshop on Statistical Machine Translation, pp. 45-51, Association for Computational Linguistics, Sofija, Bulgaria, ISBN 978-1-937284-57-2 (pdf, bibtex)
Martin Popel, David Mareček, Jan Štěpánek, Daniel Zeman, Zdeněk Žabokrtský (2013): Coordination Structures in Dependency Treebanks. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 517-527, Association for Computational Linguistics, Sofija, Bulgaria, ISBN 978-1-937284-50-3 (pdf, local PDF, local PDF, local PDF, bibtex)
Aleš Tamchyna, Ondřej Bojar (2013): No Free Lunch in Factored Phrase-Based Machine Translation. In: Lecture Notes in Computer Science, ISSN 0302-9743, 7817, pp. 210-223 (url, bibtex)
Jan Berka, Ondřej Bojar, Mark Fishel, Maja Popović, Daniel Zeman (2012): Automatic MT Error Analysis: Hjerson Helping Addicter. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 2158-2163, European Language Resources Association, İstanbul, Turkey, ISBN 978-2-9517408-7-7 (url, local PDF, bibtex)
Ondřej Bojar, Bushra Jawaid, Amir Kamran (2012): Probes in a Taxonomy of Factored Phrase-Based Models. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, pp. 253-260, Association for Computational Linguistics, Montréal, Canada, ISBN 978-1-937284-20-6 (url, local PDF, bibtex)
Ondřej Bojar, Dekai Wu (2012): Towards a Predicate-Argument Evaluation for MT. In: Proceedings of Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-6), ACL, pp. 30-38, Association for Computational Linguistics, Jeju, Korea, ISBN 978-1-937284-38-1 (url, local PDF, bibtex)
Mark Fishel, Ondřej Bojar, Maja Popović (2012): Terra: a Collection of Translation Error-Annotated Corpora. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 7-14, European Language Resources Association, İstanbul, Turkey, ISBN 978-2-9517408-7-7 (local PDF, bibtex)
Aleš Tamchyna, Petra Galuščáková, Amir Kamran, Miloš Stanojević, Ondřej Bojar (2012): Selecting Data for English-to-Czech Machine Translation. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, pp. 374-381, Association for Computational Linguistics, Montréal, Canada, ISBN 978-1-937284-20-6 (url, local PDF, bibtex)
Daniel Zeman (2012): CUNI: Feature Selection and Error Analysis of a Transition-Based Parser. In: Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012), pp. 143-148, The COLING 2012 Organizing Committee, Mumbai, India (url, local PDF, bibtex)
Daniel Zeman (2012): Data Issues of the Multilingual Translation Matrix. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, pp. 395-400, Association for Computational Linguistics, Montréal, Canada, ISBN 978-1-937284-20-6 (pdf, bibtex)
Daniel Zeman, David Mareček, Martin Popel, Loganathan Ramasamy, Jan Štěpánek, Zdeněk Žabokrtský, Jan Hajič (2012): HamleDT: To Parse or Not to Parse?. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 2735-2741, European Language Resources Association, İstanbul, Turkey, ISBN 978-2-9517408-7-7 (url, local PDF, local PDF, bibtex)
Eduard Bejček, Pavel Straňák, Daniel Zeman (2011): Influence of Treebank Design on Representation of Multiword Expressions. In: Lecture Notes in Computer Science, ISSN 0302-9743, 6608, pp. 1-14 (url, local PDF, bibtex)
Jan Berka, Martin Černý, Ondřej Bojar (2011): Quiz-Based Evaluation of Machine Translation. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 95, pp. 77-86 (pdf, local PDF, bibtex)
Ondřej Bojar (2011): Analyzing Error Types in English-Czech Machine Translation. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 95, pp. 63-76 (pdf, local PDF, bibtex)
Mark Fishel, Ondřej Bojar, Daniel Zeman, Jan Berka (2011): Automatic Translation Error Analysis. In: Lecture Notes in Computer Science, ISSN 0302-9743, 6836, pp. 72-79 (url, local PDF, local ODP, local PDF, bibtex)
Ondřej Hálek, Rudolf Rosa, Aleš Tamchyna, Ondřej Bojar (2011): Named Entities from Wikipedia for Machine Translation. In: Information Technologies – Applications and Theory, pp. 23-30, Univerzita Pavla Jozefa Šafárika v Košiciach, Košice, Slovakia, ISBN 978-80-89557-02-8 (local PDF, local PDF, local PDF, bibtex)
Bushra Jawaid, Daniel Zeman (2011): Word-Order Issues in English-to-Urdu Statistical Machine Translation. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 95, pp. 87-106 (url, bibtex)
Matouš Macháček, Ondřej Bojar (2011): Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 92-98, Association for Computational Linguistics, Edinburgh, UK, ISBN 978-1-937284-12-1 (url, local PDF, local PPT, local PDF, bibtex)
Daniel Zeman (2011): Hierarchical Phrase-Based MT at the Charles University for the WMT 2011 Shared Task. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 496-500, Association for Computational Linguistics, Edinburgh, UK, ISBN 978-1-937284-12-1 (url, local PDF, bibtex)
Daniel Zeman, Mark Fishel, Jan Berka, Ondřej Bojar (2011): Addicter: What Is Wrong with My Translations?. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 96, pp. 79-88 (pdf, local PDF, local PDF, bibtex)

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

CZECHMATE

Čeština ve věku strojového překladu

Publications