[ Skip to the content ]

Institute of Formal and Applied Linguistics

at Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic


[ Back to the navigation ]

Publication


Year 2016
Type in proceedings without ISBN
Status published
Language English
Author(s) Saleh, Shadi Pecina, Pavel
Title Adapting SMT Query Translation Reranker to New Languages in Cross-Lingual Information Retrieval
Czech title Adaptace sytému pro přeuspořádávání hypotéz strojového překladu dotazů na nové jazyky pro vícejazyčné vyhledávání informací.
Proceedings 2016: Pisa, Italy: MEDIR 2016: Medical Information Retrieval (MedIR) Workshop at the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval
Pages range 1-4
How published online
URL http://medir2016.imag.fr/data/MEDIR_2016_paper_17.pdf
Supported by 2015-2017 H2020-ICT-2014-1-644753 (KConnect (Khresmoi Multilingual Medical Text Analysis, Search and Machine Translation Connected in a Thriving Data-Value Chain)) 2012-2016 PRVOUK P46 (Informatika) 2016 SVV 260 333 (Teoretické základy informatiky a výpočetní lingvistiky) 2012-2018 GBP103/12/G084 (Centrum pro multi-modální interpretaci dat velkého rozsahu)
Czech abstract Tento článek se zabývá metodou překladu vyhledávacích dotazů, která přeuspořádávává překladové hypotézy produkované překladovým systémem, a to s ohledem na kvalitu výsledného vyhledávání. V článku je prezentován způsob adaptace této metody na nové zdrojové jazyky.
English abstract We investigate adaptation of a supervised machine learning model for reranking of query translations to new languages in the context of cross-lingual information retrieval. The model is trained to rerank multiple translations produced by a statistical machine translation system and optimize retrieval quality. The model features do not depend on the source language and thus allow the model to be trained on query translations coming from multiple languages. In this paper, we explore how this affects the final retrieval quality. The experiments are conducted on medical-domain test collection in English and multilingual queries (in Czech, German, French) from the CLEF eHealth Lab series 2013--2015. We adapt our method to allow reranking of query translations for four new languages (Spanish, Hungarian, Polish, Swedish). The baseline approach, where a single model is trained for each source language on query translations from that language, is compared with a model co-trained on translations from the three original languages.
Specialization linguistics ("jazykověda")
Confidentiality default – not confidential
Open access no
Address* Pisa, Italy
Month* July
Publisher* ACM
Creator: Common Account
Created: 7/4/16 3:51 PM
Modifier: Almighty Admin
Modified: 2/25/17 10:07 PM
***

Content, Design & Functionality: ÚFAL, 2006–2016. Page generated: Wed Sep 20 18:38:24 CEST 2017

[ Back to the navigation ] [ Back to the content ]

100% OpenAIRE compliant