Deep Machine Translation Workshop 2015

This is the first workshop on "Deep Machine Translation". Its aim is to bring together researchers and students working on machine translation approaches and technology using "deep understanding" (not necessarily using Deep Neural Networks, as the name might suggest, but certainly not excluding them either). Adding "more linguistics" has long been considered as a possible way to boost quality of current, mainly (PB)SMT-based systems. However, there are many ways to do so, and it was felt a forum is needed where experience can be shared among people working on such systems.

Moreover, we welcome submissions on any aspects of deep language analysis, generation and natural language understanding, even if the connection to machine translation might be indirect.

Finally we welcome submissions on query translation and other aspects of multilingual Question Answering (such as an NLP interface to an "IT helpdesk") and/or Cross-lingual Information Retrieval.

We would like to attract submissions also from running or past EU projects on MT (QT21, HiML, QTLeap, TraMOOC, MMT, Khresmoi, KConnect, ...) to share their experience about pursuing higher quality in MT - even if they do not use linguistic aspects directly.

Papers on original and unpublished research are welcome on any of the topics listed above in general, and specifically on any of the following:

  • General approaches to the use of linguistic knowledge for Machine Translation
  • Semantics for Machine Translation 
  • Combination of statistical and "manual" approaches to Machine Translation, hybrid systems
  • Innovative use of manually built lexical resources in Machine Translation (monolingual, bilingual)
  • Deep linguistic representation of meaning / semantics, including semantic graphs, logical representation, temporal and spatial representation and grounding
  • Deep linguistic analysis and generation
  • Joint linguistic and distributional modeling (analysis, generation, transfer)
  • Analysis, generation and transfer using graph-based meaning representation 
  • Incorporating coreference, named entity recognition, words sense disambiguation, or any other linguistically motivated features into the MT chain
  • Multilingual question-answering and CLIR approaches, including specific methods for query translation and query matching in a multilingual setting
  • Evaluation methods for standard text translation, query translation, and CLIR


  • CFP released: June 26, 2015
  • Registration open: July 17, 2015
  • Submission deadline: July 25, 2015 (updated!)
  • Announcement of acceptance: August 16, 2015 (updated)
  • Camera Ready due: August 27, 2015
  • Workshop dates: September 3-4, 2015

All deadlines are 23:59 AoE.

Collocated events

Several other NLP events will be taking place in Prague in the beginning of September (in the same building), so you can easily take part in several of them within the same trip:

  • YRRSDS, August 31 - September 1 (Workshop on Spoken Dialogue Systems for PhDs, PostDocs & New Researchers)
  • SIGdial, September 2-4 (Meeting on Discourse and Dialogue)
  • MTM, September 7-12 (Machine Translation Marathon)

Instructions for authors

The maximum submission length is 8 pages (A4), plus two extra pages for references, following a one-column ACL-like format, as specified below.

Papers shall be submitted in English. As the reviewing will be double-blind, papers must be anonymized with regard to the authors and/or their institution (no author-identifying information on the title page nor anywhere in the paper), including referencing style as usual. Authors should also ensure that identifying meta-information is removed from files submitted for review. Papers must conform to official DMTW 2015 style guidelines (see below). Submission and reviewing will be managed online by the EasyChair system. The only accepted format for submitted papers is in Adobe's PDF.

Submissions must be uploaded on the EasyChair system by the submission deadlines; submissions after that time will not be reviewed.

Papers that are being submitted in parallel to other conferences or workshops must indicate this on the title page. Papers that contain significant overlap with previously published work must also signal that.

Papers will be published online by the time of the Workshop, assigned an ISBN as regular proceedings published by the UFAL / Charles University publishing house, and listed in the ACL Anthology.

Mode of presentation will be decided by the Program Committee based on the submitted papers - either as an oral presentation or as a poster, based on suitability for the given presentation mode, not quality - all papers will be given the same space in the proceedings, and there will be no distinction in the proceedings between research papers presented orally vs. as posters. Papers will be reviewed by at least three members of the Program Committee.

Venue, Logistics, Food, Visa and other information 

The venue's address:

Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague
Malostranske nam. 25
11800 Prague 1
Czech Republic
(annotated Google Map)

Invited talk by Christian Chiarcos

We are happy to announce that Prof. Dr. Christian Chiarcos will be an invited speaker at the workshop. The talk will be given on Friday, Sep. 4, 9-10am.


Linguistic Linked Open Data: What’s in for Machine Translation?

During the past years, the notion of Linked (Open) Data has gained considerable reception in different communities working with language resources, ranging from academic and applied linguistics over lexicography to natural language processing and information technology. In this context, the Open Linguistics Working Group of the Open Knowledge Foundation (OWLG,, founded in 2010 in Berlin, Germany, is playing an important integrative role, by reaching out to a broad band-width of disciplines, by facilitating interdisciplinary information exchange through meetings, workshops, datathons and joint publications, but most noteably by introducing and maintaining the Linguistic Linked Open Data (LLOD) cloud diagram. Being deeply involved in this emerging community at the intersection between the different disciplines mentioned above, I will introduce the basic concepts of Linked Open Data for linguistics/NLP, summarize motivations and history of Linguistic Linked Open Data so far. Since creating the first instantiation of the LLOD cloud diagram in 2012, LLOD has attracted a lot of activity, we have reached an agreement on vocabularies for many aspects of language resources and the number of resources included is continuously on the rise. This growth is documented, for example, by declaring LLOD "the new hot topic in our (= language resource) community" (Nicoletta Calzolari, LREC-2014 closing session). But with substantial amounts of data being available, the focus of activity in the LLOD community is slowly shifting from resource creation to applications of Linguistic Linked Open Data. The primary promise of providing open, but heterogeneously structured and scattered language resources in a more interoperable way has been fulfilled, and it facilitates using and re-using existing language resources in novel contexts. Beyond this, innovative *LLOD-based* applications for common problems in Natural Language Processing, Digital Humanities and linguistics are on the horizon. The second part of the talk will give a glimpse on these prospects by discussing use cases and potential applications of LLOD for (Deep) Machine Translation.


The Proceedings of the 1st Deep Machine Translation Workshop are available in the ACL Anthology. They are published by Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Praha, Czech Republic, under the ISBN 978-80-904571-7-1, with Jan Hajič and António Branco as the editors.


Program in PDF for printing: DMTW_program.pdf

Thursday, Sep. 3, 2015
14:15-15:15 Jan Hajic, Antonio Branco: Opening and introduction to Deep MT Workshop (slides)
15:15-16:00 Coffee Break
16:00-16:30 Gorka Labaka, Oneka Jauregi, Arantza Díaz de Ilarraza, Michael Ustaszewski, Nora Aranberri and Eneko Agirre: Deep-syntax TectoMT for English-Spanish MT (paperslides)
16:00-17:30 Session 1 - Chair: Jan Hajic
16:30-17:00 Rudolf RosaOndrej DusekMichal Novák and Martin PopelTranslation Model Interpolation for Domain Adaptation in TectoMT (paperslides)
17:00-17:30 Kiril Simov, Iliana Simova, Velislava Todorova and Petya OsenovaFactored models for Deep Machine Translation (paperslides)
18:30-21:00 Social dinner: Konirna, restaurant U Vladare, Maltezske nam. 10, Prague 1 (map)
Friday, Sep. 4, 2015
9:00-10:30 Session 2 - Chair: Antonio Branco
9:00-10:00 Invited talk:
Christian Chiarcos: Linguistic Linked Open Data: What’s in for Machine Translation? (abstractslides)
10:00-10:30 Steven Neale, Luís Gomes and António Branco: First Steps in Using Word Senses as Contextual Features in Maxent Models for Machine Translation (paper)
10:30-11:00 Coffee Break
11:00-12:30 Session 3 - Chair: Petya Osenova
11:00-11:30 Dieke Oele and Gertjan van Noord: Lexical choice in Abstract Dependency Trees (paper)
11:30-12:00 Joachim Daiber, Lautaro Quiroz, Roger Wechsler and Stella Frank: Splitting Compounds by Semantic Analogy (paper)
12:00-12:30 Eleftherios Avramidis, Aljoscha Burchardt, Maja Popovic and Hans Uszkoreit: Towards Deeper MT - A Hybrid System for German (paperslides)
12:30-14:00 Lunch 
14:00-15:30 Session 4 - Chair: Dieke Oele
14:00-14:30 Rosa Del Gaudio, Aljoscha Burchardt and Arle Lommel: Evaluating a Machine Translation System in a Technical Support Scenario (paper, slides)
14:30-15:00 Sanja Štajner, João Rodrigues, Luis Gomes and António Branco: Machine Translation for Multilingual Troubleshooting in the IT Domain: A Comparison of Different Strategies (paperslides)
15:00-15:30 Miguel Angel Rios Gaona and Serge Sharoff: Large Scale Translation Quality Estimation (paper)
15:30-16:00 Coffee Break
16:00-17:00 Session 5 - Chair: Aljoscha Burchardt
16:00-16:30 Parameswari Krishnamurthy: Development of Telugu-Tamil Transfer-Based Machine Translation system: With Special reference to Divergence Index (paper) (CANCELLED due to visa problem, will be presented at MT Marathon)
16:00-16:30 Joachim Daiber, Khalil Sima'an: Delimiting Morphosyntactic Search Space with Source-Side Reordering Models (paper)
16:30-17:00 Sophie Arnoult and Khalil Sima'An: Modelling the Adjunct/Argument Distinction in Hierarchical Phrase-Based SMT (paperslides)
17:00-17:30 General Discussion, closing remarks

The lunch on Sep. 4 is catered in the cafeteria on the -1 floor (please take the elevators).

The workshop is organized with support of the QTLeap FP7 project.