This is the first workshop on "Deep Machine Translation". Its aim is to bring together researchers and students working on machine translation approaches and technology using "deep understanding" (not necessarily using Deep Neural Networks, as the name might suggest, but certainly not excluding them either). Adding "more linguistics" has long been considered as a possible way to boost quality of current, mainly (PB)SMT-based systems. However, there are many ways to do so, and it was felt a forum is needed where experience can be shared among people working on such systems.
Moreover, we welcome submissions on any aspects of deep language analysis, generation and natural language understanding, even if the connection to machine translation might be indirect.
Finally we welcome submissions on query translation and other aspects of multilingual Question Answering (such as an NLP interface to an "IT helpdesk") and/or Cross-lingual Information Retrieval.
We would like to attract submissions also from running or past EU projects on MT (QT21, HiML, QTLeap, TraMOOC, MMT, Khresmoi, KConnect, ...) to share their experience about pursuing higher quality in MT - even if they do not use linguistic aspects directly.
Papers on original and unpublished research are welcome on any of the topics listed above in general, and specifically on any of the following:
All deadlines are 23:59 AoE.
Several other NLP events will be taking place in Prague in the beginning of September (in the same building), so you can easily take part in several of them within the same trip:
Templates and Submission page (also for final manuscripts): templates; submission page
The maximum submission length is 8 pages (A4), plus two extra pages for references, following a one-column ACL-like format, as specified below.
Papers shall be submitted in English. As the reviewing will be double-blind, papers must be anonymized with regard to the authors and/or their institution (no author-identifying information on the title page nor anywhere in the paper), including referencing style as usual. Authors should also ensure that identifying meta-information is removed from files submitted for review. Papers must conform to official DMTW 2015 style guidelines (see below). Submission and reviewing will be managed online by the EasyChair system. The only accepted format for submitted papers is in Adobe's PDF.
Submissions must be uploaded on the EasyChair system by the submission deadlines; submissions after that time will not be reviewed.
Papers that are being submitted in parallel to other conferences or workshops must indicate this on the title page. Papers that contain significant overlap with previously published work must also signal that.
Papers will be published online by the time of the Workshop, assigned an ISBN as regular proceedings published by the UFAL / Charles University publishing house, and listed in the ACL Anthology.
Mode of presentation will be decided by the Program Committee based on the submitted papers - either as an oral presentation or as a poster, based on suitability for the given presentation mode, not quality - all papers will be given the same space in the proceedings, and there will be no distinction in the proceedings between research papers presented orally vs. as posters. Papers will be reviewed by at least three members of the Program Committee.
The venue's address:
Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague
Malostranske nam. 25
11800 Prague 1
Czech Republic
(annotated Google Map)
Detailed information about the venue, transport, accommodation, restaurants, visas, etc.
Registration is open! To register, please fill in the registration form. There is no registration fee, but registration is required for organizational reasons.
We are happy to announce that Prof. Dr. Christian Chiarcos will be an invited speaker at the workshop. The talk will be given on Friday, Sep. 4, 9-10am.
Abstract:
Linguistic Linked Open Data: What’s in for Machine Translation?
During the past years, the notion of Linked (Open) Data has gained considerable reception in different communities working with language resources, ranging from academic and applied linguistics over lexicography to natural language processing and information technology. In this context, the Open Linguistics Working Group of the Open Knowledge Foundation (OWLG, http://linguistics.okfn.org/), founded in 2010 in Berlin, Germany, is playing an important integrative role, by reaching out to a broad band-width of disciplines, by facilitating interdisciplinary information exchange through meetings, workshops, datathons and joint publications, but most noteably by introducing and maintaining the Linguistic Linked Open Data (LLOD) cloud diagram. Being deeply involved in this emerging community at the intersection between the different disciplines mentioned above, I will introduce the basic concepts of Linked Open Data for linguistics/NLP, summarize motivations and history of Linguistic Linked Open Data so far. Since creating the first instantiation of the LLOD cloud diagram in 2012, LLOD has attracted a lot of activity, we have reached an agreement on vocabularies for many aspects of language resources and the number of resources included is continuously on the rise. This growth is documented, for example, by declaring LLOD "the new hot topic in our (= language resource) community" (Nicoletta Calzolari, LREC-2014 closing session). But with substantial amounts of data being available, the focus of activity in the LLOD community is slowly shifting from resource creation to applications of Linguistic Linked Open Data. The primary promise of providing open, but heterogeneously structured and scattered language resources in a more interoperable way has been fulfilled, and it facilitates using and re-using existing language resources in novel contexts. Beyond this, innovative *LLOD-based* applications for common problems in Natural Language Processing, Digital Humanities and linguistics are on the horizon. The second part of the talk will give a glimpse on these prospects by discussing use cases and potential applications of LLOD for (Deep) Machine Translation.
The Proceedings of the 1st Deep Machine Translation Workshop are available in the ACL Anthology. They are published by Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Praha, Czech Republic, under the ISBN 978-80-904571-7-1, with Jan Hajič and António Branco as the editors.
Program in PDF for printing: DMTW_program.pdf
Thursday, Sep. 3, 2015 | |
14:15-15:15 | Jan Hajic, Antonio Branco: Opening and introduction to Deep MT Workshop (slides) |
15:15-16:00 | Coffee Break |
16:00-16:30 | Deep-syntax TectoMT for English-Spanish MT (paper, slides) |
16:00-17:30 | Session 1 - Chair: Jan Hajic |
16:30-17:00 | Translation Model Interpolation for Domain Adaptation in TectoMT (paper, slides) |
17:00-17:30 | Factored models for Deep Machine Translation (paper, slides) |
18:30-21:00 | Social dinner: Konirna, restaurant U Vladare, Maltezske nam. 10, Prague 1 (map) |
Friday, Sep. 4, 2015 | |
9:00-10:30 | Session 2 - Chair: Antonio Branco |
9:00-10:00 |
Invited talk: Christian Chiarcos: Linguistic Linked Open Data: What’s in for Machine Translation? (abstract, slides) |
10:00-10:30 | First Steps in Using Word Senses as Contextual Features in Maxent Models for Machine Translation (paper) |
10:30-11:00 | Coffee Break |
11:00-12:30 | Session 3 - Chair: Petya Osenova |
11:00-11:30 | Lexical choice in Abstract Dependency Trees (paper) |
11:30-12:00 | Splitting Compounds by Semantic Analogy (paper) |
12:00-12:30 | Towards Deeper MT - A Hybrid System for German (paper, slides) |
12:30-14:00 | Lunch |
14:00-15:30 | Session 4 - Chair: Dieke Oele |
14:00-14:30 | Evaluating a Machine Translation System in a Technical Support Scenario (paper, slides) |
14:30-15:00 | Sanja Štajner, João Rodrigues, Luis Gomes and António Branco: Machine Translation for Multilingual Troubleshooting in the IT Domain: A Comparison of Different Strategies (paper, slides) |
15:00-15:30 | Large Scale Translation Quality Estimation (paper) |
15:30-16:00 | Coffee Break |
16:00-17:00 | Session 5 - Chair: Aljoscha Burchardt |
|
|
16:00-16:30 | Delimiting Morphosyntactic Search Space with Source-Side Reordering Models (paper) |
16:30-17:00 | Modelling the Adjunct/Argument Distinction in Hierarchical Phrase-Based SMT (paper, slides) |
17:00-17:30 | General Discussion, closing remarks |
The lunch on Sep. 4 is catered in the cafeteria on the -1 floor (please take the elevators).
The workshop is organized with support of the QTLeap FP7 project.