Machine Translation Summit XVI

Machine Translation Summit XVI http://www.mtsummit2017.org Proceedings of MT Summit XVI Vol.1 Research Track Sadao Kurohashi, Pascale Fung, Editors MT Summit XVI September 18 – 22, 2017 -- Nagoya, Aichi, Japan Proceedings of MT Summit XVI, Vol. 1: MT Research Track Sadao Kurohashi (Kyoto University) & Pascale Fung (Hong Kong University of Science and Technology), Eds. Co-hosted by International Asia-Pacific Association for Association for Graduate School of Machine Translation Machine Translation Informatics, Nagoya University http://www.eamt.org/iamt.php http://www.aamt.info http://www.is.nagoya-u.ac.jp/index_en.html ©2017 The Authors. These articles are licensed under a Creative Commons 3.0 license, no derivative works, attribution, CC-BY-ND. Research Program Co-chairs Sadao Kurohashi Kyoto University Pascale Fung Hong Kong University of Science and Technology Research Program Committee Yuki Arase Osaka University Laurent Besacier LIG Houda Bouamor Carnegie Mellon University Hailong Cao Harbin Institute of Technology Michael Carl Copenhagen Business School Daniel Cer Google Boxing Chen NRC Colin Cherry NRC Chenhui Chu Osaka University Marta R. Costa-jussa Universitat Politècnica de Catalunya Steve DeNeefe SDL Language Weaver Markus Dreyer Amazon Kevin Duh Johns Hopkins University Andreas Eisele DGT, European Commission Christian Federmann Microsoft Research Minwei Feng IBM Watson Group Mikel L. Forcada Universitat d'Alacant George Foster National Research Council Isao Goto NHK Spence Green Lilt Inc Eva Hasler SDL Yifan He Bosch Research and Technology Center Philipp Koehn Johns Hopkins University Roland Kuhn National Research Council of Canada Shankar Kumar Google Anoop Kunchukuttan IIT Bombay Jong-Hyeok Lee Pohang University of Science & Technology Gregor Leusch eBay William Lewis Microsoft Research Mu Li Microsoft Research Lemao Liu Tencent AI Lab Qun Liu Dublin City University Klaus Macherey Google Wolfgang Macherey Google Saab Mansour Apple Daniel Marcu Amazon Jonathan May USC Information Sciences Institute Arul Menezes Microsoft Research Haitao Mi Alipay US Graham Neubig Carnegie Mellon University Matt Post Johns Hopkins University Fatiha Sadat UQAM Michel Simard NRC Katsuhito Sudoh Nara Institute of Science and Technology Christoph Tillmann IBM Research Masao Utiyama NICT Taro Watanabe Google Andy Way ADAPT, Dublin City University Deyi Xiong Soochow University Francois Yvon LIMSI/CNRS Rabih Zbib Raytheon BBN Technologies Jiajun Zhang Institute of Automation Chinese Academy of Sciences Bing Zhao SRI International Contents Page 1 Empirical Study of Dropout Scheme for Neural Machine Translation Xiaolin Wang, Masao Utiyama and Eiichiro Sumita 15 A Target Attention Model for Neural Machine Translation Hideya Mino, Andrew Finch and Eiichiro Sumita 27 Neural Pre-Translation for Hybrid Machine Translation Jinhua Du and Andy Way 41 Neural and Statistical Methods for Leveraging Meta-information in Machine Translation Shahram Khadivi, Patrick Wilken, Leonard Dahlmann and Evgeny Matusov 55 Translation Quality and Productivity: A Study on Rich Morphology Languages Lucia Specia, Kim Harris, Frédéric Blain, Aljoscha Burchardt, Viviven Macketanz, Inguna Skadiņa, Matteo Negri and Marco Turchi 72 The Microsoft Speech Language Translation (MSLT) Corpus for Chinese and Japanese: Conversational Test data for Machine Translation and Speech Recognition Christian Federmann and William Lewis 86 Paying Attention to Multi-Word Expressions in Neural Machine Translation Matīss Rikters and Ondřej Bojar 96 Enabling Multi-Source Neural Machine Translation By Concatenating Source Sentences In Multiple Languages Raj Dabre, Fabien Cromieres and Sadao Kurohashi 108 Learning an Interactive Attention Policy for Neural Machine Translation Samee Ibraheem, Nicholas Altieri and John DeNero 116 A Comparative Quality Evaluation of PBSMT and NMT using Professional Translators Sheila Castilho, Joss Moorkens, Federico Gaspari, Rico Sennrich, Vilelmini Sosoni, Panayota Georgakopoulou, Pintu Lohar, Andy Way, Antonio Valerio Miceli Barone and Maria Gialama 132 One-parameter models for sentence-level post-editing effort estimation Mikel L. Forcada, Miquel Esplà-Gomis, Felipe Sánchez-Martínez and Lucia Specia 144 A Minimal Cognitive Model for Translating and Post-editing Moritz Schaeffer and Michael Carl 156 Fine-Tuning for Neural Machine Translation with Limited Degradation across In- and Out-of-Domain Data Praveen Dakwale and Christof Monz 170 Exploiting Relative Frequencies for Data Selection Thierry Etchegoyhen, Andoni Azpeitia and Eva Martinez Garcia 185 Low Resourced Machine Translation via Morpho-syntactic Modeling: The Case of Dialectal Arabic Alexander Erdmann, Nizar Habash, Dima Taji and Houda Bouamor 201 Elastic-substitution decoding for Hierarchical SMT: efficiency, richer search and double labels Gideon Maillette de Buy Wenniger, Khalil Simaan and Andy Way 216 Development of a classifiers/quantifiers dictionary towards French-Japanese MT Mutsuko Tomokiyo, Mathieu Mangeot and Christian Boitet 227 Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy Zi Long, Ryuichiro Kimura, Takehito Utsuro, Tomoharu Mitsuhashi and Mikio Yamamoto 241 Usefulness of MT output for comprehension — an analysis from the point of view of linguistic intercomprehension Kenneth Jordan-Núñez, Mikel L. Forcada and Esteve Clua 254 Machine Translation as an Academic Writing Aid for Medical Practitioners Carla Parra Escartín, Sharon O'Brien, Marie-Josée Goulet and Michel Simard 268 A Multilingual Parallel Corpus for Improving Machine Translation on Southeast Asian Languages Hai Long Trieu and Le Minh Nguyen 282 Exploring Hypotheses Spaces in Neural Machine Translation Frédéric Blain, Lucia Specia and Pranava Madhyastha 299 Confidence through Attention Matīss Rikters and Mark Fishel 312 Disentangling ASR and MT Errors in Speech Translation Ngoc-Tien Le, Benjamin Lecouteux and Laurent Besacier 324 Temporality as Seen through Translation: A Case Study on Hindi Texts Sabyasachi Kamila, Sukanta Sen, Mohammed Hasanuzzaman, Asif Ekbal, Andy Way and Pushpak Bhattacharyya 337 A Neural Network Transliteration Model in Low Resource Settings Tan Le and Fatiha Sadat Empirical Study of Dropout Scheme for Neural Machine Translation Xiaolin Wang [email protected] Masao Utiyama [email protected] Eiichiro Sumita [email protected] Advanced Translation Research and Development Promotion Center, National Institute of Information and Communications Technology, Japan Abstract Dropout has lately been recognized as an effective method to relieve over-fitting when training deep neural networks. However, there has been little work studying the optimal dropout scheme for neural machine translation (NMT). NMT models usually contain attention mech- anisms and multiple recurrent layers, thus applying dropout becomes a non-trivial task. This paper approached this problem empirically through experiments where dropout were applied to different parts of connections using different dropout rates. The work in this paper not only leads to an improvement over an established baseline, but also provides useful heuristics about using dropout effectively to train NMT models. These heuristics include which part of connections in NMT models have higher priority for dropout than the others, and how to correctly enhance the effect of dropout for difficult translation tasks. 1 Introduction Neural machine translation (NMT), as a new technology emerged from the field of deep learning, has improved the quality of automated machine translation into a significantly higher level compared to statistical machine translation (SMT) (Wu et al., 2016; Sennrich et al., 2017; Klein et al., 2017). State-of-the-art NMT models, like many other deep neural networks, typically contain multiple non-linear hidden layers. This makes them very expressive models, which is critical to successfully learning very complicated relationships between inputs and outputs (De- vlin et al., 2014). However, as these large models have millions of parameters, they tend to over-fit during training phrases. Dropout has recently been recognized as a very effective method to relieve over-fitting. Dropout was first proposed for feed-forward neural networks by Hinton et al. (2012). Dropout was then successfully applied to recurrent neural networks by Pham et al. (2014) and Zaremba et al. (2014). Dropout outperforms many traditional approaches to over-fitting, including early stop of training, introducing weight penalties of various kinds such as L1 and L2 regularization, decaying learning rate and so on. Reported state-of-the-art NMT systems all adopt dropout during training phrases (Wu et al., 2016; Klein et al., 2017), but their paper have not provided many details about how dropout was applied. The optimal way to apply dropout to NMT models is non-trivial. Strictly speaking, NMT is an application of deep neural networks, so it can directly benefit from the advance of deep neural networks. However, two facts make NMT models stand out from nor- mal deep neural networks. First, NMT models contain recurrent hidden layers in order to gain Proceedings of MT Summit XVI, vol.1: Research Track Nagoya, Sep. 18-22, 2017 | p. 1 the ability of operating on sequences. As contrast, deep neural networks in face recognition and phoneme recognition are feed-forward as they take separate inputs (Povey et al., 2011; Krizhevsky et al., 2012). Second, NMT models contain a novel attention mechanism to manage a memory of an input sequence, as

Machine Translation Summit XVI

International Computer Science Institute Activity Report 2007

A Resource of Corpus-Derived Typed Predicate Argument Structures for Croatian

The Impact of Crowdsourcing Post-Editing with the Collaborative Translation Framework

How to Use Google Translate

Final Study Report on CEF Automated Translation Value Proposition in the Context of the European LT Market/Ecosystem

Metia Cloud OS Ss

Empowering People with Disabilities Through AI

3 Corpus Tools for Lexicographers

16.1 Digital “Modes”

Ruken C¸Akici

Improving Pre-Trained Multilingual Model with Vocabulary Expansion

The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on Iot Nodes in Smart Cities