Machine Translation and Monolingual Postediting: the AFRL WMT-14 System Lane O.B

Total Page:16

File Type:pdf, Size:1020Kb

Machine Translation and Monolingual Postediting: the AFRL WMT-14 System Lane O.B Machine Translation and Monolingual Postediting: The AFRL WMT-14 System Lane O.B. Schwartz Timothy Anderson Air Force Research Laboratory Air Force Research Laboratory [email protected] [email protected] Jeremy Gwinnup Katherine M. Young SRA International† N-Space Analysis LLC† [email protected] [email protected] Abstract test set for correctness against the reference translations. Using bilingual judges, we fur- This paper describes the AFRL sta- ther evaluate a substantial subset of the post- tistical MT system and the improve- edited test set using a more fine-grained ade- ments that were developed during the quacy metric; using this metric, we show that WMT14 evaluation campaign. As part monolingual posteditors can successfully pro- of these efforts we experimented with duce postedited translations that convey all or a number of extensions to the stan- most of the meaning of the original source sen- dard phrase-based model that improve tence in up to 87.8% of sentences. performance on Russian to English and Hindi to English translation tasks. 2 System Description In addition, we describe our efforts We submitted systems for the Russian-to- to make use of monolingual English English and Hindi-to-English MT shared speakers to correct the output of ma- tasks. In all submitted systems, we use the chine translation, and present the re- phrase-based moses decoder (Koehn et al., sults of monolingual postediting of the 2007). We used only the constrained data sup- entire 3003 sentences of the WMT14 plied by the evaluation for each language pair Russian-English test set. for training our systems. 1 Introduction 2.1 Data Preparation As part of the 2014 Workshop on Machine Before training our systems, a cleaning pass Translation (WMT14) shared translation task, was performed on all data. Unicode charac- the human language technology team at the ters in the unallocated and private use ranges Air Force Research Laboratory participated were all removed, along with C0 and C1 con- in two language pairs: Russian-English and trol characters, zero-width and non-breaking Hindi-English. Our machine translation sys- spaces and joiners, directionality and para- tem represents enhancements to our system graph markers. from IWSLT 2013 (Kazi et al., 2013). In this paper, we focus on enhancements to our pro- 2.1.1 Hindi Processing cedures with regard to data processing and the The HindEnCorp corpus (Bojar et al., 2014) handling of unknown words. is distributed in tokenized form; in order to In addition, we describe our efforts to make ensure a uniform tokenization standard across use of monolingual English speakers to correct all of our data, we began by detokenized this the output of machine translation, and present data using the Moses detokenization scripts. the results of monolingual postediting of the In addition to normalizing various extended entire 3003 sentences of the WMT14 Russian- Latin punctuation marks to their Basic Latin English test set. Using a binary adequacy clas- equivalents, following Bojar et al. (2010) we sification, we evaluate the entire postedited normalized Devanagari Danda (U+0964), Double Danda (U+0965), and Abbrevia- †This work is sponsored by the Air Force Research Laboratory under Air Force contract FA-8650-09-D- tion Sign (U+0970) punctuation marks to 6939-029. Latin Full Stop (U+002E), any Devana- 186 Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 186–194, Baltimore, Maryland USA, June 26–27, 2014. c 2014 Association for Computational Linguistics gari Digit to the equivalent ASCII Digit, ters in the sentence were non-Latin, or if more and decomposed all Hindi data into Unicode than half of the words were unknown to the Normalization Form D (Davis and Whistler, aspell English spelling correction program, 2013) using charlint.1 In addition, we per- not counting short words, which frequently formed Hindi diacritic and vowel normaliza- occur as (possibly false) cognates across lan- tion, following Larkey et al. (2003). guages (English die vs. German die, English Since no Hindi-English development test on vs. French on, for example). Because set was provided in WMT14, we randomly aspell does not recognize some proper names, sampled 1500 sentence pairs from the Hindi- brand names, and borrowed words as known English parallel training data to serve this pur- English words, this method incorrectly flags pose. Upon discovering duplicate sentences in for removal some English sentences which have the corpus, 552 sentences that overlapped with a high proportion of these types of words. the training portion were removed from the Source sentences were marked as non- sample, leaving a development test set of 948 Russian if less than one-third of the charac- sentences. ters were within the Russian Cyrillic range, or 2.1.2 Russian Processing if non-Russian characters equal or outnumber Russian characters and the sentence contains The Russian sentences contained many exam- no contiguous sequence of at least three Rus- ples of mixed-character spelling, in which both sian characters. Some portions of the Cyrillic Latin and Cyrillic characters are used in a sin- character set are not used in typical Russian gle word, relying on the visual similarity of the text; source sentences were therefore marked characters. For example, although the first for removal if they contained Cyrillic exten- letter and last letter in the word ap- cейчас sion characters Ukrainian I (і І), Yi(ї Ї), pear visually indistinguishable, we find that Ghe With Upturn (ґ Ґ) or Ie (є Є) in ei- the former is U+0063 Latin Small Letter ther upper- or lowercase, with exceptions for C and the latter is U+0441 Cyrillic Small U+0406 Ukrainian I (І) in Roman numerals Letter Es. We created a spelling normal- and for U+0491 Ghe With Upturn (ґ) when ization program to convert these words to all it occurred as an encoding error artifact.3 Cyrillic or all Latin characters, with a pref- erence for all-Cyrillic conversion if possible. Sentence pairs where the source was identi- Normalization also removes U+0301 Combin- fied as non-Russian or the target was identified ing Acute Accent ( ̲́) and converts U+00F2 as non-English were removed from the parallel Latin Small Letter O with Grave (ò) corpus. Overall, 12% of the parallel sentences and U+00F3 Latin Small Letter O with were excluded based on a non-Russian source Acute (ó) to the unaccented U+043E Cyril- sentence (94k instances) or a non-English tar- get sentence (11.8k instances). lic Small Letter O (о). The Russian-English Common Crawl par- Our Russian-English parallel training data allel corpus (Smith et al., 2013) is relatively includes a parallel corpus extracted from noisy. A number of Russian source sentences Wikipedia headlines (Ammar et al., 2013), are incorrectly encoded using characters in the provided as part of the WMT14 shared trans- Latin-1 supplement block; we correct these lation task. Two files in this parallel cor- sentences by shifting these characters ahead pus (wiki.ru-en and guessed-names.ru-en) by 350hex code points into the correct Cyrillic contained some overlapping data. We re- character range.2 moved 6415 duplicate lines within wiki.ru-en We examine the Common Crawl parallel (about 1.4%), and removed 94 lines of sentences and mark for removal any non- guessed-names.ru-en that were already Russian source sentences and non-English tar- present in wiki.ru-en (about 0.17%). get sentences. Target sentences were marked as non-English if more than half of the charac- 3Specifically, we allowed lines containing ґ where it appears as an encoding error in place of an apostro- 1http://www.w3.org/International/charlint phe within English words. For example: “Песня The 2For example: “Ñïðàâêà ïî ãîðîäàì Ðîññèè è ìèðà.” Kelly Family Iґm So Happy представлена вам Lyrics- becomes “Справка по городам России и мира.” Keeper.” 187 2.2 Machine Translation Decoding Features Our baseline system is a variant of the MIT- P(f e) | LL/AFRL IWSLT 2013 system (Kazi et al., P(e f) | 2013) with some modifications to the training Pw(f e) | and decoding processes. Pw(e f) Phrase Penalty| 2.2.1 Phrase Table Training Lexical Backoff For our Russian-English system, we trained Word Penalty a phrase table using the Moses Experiment Distortion Model Management System (Koehn, 2010b), with Unknown Word Penalty mgiza (Gao and Vogel, 2008) as the word Lexicalized Reordering Model aligner; this phrase table was trained using the Operation Sequence Model Russian-English Common Crawl, News Com- Rescoring Features mentary, Yandex (Bojar et al., 2013), and P (E) – 7-gram class-based LM Wikipedia headlines parallel corpora. class Plex(F E) – sentence-level averaged The phrase table for our Hindi-English sys- lexical| translation score tem was trained using a similar in-house train- ing pipeline, making use of the HindEnCorp Table 1: Models used in log-linear combina- and Wikipedia headlines parallel corpora. tion 2.2.2 Language Model Training 2.2.3 Decoding, n-best List Rescoring, During the training process we built n-gram and Optimization language models (LMs) for use in decoding and rescoring using the KenLM language mod- We decode using the phrase-based moses de- elling toolkit (Heafield et al., 2013). Class- coder (Koehn et al., 2007), choosing the best based language models (Brown et al., 1992) translation for each source sentence according were also trained, for later use in n-best list to a linear combination of decoding features: rescoring, using the SRILM language mod- Eˆ = arg max λrhr(E, F) (1) elling toolkit (Stolcke, 2002).We trained a 6- E r gram language model from the LDC English ∑∀ Gigaword Fifth Edition, for use in both the We make use of a standard set of decoding Hindi-English and Russian-English systems.
Recommended publications
  • Is Machine Translation Post-Editing Worth the Effort? a Survey of Research Into Post-Editing and Effort Maarit Koponen, University of Helsinki
    The Journal of Specialised Translation Issue 25 – January 2016 Is machine translation post-editing worth the effort? A survey of research into post-editing and effort Maarit Koponen, University of Helsinki ABSTRACT Advances in the field of machine translation have recently generated new interest in the use of this technology in various scenarios. This development raises questions over the roles of humans and machines as machine translation appears to be moving from the peripheries of the translation field closer to the centre. The situation which influences the work of professional translators most involves post-editing machine translation, i.e. the use of machine translations as raw versions to be edited by translators. Such practice is increasingly commonplace for many language pairs and domains and is likely to form an even larger part of the work of translators in the future. While recent studies indicate that post-editing high-quality machine translations can indeed increase productivity in terms of translation speed, editing poor machine translations can be an unproductive task. The question of effort involved in post-editing therefore remains a central issue. The objective of this article is to present an overview of the use of post-editing of machine translation as an increasingly central practice in the translation field. Based on a literature review, the article presents a view of current knowledge concerning post-editing productivity and effort. Findings related to specific source text features, as well as machine translation errors that appear to be connected with increased post-editing effort, are also discussed. KEYWORDS Machine translation, MT quality, MT post-editing, post-editing effort, productivity.
    [Show full text]
  • Translating the Post-Editor: an Investigation of Post-Editing Changes and Correlations with Professional Experience Across Two Romance Languages
    Translating the post-editor: an investigation of post-editing changes and correlations with professional experience across two Romance languages Giselle de Almeida Thesis submitted for the degree of Doctor of Philosophy School of Applied Language and Intercultural Studies Dublin City University January 2013 Supervisors: Dr Sharon O’Brien and Phil Ritchie I hereby certify that this material, which I now submit for assessment on the programme of study leading to the award of Doctor of Philosophy is entirely my own work, that I have exercised reasonable care to ensure that the work is original, and does not to the best of my knowledge breach any law of copyright, and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work. Signed: ______________________ ID No.: ______________________ Date: ______________________ ii Abstract With the growing use of machine translation, more and more companies are also using post-editing services to make the machine-translated output correct, precise and fully understandable. Post-editing, which is distinct from translation and revision, is still a new activity for many translators. The lack of training, clear and consistent guidelines and international standards may cause difficulties in the transition from translation to post- editing. Aiming to gain a better understanding of these difficulties, this study investigates the impact of translation experience on post-editing performance, as well as differences and similarities in post-editing behaviours and trends between two languages of the same family (French and Brazilian Portuguese). The research data were gathered by means of individual sessions in which participants remotely connected to a computer and post-edited machine-translated segments from the IT domain, while all their edits and onscreen activities were recorded via screen-recording and keylogging programs.
    [Show full text]
  • INTRINSIC and EXTRINSIC SOURCES of TRANSLATOR SATISFACTION: an EMPIRICAL STUDY Mónica Rodríguez-Castro University of North Carolina at Charlotte (Estados Unidos)
    Entreculturas 7-8 (enero 2016) ISSN: 1989-5097 Mónica Rodríguez Castro INTRINSIC AND EXTRINSIC SOURCES OF TRANSLATOR SATISFACTION: AN EMPIRICAL STUDY Mónica Rodríguez-Castro University of North Carolina at Charlotte (Estados Unidos) ABSTRACT This paper discusses the main results from an online questionnaire on translator satisfaction—a theoretical construct that conceptualizes leading sources of task and job satisfaction in the language industry. The proposed construct distinguishes between intrinsic and extrinsic sources of satisfaction using Herzberg’s two-factor framework and enumerates the constituents of translator satisfaction. Statistical analysis allows this study to quantify these constituents and their correlations. Preliminary results reveal that crucial sources of task satisfaction include task pride, ability to perform a variety of tasks, and successful project completion. Major sources of job satisfaction include professional skills of team members, a continuous relationship with clients, and clients’ understanding of the translation process. Low income and requests for discounts are found to be some of the sources of dissatisfaction. The findings from this study can be used to investigate new approaches for retention and human resource management. KEY WORDS: translation, language industry, task satisfaction, job satisfaction, outsourcing, sociology of translation. RESUMEN Este artículo presenta los principales resultados de una encuesta en línea que se enfoca en la satisfacción laboral del traductor en la actual industria de la lengua. La conceptualización teórica de la satisfacción laboral se divide en dos categorías fundamentales: satisfacción por tareas y satisfacción en el trabajo. El marco teórico establece una distinción entre fuentes intrínsecas y extrínsecas de satisfacción laboral adoptando como base los principios de la Teoría Bifactorial de Herzberg y enumera cada uno de los componentes de la satisfacción del traductor.
    [Show full text]
  • Machine Translation Post-Editing and Effort, Empirical Studies on the Post-Editing Process
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Helsingin yliopiston digitaalinen arkisto Department of Modern Languages Faculty of Arts University of Helsinki Machine Translation Post-editing and Effort Empirical Studies on the Post-editing Process Maarit Koponen ACADEMIC DISSERTATION to be publicly discussed, by due permission of the Faculty of Arts at the University of Helsinki, in Auditorium XII, Main Building, on the 19th of March, 2016 at 10 o'clock. Helsinki 2016 Supervisor Prof. Lauri Carlson, University of Helsinki, Finland Pre-examiners Dr. Sharon O'Brien, Dublin City University, Ireland Dr. Jukka M¨akisalo,University of Eastern Finland, Finland Opponent Dr. Sharon O'Brien, Dublin City University, Ireland Copyright c 2016 Maarit Koponen ISBN 978-951-51-1974-2 (paperback) ISBN 978-951-51-1975-9 (PDF) Unigrafia Helsinki 2016 Abstract This dissertation investigates the practice of machine translation post-editing and the various aspects of effort involved in post-editing work. Through analy- ses of edits made by post-editors, the work described here examines three main questions: 1) what types of machine translation errors or source text features cause particular effort in post-editing, 2) what types of errors can or cannot be corrected without the help of the source text, and 3) how different indicators of effort vary between different post-editors. The dissertation consists of six previously published articles, and an introduc- tory summary. Five of the articles report original research, and involve analyses of post-editing data to examine questions related to post-editing effort as well as differences between post-editors.
    [Show full text]
  • WPTP 2015: Post-Editing Technology and Practice
    Machine Translation Summit XV http://www.amtaweb.org/mt-summit-xv WPTP 2015: Post-editing Technology and Practice Organizers: Sharon O’Brien (Dublin City University) Michel Simard (National Research Council Canada) Joss Moorkens (Dublin City University) WORKSHOP MT Summit XV October 30 – November 3, 2015 -- Miami, FL, USA Proceedings of 4th Workshop on Post-Editing Technology and Practice (WPTP4) Sharon O'Brien & Michel Simard, Eds. Association for Machine Translation in the Americas http://www.amtaweb.org ©2015 The Authors. These articles are licensed under a Creative Commons 3.0 license, no derivative works, attribution, CC-BY-ND. Introduction The fourth Workshop on Post-Editing Technology and Practice (WPTP4) was organised for November 3rd, 2015, in Miami, USA, as a workshop of the XVth MT Summit. This was the fourth in a series of workshops organised since 2012 (WPTP1 – San Diego 2012, WPTP2 – Nice 2013, WPTP3 – Vancouver 2014). The accepted papers for WPTP4 cover a range of topics such as the teaching of post- editing skills, measuring the cognitive effort of post-editing, and quality assessment of post-edited output. The papers were also book-ended by two invited talks by Elliott Macklovitch (Independent Consultant) and John Tinsley (Iconic Translation Machines). Elliott Macklovitch’s talk was entitled “What Translators Need to Become Happy Post- Editors” while John Tinsley’s talk was entitled “What MT Developers Are Doing to Make Post-Editors Happy”. By juxtaposing these two points of view we hoped to provide an interesting frame for attendees where the views of users and developers were represented. We sincerely thank the authors of the papers as well as our two invited speakers for sharing their research.
    [Show full text]
  • What Do Professional Translators Think About Post-Editing? Ana Guerberof Arenas, Universitat Rovira I Virgili, Tarragona
    The Journal of Specialised Translation Issue 19 – January 2013 What do professional translators think about post-editing? Ana Guerberof Arenas, Universitat Rovira i Virgili, Tarragona ABSTRACT As part of a larger research project on productivity and quality in the post-editing of machine-translated and translation-memory outputs, 24 translators and three reviewers were asked to complete an on-line questionnaire to gather information about their professional experience but also to obtain data on their opinions about post-editing and machine translation. The participants were also debriefed after finalising the assignment to triangulate the data with the quantitative results and the questionnaire. The results show that translators have mixed experiences and feelings towards machine-translated output and post-editing, not necessarily because they are misinformed or reluctant to accept its inclusion in the localisation process but due to their previous experience with various degrees of output quality and to the characteristics of this type of projects. The translators were quite satisfied in general with the work they do as translators, but not necessarily with the payment they receive for the work done, although this was highly dependent on different customers and type of tasks. KEYWORDS Translation memory, machine translation, post-editing, review, professional translators, reviewers, experience, localisation, output, pricing, MT, TM. 1. Introduction It is not very often that translators are asked their opinions about post- editing and machine translation in the localisation industry. This could be out of fear of an adverse or negative response or due to the fact that translators are often invisible in the localisation work-flow, and we feel this invisibility is increasing as process automation increases, and all aspects related to technology seem to acquire more relevance than the act of translating itself.
    [Show full text]
  • Investigating Usability in Postediting Neural Machine Translation: Evidence from Translation Trainees' Self-Perception And
    Across Languages and Cultures 22 (2021) 1, 100–123 DOI: 10.1556/084.2021.00006 Investigating usability in postediting neural machine translation: Evidence from translation trainees’ self-perception and performance Xiangling Wang1p , Tingting Wang1, Ricardo Munoz~ Martın2 and Yanfang Jia3 1 School of Foreign Languages, Hunan University, Changsha, China 2 MC2 Lab, University of Bologna, Bologna, Italy 3 Research Institute of Languages and Cultures, Hunan Normal University, Changsha, China ORIGINAL RESEARCH PAPER Received: January 6, 2020 • Accepted: October 5, 2020 © 2021 Akademiai Kiado, Budapest ABSTRACT This is a report on an empirical study on the usability for translation trainees of neural machine translation systems when post-editing (MTPE). Sixty Chinese translation trainees completed a questionnaire on their perceptions of MTPE’s usability. Fifty of them later performed both a post-editing task and a regular translation task, designed to examine MTPE’s usability by comparing their performance in terms of text processing speed, effort, and translation quality. Contrasting data collected by the questionnaire, key- logging, eyetracking and retrospective reports we found that, compared with regular, unaided translation, MTPE’s usefulness in performance was remarkable: (1) it increased translation trainees’ text processing speed and also improved their translation quality; (2) MTPE’s ease of use in performance was partly proved in that it significantly reduced informants’ effort as measured by (a) fixation duration and fixation counts; (b) total task time; and (c) the number of insertion keystrokes and total keystrokes. However, (3) translation trainees generally perceived MTPE to be useful to increase productivity, but they were skeptical about its use to improve quality.
    [Show full text]
  • A Textbook of Translation
    A TEXTBOOK OF TRANSLATION Peter Newmark W *MRtt SHANGHAI FOREIGN LANGUAGE EDUCATION PRESS 9787810801232 A Textbook of Translation Peter Newmark SHANGHAI FOREIGN LANGUAGE EDUCATION PRESS A Textbook of Translation Peter Newmark Prentice Hall NEW YORK LONDON TORONTO SYDNEY TOKYO First published 1988 by Prentice HaH International vUIO Ltd. 66 Wood Lane End, Heme! Hempstead. Hertfordshire, HP2 4RG A division of Simon &i Schuster International Group (0 1988 Prentke Hall International >XK ' Ltd All rights reserved. No pan of this publication may be reproduced. stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission, in writing, from the publisher. For permission within the United States of America contact Prentice Hall Inc., Englewood Cliffs, NJ 07632. All reasonable steps have been taken to contact copyright holders of materials used in this book. The Publisher would be pleased to make suitable arrangements with any whom it has not been possible to reach. Printed and bound in Great Britain bv A. Wheaton & Co. Ltd, Kxeter Library of Congress Catahging-in-Pubiicariitn Data Newmark, Peter A textbook of translation, Bibliography: p. Includes index. L Translating and interpreting. L Title. P306.N474 1987 418 .02 86-30593 ISBNO-B-912593-Oipbk.) British Library Cataloguing in Pubhcauon Data Newmark. Peter A textbook of translation. 1. Translating and interpreting [.Title 418,02 P306 ISBN 0-13-912593-0 Published by arrangement with Pearson Education Limited. Licenced for sale in the People's Republic of China only, excluding Hong Kong A Textbook of Translation For my daughter Clare Preface This book has been five years in the writing.
    [Show full text]
  • You Create It. We Localize It About
    You create it. We localize it About e2f e2f is a full­service language service provider headquartered in San Jose, California, with presence in Argentina, Canada, France, Germany, Madagascar, Mauritius, and Vietnam. e2f’s expert translators, editors, and project managers provide prompt “follow the sun” 24x5 services. For over a decade since its founding in 2003, e2f has offered a broad range of services in localization, localization staffing, LQA testing and multimedia. Clients include high tech Silicon Valley firms, global game developers, and a wide range of consumer­ and corporate­facing companies. Get Started Right Away! Email us at [email protected] to reach our global project management team. An ​ ​ account manager available during your business hours will be appointed for you. Continuous Localization Platform Partnerships If you need ongoing, iterative localization, e2f works with continuous localization platforms. If you choose our partner Transifex as your continuous localization platform, order translations from e2f is as simple as picking us from a drop­down menu! We also support platforms such as CrowdIn or El Loco. Machine Translation Partnership For large to very large projects where standard human translation would not fit in the budget, e2f partners with Lilt, a machine assisted translation platform relying on a latest generation auto­adaptative translation engine. Stay in Touch! ● Email: [email protected] ​ ● Telephone (U.S.): +1­408­973­1637 ● Web: http://e2f.com ​ ● Twitter: @e2ftranslations ​ ● Facebook: facebook.com/e2ftranslations ​ e2f Services 1 of 5 Localization & Translation Services Professional Localization (TEP) The cornerstone of our localization services is comprised of Translation, Editing and Proofreading (TEP).
    [Show full text]
  • MT Postediting
    MT post-editing: How to shed light on the "unknown task" Experices made at SAP Dr. Falko Schäfer Neurottstr. 16, SAP AG 69190 Walldorf Germany [email protected] invested in an increasing number of MT systems Abstract over the past years. Currently there are four MT systems that are deployed in the translation of SAP This paper describes a project currently offline texts, i.e. texts extracted from SAP systems, under way at SAP dealing with the task of converted into an "MT-suitable" format before ma- post-editing MT output. As a concrete re- chine translation, and re-imported into the systems sult of the project, a standard post-editing after the translation has been completed. These guide to be used by translator end users is systems are: currently being created. The purpose of this post-editing guide is not only to train LOGOS (used for English–French and and support translators in the daily work English–Spanish) with MT but also to make the post-editing PROMT (used for English–Russian and task more accessible to them, thus en- English–Portuguese) couraging an open-minded attitude to- METAL (used for German–English) wards translation technology. LOGOVISTA (used for English– Furthermore the systematic error typology Japanese) underlying the guide serves not only as a methodological framework for the re- Translation is done by external vendors and search on post-editing but also as a diag- translation projects are coordinated by the depart- nosis for necessary corrections and ment SAP Language Services (SLS) in cooperation enhancements to be carried out in the cor- with Multilingual Technology (MLT, in the case of responding MT systems used.
    [Show full text]
  • Workshop on Post-Editing Technology and Practice
    2014 AMT A Third Workshop on Post-Editing Technology and Practice Sharon O’Brien Michel Simard Lucia Specia WORKSHOP The 11th Conference of the Association for Machine Translation in the Americas Vancouver, BC October 22-26 amta2014.amtaweb.org The 11th Conference of the Association for Machine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada Proceedings of the Third Workshop on Post-Editing Technology and Practice (WPTP-3) Sharon O’Brien, Michel Simard and Lucia Specia (Eds.) Association for Machine Translation in the Americas http://www.amtaweb.org Table of Contents 5 MT Post-editing into the mother tongue or into a foreign language? Spanish-to-English MT translation output post-edited by translation trainees Pilar Sánchez Gijón and Olga Torres-Hostench 20 Comparison of post-editing productivity between professional translators and lay users Nora Aranberri, Gorka Labaka, Arantza Diaz de Ilarraza, Kepa Sarasola 34 Monolingual Post-Editing by a Domain Expert is Highly Effective for Translation Triage Lane Schwartz 45 Perceived vs. measured performance in the post-editing of suggestions from machine translation and translation memories Carlos S. C. Teixeira 60 Perception vs Reality: Measuring Machine Translation Post-Editing Productivity Federico Gaspari, Antonio Toral, Sudip Kumar Naskar, Declan Groves, Andy Way 73 Cognitive Demand and Cognitive Effort in Post-Editing Isabel Lacruz, Michael Denkowski, Alon Lavie 85 Vocabulary Accuracy of Statistical Machine Translation in the Legal Context Jeffrey Killman 99 Towards
    [Show full text]
  • Testing Interaction with a Mobile MT Post- Editing App
    Testing interaction with a mobile MT post- editing app The InternationalInternational Journal Journal for for Translation & Int&erpreting Interpreting Research Olga Torres-Hostench trans-int.org trans-int.org Universitat Autònoma de Barcelona, Spain [email protected] Joss Moorkens Dublin City University, Ireland [email protected] Sharon O’Brien Dublin City University, Ireland [email protected] Joris Vreeke Dublin City University, Ireland [email protected] DOI: 10.12807/ti.109202.2017.a09 Abstract: Kanjingo is a post-editing application for iOS devices developed at the ADAPT Centre (formerly CNGL) at Dublin City University (DCU). The first stage of user testing was conducted in 2014 (reported in O’Brien, Moorkens & Vreeke, 2014), and improvements were made based on the initial feedback. This abstract describes further exploratory testing based on the second iteration of the Kanjingo application. The new tests were designed with several aims: (1) testing Kanjingo for post-editing using the phone’s keyboard (2) testing Kanjingo for post-editing with voice input; (3) testing Kanjingo for revision of post-edited texts; (4) testing Kanjingo general usability; and (5) testing Kanjingo interface design. This paper presents the results of the various tests, issues identified, and ideas for improvements. For example, the use of Kanjingo for post-editing with voice input, one of the most innovative forms of interaction with MT in the test, worked much better than participants expected, and this mode of input was preferred for translating from scratch when MT quality was very poor, whereas post-editing short words or phrases was found to be faster with the iPhone keyboard.
    [Show full text]