Authoring Support for Controlled Language and Machine Translation: a Report from Practice Melanie Siegel Hochschule Darmstadt
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Can You Give Me Another Word for Hyperbaric?: Improving Speech
“CAN YOU GIVE ME ANOTHER WORD FOR HYPERBARIC?”: IMPROVING SPEECH TRANSLATION USING TARGETED CLARIFICATION QUESTIONS Necip Fazil Ayan1, Arindam Mandal1, Michael Frandsen1, Jing Zheng1, Peter Blasco1, Andreas Kathol1 Fred´ eric´ Bechet´ 2, Benoit Favre2, Alex Marin3, Tom Kwiatkowski3, Mari Ostendorf3 Luke Zettlemoyer3, Philipp Salletmayr5∗, Julia Hirschberg4, Svetlana Stoyanchev4 1 SRI International, Menlo Park, USA 2 Aix-Marseille Universite,´ Marseille, France 3 University of Washington, Seattle, USA 4 Columbia University, New York, USA 5 Graz Institute of Technology, Austria ABSTRACT Our previous work on speech-to-speech translation systems has We present a novel approach for improving communication success shown that there are seven primary sources of errors in translation: between users of speech-to-speech translation systems by automat- • ASR named entity OOVs: Hi, my name is Colonel Zigman. ically detecting errors in the output of automatic speech recogni- • ASR non-named entity OOVs: I want some pristine plates. tion (ASR) and statistical machine translation (SMT) systems. Our approach initiates system-driven targeted clarification about errorful • Mispronunciations: I want to collect some +de-MOG-raf-ees regions in user input and repairs them given user responses. Our about your family? (demographics) system has been evaluated by unbiased subjects in live mode, and • Homophones: Do you have any patients to try this medica- results show improved success of communication between users of tion? (patients vs. patience) the system. • MT OOVs: Where is your father-in-law? Index Terms— Speech translation, error detection, error correc- • Word sense ambiguity: How many men are in your com- tion, spoken dialog systems. pany? (organization vs. -
Is Machine Translation Post-Editing Worth the Effort? a Survey of Research Into Post-Editing and Effort Maarit Koponen, University of Helsinki
The Journal of Specialised Translation Issue 25 – January 2016 Is machine translation post-editing worth the effort? A survey of research into post-editing and effort Maarit Koponen, University of Helsinki ABSTRACT Advances in the field of machine translation have recently generated new interest in the use of this technology in various scenarios. This development raises questions over the roles of humans and machines as machine translation appears to be moving from the peripheries of the translation field closer to the centre. The situation which influences the work of professional translators most involves post-editing machine translation, i.e. the use of machine translations as raw versions to be edited by translators. Such practice is increasingly commonplace for many language pairs and domains and is likely to form an even larger part of the work of translators in the future. While recent studies indicate that post-editing high-quality machine translations can indeed increase productivity in terms of translation speed, editing poor machine translations can be an unproductive task. The question of effort involved in post-editing therefore remains a central issue. The objective of this article is to present an overview of the use of post-editing of machine translation as an increasingly central practice in the translation field. Based on a literature review, the article presents a view of current knowledge concerning post-editing productivity and effort. Findings related to specific source text features, as well as machine translation errors that appear to be connected with increased post-editing effort, are also discussed. KEYWORDS Machine translation, MT quality, MT post-editing, post-editing effort, productivity. -
Translating the Post-Editor: an Investigation of Post-Editing Changes and Correlations with Professional Experience Across Two Romance Languages
Translating the post-editor: an investigation of post-editing changes and correlations with professional experience across two Romance languages Giselle de Almeida Thesis submitted for the degree of Doctor of Philosophy School of Applied Language and Intercultural Studies Dublin City University January 2013 Supervisors: Dr Sharon O’Brien and Phil Ritchie I hereby certify that this material, which I now submit for assessment on the programme of study leading to the award of Doctor of Philosophy is entirely my own work, that I have exercised reasonable care to ensure that the work is original, and does not to the best of my knowledge breach any law of copyright, and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work. Signed: ______________________ ID No.: ______________________ Date: ______________________ ii Abstract With the growing use of machine translation, more and more companies are also using post-editing services to make the machine-translated output correct, precise and fully understandable. Post-editing, which is distinct from translation and revision, is still a new activity for many translators. The lack of training, clear and consistent guidelines and international standards may cause difficulties in the transition from translation to post- editing. Aiming to gain a better understanding of these difficulties, this study investigates the impact of translation experience on post-editing performance, as well as differences and similarities in post-editing behaviours and trends between two languages of the same family (French and Brazilian Portuguese). The research data were gathered by means of individual sessions in which participants remotely connected to a computer and post-edited machine-translated segments from the IT domain, while all their edits and onscreen activities were recorded via screen-recording and keylogging programs. -
INTRINSIC and EXTRINSIC SOURCES of TRANSLATOR SATISFACTION: an EMPIRICAL STUDY Mónica Rodríguez-Castro University of North Carolina at Charlotte (Estados Unidos)
Entreculturas 7-8 (enero 2016) ISSN: 1989-5097 Mónica Rodríguez Castro INTRINSIC AND EXTRINSIC SOURCES OF TRANSLATOR SATISFACTION: AN EMPIRICAL STUDY Mónica Rodríguez-Castro University of North Carolina at Charlotte (Estados Unidos) ABSTRACT This paper discusses the main results from an online questionnaire on translator satisfaction—a theoretical construct that conceptualizes leading sources of task and job satisfaction in the language industry. The proposed construct distinguishes between intrinsic and extrinsic sources of satisfaction using Herzberg’s two-factor framework and enumerates the constituents of translator satisfaction. Statistical analysis allows this study to quantify these constituents and their correlations. Preliminary results reveal that crucial sources of task satisfaction include task pride, ability to perform a variety of tasks, and successful project completion. Major sources of job satisfaction include professional skills of team members, a continuous relationship with clients, and clients’ understanding of the translation process. Low income and requests for discounts are found to be some of the sources of dissatisfaction. The findings from this study can be used to investigate new approaches for retention and human resource management. KEY WORDS: translation, language industry, task satisfaction, job satisfaction, outsourcing, sociology of translation. RESUMEN Este artículo presenta los principales resultados de una encuesta en línea que se enfoca en la satisfacción laboral del traductor en la actual industria de la lengua. La conceptualización teórica de la satisfacción laboral se divide en dos categorías fundamentales: satisfacción por tareas y satisfacción en el trabajo. El marco teórico establece una distinción entre fuentes intrínsecas y extrínsecas de satisfacción laboral adoptando como base los principios de la Teoría Bifactorial de Herzberg y enumera cada uno de los componentes de la satisfacción del traductor. -
A Survey of Voice Translation Methodologies - Acoustic Dialect Decoder
International Conference On Information Communication & Embedded Systems (ICICES-2016) A SURVEY OF VOICE TRANSLATION METHODOLOGIES - ACOUSTIC DIALECT DECODER Hans Krupakar Keerthika Rajvel Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering SSN College Of Engineering SSN College of Engineering E-mail: [email protected] E-mail: [email protected] Bharathi B Angel Deborah S Vallidevi Krishnamurthy Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science and Engineering and Engineering and Engineering SSN College Of Engineering SSN College Of Engineering SSN College Of Engineering E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] Abstract— Speech Translation has always been about giving in order to make the process of communication amongst source text/audio input and waiting for system to give humans better, easier and efficient. However, the existing translated output in desired form. In this paper, we present the methods, including Google voice translators, typically Acoustic Dialect Decoder (ADD) – a voice to voice ear-piece handle the process of translation in a non-automated manner. translation device. We introduce and survey the recent advances made in the field of Speech Engineering, to employ in This makes the process of translation of word(s) and/or the ADD, particularly focusing on the three major processing sentences from one language to another, slower and more steps of Recognition, Translation and Synthesis. We tackle the tedious. We wish to make that process automatic – have a problem of machine understanding of natural language by device do what a human translator does, inside our ears. -
Machine Translation Post-Editing and Effort, Empirical Studies on the Post-Editing Process
CORE Metadata, citation and similar papers at core.ac.uk Provided by Helsingin yliopiston digitaalinen arkisto Department of Modern Languages Faculty of Arts University of Helsinki Machine Translation Post-editing and Effort Empirical Studies on the Post-editing Process Maarit Koponen ACADEMIC DISSERTATION to be publicly discussed, by due permission of the Faculty of Arts at the University of Helsinki, in Auditorium XII, Main Building, on the 19th of March, 2016 at 10 o'clock. Helsinki 2016 Supervisor Prof. Lauri Carlson, University of Helsinki, Finland Pre-examiners Dr. Sharon O'Brien, Dublin City University, Ireland Dr. Jukka M¨akisalo,University of Eastern Finland, Finland Opponent Dr. Sharon O'Brien, Dublin City University, Ireland Copyright c 2016 Maarit Koponen ISBN 978-951-51-1974-2 (paperback) ISBN 978-951-51-1975-9 (PDF) Unigrafia Helsinki 2016 Abstract This dissertation investigates the practice of machine translation post-editing and the various aspects of effort involved in post-editing work. Through analy- ses of edits made by post-editors, the work described here examines three main questions: 1) what types of machine translation errors or source text features cause particular effort in post-editing, 2) what types of errors can or cannot be corrected without the help of the source text, and 3) how different indicators of effort vary between different post-editors. The dissertation consists of six previously published articles, and an introduc- tory summary. Five of the articles report original research, and involve analyses of post-editing data to examine questions related to post-editing effort as well as differences between post-editors. -
Learning Speech Translation from Interpretation
Learning Speech Translation from Interpretation zur Erlangung des akademischen Grades eines Doktors der Ingenieurwissenschaften von der Fakult¨atf¨urInformatik Karlsruher Institut f¨urTechnologie (KIT) genehmigte Dissertation von Matthias Paulik aus Karlsruhe Tag der m¨undlichen Pr¨ufung: 21. Mai 2010 Erster Gutachter: Prof. Dr. Alexander Waibel Zweiter Gutachter: Prof. Dr. Tanja Schultz Ich erkl¨arehiermit, dass ich die vorliegende Arbeit selbst¨andig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe sowie dass ich die w¨ortlich oder inhaltlich ¨ubernommenen Stellen als solche kenntlich gemacht habe und die Satzung des KIT, ehem. Universit¨atKarlsruhe (TH), zur Sicherung guter wissenschaftlicher Praxis in der jeweils g¨ultigenFassung beachtet habe. Karlsruhe, den 21. Mai 2010 Matthias Paulik Abstract The basic objective of this thesis is to examine the extent to which automatic speech translation can benefit from an often available but ignored resource, namely human interpreter speech. The main con- tribution of this thesis is a novel approach to speech translation development, which makes use of that resource. The performance of the statistical models employed in modern speech translation systems depends heavily on the availability of vast amounts of training data. State-of-the-art systems are typically trained on: (1) hundreds, sometimes thousands of hours of manually transcribed speech audio; (2) bi-lingual, sentence-aligned text corpora of man- ual translations, often comprising tens of millions of words; and (3) monolingual text corpora, often comprising hundreds of millions of words. The acquisition of such enormous data resources is highly time-consuming and expensive, rendering the development of deploy- able speech translation systems prohibitive to all but a handful of eco- nomically or politically viable languages. -
Is 42 the Answer to Everything in Subtitling-Oriented Speech Translation?
Is 42 the Answer to Everything in Subtitling-oriented Speech Translation? Alina Karakanta Matteo Negri Marco Turchi Fondazione Bruno Kessler Fondazione Bruno Kessler Fondazione Bruno Kessler University of Trento Trento - Italy Trento - Italy Trento - Italy [email protected] [email protected] [email protected] Abstract content, still rely heavily on human effort. In a typi- cal multilingual subtitling workflow, a subtitler first Subtitling is becoming increasingly important creates a subtitle template (Georgakopoulou, 2019) for disseminating information, given the enor- by transcribing the source language audio, timing mous amounts of audiovisual content becom- and adapting the text to create proper subtitles in ing available daily. Although Neural Machine Translation (NMT) can speed up the process the source language. These source language subti- of translating audiovisual content, large man- tles (also called captions) are already compressed ual effort is still required for transcribing the and segmented to respect the subtitling constraints source language, and for spotting and seg- of length, reading speed and proper segmentation menting the text into proper subtitles. Cre- (Cintas and Remael, 2007; Karakanta et al., 2019). ating proper subtitles in terms of timing and In this way, the work of an NMT system is already segmentation highly depends on information simplified, since it only needs to translate match- present in the audio (utterance duration, natu- ing the length of the source text (Matusov et al., ral pauses). In this work, we explore two meth- ods for applying Speech Translation (ST) to 2019; Lakew et al., 2019). However, the essence subtitling: a) a direct end-to-end and b) a clas- of a good subtitle goes beyond matching a prede- sical cascade approach. -
Machine Translation and Monolingual Postediting: the AFRL WMT-14 System Lane O.B
Machine Translation and Monolingual Postediting: The AFRL WMT-14 System Lane O.B. Schwartz Timothy Anderson Air Force Research Laboratory Air Force Research Laboratory [email protected] [email protected] Jeremy Gwinnup Katherine M. Young SRA International† N-Space Analysis LLC† [email protected] [email protected] Abstract test set for correctness against the reference translations. Using bilingual judges, we fur- This paper describes the AFRL sta- ther evaluate a substantial subset of the post- tistical MT system and the improve- edited test set using a more fine-grained ade- ments that were developed during the quacy metric; using this metric, we show that WMT14 evaluation campaign. As part monolingual posteditors can successfully pro- of these efforts we experimented with duce postedited translations that convey all or a number of extensions to the stan- most of the meaning of the original source sen- dard phrase-based model that improve tence in up to 87.8% of sentences. performance on Russian to English and Hindi to English translation tasks. 2 System Description In addition, we describe our efforts We submitted systems for the Russian-to- to make use of monolingual English English and Hindi-to-English MT shared speakers to correct the output of ma- tasks. In all submitted systems, we use the chine translation, and present the re- phrase-based moses decoder (Koehn et al., sults of monolingual postediting of the 2007). We used only the constrained data sup- entire 3003 sentences of the WMT14 plied by the evaluation for each language pair Russian-English test set. -
Speech Translation Into Pakistan Sign Language
Master Thesis Computer Science Thesis No: MCS-2010-24 2012 Speech Translation into Pakistan Sign Language Speech Translation into Pakistan Sign AhmedLanguage Abdul Haseeb Asim Illyas Speech Translation into Pakistan Sign LanguageAsim Ilyas School of Computing Blekinge Institute of Technology SE – 371 79 Karlskrona Sweden This thesis is submitted to the School of Computing at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Computer Science. The thesis is equivalent to 20 weeks of full time studies. Contact Information: Authors: Ahmed Abdul Haseeb E-mail: [email protected] Asim Ilyas E-mail: [email protected] University advisor: Prof. Sara Eriksén School of Computing, Blekinge Institute of Technology Internet : www.bth.se/com SE – 371 79 Karlskrona Phone : +46 457 38 50 00 Sweden Fax : + 46 457 271 25 ABSTRACT Context: Communication is a primary human need and language is the medium for this. Most people have the ability to listen and speak and they use different languages like Swedish, Urdu and English etc. to communicate. Hearing impaired people use signs to communicate. Pakistan Sign Language (PSL) is the preferred language of the deaf in Pakistan. Currently, human PSL interpreters are required to facilitate communication between the deaf and hearing; they are not always available, which means that communication among the deaf and other people may be impaired or nonexistent. In this situation, a system with voice recognition as an input and PSL as an output will be highly helpful. Objectives: As part of this thesis, we explore challenges faced by deaf people in everyday life while interacting with unimpaired. -
WPTP 2015: Post-Editing Technology and Practice
Machine Translation Summit XV http://www.amtaweb.org/mt-summit-xv WPTP 2015: Post-editing Technology and Practice Organizers: Sharon O’Brien (Dublin City University) Michel Simard (National Research Council Canada) Joss Moorkens (Dublin City University) WORKSHOP MT Summit XV October 30 – November 3, 2015 -- Miami, FL, USA Proceedings of 4th Workshop on Post-Editing Technology and Practice (WPTP4) Sharon O'Brien & Michel Simard, Eds. Association for Machine Translation in the Americas http://www.amtaweb.org ©2015 The Authors. These articles are licensed under a Creative Commons 3.0 license, no derivative works, attribution, CC-BY-ND. Introduction The fourth Workshop on Post-Editing Technology and Practice (WPTP4) was organised for November 3rd, 2015, in Miami, USA, as a workshop of the XVth MT Summit. This was the fourth in a series of workshops organised since 2012 (WPTP1 – San Diego 2012, WPTP2 – Nice 2013, WPTP3 – Vancouver 2014). The accepted papers for WPTP4 cover a range of topics such as the teaching of post- editing skills, measuring the cognitive effort of post-editing, and quality assessment of post-edited output. The papers were also book-ended by two invited talks by Elliott Macklovitch (Independent Consultant) and John Tinsley (Iconic Translation Machines). Elliott Macklovitch’s talk was entitled “What Translators Need to Become Happy Post- Editors” while John Tinsley’s talk was entitled “What MT Developers Are Doing to Make Post-Editors Happy”. By juxtaposing these two points of view we hoped to provide an interesting frame for attendees where the views of users and developers were represented. We sincerely thank the authors of the papers as well as our two invited speakers for sharing their research. -
Between Flexibility and Consistency: Joint Generation of Captions And
Between Flexibility and Consistency: Joint Generation of Captions and Subtitles Alina Karakanta1,2, Marco Gaido1,2, Matteo Negri1, Marco Turchi1 1 Fondazione Bruno Kessler, Via Sommarive 18, Povo, Trento - Italy 2 University of Trento, Italy {akarakanta,mgaido,negri,turchi}@fbk.eu Abstract material but also between each other, for example in the number of blocks (pieces of time-aligned Speech translation (ST) has lately received text) they occupy, their length and segmentation. growing interest for the generation of subtitles Consistency is vital for user experience, for ex- without the need for an intermediate source language transcription and timing (i.e. cap- ample in order to elicit the same reaction among tions). However, the joint generation of source multilingual audiences, or to facilitate the quality captions and target subtitles does not only assurance process in the localisation industry. bring potential output quality advantageswhen Previous work in ST for subtitling has focused the two decoding processes inform each other, on generating interlingual subtitles (Matusov et al., but it is also often required in multilingual sce- 2019; Karakanta et al., 2020a), a) without consid- narios. In this work, we focus on ST models ering the necessity of obtaining captions consis- which generate consistent captions-subtitles in tent with the target subtitles, and b) without ex- terms of structure and lexical content. We further introduce new metrics for evaluating amining whether the joint generation leads to im- subtitling consistency. Our findings show that provements in quality. We hypothesise that knowl- joint decoding leads to increased performance edge sharing between the tasks of transcription and consistency between the generated cap- and translation could lead to such improvements.