Post-Editing Machine Translated Text in a Commercial Setting: Observation and Statistical Analysis

Post-Editing Machine Translated Text in A Commercial Setting: Observation and Statistical Analysis Midori Tatsumi, MSc. Thesis submitted for the degree of Doctor of Philosophy School of Applied Language and Intercultural Studies Dublin City University Supervisors: Dr. Sharon O’Brien / Dr. Minako O’Hagan / Dr. Fred Hollowood October 2010 I hereby certify that this material, which I now submit for assessment on the programme of study leading to the award of Doctor of Philosophy is entirely my own work, that I have exercised reasonable care to ensure that the work is original, and does not to the best of my knowledge breach any law of copyright, and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work. Signed: ____________________________ ID No.: ______________________ (Candidate) Date: _____________________________ ii ACKNOWLEDGEMENT I would like to express my gratitude for the Enterprise Ireland Innovation Partnerships Programme in conjunction with Symantec Ltd. for giving me a rare opportunity to conduct research in both academic and industrial contexts while fulfilling my personal interest. I would like to thank my academic supervisors, Dr. Sharon O’Brien and Dr. Minako O’Hagan, for guiding and enlightening me on how to perform research of academic value. This was an eye-opening experience in every sense for me, a person who had never been a resident of academia! I would also like to thank my industrial supervisor, Dr. Fred Hollowood, for maintaining the practical value of my research by questioning the meaning of each finding from an industrial view point. I am grateful to Dr. Johann Roturier for providing constant and thorough technical help and astonishing expertise, and also to everyone I worked with at Symantec for helping me on many occasions and for friendship. I would like to thank my family and friends in Japan for the continuous support during my long journey abroad. To all friends I met in Ireland and especially in the post-grad rooms, thank you for your company! Special thanks to Marian, for all the running sessions and half-marathons we did together. Running is one of the best things I have done during my study in Ireland. And finally, thank you Frank, for constantly taking me outdoors to enjoy nature and giving me a ‘Peaceful Easy Feeling’. iii ABSTRACT Post-Editing Machine Translated Text in A Commercial Setting: Observation and Statistical Analysis Machine translation systems, when they are used in a commercial context for publishing purposes, are usually used in combination with human post-editing. Thus understanding human post-editing behaviour is crucial in order to maximise the benefit of machine translation systems. Though there have been a number of studies carried out on human post-editing to date, there is a lack of large-scale studies on post-editing in industrial contexts which focus on the activity in real-life settings. This study observes professional Japanese post-editors’ work and examines the effect of the amount of editing made during post-editing, source text characteristics, and post-editing behaviour, on the amount of post-editing effort. A mixed method approach was employed to both quantitatively and qualitatively analyse the data and gain detailed insights into the post- editing activity from various view points. The results indicate that a number of factors, such as sentence structure, document component types, use of product specific terms, and post-editing patterns and behaviour, have effect on the amount of post-editing effort in an intertwined manner. The findings will contribute to a better utilisation of machine translation systems in the industry as well as the development of the skills and strategies of post-editors. iv TABLE OF CONTENTS PART I RESEARCH BACKGROUND Chapter 1 Introduction...................................................................................................2 1.1 Machine Translation (MT)................................................................................2 1.2 MT and Post-Editing (PE).................................................................................3 1.3 IT Document As A Text Genre.........................................................................5 1.4 Research Context ..............................................................................................6 1.5 Structure of the Thesis ......................................................................................8 Chapter 2 Review of Post-Editing Studies ..................................................................10 2.1 Definition of Post-Editing (PE).......................................................................10 2.2 Research on Strategies and Methodologies of PE...........................................11 2.2.1 Human PE ...............................................................................................11 2.2.2 Aided and automated PE.........................................................................12 2.2.3 Statistical machine translation based post-editing (SPE)........................14 2.3 Methods of Measuring Human PE Effort .......................................................15 2.3.1 Cognitive PE effort .................................................................................16 2.3.2 Temporal PE effort..................................................................................17 2.3.3 Technical PE effort .................................................................................17 2.4 Findings from PE Effort Studies and Research Gaps .....................................18 2.4.1 MT quality and PE effort ........................................................................18 2.4.2 ST characteristics and PE effort..............................................................20 2.4.3 PE typology.............................................................................................22 2.4.4 Post-editor variance.................................................................................24 2.4.5 Research settings.....................................................................................25 2.4.6 Analysis method......................................................................................26 2.5 MT and TM.....................................................................................................26 2.5.1 Combining MT and TM..........................................................................26 2.6 MT and PE research in Japan..........................................................................27 2.7 Formulation of Research Questions and Research Plans................................30 2.8 Concluding Remarks.......................................................................................32 v PART II CONDUCTING THE STUDY Chapter 3 Research Methodology ...............................................................................34 3.1 Research Framework.......................................................................................34 3.2 Mixed Methods Research Design ...................................................................35 3.2.1 ‘What is happening?’ ..............................................................................37 3.2.2 ‘Is there a systematic effect?’..................................................................37 3.2.3 ‘How is it happening?’............................................................................38 3.3 Observational Experiment (Data Collection Phase) .......................................38 3.4 Operationalisation ...........................................................................................39 3.4.1 Amount of PE effort (dependent variable)..............................................39 3.4.2 Amount of editing (independent variable) ..............................................41 3.4.3 Characteristics of source text (independent variable) .............................43 3.4.4 Types of PE operation (independent variable)........................................44 3.5 Statistical Data Analysis .................................................................................45 3.6 MT/PE Quality Assessment ............................................................................46 3.7 Concluding Remarks.......................................................................................46 Chapter 4 Preliminary Study .......................................................................................48 4.1 Objectives........................................................................................................48 4.2 Methods...........................................................................................................48 4.2.1 Preparation of test corpus........................................................................48 4.2.2 Post-editing session.................................................................................49 4.3 Initial findings .................................................................................................50 4.3.1 Correlation between the amount of editing and PE effort.......................50 4.4 Effect of combined variables on PE effort......................................................53 4.4.1 ST structure .............................................................................................53 4.4.2 ST length.................................................................................................55 4.4.3

Load more