An Empirical Study on the Influence of Translation Suggestions’ Provenance Metadata
Total Page:16
File Type:pdf, Size:1020Kb
An empirical study on the influence of translation suggestions’ provenance metadata A Thesis Submitted for the Degree of Doctor of Philosophy by Lucía Morado Vázquez Department of Computer Science and Information Systems, University of Limerick Supervisor: Dr Chris Exton Co-Supervisor: Reinhard Schäler Submitted to the University of Limerick, August 2012 i Abstract In the area of localisation there is a constant pressure to automate processes in order to reduce the cost and time associated with the ever growing workload. One of the main approaches to achieve this objective is to reuse previously-localised data and metadata using standardised translation memory formats –such as the LISA Translation Memory eXchange (TMX) format or the OASIS XML Localisation Interchange File Format (XLIFF). This research aims to study the effectiveness and importance of the localisation metadata associated with the translation suggestions provided by Computer-Assisted Translation (CAT) tools. Firstly, we analysed the way in which localisation data and metadata can be represented in the current specification of XLIFF (1.2). Secondly, we designed a new format called the Localisation Memory Container (LMC) to organise previously-localised XLIFF files in a single container. Finally, we developed a prototype (XLIFF Phoenix) to leverage the data and metadata from the LMC into untranslated XLIFF files in order to improve the task of the translator by helping CAT tools, not only to produce more translation suggestions easily, but also to enrich those suggestions with relevant metadata. In order to test whether this “CAT-oriented” enriched metadata has any influence in the behaviour of the translator involved in the localisation process, we designed an experimental translation task with translators using a modified CAT tool (Swordfish II). A pilot study with translation students was carried out in December 2010 to test the validity of our methodology. The main study took place between December 2011 and January 2012 with the participation of 33 professional translators divided into three groups. The analysis of the gathered data indicated that groups which received the translation memory obtained on average significantly better results (less time and better quality scores) than the group which did not receive any translation memory. In terms of participants’ attitude towards the metadata received, most of the participants did not find it distracting, and the majority of them would prefer a translation memory which ii contained metadata; finally, half of participants could mention a case where it was helpful for them. In this thesis we present our research objectives, the methodology and procedures, an analysis of the results of the experiments and finally we extract reasoned conclusions based on evidence of the importance of metadata during the localisation process. iii Declaration I hereby declare that this thesis is entirely my own work, and that it has not been submitted as an exercise for a degree at any other university. The tool developed in the first stage of this research, XLIFF Phoenix, was presented in CNGL live demo presentations and in the LRC XV annual International Localisation in 2010, a demo is also publicly available on the internet 1 . A short summary of its development was also included in a journal paper (Aouad et al 2011) . The pilot study was presented to the International T3L Conference: Tradumàtica, Translation Technologies & Localization in 2011, a paper following that conference has been accepted for publication. A full list of the publications and conference presentations can be consulted in Appendix K. 1 http://www.youtube.com/watch?v=E6b36IHAMgM. iv Acknowledgements This thesis has been made possible thanks to the support and help of many people; I would like to express my gratitude to some of them: I would like to start by thanking Dr. Chris Exton, my supervisor, who has wisely advised me from the first meeting we had. I have learnt so much from his wisdom and kind personality. He always had the perfect words to support and guide me on the right path when I was lost. I would also want to thank his wife Geraldine for helping me with my written English. I would like to thank my other supervisors in the LRC: Dimitra Anastasiou, Reinhard Schäler and David Filip, who have also supported me in my work and advised me. I cannot remember the first time I heard the word localisation, but I am sure it was in one of the lessons taught by Dr. Jesús Torres. I cannot express my gratitude enough for all the time you spent teaching and advising me in the field of localisation. It is thanks to your support that during my research stays, in the University of Salamanca, I was able to carry out the pilot study. I would like to express my sincere gratitude to Rodolfo Raya, secretary of the XLIFF Technical Committee and main developer of MaxPrograms, who kindly modified one of his tools (Swordfish) to accommodate the needs of this research. As well as advising me with specific technical issues, he also provided me with the necessary licenses to carry out the experiment and gave three full licenses to raffle between the experiment’s participants. I would also like to thank the rest of the XLIFF TC members. I want to thank Karl Kelly, manager of the LRC, for all his help and support during these years; especially in the configuration of the server where the translation experiments took place and the continuous support during their execution. I also want to thank all of my LRC colleagues who walked with me all this way and helped me whenever I asked them to. I also want to thank the CNGL industrial partners who contributed to this research either with material or by helping me to find volunteers. I want to thank my parents for giving me the best inheritance a person can get: my education. I also want to thank my siblings and the rest of my family for their constant support during these years. v A big thank you should also go to my best friend, Laura, who has been there for me since day one; her moral support helped me to continue working during the worst moments and helped me to find a balance between my life and the PhD. I want to thank all of my other friends, who helped me see that there was a life after the PhD. I want to thank Marisé, María and Marta for making my Irish summers much sunnier. Finally, I would like to thank all the participants (students and professionals) who donated their time to this research. This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (www.cngl.ie) at the University of Limerick. vi Table of Contents Abstract ............................................................................................................................. ii Declaration ....................................................................................................................... iv Acknowledgements ........................................................................................................... v List of Tables.................................................................................................................... xi List of Figures ................................................................................................................. xii List of Appendices .......................................................................................................... xv List of Abbreviations...................................................................................................... xvi Chapter 1- Introduction ..................................................................................................... 1 1.1. Introduction ........................................................................................................ 1 1.2. Research Questions – Hypothesis ...................................................................... 7 1.3. Methodological Approach .................................................................................. 9 1.4. Motivation ........................................................................................................ 16 1.5. Thesis Layout ................................................................................................... 17 Chapter 2 – Literature review ......................................................................................... 18 2.1. Introduction ...................................................................................................... 18 2.2. Defining Localisation ....................................................................................... 18 2.3. Digital Content ................................................................................................. 23 2.4. Standards of localisation .................................................................................. 28 2.4.1. LISA and TMX ......................................................................................... 29 2.4.2. OASIS XLIFF ........................................................................................... 31 2.5. Previous research on CAT tools and translation memories ............................. 34 Chapter 3 – Methodology I ............................................................................................. 38 3.1. Introduction ...................................................................................................... 38 3.2. Design and Creation ........................................................................................