An empirical study on the influence of translation suggestions’ provenance metadata
A Thesis Submitted for the Degree of
Doctor of Philosophy
by
Lucía Morado Vázquez
Department of Computer Science and Information Systems,
University of Limerick
Supervisor: Dr Chris Exton
Co-Supervisor: Reinhard Schäler
Submitted to the University of Limerick, August 2012
i
Abstract
In the area of localisation there is a constant pressure to automate processes in order to reduce the cost and time associated with the ever growing workload. One of the main approaches to achieve this objective is to reuse previously-localised data and metadata using standardised translation memory formats –such as the LISA Translation Memory eXchange (TMX) format or the OASIS XML Localisation Interchange File Format (XLIFF).
This research aims to study the effectiveness and importance of the localisation metadata associated with the translation suggestions provided by Computer-Assisted Translation (CAT) tools. Firstly, we analysed the way in which localisation data and metadata can be represented in the current specification of XLIFF (1.2). Secondly, we designed a new format called the Localisation Memory Container (LMC) to organise previously-localised XLIFF files in a single container. Finally, we developed a prototype (XLIFF Phoenix) to leverage the data and metadata from the LMC into untranslated XLIFF files in order to improve the task of the translator by helping CAT tools, not only to produce more translation suggestions easily, but also to enrich those suggestions with relevant metadata.
In order to test whether this “CAT-oriented” enriched metadata has any influence in the behaviour of the translator involved in the localisation process, we designed an experimental translation task with translators using a modified CAT tool (Swordfish II). A pilot study with translation students was carried out in December 2010 to test the validity of our methodology. The main study took place between December 2011 and January 2012 with the participation of 33 professional translators divided into three groups.
The analysis of the gathered data indicated that groups which received the translation memory obtained on average significantly better results (less time and better quality scores) than the group which did not receive any translation memory. In terms of participants’ attitude towards the metadata received, most of the participants did not find it distracting, and the majority of them would prefer a translation memory which
ii contained metadata; finally, half of participants could mention a case where it was helpful for them.
In this thesis we present our research objectives, the methodology and procedures, an analysis of the results of the experiments and finally we extract reasoned conclusions based on evidence of the importance of metadata during the localisation process.
iii Declaration
I hereby declare that this thesis is entirely my own work, and that it has not been submitted as an exercise for a degree at any other university.
The tool developed in the first stage of this research, XLIFF Phoenix, was presented in CNGL live demo presentations and in the LRC XV annual International Localisation in 2010, a demo is also publicly available on the internet 1 . A short summary of its development was also included in a journal paper (Aouad et al 2011) . The pilot study was presented to the International T3L Conference: Tradumàtica, Translation Technologies & Localization in 2011, a paper following that conference has been accepted for publication. A full list of the publications and conference presentations can be consulted in Appendix K.
1 http://www.youtube.com/watch?v=E6b36IHAMgM. iv Acknowledgements
This thesis has been made possible thanks to the support and help of many people; I would like to express my gratitude to some of them:
I would like to start by thanking Dr. Chris Exton, my supervisor, who has wisely advised me from the first meeting we had. I have learnt so much from his wisdom and kind personality. He always had the perfect words to support and guide me on the right path when I was lost. I would also want to thank his wife Geraldine for helping me with my written English.
I would like to thank my other supervisors in the LRC: Dimitra Anastasiou, Reinhard Schäler and David Filip, who have also supported me in my work and advised me.
I cannot remember the first time I heard the word localisation, but I am sure it was in one of the lessons taught by Dr. Jesús Torres. I cannot express my gratitude enough for all the time you spent teaching and advising me in the field of localisation. It is thanks to your support that during my research stays, in the University of Salamanca, I was able to carry out the pilot study.
I would like to express my sincere gratitude to Rodolfo Raya, secretary of the XLIFF Technical Committee and main developer of MaxPrograms, who kindly modified one of his tools (Swordfish) to accommodate the needs of this research. As well as advising me with specific technical issues, he also provided me with the necessary licenses to carry out the experiment and gave three full licenses to raffle between the experiment’s participants. I would also like to thank the rest of the XLIFF TC members.
I want to thank Karl Kelly, manager of the LRC, for all his help and support during these years; especially in the configuration of the server where the translation experiments took place and the continuous support during their execution. I also want to thank all of my LRC colleagues who walked with me all this way and helped me whenever I asked them to. I also want to thank the CNGL industrial partners who contributed to this research either with material or by helping me to find volunteers.
I want to thank my parents for giving me the best inheritance a person can get: my education. I also want to thank my siblings and the rest of my family for their constant support during these years. v A big thank you should also go to my best friend, Laura, who has been there for me since day one; her moral support helped me to continue working during the worst moments and helped me to find a balance between my life and the PhD. I want to thank all of my other friends, who helped me see that there was a life after the PhD.
I want to thank Marisé, María and Marta for making my Irish summers much sunnier.
Finally, I would like to thank all the participants (students and professionals) who donated their time to this research.
This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (www.cngl.ie) at the University of Limerick.
vi Table of Contents
Abstract ...... ii
Declaration ...... iv
Acknowledgements ...... v
List of Tables...... xi
List of Figures ...... xii
List of Appendices ...... xv
List of Abbreviations...... xvi
Chapter 1- Introduction ...... 1
1.1. Introduction ...... 1
1.2. Research Questions – Hypothesis ...... 7
1.3. Methodological Approach ...... 9
1.4. Motivation ...... 16
1.5. Thesis Layout ...... 17
Chapter 2 – Literature review ...... 18
2.1. Introduction ...... 18
2.2. Defining Localisation ...... 18
2.3. Digital Content ...... 23
2.4. Standards of localisation ...... 28
2.4.1. LISA and TMX ...... 29
2.4.2. OASIS XLIFF ...... 31
2.5. Previous research on CAT tools and translation memories ...... 34
Chapter 3 – Methodology I ...... 38
3.1. Introduction ...... 38
3.2. Design and Creation ...... 38
vii 3.2.1. Localisation Memory Container ...... 39
3.2.2. Localisation metadata in XLIFF ...... 42
3.2.3. XLIFF PHOENIX ...... 47
Chapter 4 – Methodology II ...... 81
4.1. Introduction ...... 81
4.2. Distribution of groups ...... 82
4.3. Participants ...... 82
4.4. Experiment environment ...... 83
4.5. Data used ...... 85
4.5.1. Source text ...... 85
4.5.2. Translation Memory ...... 86
4.5.3. Provenance metadata ...... 87
4.6. Logistics ...... 91
4.7. Translation assignment instructions ...... 92
4.8. Data collection methods ...... 94
4.8.1. Design of the demographic questionnaire ...... 94
4.8.2. Design of the task specific questionnaire ...... 99
4.9. Validity ...... 104
Chapter 5 – Results ...... 109
5.1. Introduction ...... 109
5.2. Participants ...... 109
5.2.1. Group A – No TM ...... 109
5.2.2. Group B - TM ...... 110
5.2.3. Group C – TM + Provenance Metadata ...... 110
5.3. Demographic Questionnaire ...... 112
5.3.1. Group A ...... 112
5.3.2. Group B ...... 123
viii 5.3.3. Group C ...... 133
5.3.4. Overall ...... 145
5.4. Task Specific Questionnaire ...... 156
5.4.1. Group A ...... 156
5.4.2. Group B ...... 168
5.4.3. Group C ...... 179
5.4.4. Overall ...... 190
5.5. Task Specific Questionnaire. Group C, additional questions ...... 201
5.6. Review of the Translation ...... 212
5.6.1. Group A ...... 215
5.6.2. Group B ...... 216
5.6.3. Group C ...... 217
5.6.4. Overall ...... 218
5.7. Video observations ...... 221
5.7.1. Analysis of the Keylog information ...... 221
5.7.2. Analysis of the videos ...... 230
5.7.3. Group A ...... 230
5.7.4. Group B ...... 232
5.7.5. Group C ...... 233
5.7.6. Overall ...... 234
Chapter 6 –Analysis & Interpretation ...... 238
6.1. Summary and analysis of the demographic data ...... 238
6.1.1. Personal data ...... 239
6.1.2. Translation Experience...... 239
6.1.3. Experience with CAT tools ...... 240
6.1.4. Correlation between the different variables measured ...... 240
6.2. Quality ...... 245
ix 6.2.1. Correlation with other variables ...... 245
6.3. Time ...... 251
6.3.1. Correlation with other variables ...... 251
6.4. Participants attitudes and opinions towards the metadata ...... 258
Chapter 7 – Conclusions & Recommendations – ...... 260
7.1. Summary of results ...... 260
7.2. Conclusion and recommendations ...... 261
7.3. Impact of the research ...... 262
7.4. Limitations and Future Research ...... 264
Bibliography ...... 267
x List of Tables
Table 1. Zachman Framework applied to XLIFF 1.2 attributes...... 44 Table 2. Attributes allowed in the alt-trans element...... 45 Table 3. Group C. Participants' position...... 111 Table 4. Age. Group A ...... 112 Table 5. Experience Years. Group A ...... 114 Table 6. Hours per day. Group A ...... 114 Table 7. Other translation-related activities. Group A ...... 115 Table 8. Experience Years. Group B ...... 124 Table 9. Hours per day. Group B ...... 125 Table 10. Other translation-related activities. Group B ...... 126 Table 11. Experience years. Group C ...... 134 Table 12. Hours per day. Group C ...... 136 Table 13.Other activities. Group C ...... 137 Table 14. Age. All participants ...... 145 Table 15. Experience years. All participants ...... 146 Table 16. Hours per day. All participants ...... 147 Table 17. CAT tools usage. All participants ...... 151 Table 18. Doubts. Group A...... 167 Table 19. External resources. Group A...... 167 Table 20. Doubts. Group B...... 177 Table 21. External resources. Group B...... 178 Table 22. Linguistic difficulty. Group C...... 186 Table 23. Doubts. Group C...... 188 Table 24. External resources. Group C...... 189 Table 25. Doubts. Overall ...... 199 Table 26. External resources. Overall ...... 200 Table 27. Correlation between the demographic variables ...... 241
xi List of Figures
Figure 1. XLIFF Phoenix. General Filter ...... 52 Figure 2. XLIFF Phoenix. Advanced Filter ...... 53 Figure 3 Enriched XLIFF file in Swordfish II ...... 62 Figure 4. XLIFF Phoenix Architectural Diagram ...... 71 Figure 5. XLIFF Phoenix. Main screen...... 72 Figure 6. XLIFF Phoenix. Loaded XLIFF source file ...... 73 Figure 7. XLIFF Phoenix. Loaded LMC file ...... 74 Figure 8. XLIFF Phoenix. Filtering option in the Wizard ...... 75 Figure 9. XLIFF Phoenix. General filtering options ...... 75 Figure 10. XLIFF Phoenix. Advanced filtering options ...... 76 Figure 11. Enriched file...... 77 Figure 12. XLIFF Phoenix. Saving warning message ...... 78 Figure 13 Enriched XLIFF file in Swordfish ...... 80 Figure 14. Server Environment ...... 84 Figure 15 Main language combinations. Group A ...... 116 Figure 16. Other language combinations. Group A...... 117 Figure 17. CAT tools usage. Group A ...... 118 Figure 18. TM Usage. Group A ...... 119 Figure 19. Swordfish. Group A ...... 120 Figure 20. XLIFF Group A ...... 121 Figure 21. Main language combinations. Group B ...... 126 Figure 22. Other language combinations. Group B ...... 127 Figure 23. CAT tools usage. Group B ...... 128 Figure 24. TM Usage. Group B ...... 129 Figure 25. Swordfish. Group B ...... 131 Figure 26. XLIFF. Group B ...... 132 Figure 27. Main language combinations. Group C ...... 138 Figure 28. Other language combinations. Group C ...... 138 Figure 29. Total number of language combinations. Group C...... 139 Figure 30. CAT tools usage. Group C ...... 140 Figure 31. TM usage. Group C...... 142 Figure 32. Swordfish. Group C ...... 143 xii Figure 33. XLIFF. Group C ...... 144 Figure 34. Main language combination. All participants ...... 149 Figure 35. Other language combinations. All participants ...... 149 Figure 36. TM Usage. All participants...... 153 Figure 37. Swordfish. All participants...... 154 Figure 38. Topic of the text. Group A...... 157 Figure 39. Experience with Microsoft products. Group A...... 158 Figure 40. Experience with Excel. Group A...... 160 Figure 41. Experience with Word. Group A...... 161 Figure 42. Experience with PowerPoint. Group A...... 162 Figure 43. Experience with Access. Group A...... 163 Figure 44. Linguistic difficulty. Group A...... 165 Figure 45. Topic of the text. Group B...... 168 Figure 46. Experience with Microsoft products. Group B...... 169 Figure 47. Experience with Excel. Group B...... 170 Figure 48. Experience with Word. Group B...... 171 Figure 49. Experience with PowerPoint. Group B...... 172 Figure 50. Experience with Access. Group B...... 173 Figure 51. Experience with Outlook. Group B...... 174 Figure 52. Linguistic difficulty. Group B...... 175 Figure 53. Topic of the text. Group C...... 179 Figure 54. Experience with Microsoft products. Group C...... 180 Figure 55. Experience with Excel. Group C...... 181 Figure 56. Experience with Word. Group C...... 182 Figure 57. Experience with PowerPoint. Group C...... 183 Figure 58. Experience with Access...... 184 Figure 59. Experience with Outlook. Group C...... 185 Figure 60. Experience with Microsoft products. Overall...... 191 Figure 61. Experience with Excel. Overall ...... 192 Figure 62. Experience with Word. Overall...... 193 Figure 63. Experience with PowerPoint. Overall...... 194 Figure 64. Experience with Outlook. Overall...... 196 Figure 65. Linguistic difficulty. Overall...... 197 Figure 66. Screenshot of one of participants’ LISA QA Model results ...... 214 xiii Figure 67. LISA QA Model results. Group A...... 215 Figure 68. LISA QA Model results. Group B...... 216 Figure 69. LISA QA Model results. Group C...... 217 Figure 70. LISA QA Model results. Arithmetic mean all groups...... 219 Figure 71. LISA QA Model results. Overall ...... 220 Figure 72. Time all groups ...... 236
xiv List of Appendices
Appendix A – Demographic Questionnaire ...... 274
Appendix B –Task Questionnaire. Groups A and B ...... 283
Appendix C –Task Questionnaire. Group C ...... 289
Appendix D – Answers to the Demographic Questionnaire ...... 297
Appendix E – Answers to the Task Questionnaire. Groups A and B, and first part of Group C...... 309
Appendix F – Answers to the Task Questionnaire. Second part of Group C ...... 320
Appendix G – Translation Evaluation. Results ...... 331
Appendix H – Keylog Information ...... 376
Appendix I – Video Observations ...... 378
Appendix J – Translation Text ...... 380
Appendix K – Publications, conference presentations and other research related activities 408
Appendix L – Call for participation ...... 412
Appendix M – Participant Information Sheet ...... 413
xv List of Abbreviations
AR: Arabic
CA: Catalan
CNGL: Centre for Next Generation Localisation
CSIS: Computer Science and Information Systems
DCM: Digital Content Management
DTP: Desktop Publishing
DU: Dutch
EN: English
ES: Spanish
ETSI: European Telecommunications Standards Institute
FR: French
GALA: Globalization and Localization Association
GL: Galician
IT: Italian
LISA: Localization Industry Standards Association
LMC: Localisation Memory Container
LPS: Language Service Provider
LQA: Language Quality Assurance
LRC: Localisation Research Centre
OASIS: Organization for the Advancement of Structured Information Standards
xvi OAXAL: Open Architecture for XML Authoring and Localization Reference Model
OSCAR: Open Standards for Container/Content Allowing Re-use
PMD: Provenance Metadata
QA: Quality Assurance
SF: Systems Framework
TM: Translation Memory
TC: Technical Committee
Vkeys: Virtual Keys
XLIFF: XLIFF Localisation Interchange File Format
XML: Extensible Markup Language
XSL: EXtensible Stylesheet Language
xvii
Eu non se esvarío. Vexo o mundo darredor de min e adoezo por entendelo. Vexo sombras e luces, nubeiros que viaxan, lume, árbores. Qué é todo esto?
(Neira Vilas 1961 p.28)
xviii Chapter 1- Introduction
1.1. Introduction
Localisation is the “linguistic and cultural adaptation of digital content” (Schäler 2009 p.157). The term2 is normally used to define the technical processes that are needed to transform digital content into another language and culture, as well as the powerful industry- involving software companies, Language Service Providers (LSP), translators, localisation engineers, etc. - which are behind those processes.
The localisation industry
If we exclude the open source movements from the bigger picture, localisation is an industry driven area, mainly lead by the big software companies, which have based their expansion into foreign markets on this field: they need to localise their products in order to sell them in other areas of the globe where languages other than English are spoken. Ireland has been for years the leading country in this field, this sector is estimated to be worth in this country 680 million Euros annually, employing around 14,000-16,000 professionals (CNGL 2012a p.6). Many of the largest software companies are based here (ibid), as well as the largest localisation companies and the biggest research centre, the Centre for Next Generation Localisation (CNGL). This country was also the birthplace of the XML Localisation Interchange File Format (XLIFF TC 2008a p.9), one of the key data exchange standards in this area.
The CNGL is an academia-industry synergy whose objective is “to produce substantial advances in the basic and applied research underpinning the design, implementation and evaluation of the blueprints for the Next Generation Localisation Factory” (CNGL 2012b). The centre is composed of more than 100 researchers based in four Irish universities (University of Limerick, Dublin City University, University College Dublin and Trinity College Dublin) and also by 10 Software and Localisation industry partners. The research within the CNGL is divided in four big tracks: Integrated Language Technologies (ILT), Digital Content Management (DCM), Localisation (LOC) and Systems Framework (SF). The present research is embedded in the LOC research group based at the Localisation Research Centre (LRC), in the Computer Science Department
2 A discussion on the different definitions of localisation through the time can be found in the section “Literature Review”. 1 and Information Systems (CSIS) at the University of Limerick. In this research the help of the industrial partners was of a key importance: they provided us with real data to be used in our experiments as well as giving us assistance to find volunteers for those experiments.
The localisation process
If we study localisation as a process, we can find several phases or steps that are needed to create a successful new localised product. These phases involve different players, different tools and different levels of technical experience. A good example of the steps that a localisation project can have can be found in Esselink (2000 p.17): “Pre-Sales Phase3, Kick-Off Meeting, Analysis of Source Material, Scheduling and Budgeting, Terminology Setup, Preparation of Source Material, Translation of Software, Translation off Online Help and Documentation, Engineering and Testing of Software, Screen Captures, Help Engineering and DTP of Documentation, Processing Updates, Product QA and Delivery and Project Closure”. As you could observe by the different steps described above, this field has a multidisciplinary nature: it combines the area of linguistics (necessary to transfer the knowledge from one language and culture to the other) and computer science (as it deals with digital content which needs to be manipulated and transformed into other systems). It also has dependencies on other areas such as marketing, desktop publishing or workflow management.
The translation task and CAT tools
After acknowledging the vast area of study that localisation represents and the background of the researcher, we decided to focus specifically on the translation task. The translation task in the localisation process involves professional translators4, and Translation Memory (TM) tools are the standard tool option (García 2008 p.50). Computer Assisted Translation (CAT) tools are computer-based programs that were designed to assist translators in their daily task by automating some of the processes. They can have a wide range of functionalities: terminology management, translation memory systems, spellcheckers, quality assurance tests, etc. The purpose behind these
3 Esselink defines the steps needed to localise a software product. Localisation, as we understand it, does not only involve this kind of products and different steps would be needed for other type of products, i.e. a website. 4 Although, we can see crowdsourcing initiatives like the localisation of the Facebook platform that involved bilingual users. 2 tools is to help translators to automate their work, improve their productivity, reduce costs and assure a more consistent terminology within their output products. In this research we will focus only in CAT tools that have the Translation Memory functionality, and we indistinctively refer to them as CAT tools or TM tools during this dissertation. Although it can also represent an interesting area of research, studying other functionalities or aspects of the CAT/TM tools is out of the scope of this research.
This is an industry that, whatever name it uses, is based on selling lots of translated words, with quality often taken for granted, time-to-market an important constraint, and price paramount. (García 2006 p.15)
The objective of the CNGL project is to improve the localisation process in the three traditional commercial axes: better quality (which García states as normally being taken for granted), less time (the above mentioned “time-to-market”) and with a reduction of cost (the aforementioned “price paramount”); and with an increase in automation, which is perceived by many to be the only means to deal with the future demands of an increasing use of the web by a varied language user base.
Translation Memory tools and formats
The automation and leveraging of previous translations have been identified as one of the key solutions (TAUS Data Association 2010 p.1) in order to address the above mentioned challenges of this area. Translation Memory tools are CAT tools whose main functionality is the management of translation memories. Translation Memories (TM) are databases composed by previously localised translations, which are segmented and stored for future reuse. Many agree that translation memories have proven to help translators and increase their productivity and the consistency of their texts (Brkić et al 2009). However, TMS are not the panacea for all localisation processes; it has been suggested that certain text types are more inclined to work better with TM systems than others (Christensen 2003; Bowker 2005; Christensen and Schjoldager 2010), and other studies also seem to establish a relation between the translation of technical texts and a higher use of TM systems by translators (Fulford and Granell-Zafra 2005; Lagoudaki 2006a). This relation could be explained by the nature of the technical texts themselves, which use a limited range of terms and normally contain lexical and phraseological repetitions, all of which make this the most suitable text type for TM usage (Bowker
3 2005; Christensen and Schjoldager 2010). Critical voices with the use of TM have also pointed out additional constraints that this kind of tools impose to the translator: possible perpetuation of errors and confusion (de Saint Robert 2008 p.114) and the necessity of spending additional time training translators on the usage of the tool to obtain improvements in terms of productivity and consistency (de Saint Robert 2008 p.115; García 2008 p.55) .
Translation Memory and Data Exchange Standards: TMX and XLIFF
The format in which the translation memories are stored clearly defines their future reuse. In order to avoid being locked-in in proprietary solutions, a standardised TM format was developed by the Open Standards for Container/Content Allowing Re-use (OSCAR) group of the extinct Localisation Industry Standards Association (LISA), the Translation Memory eXchange (TMX). This format is a XML language which allows the storage of translation units for its later leverage in future projects. TMX has proved to be very successful and is supported by the main CAT tool in the industry as well as in the Open Source world (Gough 2010), although, with the dissolution of LISA in February 2011 its future is still uncertain. A complete review on the TMX standard and its support on CAT tools can be found in the section Standards of Localisation in Chapter 2.
Another standardised data exchange format in the localisation area, which was not originally designed to serve as TM, but that can be used for that purpose, is the XML Localisation Interchange File Format (XLIFF). In a nutshell XLIFF is a data container that carries localisation content from one localisation process to the other without loss or corruption of data. The first version of this standard, developed by the Organization of the Advancement of Structured Information Standards (OASIS), was published in 2001. The current version is 1.2 and it is moving towards 2.0. XLIFF was created as a standardised intermediate format for localisation where localisable data extracted from different formats can be stored; this would allow CAT tools developers to work with a unique format instead of having to develop a new filter/parser for every new format and therefore they can concentrate their efforts on improving other functionalities. The main
4 feature of XLIFF is that it enables potential interoperability5 between different tools, thus eliminating vendor lock-in. An XLIFF file is basically composed by two differentiated sections: the
where extracted localisable information from another file is segmented and stored into several translation units (Research on Translation Memories
The influence of the TMs in the translator’s behaviour has been the subject of some recent research efforts (Christensen and Schjoldager 2010). With the exception of Teixeira (2011), little has been studied about the metadata that surrounds the translation suggestions presented to the translators through the TM tools; these metadata can represent provenance information (about the author, date, state, or any other identifying information of the previous translation) or other additional information regarding the translation suggestion like the match percentage information. To narrow the scope of our research we are only concentrating our research on the provenance metadata. Although provenance metadata was not the central object of study of previous research, we did find some references to this kind of information in some of them: in Guerberof
5 In the localisation context, a file is “interoperable” if it can be transported and/or modified between different tools without loss or corruption or data, the XLIFF standard aims to provide a format that would allow to create files with that feature. 5 (2009) an experiment comparing translation suggestion coming from two different sources (translation memories and machine translation) is presented and the provenance metadata that could identify the origin of each of the suggestions is hidden on purpose “because translators would ignore the nature of the source text, be it MT or TM, and thus they would not be biased towards either type of text during the post-editing process” (ibid); this statement clearly implies that provenance information could have an impact on the translator’s behaviour. On another study by de Saint Robert (2008 pp.113–114), the author when talking about CAT tool constraints points out (as a negative factor) that commercial tools do not always guarantee “document traceability” of the translation suggestions they provide. We can again interpret that not having information about “document traceability”, or as we call it “provenance metadata” has an impact (in this case negative) on the translator’s behaviour. Later in her paper, the author states:
“Additional tools are document alignment tools by language pairs. Indexing of large text corpora for retrieval of precedents are felt preferable to tools that provide text segments, be they paragraphs, sentences or sub-units with their respective translation, but6 without any indication of date, source, context, originator, name of translator and reviser to asses adequacy and reliability in an environment where many translators are involved.” (de Saint Robert 2008 p.118)
Focusing on the last part of that paragraph we can imply that provenance information (“indication of date, source, context, originator, name of translator and reviser”) helps translators to assess the adequacy and reliability of a given translation suggestion. Having indentified this gap in the literature review of our field of research – there have not been to our knowledge any research efforts that have focused on the study of the provenance metadata that surrounds translation suggestions – we decided to focus on studying the influence of provenance metadata of translation suggestions during the translation process in human translators.
Provenance metadata
Metadata is generally defined as data about data, in the area of computer science, metadata normally is used to indicate data that describes or carries information about other data (for example, creation date). Provenance metadata is data which contains
6 Emphasis added by the researcher. 6 information about the origin of other data. In our research, our main object of study is the provenance metatada that surrounds translation suggestions, that is, the information about the origin of translation suggestions for example, who translated it or when was it translated. Provenance metadata is not normally exposed to the translator during their work, and data exchange standards do not still offer the possibility of adding most of the metadata items (which we identified as containing provenance information) to the elements that contain the translation suggestions in a standardised form that could be understood by different CAT tools. More information about our study of provenance metadata can be found in Chapter 3.
1.2. Research Questions – Hypothesis
In the very early stage of our research, our initial question was “how can we store, organise and reuse localisation knowledge?”, in order to answer this question: how the localisation knowledge was being stored and transmitted through different process and time, we decided to focus on the existing localisation data exchange standards. These standards have been specifically developed to allow the reuse of previously localised content, and by doing so, they help to increase the productivity of translators. After studying the different standards, we decided to focus only on XLIFF as a vehicle to investigate how data and metadata could be stored and transmitted to future assignments. As is often the case, our research question became more refined as we understood more about the problem domain.
We identified a gap in the reuse of the provenance metadata – most of the provenance information we identified in our analysis of the standard could not be reinserted in the alt-trans element, which is the element which can contain translation suggestions. Therefore we decided to demonstrate that that metadata could be inserted automatically and presented to translators during their work, and we designed and developed a tool prototype (XLIFF Phoenix) for that purpose. After having demonstrating that provenance metadata could be stored, organised and presented to the translators, we decided to redefine and focus our research objectives and study the influence that provenance metadata can have in the translator’s behaviour during their work. A complete description of all the steps we carried out to answer our initial research question could be found in Chapter 3.
7 As previously stated, after an initial broader question, the principal aim of this research became to determine the influence that translation suggestions’ provenance metadata has in the behaviour of human translators during their work when using Computer Assisted Translation Tools. The main question that we want to answer in this research is the following:
How does the provenance that surrounds translation memory suggestion influence the behaviour of translators during their work in a localisation process?
Three hypotheses were considered for this main question.
H0- Provenance metadata has no effect on the behaviour of translators during their work.
H1- Provenance metadata has a positive impact on the behaviour of translators during their work.
H2- Provenance metadata has a negative impact on the behaviour of translators during their work.
Based on the above quoted from de Saint Robert (2008 p.118) we tended to believe that PMD can have a positive impact on the translator’s behaviour (H1). By positive impact we mean: a reduction of time spent on the translation, an improvement in the quality of the translation and by a combination of these two a cost reduction on the task, in comparison with the same situation but with the absence of the provenance metadata. Through the experimental phase of our research, explained in detail in Chapter 4, the impact on the translator’s behaviour was measured using a triangular data collection method, where we combined: questionnaires, recording of the screen, keystroke information and the translated file itself. By negative impact we mean: negative attitudes towards the metadata provided by the translators (which could be obtained through some answers to the questionnaires), an increase in the time spent on the translation, or worse quality in the translation task. Each of these indicators were analysed independently and are presented in chapters 5 and 6.
8 1.3. Methodological Approach
A method triangulation strategy was used in our study: we combine the "design and creation strategy" (Oates 2006 p.108) where first, we studied how localisation knowledge was being stored, organised and reused, which led us to the study of the current data exchange localisation formats. We found a gap in the treatment that provenance metadata was receiving: in some cases it could not be encapsulated along with the translation suggestion without breaking the validity of the file. Therefore, we decided to design and develop a tool prototype (XLIFF Phoenix) that automated the leveraging of localisation data and metadata to new files to be presented to translators within a CAT tool. Once we had obtained an automated way to retrieve localisation provenance metadata and insert it along with the translation suggestions in new files, we decided to test whether these new enriched files (with provenance metadata) had an influence on the translator’s behaviour (and subsequently on his or her output work) during the translation task in the localisation process. We used an “experimental strategy" (Oates 2006 p.126) to answer this second and main question.
Object of study: Tools vs. Standards
Is metadata defined in the standards that will be later implemented in CAT tools? Or, on the contrary: Are CAT tool providers influencing the development of standards to introduce the metadata items that they use or want to use in their tools? Taking this dichotomy into account we identified two possible approaches to study the provenance MD in translation suggestions:
1. Study how current CAT tools show provenance metadata to translators. That is, studying what type of data is presented and how it is presented in the GUI to their potential users (project managers, translators, reviewers, etc.). 2. Study the current standards for exchanging translation and localisation data and observe which provenance metadata items are defined in their specifications, which items are missing, and if that is the case, how they can be incorporated.
The main issue with applying the first approach is the rapid and constant development of CAT tools, which would make the object of study of this research obsolete by the time it were published which represents one of the biggest impediments in the research on TM tools (Pym 2011 p.5). Besides, most of these CAT tools are being developed by 9 private companies, which implies that we do not have direct access to their design and development strategies, and we did not want to have our research relying on something that we do not have control over. Taking into account the estimated duration of our PhD research and the fact that we are not members of any CAT tool development group, we decided to discard this option.
The second approach –concentrating our efforts at the standard level- was the elected one. First of all, because the two main standards considered from the beginning were and are developed in an open way, as they are open standards by definition, this means that we have access to the technical committee documentation and other related official material (like the development Wiki, meeting minutes, etc.). As well as that, I have become a Technical Committee member of the XLIFF standard since the beginning of this research (March 2009), which has provided me with a better insight into of its development, and the possibility of having a direct influence on it. Having said that, CAT tool developers participate actively in the development of translation data Exchange standards – 23% of XLIFF TC members in April 2012 were CAT tool developers (Filip 2012 p.33) – therefore we can also assume that their requirements in terms of metadata inclusion in the standard are also regularly presented to the standard Technical Committee for their approval. On the other hand, standards do also influence the development of CAT tools, as every time a new version of a standard is released a CAT tool might need to be modified if the developer wants to include support for the new standard version. Lastly, another reason for choosing this approach is the much slower rhythm of development that open standards have. Which not only would keep this research valid for more time, but the results of the research itself could also influence directly in the development of the standard.
DESIGN AND CREATION
The XML Localisation Interchange File Format (XLIFF) has been developed to allow the interoperability between CAT tools and the seamless transmission of data and metadata during the localisation process (XLIFF TC 2008a). It is a bilingual document that may contain parallel data inside the source and target elements of the translation or binary units. These parallel texts can be easily transformed into TMX using an XSL template (Raya 2004) and reused in future projects. However, much of the existing data and metadata will be lost during that process, as the TMX format was not designed to
10 capture as much data as an XLIFF document can contain. With our research we use the whole XLIFF document as a memory element. We have developed a tool prototype – XLIFF Phoenix – which allows the leverage of existing translation into unlocalised XLIFF files.
The Localisation Memory Container (LMC) is an XML vocabulary that was developed as a data descriptor to allow the storage of previously localised XLIFF documents within a single file. XLIFF Phoenix was developed to obtain information from the LMC and enrich unlocalised XLIFF documents with translation suggestions.
XLIFF Phoenix compares the XLIFF documents contained in the LMC with an unlocalised XLIFF file introduced into the system, and leverages the coincident translation units along with their correspondent metadata (origin, source and target language, etc.); both the data and the metadata are introduced in the alt-trans element inside its primary trans-unit element. Then, the tool exports a valid XLIFF file that can be read by most of the CAT tools available in the market7 and subsequently can be used by translators and or localisers, who would benefit from the introduced data.
EXPERIMENTS
Few experimental studies have been carried out in the research field of translation memories over the last years (Bowker 2005; Christensen and Schjoldager 2010). There are various reasons for this shortage such as the difficulty in obtaining valid participants and useful research objects and supporting data (valid documents and TMs), the different rates of development between academia and industry (Pym 2011 p.5) and the difficulty in controlling all the different variables that can lead to an externally valid experiment. Experiment variables, as far as CAT procedures are concerned, include profiles of participants, workflow, time pressure, text types, translation instructions, TM/MT programs and language combinations chosen (ibid, p.7) (as we encountered in the design of our approach). However, none of these constraints discouraged us from continuing with the experimental approach, and we tried to overcome them with all the means and resources we had available. To test the validity of our methodology we carried out a pilot study in December 2010 with translation students. One year later, in
7 The result file is a valid XLIFF 1.2 file, therefore those tools that support this version of the standard would be able to work with it. 11 December 2011 and January 2012 the main experiment took place with the participation of professional translators.
Pilot Study
The experiment was carried out in a computer room in the Faculty of Translation and Documentation of the University of Salamanca, Spain. Although all participants worked simultaneously in the same room, they were not allowed to communicate among themselves. The participants of this experiment were ten translation students (nine females and one male) in their final year of their Translation and Interpreting undergraduate degree course.
The text given to the translators was donated by Microsoft, one of the CNGL's industrial partners; the text was part of the official Ms Excel help documentation. We converted the original html file into XLIFF and we leveraged past translation into it with our tool (XLIFF Phoenix) as well as their correspondent metadata. We created three versions of the document including different levels of data and metadata in different points. We split the participants into three groups: A, B and C and each group received a different version of the master enriched file.
Prior to the translation task, general instructions were given to the participants for their assignment. They were asked to complete the translation from English into Spanish (es- ES) of one file using Swordfish II plus any other useful resource they might access via a browser and internet connection. They were told to behave as they would do in a real translation assignment scenario, with the only restriction being the forbidding of communication between them.
In order to obtain the maximum amount of data from the experiment and subsequently increase the validity of its results we decided to use a triangular data collection mechanism: questionnaires, recording of the screen and the output translated document. Two questionnaires were completed by the participants: the first one aimed to obtain background information of the participant and was completed before the translation task; the second questionnaire was focused on getting the translator's impression on the fulfilled task and was filled after the completion of the task. We obtained different types of data from the various data collection methods that needed to be analysed in a different way: the quality of the output translation task was analysed using the LISA
12 QA Model which is based on an error base system, which deducts points to an initial 100% rate depending on the type of error found and its severity; the video recordings were observed using a video player program and information about the translator behaviour and the time spent on each of the segments was extracted and annotated; and finally, the data obtained from the questionnaire was analysed depending on the nature of each question: quantitative or qualitative.
Although our focus was not on studying the results of the experiments, but on studying the validity of the methods used, we did extract some findings from this preliminary study. It was deduced by the participants’ answers to the retrospective questionnaire that some metadata elements were more taken into consideration than others (contact-name, target-language and date are the most consulted metadata items). And there was an observed increase in the quality (better LISA QA Model rates) and productivity (less time spent) in the sections that contained metadata information. More detailed information about the design of the pilot study can be found in Chapter 4.
Main Study
The main study was carried out one year later than the pilot study. In this new iteration our target participants were professional translators. Translators traditionally work in two modes in the localisation industry: as in-house translators, working in the offices of a translation company or within the translation department of a bigger company (such as a software company); or as freelance translators, working from their own homes receiving projects from different companies (commonly translation companies). Firstly, we explored the possibility of conducting our main experiment in a translation company with in-house translators, this option was soon discarded for two main reasons: first because after consulting our CNGL partners we acknowledged that none8 of them had a sufficient number of translators with the language combination (EN-ES) in a single place and secondly because doing the experiments in an external translation company would be economically unfeasible with our resources. Then, we decided to target freelance translators, who work for translation companies from their own homes and usually using their own CAT tools in their own computers (although there are already some cloud based translation platforms which are accessible via a web browser). Participants were recruited through the CNGL partners and through an open call for
8 Or at least the ones that answered to our request. 13 participation that was sent through several translation and localisation distribution mailing lists.
A free webinar on the CAT tool Swordfish was offered to all the participants. This free webinar attracted some of the participants and also allowed us to inform the participants about the basic functionality of the tool and the process of the experiment itself. Participants did not receive monetary compensation for their time, but three full licenses for Swordfish III were raffled between the participants who completed the experiment.
A controlled environment was designed to carry out the translation task. A server was setup in the Localisation Research Centre and 100 user accounts were created. The server was remotely accessible. The Operating System was Windows 7 and it had Spanish as main input language. After carrying out our own internal testing, the system was technically tested with seven participants in November 2012 with positive results. The server could only be accessed by one person at a time; therefore we used the doodle9 platform to allow participants to choose the timeslot that better suited their needs. An average of four participants completed the experiments per day. A total of 59 valid experiments were successfully recorded.
We used the same data method gathering process as in the pilot study (questionnaires, video recording and output translation file). We decided to use a different recording screen program on this occasion that allowed us the introduction of a new data gathering process: the keystroke movements. The keystroke data obtained through an XML file produced by the recording program allowed us to obtain automatically the typing efforts of each of the participants.
A different distribution of participants and data was put in place for this experiment. Instead of having three documents that contained each of the three sections (one without TM, another with TM, and another with TM plus provenance MD) which were in three different positions depending on each of the groups; we decided to simplify our choices and subsequently its later analysis: Group A (received a text without TM), Group B (received the same text with TM) and Group C (received the same text with TM and its correspondent provenance MD). A different text created from even segments (in terms of difficulty and length) from the “Microsoft Excel help documentation” was created.
9 http://www.doodle.com/ 14 You can find more detailed information about the design of the experiment of the main experiment and how the text was composed in section 2 of Chapter 4.
Although our target participant was a professional translator, we did not state that in the call for participation, and we left the experiment open for translation students as well. In Group A (the group that had the text without TM) that represented our control group, we had ten participants, seven of which were professional translators. In Group B (the group that had the text with TM), we had also ten participants, with seven professional translators among them. Finally, in Group C (the group that had the text with TM with provenance MD), we had 39 participants, with 25 professional translators among them. The bigger number of participants in Group C was due to our desire of obtaining a significant amount of qualitative data on the use of provenance metadata (through the retrospective questionnaire), and Group C was the only group exposed to that information.
One of the main threats to the validity of our experiments was the lack of representativeness of the participants and the possible subject variability between the groups that could distort the results. We measured different personal variables that could affect our results: personal data (age and gender), translation experience (years of experience, current position, translation working hours per day), and experience with CAT tools. After doing a descriptive analysis of our dataset (which can be consulted in Chapter 5) we determined that there were not significant differences between the average measurements of the three groups. Therefore, changes observed in between the groups in the two factors measured in the experiments (time and quality) could not be directly associated to differences in the profile of the participants.
As stated before, we measured the time and the quality that each of the participants had in their translation task. In terms of quality, groups B and C obtained significantly better quality scores than Group A, which indicates that the use of a translation memory (with or without metadata) implies an improvement in the quality of the translation. Group B (the one without metadata) obtained slightly better results than Group C, however, the difference was not significant enough to infer any cause-effect relationship. In terms of time we obtained similar results: Group A spent more time than Groups B and C, which indicates also that the use of a translation memory (with or without metadata) implies a reduction of the time. Group B spent less time than Group C, however, again the
15 difference was not enough significant to extract any conclusion. A complete analysis and interpretation of the data obtained in the experiments can be found in Chapter 6.
We also studied the attitude that translators from Group C had towards the metadata they received, the majority of them would prefer to receive translation memories with provenance metadata. However, not all the participants declared that the metadata helped them during their work. Two key concepts aroused from their answers: trust and reliability. They stated that the quantity of the metadata was not important, but what was of a key importance for them was the meaning that those metadata items had for them. A complete analysis and interpretation of the participant’s attitude can be found in Chapter 6.
To sum up, with our experiments, we did not find any significant difference in the behaviour of the translators in terms of quality and time due to absence or not of the metadata. However, we found significant evidence that translation memory (with or without metadata) can improve translators’ output (better quality and in less time). We did not find negative attitudes in the participants that would imply a negative impact in their behaviour, although not all the participants stated that the metadata actually helped them.
1.4. Motivation
There have been some efforts in the area of TMs and how they affect the behaviour of the translator during his or her work (Christensen and Schjoldager 2010). These studies analyse the effect of the translation memories and only take into account the translation suggestion that is given to the translator. With the exception of the initial work of Teixeira (2011) discussed in the literature review section, provenance metadata that surrounds translation suggestions was never the central focus in any of those studies.
The present research aims to fill that gap and set a solid base for future research in this field applying different research strategies and methods, such as eye tracking.
16 1.5. Thesis Layout
The structure of this dissertation is as follows:
Chapter 2 – Literature Review. This chapter contains an historical literature review of the localisation field with a specific stress on localisation standards and previous research on translation memory. Chapter 3 – Methodology I. In this chapter the general triangular methodology strategy is presented. Then the first part of this methodology “Design and Creation” is explained in detail: the development of the LMC language and the XLIFF Phoenix Prototype tool. Chapter 4 – Methodology II. This second methodology chapter explains in detail the design and implementation of the pilot study and the main experiment: recruitment of the participants, data preparation, issues encountered, setup of the server, tools used and data collection methods). Chapter 5 – Findings. A descriptive analysis data obtained in the experiments is presented in this chapter. Chapter 6 - Analysis & Interpretation. An analysis of the data is presented in this chapter. An interpretation on the results of the analysis is provided along with the analysis. Chapter 7 – Conclusions and Recommendations. A summary of the research results can be found in this section as well as recommendations for future developments on this research.
17 Chapter 2 – Literature review
Writing about software localization is like fighting against time. (Esselink 2000 p.preface)
2.1. Introduction
Localisation10 and its industry is a relatively new area that goes back to the 1980s, since then has grown dramatically thanks also to the development of the internet and the so- called globalisation of the world’s economy (Esselink 2000 pp.5–6). It is an industry- oriented discipline, and in terms of academia there have been little research efforts, especially when compared with other related areas, in fact only one specific research centre for localisation (the Localisation Research Centre at the University of Limerick) was present until very recently. The first academic journal specific to the topic was also edited by the LRC. In 2008 the CNGL was born in Ireland and four Irish universities (University of Limerick, Trinity College Dublin, University College Dublin and Dublin City University) are leading the research in the field with more than 300 specialised publications since its establishment (CNGL 2012c).
This chapter presents a review and discussion of the different definitions of localisation through the recent history of the field. Then a review of the main standards of localisation is presented along with an overview of the existing research in CAT tools and translation memories.
2.2. Defining Localisation
Localisation is a new term and concept, as Esselink (2000 p.1) explains “[t]he term ‘localization’ is derived from the word ‘locale’, which traditionally means a small area or vicinity. Today, locale is mostly used in a technical context, where it represents a specific combination of language, region, and character encoding.”
Many authors and institutions have tried to define the concept of “localisation”, we present and discuss in the following paragraphs some of their most representative contributions:
10 We followed in this dissertation British English spelling conventions. However, in the quotes by other authors we respect their writing preferences. 18 The Localization Industry Standards Association (LISA)11 has defined localisation as follows:
Localization is the process of modifying products or services to account for differences in distinct markets. (Lommel 2006 p.11)
This definition is too generic for the purposes of this research and does not specify the type of products that are “localised”. A more extended definition can be found in the same organisation webpage:
Localization refers to the actual adaptation of the product for a specific market. It includes translation, adaptation of graphics, adoption of local currencies, use of proper forms for dates, addresses, and phone numbers, and many other details, including physical structures of products in some cases. If these details were not anticipated in the internationalization phase, they must be fixed during localization, adding time and expense to the project. In extreme cases, products that were not internationalized may not even be localizable. (LISA 2009)
Again, the type of product is not clearly defined in this definition. This might be done on purpose to cover as many products as possible. However, for the purposes of this research we need to narrow our field.
After examining LISA’s definition, which is by far the most accepted and quoted one in our area, and as it does not fit with the research objectives we will make a historical review of the different attempts to define localisation, we will discuss them and finally we will come up with our own definition.
In the year 2000 Robert C. Sprung edited the book Translating into Success, and despite its title, its main topic is the localisation process. In the introduction of the book Sprung defines the term localisation as follows:
[L]ocalization—taking a product (ideally, one that has been internationalized well) and tailoring it to an individual local market (e.g., Germany, Japan). “Localization” often refers to translating and adapting software products to local markets.
(Sprung and Jaroniec 2000 p.x)
11 See the section “Standards of localisation” for more information. 19 The definition does not differ much from the previous LISA one, in fact, there is a relation between them, as Michael Anobile, founding member and managing director of LISA at that time, wrote the foreword. However, in the second sentence of Sprung’s definition the product is referred as “software”, and it is the first time that software is mentioned when talking about localisation.
Also in the year 2000, Bert Esselink wrote one of the most influential and quoted books in localisation —A practical guide of Localization— where the author defines the term “localisation” in the introduction as follows:
Generally speaking, localization is the translation and adaptation of a software or web product, which includes the software application itself and all related product documentation. (Esselink 2000 p.1)
In this definition the product that LISA mentions has been narrowed into “software or web product”, that is a more specific concrete solution and more adequate definition to our research. It is not only “software” but also “web products”, and this is something that Esselink has added from a previous version of the book that was titled “Practical Guide to Software Localization”, from 1998. The author acknowledges that with the rise of the internet, the term localisation should cover other areas, “such as web sites or “traditional” documentation” (Esselink 2000 p.preface).
Four years later, in 2004, Anthony Pym, an academic in the field of translation studies, wrote the book The Moving Text: Localization, translation and distribution. In this book Pym does not talk about products, but “texts”. In his theory “text” is the key concept, and “translation” is not disregarded as one of the steps of the localisation process, but occupies a central position. There is an attempt to define the concept of localisation in the first part of the book:
(…). In this very practical sense, localization is the adaptation and translation of a text (like a software program) to suit a particular reception situation. (Pym 2004 p.1)
We could include Pym in what Morado et al (2009) defined as TOLS (Translation- Oriented Localisation Studies), where the central object of study in the localisation area is the translation process, and it is from that point of view that we would like to take our research. 20 In 2006, the book Perspectives on Localization was published, addressing most of the topics and challenges that the growing discipline was encountering: terminology management; localisation education; and localisation standards. Dunne himself defined localisation as:
[T]he process by which digital content and products developed in one locale (defined in terms of geographical area, language and culture are adapted for sale and use in another locale. (Dunne 2006 p.115)
In this book the concept of “digital content” was introduced for the first time, when talking about the localisation process. The general concept of “products” is abandoned, and a more specific “digital products” term is introduced, but the author is not specifying which type of digital content he is referring to (e.g. software or web products). However, the idea of product is still present and the purpose of localisation here is the sale of the product. Localisation of Open Source Software, for example, would be excluded from this definition.
In 2009 the second edition of the Routledge Encyclopedia of Translation Studies edited by Mona Baker and Gabriela Saldanha was published. In the previous edition (from 1998), “localisation” was not present as an entry in the encyclopedia. In the second edition the entry for localisation was written by Reinhard Schäler, director of the Localisation Research Centre at the University of Limerick. Schäler defines localisation as follows:
Localization can be defined as the linguistic and cultural adaptation of digital content to the requirements and locale of a foreign market, and the provision of services and technologies for the management of multilingualism across the digital global information flow. (Schäler 2009 p.157)
Dunne’s “digital content” is also present in this definition. In fact, Schäler explains in the same article that “what makes localization, as we refer to it today, different from previous, similar activities, [is] namely that it deals with digital material. To be adapted or localized, digital material requires tools and technologies, skills, processes and standards that are different from those required for the adaptation of traditional material such as paper-based print or celluloid (…).” [Emphasis in the original].
21 The commercial purpose is still present in this definition with the word “market”, and gives the impression that localisation cannot exist without an economic objective. This is not always the case, as many open source localisation efforts have been carried out successfully in recent times (Diaz Fouces and García González 2008). Instead, we prefer to use Pym's ambiguous “particular reception situation” which does not contain commercial connotations but helps us to understand the idea of the change. Taking into account what has been said above, we should admit that it is understandable that most of the authors use the words “product” or “market”, because localisation since its beginning has been a very business oriented discipline, and this aspect cannot be avoided.
Schäler’s definition and his idea of the digital content suit the objectives of this current research perfectly. However, what is digital content? Is it software and web products as Esselink said? Or software alone? The answer to these questions will be discussed in the next section, under the title “Digital content”.
In summary, for the purposes of this research we understand localisation as the “the linguistic and cultural adaptation of digital content.”
(Adapted from Schäler 2009 p.157)
22 2.3. Digital Content
A bit has no color, size, or weight, and it can travel at the speed of light. It is the smallest atomic element in the DNA of information. It is a state of being: on or off, true or false, up or down, in or out, black or white. For practical purposes we consider a bit to be a 1 or a 0. The meaning of the 1 or the 0 is a separate matter. In the early days of computing, a string of bits most commonly represented numerical information. (Negroponte 1995 p.14)
It has been stated before that localisation implies the linguistic and cultural adaptation of digital content (adapted from Schäler 2009 p.157). In this section the terms “digital products” and “digital texts” will be used interchangeably to refer to “digital content”. The discussion about the purposes of the content—for commercial purposes or otherwise, will not be raised in this section.
In this section we will specify what digital is and the consequences this has for our field, because we can predict that this digital nature will determine the way the content will be created, stored, modified and reused, which were the three main axes that we started to study in our research: how can we store, organise and reuse localisation knowledge?
Nicholas Negroponte, co-founder and former director of the MIT Media Laboratory, wrote a visionary book—Being digital— in 1995, which tried to explain the new digital era and its implications for our lives. In the book, Negroponte distinguishes between atoms and bits: atoms are things that have been always around us and that we can touch (for example, a book or a coffee machine); on the other hand, bits are everything that can be represented in a digital form (in binary code) and cannot be substantially touched (for example, a software program or a document created with a word processor.).
In the following lines, we present six definitions of “digital” from computer science- specialised books and dictionaries:
[D]igital[:] Pertaining to digits or to showing data or physical quantities by digits.
(Marotta 1986 p.123)
Digital[:] A term describing devices such as a computer that process and store data in the form of ones and zeros. In a positive logic representation, 'one' might be +5 volts and zero 0 volts. This lowest of levels at which computers operate is known as machine code. Binary arithmetic and Boolean algebra (named after Irish mathematician George Boole) permit mathematical representation. Boolean algebra and Karnaugh maps are
23 used widely for minimisation of logic algebraic expressions. Though digital signals exist at two levels (one or zero), an indeterminate state is possible.
(Botto 1999 p.105)
[D]igital (adj.) Describes anything that uses discrete numerical values to represent data or signals (as opposed to a continuously fluctuating flow of current or voltage). A computer processes digital data representing text, sound, pictures, animations, or video content.
(Hansen 2004 p.82) What all these definitions have in common is the numeric nature of describing the information. As the definitions get closer to the present, the concept develops from having the meaning of “pertaining to digits” (Marotta 1986), through to becoming more specialised to refer to “binary numbers” (Botto 1999), before taking on the value of “representing data” like “text, sound, pictures, animations, or video content" (Hansen 2004). This new form of representing material and texts has many implications for its manipulation.
Today, it is not only about transforming atoms into bits, i.e. digitalising a printed book. The tendency is to create texts, products, and material directly in digital form that, in most cases, will not be transformed into atoms. A blog, for example, is a personal diary written in digital form, which is rarely transformed into atoms unless some entries are printed or turned into a book (in the case of a very successful blog).
One of the first questions that was addressed when the digital era started was, how do we digitalise in a way that other devices can interpret the same data in the same way? The answer came with the signature of conventions, guidelines, standards, and ad hoc proprietary solutions, the latter being the less desirable option. In other words: an agreement between two or more parties to create, interchange, use or store data in a specific way (a specific binary sequence).The different layers (from the machine language to the user interface) are constructed on the basis of standards and conventions that facilitate the interchange and interoperability of data between devices and tools and also ensure that the information can be sent in a secure mode and without loss or corruption of data.
The localisation industry has also gone through this process of standardisation; this topic will be covered in the next section of this chapter. However, it should be said that
24 localisation itself was born along with the digital era, as it works with digital material. So standards are an intrinsic part of localisation. It could be stated that localisation would be impossible without digital standards or conventions.
What does “digitalise” mean to texts and other material?
One of the implications that the digitalisation of texts has is its new linked nature. In a digital world the interrelation of elements is a simple and sometimes necessary operation:
digitalisation allows and incites the interrelation and the tagging of each element in a multidimensional way. This means that in a digital environment, as well as it was in the oral nature (section1), all the elements (text, image, sound, buttons…) can do (function) and mean something. [Emphasis in the original].
(Torres del Rey 2003, section 5) [Translated by Lucía Morado Vázquez]
Jesús Torres, lecturer and researcher in the field of localisation and translation studies, also states the consequences of the digitalisation of texts:
The text could lose its strictly linear nature:
The hyperlinks allow you to surf/go associatively through related ideas and enter and go out whenever and wherever you want, even randomly.
Texts are adapted to the user´s needs: for example, a manual (contained in a single text or hypertext) can be consulted linearly, by sections, or by key words.
In order to manipulate that flexibility the meaning and function of the textual unities needs to be scrupulously tagged and organised.
(Torres del Rey 2003, section 5) [Translated by Lucía Morado Vázquez]
This idea of non-linear text is also addressed in Negroponte's book:
In a printed book, sentences, paragraphs, pages, and chapters follow one another in an order determined not only by the author but also by the physical and sequential construct of the book itself. While a book may be randomly accessible and your eyes may browse quite haphazardly, it is nonetheless forever fixed by the confines of three physical dimensions. (Negroponte 1995 p.69)
Negroponte follows the idea that the new digital components are no longer in our atoms’ three dimensional reality, but are multidimensional:
25 In the digital world, this is not the case. Information space is by no means limited to three dimensions. An expression of an idea or train of thought can include a multidimensional network of pointers to further elaborations or arguments, which can be invoked or ignored. The structure of the text should be imagined like a complex molecular model. Chunks of information can be reordered, sentences expanded, and works given definitions on the spot (something I hope you have not needed too often in this book). These linkages can be embedded either by the author at "publishing" time or later by readers over time.
Think of hypermedia as a collection of elastic messages that can stretch and shrink in accordance with the reader’s actions.
(Negroponte 1995 p.70)
We also see this new multidimensional nature as a main feature of the digital texts. We can see a digital text, for example, the manual that Torres (2003) mentioned earlier and we can see it as a linear structure, that can be read from the first to the last page, or we can see it as a modular structure, and navigate through the points that interest us depending on our own preferences in a particular moment (for example, when we want to see how can we change the contrast of our new tv screen).
This new form of understanding and presenting information has many implications from both the linguistic and technical points of view, such as maintaining the consistency of terminology which is required through the whole product.
Another example that can be pointed out to clarify this multidimensional nature is a software program. Each user will navigate in a different way through the different menus and options of the program depending on their specific needs. It is almost impossible to predict their exact behaviour. As a result of that, the terminology used needs to be consistent as well as the style and register of the language.
Following the same idea, Jesús Torres in his book “La Interfaz de la Traducción” stated that two of the main characteristics of the localisation process are “dynamism” and “interactivity”. He states that:
(...) [T]he continuous linking of segments and unities of different size (from letters to texts or sets of texts) with databases categorised with different criteria (semantic, structural, etc.), causes the changes introduced in one point of the connectively linked chain to affect not only the rest of the elements of the chain, but the whole process.
(Torres del Rey 2005 p.95) [Translated by Lucía Morado Vázquez]
In order to address these new challenges that digital texts and content have, different techniques and tools are needed, which are different from the ones used in the pre- 26 digital area. The tools that have been specifically developed to assist translators and localisers during their task are commonly named Computer Assisted Translation (CAT) tools; we will be discussing them later in this chapter. These tools aim to speed up the whole process and ensure consistency. Most of these tools make use of standards: for example, in order to maintain consistency of terminology, the use of the TBX (Term- Base eXchange) standard will help not only to follow the preferred terminology in one specific text or product, but also it can be reused in future versions of the product or similar texts from the same company. A broader view on the topic of standards of localisation is presented in the following section.
27 2.4. Standards of localisation
The initial research question of this research was “How can we capture, organise and use localisation knowledge?” Several industrial initiatives were created and developed to try to address this problem and others that affect the localisation process, therefore, our first step to answer our question was then to study how current systems store and transmit localisation data from past projects to new ones. This leaded us to the study of standards of localisation.
By standards of localisation we understand those standards that were created ad hoc to enable, facilitate and improve the various localisation efforts. There are other computer science standards and conventions that have a great influence in the localisation process (e.g. UNICODE fonts, IANA language codes, etc), but they are out of the scope of the current research and will not be discussed.
We only focused our research on standard organisations that allowed the creation and development of “open” standards, which means that their work should be transparent and a royalty-free use should be guaranteed (Filip 2012 p.29). A good example of an open standard initiative is the work within the Technical Committee (TC) of the OASIS XLIFF standard: all the work carried out by its members is documented (regular working meetings minutes, mailing lists, and work on the wiki) and open to the general public and interested parties to consult at any time. The TC offers also the possibility for non-members to participate with their feedback in the development process through a special mailing list created with that specific purpose in mind.
In the recent history of localisation and its standards, two organisations have been leading the standardisation efforts: the Organization for the Advancement of Structured Information Standards (OASIS)12, which includes under its umbrella two localisation standards (OAXAL - Open Architecture for XML Authoring and Localization Reference Model- and XLIFF) and the Localization Industry Standards Association (LISA) with a committee formed by a group of experts (OSCAR –Open Standards for Container/content Allowing Reuse) which was devoted uniquely to the creation and development of standards of localisation.
12 OASIS develops many other standards that are not localisation related. 28 At the time of writing this dissertation, we are experiencing a changing momentum in the field of standards of localisation: on the one hand, LISA was declared insolvent in March 2011 and subsequently stopped operations; and on the other hand, LISA’s disappearance lined up in time with the flowering of several new standards initiatives (some of them more strictly open than others): the Unicode Localization Interoperability technical committee within the Unicode Consortium, the GALA (Globalization and Localization Association) Standards Initiative and The Interoperability Now! group (Filip 2012 pp.32–35). The XLIFF standard (developed by OASIS), is the main localisation standard that is currently enjoying a steady and vigorous moment: it has attracted new and influential members (ibid 2012, p.33) and its work is reaching a bigger audience through several promotion initiatives coordinated through a new subcommittee created within the main TC which is devoted to promotion and liaison activities.
2.4.1. LISA and TMX
As stated below, the main standard organisation in the field of localisation was LISA (Localization Industry Standards Association). As its name implies, LISA had a clear commercial nature, it had more than 100 industrial members in the field (localisation companies, software developers, translation agencies, etc.) (LISA 2011a). Unfortunately, in March 2011, the association had to be closed due to financial problems, and its standards specifications were donated to another standard association, the European Telecommunications Standards Institute ETSI. Since then, the former LISA standards are being moving forward through this organisation (Trillaud and Guillemin 2012 pp.39–41).
LISA carried out its standard activities through the OSCAR (Open Standards for Container/content Allowing Reuse) committee which was responsible for the development of the following standards:
Translation Memory eXchange (TMX). OSCAR’s oldest standard, TMX supports the exchange of translation memory (TM) data between applications. Segmentation Rules eXchange (SRX). A companion standard to TMX, SRX allows application developers to define how their tools segment text. Term-Base eXchange (TBX). Based on ISO standards, TBX is the standard for the exchange of structured terminological data. XML Text Memory (xml:tm). xml:tm allows text memory (including translations) to be embedded within XML documents.
29 Global Information Management Metrics eXchange - Volume (GMX-V). GMX-V provides a standard way to count characters and words in documents for information management purposes, including estimating translation costs. Term Link (proposed). Term Link is an XML namespace that enables XML elements to be linked to termbases.
(LISA 2011b)
For the aims of this research we were particularly interested in TMX as its purpose “is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process” (OSCAR LISA 2005). This objective partially coincided with our initial research objectives (capture, organise and use), however it was only restricted to “translation data”, and our scope was broader as it involved a more generic process, localisation, where translation is only a part of it, as the localisation academic Ignacio García (2006 p.15) states: “[i]t is worth noting that the localisation industry does not translate – it localises.” We needed to look to a broader standard that could encapsulate not only translation data but the whole procedural data. That standard is called XLIFF (XML Localization Interchange File Format) and will be discussed in the next section. The main difference between both formats is that a TMX file is formed by a collection of translation unit elements with no specific order and without a mechanism to rebuild the original file (XLIFF TC 2007) whereas in an XLIFF file the translation units are ordered and identified, and from them, the original file could be rebuilt. Moreover, more information about the original file is included in specific elements of the file which eventually can also be reused in future processes, as we demonstrated with the first stage of our methodology.
The XLIFF format can be used for different purposes and can be adjusted to ad hoc requirements. The information that it is kept in the XLIFF file format can also be reused for future projects; “in modern localisation operations, TM is the primary enabler for cost reductions by maintaining multiple representations of information.” (Lommel 2006 p.228). However, XLIFF was not originally designed to act as a translation memory in itself. The use of XLIFF as TM was first associated to its conversion to TMX (Raya 2004; Raya 2005; XLIFF TC 2007), but that conversion from XLIFF into TMX tags, by mapping the former items with the equivalent ones in the latter format, results in the disposal of many pieces of information that cannot be matched in TMX. Among other reasons, this is because both standards (that share some similarities) were created
30 with different goals and fulfill different necessities during the localisation process (XLIFF TC 2007). Instead of using that approach we wanted to maintain the whole data set that an XLIFF document contains by storing the document itself inside a bigger container.
2.4.2. OASIS XLIFF
The XML Localisation Interchange File Format is an open standard developed by OASIS (Organization for the Advancement of Structured Information Standards). It is a data container that carries localisable data from one localisation process to the other. Its main feature is that it allows the interoperability between different tools.
The XLIFF Technical Committee (TC) is in charge of the development and maintenance of the standard. The current TC is formed by 21 voting members (including the author of this dissertation) coming from localisation-related industries and institutions: 33.3% from software corporations, 23.8% from tool vendors, 14.3% from associations, 9.5% from service tool providers, 9.5 % from academic institutions and 9.5% of the members are registered as individuals (Filip 2012 p.33). The TC is currently developing the new version XLIFF 2.0 (the current specification is XLIFF 1.2) and there are two subcommittees: XLIFF Inline Markup SC, which deals with the developing and description of inline elements within the standard, and the XLIFF Promotion and Liaison, which works on promotion and liaison activities. Users and other interested people can also participate by sending their comments, questions or suggestions to an open mailing list devoted to that purpose. All the technical documents produced by the TC are open to the general public and can be downloaded from its web page: https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff.
The basic principle behind XLIFF is the extraction of the localisation-related data from its original format, which can be manipulated by those tools that support the standard, and its conversion to the original format once the whole localisation process is finished.
31
XLIFF extracting/merging principle schema (XLIFF TC 2008a p.11)
XLIFF ARCHICTETURE
The XLIFF 1.2 specification includes 37 elements, 80 attributes and 269 pre-defined values. That makes a total of 386 items, which makes it a complex XML vocabulary.
The main structure of a document is composed by the following elements:
Inside the translation units we find three elements: the
32 The attributes that are present in the current specification can contain rich metadata that can be reused in future projects. An analysis of the relevant attributes for our research could be found in the methodology chapter.
The core parts in an XLIFF document are the translation unit elements. If we see them closely, they are similar to a standardised TMX document (XLIFF TC 2007). However, an XLIFF document is much more than a Translation Memory. It can contain much more information about the process, the tools used, the people involved and the current state of the document.
The XLIFF standard will be central in our research. We will analyse the metadata that it can contain, how it is actually reused, and we will propose a new way of reusing more metadata and organise it in a way that can be helpful for the use of the localisation professionals. All these processes are explained in detail in the first methodology chapter.
LANGUAGES IN XLIFF:
An XLIFF document is by default a bilingual document, which contains the original file in one language and the target language that is being localised. The languages of the original and target elements should be stated inside the
INVOLVEMENT IN THE XLIFF TC
The author of this research joined the XLIFF Technical Committee in March 2009 and has contributed to its development since then –participating in the fortnightly meetings, collaborating in the development of the new specifications, being one of the main organisers of the First XLIFF International Symposium13, and member of the scientific committee in the two following symposium editions. In March 2010 she joined the XLIFF Inline Markup Subcommittee14, a group that it is devoted to the definition and specification of the inline elements of the XLIFF standard and its relation and possible interoperability with related standards such TMX and ITS. The author has also joined the Promotion and Liaison since its establishment in 2011 and has been the main editor
13 You can visit the official webpage of the event at www.localisation.ie/xliff. 14 You can visit the official subcommittee webpage at http://www.oasis- open.org/committees/tc_home.php?wg_abbrev=xliff-inline 33 of the XLIFF support in CAT tools Survey, which obtains information about the XLIFF support in CAT tools from the main CAT tool developers of the industry and the open source world (Morado Vázquez and Filip 2012).
2.5. Previous research on CAT tools and translation memories
Computer Assisted Translation (CAT) tools have been designed to support human translators to fulfil their job, although TM technology had started to be developed in the late 1960s and early 1970s, it was not until the 1990s that commercial CAT tools became to spread (Benito 2009). They should not be confused with Machine Translation systems, which ultimate objective is to substitute human translators. CAT tools were designed to help translators to eliminate repetitive tasks, to automate terminology lookups, and to reuse previously translated material (Esselink 2000 p.359). Some of the principal characteristics that CAT tools share are: text extraction and format filters (which separate the translatable text from the rest of the code), translation memory and terminology management, project management functionalities, quality controls, and preview options (Torres del Rey 2011). In our research, we were only focused in the translation memory functionality and how it affects the translation behaviour (along with the provenance metadata that we introduced).
Translation memories are databases made of previous translations (organised by source and target language) that can be reused in future projects. They can also be of much help when working in the same file or project, as when repetitive text appears, its translation can be automatically suggested or translated by the tool; and when working with other translators, having into account that a remote translation memory accessible simultaneously is into place. The format in which these translation memories are stored will determine their future reuse. As seen in the previous section, a specific standard (TMX) was developed by LISA to allow the interoperability of translation memories between different CAT tools. Other data exchange formats, like XLIFF, which was not originally created to be used as a translation memory format, can be used for that purpose.
The main advantages of the use of translation memories for translators are an increase of productivity (less time spent on the translation task), and an improvement on the 34 quality of the translation by increasing consistency (Bowker 2005 p.15). The penetration rate of translation memory systems within professional translators has been reported as very high (Lagoudaki 2006b p.15) with a penetration rate of 81% among freelance translators in that year.
There have not been many research efforts on CAT tools and translation memories and their impact on translators. Two recent academic papers (Christensen & Schjoldager 2010 and García 2009) compiled and reviewed the recent studies on the field. García divides the research studies into descriptive studies (surveys, reports and case studies) and (quasi) experimental studies. Our research would clearly fit into the later one. García also states that one of the risks for researchers working with TMs is that, due to the rapid technological change, their work might get obsolete or not relevant by the time is published. In the paper by Christensen and Schjoldager, they focused on compiling and describing existing research on TM research carried out with empirical methods after the year 2000; they identified nine studies which fulfilled those criteria. In their conclusion Christensen and Schjoldager stated that “[m]ost practitioners seem to take for granted that TM technology speeds up production time and improve translation quality, but there are not studies that actually document this”, our experiment will also help to fill that gap in the field of translation research using CAT tools and translation memories, because, after analysing the results of our experiments, we determined that, in average, the two groups that received a translation memory, obtained significant better results in terms of reduction of time and better quality.
Apart from the risk of relevancy, mentioned by García and also present in Pym (2011), we have found through our research, what we believe, is one of the main constraints of this type of research has: the difficulty to obtain a large and representative set of participants. Translators can work as in-house professionals or as freelancers. Reaching both groups involves difficulties: on the one hand, reaching in-house translators, who work in offices, can be a difficult task if only a language combination is picked for the study, in our particular case, the industrial partners we consulted did not have a sufficient number of English-Spanish translators in place; on the other hand, freelance translators represent also many difficulties to reach to: they cannot be easily physically approached (as they work from their own homes), and they work using their own and customised work environment with their own preferred translation tools, all those characteristics impose a high subject variability and a lousy control over working 35 environments that can influence negatively the validity of the results of an empirical study. A deeper discussion in the limitations of our particular experiment and its validity can be found in Chapter 4. Some researchers (Bowker 2005; Christensen and Schjoldager 2011; Mesa-Lao 2011) used translation students, which helped them to have a bigger dataset of participants. We also used translation students in our pilot study, this study helped us to better define the boundaries of the experiment and to redefine and adjust some aspects of its design (like the distribution of the groups). The biggest set of participants used in a empirical study involving translation memories, was achieved by the TRACE project (conducted by several researchers based in the Autonomous University of Barcelona), which involved the participation of 90 15 professional translators and which main objective was to measure the impact of CAT tools on translated texts (Torres-Hostench 2010). To the best of our knowledge, our experiment represents the second biggest data set in terms of participants in the field of empirical research using translation memories, with a total of 33 valid participants.
Empirical research on translation memories has been focused on studying the following specific topics: their impact on speed and quality (Bowker 2005; Yamada 2011); their impact on the cognitive processes (Christensen and Schjoldager 2011); the productivity and quality in post-editing Machine Translation (O’Brien 2007; Guerberof 2009); consistency in translation memories (Moorkens 2011); translation research using eye- tracking (Doherty et al 2010; O’Brien 2010) which was also used to measured the translator’s attitudes to UI design (O’Brien et al 2010); and the impact of CAT tools on translated texts, carried out by the above mentioned TRACE project in which different researchers worked on studying three specific aspects: explicitation, linguistic interference and textuality (Torres-Hostench 2010).
Teixeira (2011) is the only research effort that explicitly relates to the study of provenance metadata in translation memories, his object of study is the combination of translation suggestions coming from human created translation memories and from machine translation systems. In his study, he tried to identify whether the absence or presence of provenance metadata (the independent variable of his study) had any influence in the translators work in terms of translation speed and quality (the dependent variables). In his initial study, which he intends to extend, he used two professional
15 We are not sure of the final number of participants, as they stated in the paper that some of the initial 90 recruited participants were discarded. 36 translators. Due to the limitations of his initial study, definitive conclusions could not be drawn. Although, there are similarities with our research, our study, did not make use of translations suggestions coming from machine translation systems, we only used translation memory suggestions coming from the official target translation, which were carried out by human translators. We also investigated a full range of provenance metadata items that surrounded the translation suggestion (name of the translator, date, topic of text, target language and name of original file), not only its origin (being created by a human translator or by a machine translation system). Our research aims to contribute to the field of empirical research on translation memories; by studying the influence that provenance metadata that surrounds translation suggestions has on the translator's behaviour.
37 Chapter 3 – Methodology I
3.1. Introduction
In the first stage of our research our main objective was to answer the following question How can we capture, organise and use localisation knowledge? In order to answer this question we studied the current data exchange standards present in our research field –explained in detail in the literature review section– then we concentrated our efforts on one of those standards (the XLIFF standard) and we identified the gaps where possible useful metadata is not being captured, organised and reused. Therefore we decided to follow a “Design and Creation”16 research strategy that allowed us to design and develop a tool prototype that demonstrates how metadata that surrounds translation suggestions can be captured, organised and reused by translators in future projects. This chapter explains the process we followed to do so. In the next chapter we follow a different research strategy: “Experimental Research” where we conducted research experiments with translators to study the influence that the metadata that we are introducing, along with the translation suggestions, has on the translators during their task. We will discuss the two sequential strategies separately. 3.2. Design and Creation
The design and creation research strategy focuses on developing new IT products, also called artefacts.
(Oates 2006, p.106)
The first part of our methodology is the design and creation of a data container (later called “Localisation Memory Container” or LMC) that would allow us to encapsulate localisation data and metadata, and the second part is the development of a localisation tool that will automate the process of recovering localisation data and metadata.
The purpose of these two steps is to answer the initial question of our research, and that means to demonstrate how can we organise (in a Localisation Memory Container file), capture (with an extracting and merging tool), and use (with a traditional Computer
16 We have followed the terminology of the book “Researching Information Systems and Computing” by Briony J Oates (2006) and its research guidelines in this section.
38 Assisted Translation tool- or CAT tool). The resulting files from this strategy will be: a) Enriched XLIFF files that will be used in CAT tools, and b) LMC files where XLIFF files could be stored for its latter reuse. In the second part of our methodological strategy we will test whether this new localisation metadata that we are introducing in the enriched files will actually be helpful for the translators and subsequently improve their work and productivity.
This chapter is divided into three sections: Localisation Memory Container, which describes the repository that we developed to organise XLIFF files for its later reuse; the Localisation Metadata in XLIFF, which describes the process to identify possible meaningful metadata within the current XLIFF 1.2 specification; and XLIFF Phoenix, which describes the design and development of our tool prototype.
3.2.1. Localisation Memory Container
The Localisation Memory Container is an XML vocabulary that was developed as a data descriptor that allows the storage of previous XLIFF documents in a single file.
Instead of using extracted translated data like is done in the TMX standard, we want to use complete XLIFF files as localisation memory components that could be reused in future projects. It is possible to encapsulate several XLIFF files into one XLIFF file (XLIFF TC 2008b), however this possibility did not allow us to have our desirable self description document, that is to say, to have the information about what we have in the container in the same document. Our approach was to develop a new XML language where we could store previous XLIFF files; the language should also be self descriptor, which means that it has information about the number of files that it contains, its author, its creation and last modification date.
As you can see defined in its XML Schema below, the main structure of an LMC document is composed by the elements
In the body, any XML that is correctly specified with a namespace can be included. For the purposes of our research we will only use XLIFF files. However, we left this option 39 open to other XML languages, like the TMX that could be incorporated if necessary in the future.
This is the XML schema of the LMC:
And this is a LMC sample file containing two XLIFF files:
40
41 3.2.2. Localisation metadata in XLIFF
Localisation Metadata connects the data present at different stages of the localisation process, from digital content creation, annotation, and maintenance, to content generation, and process management. The usage of open, rich, and flexible metadata is an effective step towards the aggregation and sharing of data between localisation sub- processes.
(Anastasiou & Morado Vázquez 2010, p.258)
As mentioned above, metadata can be very important in various phases of the localisation process, particularly when data is shared among people, in similar or different roles, along the globalisation cycle. However, not all metadata has the same relevance for localisation professionals, and depends very much on the computer- assisted localisation tool and the type of content being localised. XLIFF provides many mechanisms to add metadata related to the product, the process, the supplementary material, or any other localisation factors. However, little use is made of such opportunity to improve the task of the localiser by enriching the data that they need to process.
Following the above definition, we carried out a complete analysis of the current XLIFF specification to identify the attributes that could encapsulate the relevant localisation metadata for our research. An explanation of how we decided the relevancy is given below. After that, we investigated which of our selected attributes could be introduced validly in the alt-trans element, and finally we proposed a new form of introducing the new metadata in the “alt-trans” element (an XLIFF 1.2 element explained later) without breaking the validity of the XLIFF file.
The relevant metadata (for the localisation process) we have selected is those attributes that can self-describe the localisation process by answering the traditional five W questions: why, who, what, where and when; and the sixth one: how.
42 Choice of metadata
In order to identify potentially relevant localisation metadata for our research, we first carried out a complete analysis of the current XLIFF 1.2 specification and looked for attributes that could describe the localisation process by answering the traditional Zachman Entreprise Framework (Warren 2007) WH questions: why, who, what, where, when and how. See the next table for a schematic view of our analysis:
What (data) is it? original
is being localised? source-language
target-language
category
version
does it do? datatype
Who is localising/translating? contact-name (people) localised similar files earlier? (matches) contact-email
is the product for translate
Where is it localised [company/country]? company-name (network) target-language
is it localised [tool]? tool-id
tool-name
did the translation matches come from? state-qualifier
When were the matches obtained? date (date) was the product made? date
are we in the localisation phase state
Why is it being localised? job-id (motivation)
How is it localised [tool, process, auxiliary material]? tool-id (function) tool-name
is it being localised? state
43 state-qualifier
Table 1. Zachman Framework applied to XLIFF 1.2 attributes.
The next step was to check whether these attributes are already present in the
We summarise in this table the results from the metadata obtained:
Attribute Allowed in the alt- Included in our trans element analysis
alttranstype Yes No
approved No Yes
category No Yes
company-name No Yes
contact-email No Yes
contact-name No Yes
coord Yes No
crc Yes No
css-style Yes No
datatype Yes Yes
date No Yes
exstyle Yes No
extradata Yes No
extype Yes No
font Yes No
format No Yes
help-id Yes No
job-id No Yes
match-quality Yes No
44 menu Yes No
menu-name Yes No
menu-option Yes No
mid Yes No
origin Yes No
original No Yes
phase-name Yes No
resname Yes No
restype Yes No
source-language No Yes
state No Yes
state-qualifier No Yes
style Yes No
target-language No Yes
tool Yes No (deprecated in XLIFF 1.2)
tool-company No Yes
tool-id Yes Yes
tool-name No Yes
translate No Yes
ts Yes No
version No Yes
xml:lang Yes No
xml:space Yes No
Table 2. Attributes allowed in the alt-trans element. The place of metadata
In XLIFF 1.2, only a set of predefined attributes are allowed to be introduced in the
45 xml:space, ts, restype, resname, extradata, help-id, menu, menu-option, menu-name, coord, font, css-style, style, exstyle, extype, origin, phase-name, and alttranstype). Therefore, the next problem we encountered was to decide where and how we could introduce the metadata that is not currently allowed in the alt-trans element. We considered the following possibilities:
Introduce a new element
This solution implies the introduction of a hypothetical element called metadata where we could encapsulate all the metadata that surrounds the translation suggestions; however this cannot be done without breaking the validity of the XLIFF file. There is a similar possibility that would not break the validity: designing a new XML language and embedding it through the XML namespace mechanism, however, this means that we are proposing a non-standardised language and the data contained on it will be lost between tools that will not recognise the new language. Extending the XLIFF format through XML namespace mechanism has been a topic of much debate during the last few years within the XLIFF TC (Fredrik 2012; Savourel 2012), in the new version 2.0 which is under development, it will be included as well.
Introduce the metadata in the extradata attribute
This attribute which can be included in the
Introduce the information through XML processing instructions
The XML processing instructions “allow documents to contain instructions for applications” (W3C 2008). With this mechanism we could maintain our “attribute- value” structure intact for its later reuse and we will not be breaking any validity of the XLIFF specification. This solution was proposed by Rodolfo Raya (localisation and XLIFF expert17), during an interchange of private emails, where he also offered us to
17 As well as working as a tool developer in MaxPrograms, Rodolfo Raya has been actively involved in the two main localisation data exchange formats (TMX and XLIFF) during the last years. He was the 46 modify his commercial localisation tool for the purposes of this research (see the section “Swordfish II” for more information). We decided to use this solution to introduce our metadata information because it did not break the validity of our output XLIFF files and our data could be displayed in a meaningful way in at least one CAT tool which was later used for the experiments with translators.
3.2.3. XLIFF PHOENIX
As its end approached, the phoenix fashioned a nest of aromatic boughs and spices, set it on fire, and was consumed in the flames. From the pyre miraculously sprang a new phoenix, which, after embalming its father’s ashes in an egg of myrrh, flew with the ashes to Heliopolis (“City of the Sun”) in Egypt, where it deposited them on the altar in the temple of the Egyptian god of the sun, Re. (Britannica Online Encyclopedia 2010)
The next step in our research methodology was the design and implementation of a tool prototype that could extract localisation data and metadata (see previous section for definition of the metadata) from an LMC file and introduce it into a untranslated XLIFF document (in the
3.2.3.1. Definition
XLIFF PHOENIX is a Computer-Assisted Translation (CAT) tool that allows the reuse of previously localised XLIFF documents. This is achieved by filtering the information included in them and matching it with a new document introduced into the system. The resulting document is an enriched file that contains translation recommendations (in the alt-trans element) and embedded metadata.
3.2.3.2. Development
In the first stages of the development of the tool, we have stated the initial technical requirements and a draft design of the GUI. The design process did not finish in the
editor of the latest TMX specification (2.0) and the latest XLIFF specification (1.2). He is currently the secretary of the XLIFF TC Committee. 47 early stage, but it continued during the whole implementation process, by designing new features and modifying others to improve the whole system.
The technical implementation of the tool was carried out by CSIS 2nd year intern Seán Mooney. He worked as an intern for 8 weeks with a grant from the CNGL. He was supervised mainly by Lucía Morado Vázquez, Chris Exton and Dimitra Anastasiou.
The tool was developed using the Netbeans platform. The programming language chosen for this implementation was Java. While C# was initially considered it was discounted for the following reasons:
C# while faster than java, failed to meet the requirement of universal cross platform compatibility. While mono is a promising solution to this issue it was not mature enough to use at the time of implementation. As the implementation deadline for this project was 7 weeks C# was also less favorable as the programmer has greater experience with GUI development in the java environment.
Due to the short time scale involved for the entire project an Agile approach was followed during the implementation process. This proved to be beneficial to the implementation and design of the tool, as additional functionality could be accessed and implemented as required throughout the entire implementation process.
This process was facilitated by daily updates and weekly assessment of the development of the tool. This frequent interaction also ensured that each phase of development was in line with the final goal for the project.
Phase 1:
Phase 1 comprised of the implementation of the core functionality of the tool.
Features implemented:
Load and read valid XLIFF files. Load and read LMC (Localisation Memory Container) files. Filter the LMC by 6 general attributes (File Name, Source Language, Target Language, Format, Date and Tool Name). Leverage the exact matches into the alt-trans element
48 Export Enriched XLIFF.
Phase 2:
Phase 2 involved extending the filter to accommodate all 18 parameters and resulted in an initial beta of the tool.
Additional features implemented:
Simple fuzzy matching with adjustable threshold. Filter the LMC by: o 6 general attributes (File Name, Source Language, Target Language, Format, Date and Tool Name). o 12 specialised/advanced attributes (Locked, Tool Id, Domain, XLIFF Version, TU Status, TU State, Approved Translation, Tool Company, Company Name, Contact Name, Email and Job Id)
Phase 3:
Phase 3 involved the completion of the initial requirement of the project and marked the point at which the tool began to be extended.
Additional features implemented:
Leverage the sentences into the alt-trans element along with the percentage and the origin information. Leverage the metadata surrounding the retrieved sentence. Transformation and exporting into valid TMX. Export in a “readable” html document. Refinement of the fuzzy matching algorithm for greater accuracy and performance.
Phase 4:
Phase 4 involved the development of a companion tool to construct LMC documents named the LMC Builder. This was not part of the initial project requirements but was added to allow for the easy adoption of LMC format and the XLIFF Phoenix tool.
Additional features implemented: 49 Leverage Statistics. Initial branding of the tool. Initial Wizard prototype created. LMC Builder External tool. Which included: o TMX to XLIFF converter. o Build an LMC from previous XLIFF documents. o Build an LMC from previous TMX documents. o Append TMX or XLIFF documents to an LMC.
Phase 5:
Phase 5 involved the extension of the tool to facilitate interoperability with both the CNGL Solas Platform 18 and Swordfish. In addition to this final quality assurance checking was undertaken in this phase.
Additional features implemented:
Ability to load and save to the CNGL Solas Platform. Ability to get jobs from the CNGL Solas Platform. Metadata stored as XML processing instructions to allow tool such as Swordfish to make use of the metadata that is added. Addition of option to disable watermark and server futures at runtime. Enrichment of user interface to allow for easier use by adding hyperlinks and additional tool time to explain the features of the filters in greater detail. Final branding.
3.2.3.3. Functionality:
Here there is a summary of our tool’s functionality:
Load and read valid XLIFF files. Load and read LMC (Localisation Memory Container) files. Filter the LMC by: o 6 general attributes (File Name, Source Language, Target Language, Format, Date and Tool Name).
18 XLIFF Phoenix is one of the components of the SOLAS platform, developed by the CNGL (Aouad et al 2011) . 50 o 12 specialised/advanced attributes (Locked, Tool Id, Domain, XLIFF Version, TU Status, TU State, Approved Translation, Tool Company, Company Name, Contact Name, Email and Job Id). Fuzzy Matching (also selection of the percentage) Leverage the sentences into the alt-trans element along with the percentage and the origin information. Leverage the metadata surrounding the retrieved sentence, this information is introduced via processing instructions. Transformation and exporting into valid TMX. Export in a “readable” html document.
3.2.3.4. Filtering
We use a filter that allows the user to define his/her preferences to a fine point of granularity. The filter also allows a better performance of the tool, because it produces a smaller version of the LMC that will be processed faster in the later operations of the tool. The filtering options correspond with XLIFF 1.2 attributes (the ones that we have selected as required metadata, see previous section), however some names do not correspond exactly to the specific attribute name in the specification, because the latter was not very explicit (for example: we use the field “name” that corresponds to the XLIFF attribute "origin”) or because it was an abbreviation form.
The filter is divided into two sections: general and advanced. In the general options we encounter the following options: File Name, Source Language, Target Language, Format, Date and Tool Name. If the XLIFF 1.2 specification indicates predefined values for any attribute specify in this filter, those predefined attributes are shown in an editable dropdown list.
51
Figure 1. XLIFF Phoenix. General Filter In the advanced options we encounter the following options: Locked, Tool Id, Domain, XLIFF Version, TU Status, TU State, Approved Translation, Tool Company, Company Name, Contact Name, Email and Job Id. If the XLIFF 1.2 specification indicates predefined values for any attribute specify in this filter, those predefined attributes are shown in an editable dropdown list.
52
Figure 2. XLIFF Phoenix. Advanced Filter If the XLIFF 1.2 specification indicates predefined values for any attribute specify in this filter, those predefined attributes are shown in an editable dropdown list.
XLIFF Phoenix was developed successfully during the summer of 2010 and it was presented in several international scientific events (see Appendix K for more information), a video with a demo presentation of the tool is available in this webpage: http://www.youtube.com/watch?v=E6b36IHAMgM.
3.2.3.5. Fuzzy Matching
The fuzzy matching of this tool consists of comparing source elements from translation units of the LMC with source elements from translation units of the XLIFF source file. The user can define their fuzzy matching percentage.
The fuzzy matching is done using the following algorithm (adapted from Cohen et al 2003):
53
Given the strings S1(Search text) and S2(LMC Text):
For a give weight K
K*( SSrJaroWinkle ,(),( SSJaccard )) 21 21 MatchQuality( 1, SS 2 ) K 1
P' SSJaroSSrJaroWinkle ),(),( ,(1(* SSJaro )) 21 21 10 21
Where P length of the longest common
prefix of S1 and S2 gives P’ =max(P; 4) 1 M M TM SSJaro 21 ),( ss 3 S S 21 M 21 ssJaccard 21 ),( ss 21
Where M= the number of matching characters in S1 and S1
T= the number of transpositions between S1 and S1
Hence for a threshold value Ω, S1 matches S2 if
MatchQuali ,ty(S 21 )S
When L1 the length of S1 is less than L (32) characters in length S1 and S2 said to matches if
L MatchQuality(S , S 1) 2* 1 2 L 1
The result from the comparison is included in the alternative translation element (alt- trans), in the source and target elements. There is also other information included: the percentage of the fuzzy matching (in the match-quality attribute) and the attributes (included in the filter) that surrounded the translation unit. These attributes and their values are stored as processing instructions in the alt-trans element.
54 3.2.3.6. Enrichment (leveraging)
If the tool finds matches between the LMC and the XLIFF file it will include the translation suggestion and its metadata in an
For example, if we have the following LMC file containing one XLIFF file:
55
In sections 1 (datatype="html", original="Symposium.htm", source-language="en" target-language="es", tool-id="Swordfish", date="2010-05-01T12:00:00Z", category="standards" and job-id="SymposiumWebpage"), 2 (company-name="LRC" contact-name="Lucia Morado" and contact-email="[email protected]") and 3 (tool- company="Maxprograms" tool-id="Swordfish" tool-name="Swordfish II" and tool- version="2.0-1 7DA-4-C") we encounter different metadata items that could be retrieved using our tool.
This is the untranslated file that we want to enrich which contains 4 translation units (see section 4):
56
57
And this is the enriched file after XLIFF Phoenix has found only one match (section 5) and has leveraged correspondent translation suggestion with its correspondent metadata information (section 6):
59
You can see in the following extraction of the code more clearly the single match that the tool has found match, it added inside the
60
Inside the opening
All those metadata items were recovered from the different elements within the XLIFF file:
And this is how the file would be displayed to the translator in Swordfish II, you can see that the editing section is located in the right and central part, and on the right side you can consult the translation suggestion (or TM match as the tool has name it), and the provenance metadata can be consulted in a box below the translation suggestion (or Match Properties, as the tool has named it):
61 Translation Suggestion
Provenance metadata
Figure 3 Enriched XLIFF file in Swordfish II
62 3.2.3.7. Exporting
The tool can save the document with the translation recommendations in a new XLIFF file or it can overwrite the original one. Moreover, the tool gives you the option to export the file in an HTML format (to have a quick and human readable version of the file) and in a TMX format, the TMX conversion is offered because some CAT Tools do not still support the alt-trans element (Morado Vázquez & Filip 2012, p.11), but they do support TMX files. For obtaining a summary of the leveraging process the tool can also export a statistical breakdown of the
Exporting as an XLIFF format
This file is the main purpose of our tool, an enriched XLIFF file which contains translation suggestions with its corresponding information. In the previous section you can see an example of an XLIFF Phoenix output file.
Exporting as html
An html version of the file can also be exported to provide the translator with a human readable version of the file. The html contains a table with five columns: Source Text, where the source text of each of the translation units is included; Alt-Target Text, where the target text of the alt-trans is included19; Match%, where the match percentage is included; and Source Lang and Target Lang, where the information about the source and target language is included.
This data has been generated by Xliff Phoenix
19 If the tool has not found any match, this cell would be empty. 63
The file processed was:
Source Text | Alt-Target Text | Match % | Source Lang | Target Lang |
---|---|---|---|---|
text/html; charset=UTF-8 | en | |||
2010 1st XLIFF International Symposium - Limerick, Test | 2010 Primer Simposio Internacional sobre XLIFF - Limerick, Irlanda | 94.0 | en | es |
The first XLIFF Symposium 2010 is taking place in Limerick, Ireland on 22nd September Test. | en | |||
It will be part of the LRC preconference Test. | en |
64
© 2010 UNIVERSITY OF LIMERICK.
All rights reserved.
This material may not be reproduced, displayed, modified or distributed without
the express prior written permission of the copyright holder.
This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (www.cngl.ie) at University of Limerick .
The same html file displayed in a web browser:
Exporting as TMX
All the matches found in the processing action can be exported in a TMX format. If a tool does not support the
65 TMX to it and leverage its information during the translation task20. The metadata information is included as a comment in the code, which does not break the validity of the format.
Exporting statistics information
It is possible to export the information regarding the leveraging process in an html or XML file. The file obtained would include a count of all the segments where a fuzzy- match was obtained, it would also classify the match depending on the percentage of the fuzzy-match: 100%, 95%-99%, 85%-94%, 75%-84%, 50%-74% and <50%. Here is an example of an html file with count information:
20 If that TMX file is included within the active translation memories to be used in a specific translation task. 66
This data has been generated by XLIFF Phoenix
Match Range | Segments | Percent of Document |
---|---|---|
100% | 0 | 0 |
95%-99% | 0 | 0 |
85%-94% | 1 | 100 |
75%-84% | 0 | 0 |
50%-74% | 0 | 0 |
<50% | 670 | 0 |
Total | 1 | 100 |
©2010 UNIVERSITY OF LIMERICK.
All rights reserved.
This material may not be reproduced, displayed, modified or distributed without
the express prior written permission of the copyright holder.
This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (www.cngl.ie) at University of Limerick .
And this is how the same html file would look like in a web browser:
68
The same statistical information can be exported in a XML file:
3.2.3.8. XLIFF Phoenix Architectural Layout:
The final implementation consists of two packages:
Phoenix net.boplicity.xmleditor
The first package, Phoenix, will be discussed further below. The second package, net.boplicity.xmleditor, is an external project and is hence outside the remit of this document. It should be noted that the xml editor package only provides aesthetic functionality to the main tool and has no further contribution to how it operates.
The main Phoenix package has been functionally subdivided into four packages and the core program. These are as follows:
69 Phoenix.I/O Phoenix.GUI Phoenix.XLIFF Phoenix.LMC
Phoenix.I/O:
This package is responsible for all input and output functionality provided by the tool. It contains one interface: ITransformer and eight classes: Output, Input, TransformerBase, HtmlTransformer, StatTransformer, TmxtoXliff, TmxExporter and ClientHttpRequest.
Phoenix.GUI:
This class represents the entirety of the user interface. It contains eight classes: AdvancedFilterView, FilterView, Jobs, Options, PhoenixAboutBox, PhoenixView, WTextPane and Wizard.
Phoenix.XLIFF:
This Package extends JDOM’s Element class to create a default representation of some but not all XLIFF Elements. This Package contains 11 classes: AltTrans, MetaData, Note, Source, Target, Tool, TransUnit, Xliff, XliffElement, XliffFile and XliffHeader.
Phoenix.LMC
This Package extends JDOM’s Element class to create a default representation of LMC Elements. This Package contains one interface ILmcBuilder and four classes: Lmc, LmcBody, LmcBuilder and LmcHeader.
Phoenix Core:
The core application contains two interfaces: IFilter , IProcesser and six classes: Core, Filter, PhoenixApp, Processer, QuickHeapSort and StatBuilder.
70 Phoenix
XLIFF I/O Gui
XliffElement TransformerBase ITransform FilterView AdvancedFilterView
HtmlTransfromer StatTransfromer Source Target TransUnit PhoenixView PhoenixAboutBox
Wizard Output Input WTextPane Note Tool Xliff
Options Jobs TmxToXliff AltTrans MetaData XliffFile TmxDocExporter
XliffHeader ClientHttpRequest
StatBuilder QuickHeapSort LMC Core LmcBody Lmc PhoenixApp IFilter IProcesser LmcHeader
Filter Processer
Figure 4. XLIFF Phoenix Architectural Diagram
3.2.3.9. How XLIFF Phoenix works
Starting the program
When we open XLIFF PHOENIX, we have two main user options: to use the wizard button (that will take the user through the main steps of the tool), or to carry out every step separately using the different options that the tool provides. We recommend the Wizard option for the users who use the tool for the first time or for those who might not have much experience with the localisation process.
71
Figure 5. XLIFF Phoenix. Main screen. Loading a XLIFF source file
If you are using the Wizard, this will be the first step that you will have to carry out. If that is not the case, you will have to go to the "Original" tab and click on the button "Load". Then you will have to select the location of the XLIFF file that you want to enrich.
72
Figure 6. XLIFF Phoenix. Loaded XLIFF source file Loading a LMC
If you are using the Wizard, this will be the second step that you will have to carry out. If not, you will have to go to the “Container” or “Filtered Container” tab and click on the button "Load". Then you will have to select the location of the LMC file that you want to load. Alternatively the LMC may be loaded before the source file in when not using the wizard.
73
Figure 7. XLIFF Phoenix. Loaded LMC file Filtering Options
The filter allows you to define your comparison with a greater point of granularity. If you are using the Wizard, this will be the third step that you will have to carry out. The wizard will ask you if you want to filter your LMC or if you want to do the fuzzy matching against the whole LMC document.
74
Figure 8. XLIFF Phoenix. Filtering option in the Wizard If you are not using the Wizard, you should click on the “Filter” button. A popup window will come up with the filter options available. This window will contain the general filtering options. Note that the “Filtered Container” tab will be updated upon completion of the filtering process.
Figure 9. XLIFF Phoenix. General filtering options
75 If you click on the advanced option you will get another window with more advanced options. To deactivate the advanced mode you have to click on the “Basic” button.
Figure 10. XLIFF Phoenix. Advanced filtering options Fuzzy Matching
If you are using the Wizard, this will be the fourth step that you will have to carry out. If you are not using the Wizard, click on the option “Fuzzy Matching”, choose a percentage from the dropdown menu (or enter your own) and click on the “Process” button. The result file will be displayed in the “Enriched” tab.
76
Figure 11. Enriched file. Saving options
If you are using the Wizard, this will be the last step that you will have to carry out, you can choose to save the file with a different name or to save the changes in the same file (overwrite it). If not you will click on the “Save” button. If the file exists a warning will be displayed asking you to confirm you wish to save.
77
Figure 12. XLIFF Phoenix. Saving warning message Exporting Options
Apart from exporting your file in the original XLIFF format, you can also export your file as HTML (to have a more human readable version of the file) or as a TMX file. Statistical information of the undertaken process can also be exported. You can see in the previous section Exporting for more information on this feature.
3.2.3.10. Future extensibility
The current tool has been declared as an University of Limerick invention disclosure, its Case Number at UL is: 2006165. It is intended to convert it to an open source project and leave it open for other interested people to use, modify, distribute and/or improve to their own interests. The tool can be extended in the future to allow the introduction of more translation memory formats (at the moment TMX and XLIFF are the file formats that can be introduced in the Memory Container), and to allow more filtering options (we are using at the moment 18 of the 80 attributes defined in XLIFF 1.2). Moreover, when the new version XLIFF 2.0 will be released the tool will need to be adapted to fulfill its new features.
3.2.3.11. Output files in CAT tools
The enriched output XLIFF file that XLIFF produces is a valid XLIFF file, therefore it can be read by CAT tools that support this format. The metadata information that we added can only be displayed if the tool is adapted to do so. Swordfish II, a commercial CAT tool, was adapted to display our metadata information.
3.2.3.12. SWORDFISH II
Swordfish II is a commercial CAT tool developed by MaxPrograms. Its main developer is Rodolfo Raya, the secretary of the XLIFF TC. Due to our mutual collaboration in the
78 Technical Committee of the standard, Rodolfo Raya offered his help in this research and kindly modified his tool to allow the acceptance of our enriched files.
Swordfish works natively with XLIFF files, in the recent study21 of XLIFF support in CAT tools carried out within the XLIFF TC (Morado Vázquez & Filip 2012), Swordfish demonstrated having a great support of the standard.
The source and target of the alt-trans elements appear in a special box where the translation suggestions usually come in the tool. In that box you could also see the origin and the matching percentage information. The rest of the metadata information introduced with xml processing instructions appears in a floating box called “match properties” that can be shown or hidden according to the user preferences. It can also be located anywhere in the screen by the translator according to their readable preferences.
We decided to use this tool in our experiments for three main reasons: firstly, because it represents one of the current CAT tools that more faithfully follows the standard; secondly, because its main developer had modified the tool according to our needs and the experiment (explained in the next section), and thirdly, because the developer provided us with the necessary licenses of the tool and even three more to raffle between the participants of our experiment (which also helped us to attract more volunteers).
21 In this study, the information included corresponds to Swordfish III, a later version of tool, slightly different from the one used in our experiments. 79
Figure 13 Enriched XLIFF file in Swordfish
80 Chapter 4 – Methodology II
In academic research, an experiment is a strategy that investigates cause and effect relationships, seeking to prove or disprove a causal link between a factor and an observed outcome. (Oates 2006 p.127)
4.1. Introduction
After positively answering our initial question, which included the development of a prototype that allows us to recover a rich set of provenance metadata items that surround translation suggestions, we decided to investigate, in this second phase of our research, the influence that provenance metadata that surrounds translation suggestions has in the translator’s work. Therefore we decided to implement an experimental strategy which involved professional translators. Prior to the main study with professional translators, we carried out a pilot study with translation students from the University of Salamanca, Spain, in December 2010, which allowed us to test our methodology and better define its boundaries and limitations. The main study took place between December 2011 and January 2012, however, its design, testing, small pilot studies and the recruitment of the participants started a year prior to that.
In this chapter we describe the design and implementation of the pilot study and main study: the data and data collection methods used, the environment, the recruitment and nature of the participants, the experiment’ logistics, and a reflexion on its validity (internal, external and ecological). In chapter 5 we present the results obtained from the main experiment, and in chapter 6 their interpretation and conclusions derived from it.
Our research aimed to study the influence that the metadata that surrounded translation memories had in the translators’ behaviour during their task. The experiment consisted on the translation of a text by different groups of participants, each of them containing different state of the independent variable (one group without translation memory, one group with translation memory and the third group with translation memory and its correspondent provenance metadata). We collected different types of data from the participants that helped us to analyse if there were any change of behaviour in the translators and/or their work due to our manipulation of the independent variable.
81 4.2. Distribution of groups
In the pilot study we decided to use three groups (A, B and C) and give them the same text divided into three parts, each part had a different quantity of data and metadata (one section without metadata, one section with translation memory without metadata and one section with translation memory with metadata) and each of the groups received the same text but with a different distribution of data and metadata. However, this distributional strategy made the analysis and interpretation a complicated task, as it was difficult to establish a direct relationship between the independent variable (the inclusion of metadata) and the effect that it might have caused on the translators. In order to avoid those issues detected in the pilot study analysis, in our main study we decided to use a different distribution of groups: we divided participants into three groups (A, B and C) and each group received the same text to translate but with different grade of translation memory and provenance metadata: Group A received a text without translation memory, Group B received the same text to translate and a translation memory but without provenance metadata, Group C received the same text to translate with a translation memory with its provenance metadata. We deliberately included more participants in Group C, as this group contained the provenance metadata, because we wanted to investigate deeper some aspects that could only be done in this group (as the participant’s reactions and opinions towards the provenance metadata that they were exposed to). Strictly speaking, Group A and Group B were considered control groups.
4.3. Participants
In the pilot study that we carried out in December 2010, we used ten final year students from the University of Salamanca, Spain. These students were in their final year and had studied for a semester the module of localisation, therefore they were familiar with CAT tools and how they worked . Moreover, a one hour class was offered to them on the use of Swordfish a week before the experiment.
Recruitment of the participants
In the main study our target participants were professional translators. We can divide professional translators into two main categories: in-house translators, who worked in an office of a specific company and normally have a regulated timetable and wages, and 82 freelance translators, who usually work from their own homes, using their own equipment and distributing their own time and getting paid based on the volume of translation that they fulfil. Firstly, we estimated that the first set of professionals would allow us to carry out our experiments in a controlled physical environment where we could control better external variables that might affect the results of the experiment (for example, if they get distracted by a phone call). However, this option was not possible for two reasons: firstly, because the CNGL industrial partners that answered to our formal request did not have a sufficient amount 22 of translators of the language combination English-Spanish in place; and secondly, because it would be out of our reach to pay for that number of participants in any other external company. Therefore, we decided carry out our experiment in a server-based system where participants could log in from their own homes and subsequently recruit volunteer professional translators (in-house or freelancers) who would participate in our project donating their time and experience. Around 100 people expressed their interest in participating in the experiment, 59 participants carried out all the steps that it involved, and between them, 33 participants were considered as valid participants (experienced professional translators) and their data was analysed and interpreted. A descriptive analysis containing detailed information about the participants’ profile can be found in Chapter 5.
4.4. Experiment environment
In the pilot study, students carried out the experiments in the computer lab of the faculty. In the main study, we prepared a specific controlled environment in a server that was used by all the participants: we setup a server in the Localisation Research Centre office that was accessible remotely through internet. We created in this server 100 different accounts. The operating system of the server was Windows 7. In the desktop, participants had shortcuts to the recording program (BB Flashback Express), to the CAT tool (Swordfish II) and to three different web browsers (Microsoft Internet Explorer, Mozilla Firefox and Google Chrome), a pdf document with the translation assignment instructions was also in the desktop, as well as the file that they needed to translate. Participants had access to the internet.
22 We were aiming at recruiting a minimum of 30 participants. 83
Figure 14. Server Environment Apart from several internal testings, the server was tested with seven participants in November 2010, and this small pilot study confirmed that the environment was functional (in terms of performance of the tools included on it and in the connection speed). However, we did learn a good lesson from it: the input language of the server should be Spanish. We were using English before, and that caused some problems between the participants who were not able to change that option when they wanted to type special Spanish characters. Therefore, in order to avoid that issue to interfere in the participants’ performance, for the main study, we made sure that the server had Spanish as input language.
The server could only be accessed by one participant at a time. Participants received the password to access the server one hour before the experiment took place, this password was changed after the participant had finalised his/her experiment.
84
4.5. Data used
4.5.1. Source text
The experiment primarily consisted on the translation of a text using a CAT tool. We decided to use a technical text, as this textual type has been described as more inclined to work better with translation memories (Christensen 2003; Bowker 2005; apud Christensen and Schjoldager 2010). We also had the possibility of using technical texts because the CNGL industry partner, Microsoft, had donated to the researchers of the CNGL project Microsoft Office 2003 documentation translation memories and XML source files. After studying the composition of that material we judged it to be adequate to the purposes of our research and we decided to use it to compose the source text of our translation task.
The language combination that we decided to use was English into Spanish, we chose this combination because I am a certified translator in that combination therefore I am capable of judging the behaviour and the issues that the translators might have during and after the experiment; and also because Spanish was one of the three languages of the translation memories that Microsoft had donated to us.
“Using a segmented whole document vs using a document made of 30 segments coming from different (but related) documents”. This was one of the crucial decision we had to make when designing the experiment. In the pilot study we had chosen the first option, we used a whole document that was segmented on a sentence basis. This allowed us to provide the translator with the actual original html file so they could consult the content in its real context. However, as not all the sentences were equal in terms of length and word complexity, our later analysis became more difficult to extract valid conclusions, as length and word complexity could represent two more variables to have into account when analysing the results. And many new questions arouse: did the translator spend more time on that sentence because it was longer than the previous one or because there was a difficult word or because he did not use the translation suggestion? Therefore we decided to use a more artificial text (but controlled) in the main experiments. This text was composed by 36 segments that were equal in terms of length and complexity. In order to obtain these similar segments we followed the following process: 85 We only selected the documents inside the Excel documentation, these made a total of 682 documents. From this bigger data set we chose 100 in alphabetical order. From this 100 group, we decided to choose only the 50 smaller ones. From the final 50 group we analysed each sentence of the document using different readability tests 23 and we obtained 91 sentences that had passed positively the readability tests and had a length between 10 and 20 words. The size of the document that we were aiming at should be between 30 and 40 segments24, therefore, we decided to reduce our group of sentences to those which had a length between 15 and 20 words, we obtained 55 sentences. In the final selection process we eliminated sentences that were duplicated and sentences that had anaphoric or cataphoric references, we came out with a final text composed by 36 segments.
The final 36 sentences can be found in Appendix J. A critic that might arise from using an “artificial” text could be that translators work on a sentence base rather than on a text itself, however, it should also be noted that professional translators in localisation process (especially if more than one translator is involved in the same project) do normally work with segments and do not have an overview of the whole document/project (García 2008 p.54), our experiment reproduces that particular real situation.
4.5.2. Translation Memory
In the main experiment, Group A received the text (composed by the 36 sentences mentioned above) without translation memory. Group B and C received the same text with a translation memory, Group C received also the provenance metadata associated to the translation memory.
In order to create the translation memory we used the official Spanish translation of each of the selected segments (provided by Microsoft as well). The procedure we followed to create the translation memory that was added to the text to Group B and C was the following: first, in a XLIFF template file, we aligned each of the segments from
23 We used the readability texts proposed by EditCentral http://www.editcentral.com/gwt1/EditCentral.html, following the example of a similar study made by Mesa-Lao (2011 p.51), those readability tests included the following indexes and scores: the Flesch reading ease score, the automated readability index, the Flesch-Kincaid grade level, the Coleman-Liau index, the Gunning fog index and the SMOG index. 24 Based on the results of the pilot study, the time required for translating this quantity of segments will stay below 70 minutes in average. 86 the original English source with their correspondent official translation 25; then, we slightly modified the original translation units and their target segments to artificially produce translation matches with a match percentage between 80 and 90%; at this stage we also added the metadata26 information that will be explained in detail in the next section to each of the segments. After that, we merged the XLIFF file into an LMC using a tool called “LMC Builder”, which we had developed along with XLIFF Phoenix. Finally we introduced the original English file in XLIFF format, the one we had created with the 36 segments, into XLIFF Phoenix and we processed it against the LMC. This resulted in an enriched file with translation suggestions and their corresponding metadata. We had to repeat the process several times to adjust all matches to be in the 80-90% range. After obtaining our master enriched file that was used for Group C, we eliminated the provenance metadata information and that resulted into the Group B text. Both texts can be found in Appendix J.
4.5.3. Provenance metadata
In our research “provenance metadata” is the information that carries some meaning about the origin of the translation suggestion. In the pilot study we decided to include the metadata items that we had detected as provenance information inside the XLIFF specification (origin, company name, contact email, date, job id, source language, target language, tool-company, tool-id, tool-name, tool version and category), however those 14 items represented too much information for the participants and it also physically represented too much space in the match properties floating window that should be placed in the GUI. Therefore for the main study we decided to reduce that number of metadata items to only five, we based our decision on: eliminating duplication of items that contain similar information (for example: contact-name and contact-email); eliminating information that could be useful for other parties of the localisation process such as the project manager or the tool itself but might not represent valuable information for the translator (job-id, tool id, tool name, company name and tool version); and finally eliminating information about the source language which was only English in our experiment. We included in the main study the following metadata items:
25 In the case of the pilot study, as we were working with a single document, we automatically aligned it with its correspondent Spanish translation file using the tool Stingray Document Aligner by MaxPrograms. 26 We added the metadata information, although it was deleted later in the process in the case of the text that Group B received. 87 contact name, which contained information about the translator; date, which contained information about the date of the translation; target language which contained information about the target language; category, which contained information about the topic of the text; and original, which contained information about the name of the original source file from which the segment was extracted.
Reliable and unreliable metadata
Half of the segments included reliable metadata, that is, information that was meaningful to the participants, and half of the segments included unreliable metadata, that is, information that was meaningless to the participants. The distribution of reliable and unreliable metadata was introduced randomly, however, each segment contained either reliable or unreliable information in all the five metadata items, they were not mixed.
4.5.3.1. Reliable information.
Author
In the segments that contained what we called “reliable information”, the author was always “Antonio García”, in the translation assignment (which was given to the translators) that information was included; therefore, they knew that that translator was the official translator. The name “Antonio García” was artificially created from the most popular27 name (for males) and surname in Spain.
Target language
The target language for the segments that contained “reliable” metadata was always Spanish from Spain (es-ES). In the assignment instructions, we had indicated that they had to translate into that specific locale.
Date
The date was always superior to 2008, as we informed the participants in the translation assignment, that official translations were done after that year. We created the artificial dates, by randomly obtaining numbers: for the day (numbers between 1 and 28), for the
27 We consulted the Spanish National Institute of Statistics: http://www.ine.es/daco/daco42/nombyapel/nombyapel.htm 88 month (numbers between six and seven) and for the year, we always picked the year 2009.
Category
In the category item (the topic of the text) we included one of these three key words “spreadsheet”, “database” and “worksheet” directly related to the Excel program (participants were informed that they were translating official Excel documentation), the two last words were obtained from the Microsoft Word thesaurus after inserting the keyword “spreadsheet”.
Original
In the original item, we included the name of the original file, but we included or deleted one of the letters from the original title name, so participants could not have a direct access to the official original or translated document (as most of them are publicly available on the Microsoft official webpage). This is an example of a set of “reliable” provenance metadata items included in one of the translation suggestions:
4.5.3.2. Unreliable information
Author and target language
In the segments that contained “unreliable” information, the author and target language were related. For the languages we included half of the segments (9) with the Spanish from Spain locale (es-ES), and half of the segments from Latin American Spanish locales from the fourth most populated countries of that world region28: Spanish from Mexico (es-MX), Spanish from Argentina (es-AR), Spanish from Perú (es-PE), and Spanish from Colombia (es-CO). In the case of the segments that were indicated with the Spanish from Spain locale we included one of the following names: José González,
28 Based on data from the World Population Prospects: The 2010 Revision report edited by the ONU: http://esa.un.org/unpd/wpp/Excel- Data/DB02_Stock_Indicators/WPP2010_DB2_F01_TOTAL_POPULATION_BOTH_SEXES.XLS 89 Manuel Rodríguez, María Fernández or Carmen29 López, those names were artificially created joining the most popular names and surnames30 for males and females in Spain in 2010 (excluding the above mentioned “Antonio García”). In the case of the Latin American names, we created their names joining the most popular 31 names and surnames of people from those nationalities living in Spain: Miguel Ángel González for Argentina, Alejandro García for Mexico, Luis Alberto Rodríguez for Perú, and Juan Carlos García for Colombia.
Date
We artificially created dates that were earlier than 2008 using a random number generation method: for each day we selected a number between 1 and 28, for each month we selected a number between one and twelve, and for each year we selected a number between 2004 and 2007.
Original
We selected randomly five names of files from each of the following Microsoft Office products that were in the documentation that was donated from Microsoft: Access, PowerPoint, Outlook and Word32. If the original name contained some numbers or especial identification code, we eliminated them to create a more human readable file name.
Category
This item in each suggestion was related to the original information. If the name of the file came from a specific program, we introduced a keyword related to that program (which corresponded to the main product that they manipulated). In the case of file names from the Access documentation we included the keyword “database”, from the PowerPoint documentation we included the keyword “presentation”, from the Outlook
29 The first most common names for females were “María Carmen”, “María” and “Carmen”, we decided to use "María" and "Carmen" individually and discard “María Carmen”. 30 We consulted the Spanish National Institute of Statistics: http://www.ine.es/daco/daco42/nombyapel/nombyapel.htm 31 We consulted the Spanish National Institute of Statistics: http://www.ine.es/daco/daco42/nombyapel/nombyapel.htm 32 There were other Microsoft products apart from the four above mentioned in the documentation we received. 90 documentation we included the keyword “email” and from the Word documentation we included the word “text”.
4.6. Logistics
For the main study, in the call for participation (available in Appendix L) sent through several mailing lists, it was stated that participants should contact Lucía Morado to her email address. Around one hundred requests were received. An email with instructions on how to register, along with the official information sheet33 approved by the ethical committee of the University of Limerick was sent to those who showed interest. Participants were asked to choose two timeslots: one for attending the Swordfish Webinar, and one for the execution of the experiment, the allocation of the participants was done using a web based system (www.doodle.com) where participants could choose the time that better suited their interests (from a set of different timeslots that we had previously introduced in the system). More than 50 34 participants attended the webinar, which took place in thirteen different time slots in order to reach to all the requests.
The webinar had three different sections: an introduction to the research, an overview of Swordfish’s main functionalities with a live demonstration, and a description of the experiments instructions (access to the server, recording program, translation assignment, and the configuration of the match properties floating windows which contains the metadata information in the tool). Participants were encouraged to ask any doubts about the use of the tool or about the experiment. The webinar helped us not only to attract more participants (interested in learning about the CAT tool), but also it allowed us to make the participants more familiar with Swordfish and with the task that they would have to carry out.
33 See Appendix M. 34 It was difficult to track attendance with participants who joined the webinar through telephone instead of the web version as their data was not register in the system automatically. 91 The experiments took place from early December 2010 to the end of January 2012, four slots were offered per day from Monday to Saturday in order to accommodate all the participants and their different timetable requirements. We planned two hours per each participant with an hour gap between participants, which allowed us to access the server, collect the data from the previous participant, change their password and prepare the account for the next participant. We had also facilitated other timetable possibilities for some participants who could not attend to the official experiment hours (during Sundays or very early in the morning or late in the evening).
Freelance translators do not always have a fixed timetable, and sometime they receive unexpected and urgent assignments. We had to postpone some of the scheduled experiments of some participants due to those circumstances. We were flexible and tried to accommodate to the needs of all the participants.
A remainder email was sent to participants the day before their scheduled experiment. An hour before the scheduled experiment, we sent each participant an email with instructions on how to access the server and his/her specific password, and also with the urls of the two questionnaires that he/she had to fulfil. We facilitated them our Skype contact in case they had any technical problem during the experiment, and we also were available through email during the time of the experiment.
In February 2012 we raffled three full licences of Swordfish III to those participants who completed the experiment.
4.7. Translation assignment instructions
In the main study, participants received translation assignment instructions prior to start translating. Participants from all groups were told to translate the document present in the server from English to Spanish of Spain (es-ES). They were also informed that the document was made of segments coming from different official Excel documentation documents that could have or not a relation between them.
In the case of Group A, we informed them that we did not provide them with any translation memory and we asked them not to create any translation or terminology memory database.
92 In the case of Group B, we informed the participants that we provided them with a translation memory, however no information about its origin was presented, and we also told them that the suggestions could come from official or unofficial translations. We told them to use the translation memory if they considered it necessary. We reminded them of the shortcut that they could use to automatically copy the translation suggestion to the translation editor section in Swordfish. The last instruction given to them was the recommendation that they should use their common sense when they had to take decisions; Bowker (2005), in a translation experiment carried out with students, highlighted this last statement stating that students (in her case participants were students) should always be reminded that blindly following the TM was not the most advisable thing to do. It was also suggested by Guerberof (2009) when translators are faced with translation suggestion with a high fuzzy match percentage translators might trust the “content that flows naturally without necessarily critically checking accuracy against the source text”35.
In the case of Group C, we informed the participants that we provided them with a translation memory with provenance information. We told them that, to visualize the provenance information they should activate the “Match Properties” window. We told them the type of metadata information that it will be shown in that window: name of the translator (contact-name), date of the translation (date), target language of the translation suggestion (target-language), topic of the text (category) and name of the original file from which the suggestion comes from (original). We also indicated to them that some suggestions came from the official translations of the Excel documentation and others didn ot. We told them that official translator of the office documentation was Antonio García, but there can be translations from other translators. We also told them that official translations were created after 2008 and that there could be translations from other Spanish locales. As we did with Group B, we reminded them of the shortcut to copy the translation suggestion to the translation editor section in Swordfish and we told them to use the TM if they found it necessary and to use their common sense when they had to take decisions.
35 This statement was in the context of her study, where she compared gains in productivity comparing translation suggestions (within a 80-90% fuzzy match) with translation suggestions produced by a machine translation system. 93 4.8. Data collection methods
The use of more than one data generation method to corroborate findings and enhance their validity is called method triangulation. (Oates 2006 p.37)
In our experiments we decided to use a method triangulation, which implies the use of two or more data collection methods. The use of more than two data generation methods allow us to obtain more data, to corroborate the results by comparing the data from different methods; and to enhance the validity of our results (Oates 2006 p.37).
We decided to use data collection methods that helped us to measure all the changes that could be made due to the manipulation of the independent variable or by any other factor that might influence the results. Independent variable would be the use of TM and TM+MD, the dependent variables in our experiment would be time, quality and other changes in behaviour that can be observed in the videos.
Two main data generation methods were put into place: questionnaires and observations. One questionnaire that aimed to obtain demographic information from the participants was fulfilled before the translation task; and a second questionnaire, which aimed to obtain the participant impression on the task, was fulfilled after the translation task. A complete description on the design of the questionnaires can be found in the next sections. During the translation task we used a recording software program (BB flashback Express in the main study) that allowed us to obtain two types of information: a video with the movements of the screen and a XML file that contained keylog information (all the typing efforts of each of the participants). The last type of data that we obtained was the translation file itself that participants had to translate. The data obtained from all these methods is presented in Chapter 5 and analysed and interpret in Chapter 6.
4.8.1. Design of the demographic questionnaire
The main purpose of the demographic questionnaire was to obtain a complete profile of the participants. We aimed at three different type of data: demographic data (age and gender), translation experience (current position, years of experience, daily translation activity, other translation related activities, and main and secondary language combinations), and experience with Computer Assisted Translation (CAT) tools (use of 94 CAT tools and Translation Memories, familiarity with the tool used in the experiment and familiarity with the XLIFF standard). The answers to these questions allowed us to better describe our participants and also compare results not only between the different groups but also on other variables, like for example the years of experience. These questions also allowed us to identify the grade of internal validity that our experiment has, as we could demonstrate that participants through the three groups have similar characteristics on average (similar years of experience, experience with CAT tools, etc.).
The questionnaire had open and closed questions. Closed questions contained predefined answers that participants could select from, if appropriate an extra “other” answer was given where participants could introduce their own answer. In open questions, a text box was provided to the participants to introduce their answers, however, to obtain a uniform set of answers and to better explain the questions, an example answer was also provided, see question 14 in the explanatory table for an example.
Different types of quantitative data were obtained from different questions:
Nominal data “which describes categories and has no actual numeric value” (Oates 2006 p.247). Nominal data was obtained from questions 8 and 9.
Ordinal data where “numbers are allocated to a quantitative scale” (Oates 2006 p.247). Ordinal data was obtained from questions 16, 17, 18 and 19.
Interval data “is like ordinal data, but now measurements are made against a quantitative scale where the differences, or intervals, between points of the scale are consistently the same size, that is, the ranking of the categories is proportionate” (Oates 2006 p.247). Interval data was obtained from question 7.
Ratio data “is like interval data, but there is a true zero to the measurement scale being used” (Oates 2006 p.247). Ratio data was obtained from questions 7, 11 and 12.
Qualitative data was obtained from questions 6, 13, 14, 15 in our analysis, and whenever possible, we analysed that data following quantitative techniques, for example: counting the number of participants who introduced as their main language combination “English into Spanish”, and qualitative data was also obtained from the
95 “Other” text boxes in some questions. You can find in Chapter 5 a summary of the results obtained from this questionnaire. A copy of the questionnaire that the participants completed can be found in Appendix A, the complete set of answers to the questionnaire can be found in Appendix D.
In the following table you can find a detailed analysis of the questionnaire:
Section Question Explanation This section included an introduction that informed the
participants about the questionnaire and reaffirmed them in the confidentiality of their data, which would be only used for scientific purposes. We also stated that personal data Introduction
would not be revealed under any circumstances. 1,2,3,4 This section included the consent form with the following statements: 1. I confirm that I have read the Information Sheet regarding the proposed research and understand the nature of the study. 2. I acknowledge that I have had the opportunity to ask the researchers involved in this research any questions that I might have and these have been met to my satisfaction. 3. I understand the terms under which I am participating. 4. I understand that the information that I give may be used Consent Form Consent by researchers involved in this research and that my participation is voluntary and that I may withdraw myself and my data at any time throughout the study until the research is submitted. If all the points were accepted, participants could continue answering the questionnaire. 5 This section included a confidential agreement where participants agreed not to reveal any detail of the
experiments (procedures, data, content and software used) to any third parties. If they had accepted this condition they Agreement Confidential Confidential could continue answering the questionnaire.
6. Participant Name Participant Name. (e.g. Participant22) This question aimed to obtain the participant name in order
to identify its data later in the analysis. The participant name was given to the participants in the first interaction that we had, it was not related with his/her name of the group Personal Data Personal affiliation. Answering his question was compulsory. A text box was provided where participants could introduce their
96 information.
7. Date of Birth This question aimed to obtain information about the age of participants. A text box was provided where participants could introduce their information. 8.Gender This question aimed to obtain the gender information of the participants. Two options were provided to the participants: Male and Female 9. Current Position (if This question aimed to obtain information about the current applicable, check position of the participants. Three predefined answers were more than once) proposed (Translation Student, Freelance Translator and In- house Translator), a fourth “other” option was also offered with a text box where participants could introduce additional information. 10. If you are a This question, to be completed only by translation students, translation student, aimed to obtain information about the year of studies. A text specify in which year box was provided where participants could introduce their are you? (e.g. I am in information. my 3rd year).
11. If you work as a This question, to be completed only by translation translator, how long professionals, aimed to obtain information about their years have you been of translation experience. This information was a deciding working factor to select the valid participants for our analysis. We professionally? (E.g. considered only those participants who had answered to this Translation Experience Translation three years) question and who had stated two or more years of experience. A text box was provided where participants could introduce their information. 12. Translation This question aimed to obtain the number of hours that activity, in hours per translators spent every day on translation activities. Eleven day: predefined answers were offered to the participants and a twelfth “other” option was also offered with a text box where participants could introduce additional information. 13. Do you carry out This question aimed to obtain information about other other translation translation related activities that participants might do other related activities on than translating. A text box was provided where participants
97 your daily work (E.G. could introduce their information. “I do postediting")
14. Main working This question aimed to obtain information about the main language working language combination of the participants. A text combination(s). (E.g. box was provided where participants could introduce their English into Spanish) information. 15. Other working This question aimed to obtain information about the language other/secondary working language combination of the combination(s). (E.g. participants. A text box was provided where participants French into Spanish) could introduce their information. 16. Use of CAT tools. This question aimed to obtain information about the How often do you use participant’s use of CAT tools in their work. Five predefined them? answers were given to the participants, ranked from more to less use of CAT tools, a sixth “other” option was also offered with a text box where participants could introduce additional
information. 17. Use of Translation This question, which was related to the previous one, aimed Memories. How often to obtain information about the participant’s use of do you use them? Translation Memories in their work. Five predefined answers were given to the participants, ranked from more to less use of Translation Memories, a sixth “other” option was also offered with a text box where participants could introduce additional information. ssisted Translation (CAT) tools (CAT) ssisted Translation 18. Familiarity with This question aimed to obtain information about the Swordfish. How well participant’s familiarity with the tool that would be used in do you know this the experiment (Swordfish). Five predefined answers were CAT tool? given to the participants, ranked from more to less use of Translation Memories, a sixth “other” option was also offered with a text box where participants could introduce
Experience with Computer A Computer with Experience additional information. 19. Familiarity with This question aimed to obtain information about the XLIFF (XML participant’s familiarity with the XLIFF standard. A scale of Localisation six grades was offered to the participants, where the lowest Interchange File point indicated “Total unfamiliar” and the highest point “I Format. am very familiar with it.”
98
4.8.2. Design of the task specific questionnaire
The main purpose of the task specific questionnaire was to obtain information about the participants’ impression and experience after completing the translation task: familiarity with the topic of the text, experience with Microsoft© Office products, linguistic difficulty, required assistance, doubts and consulted external resources. The three groups received the same questions, however Group C received nine additional questions related to the provenance metadata that they could have consulted during the translation task, which aimed to obtain information about the usage that participants made of that information (consulted, useful, influence on their work, grade of distraction and general comments).
Similarly to the demographic questionnaire, the task specific questionnaire had open and closed questions. Different types of quantitative data were obtained from different questions:
Nominal data was obtained from questions 7 and 8, (and 9, 10, 11,12, 15 and 17 from the Group C questionnaire).
Ordinal data was obtained from questions 3, 4 and 5 (and 16 from the Group C questionnaire)
Ratio data was obtained from questions 2 and 6.
Qualitative data was obtained from question 9 in the questionnaire to Group A and B, and from questions 13, 14, 18 and 19 in the Group C questionnaire. A summary of the results obtained from this questionnaire can be found in Chapter 5, and an analysis and interpretation of the data can be found in Chapter 6. A copy of the questionnaire that the participants from Group A and B completed can be found in Appendix B, the questionnaire that participants from Group C received can be found in Appendix C. The complete set of answers to the questionnaire can be found in Appendix D.
In the following table you can find a detailed analysis of the questionnaire:
99 Section Question Explanation
1. Introduce your This question aimed to obtain the name of the participant participant name (e.g. which allowed us to identify them. Participant22)
Identification 2. Duration. Please This question aimed to obtain the duration of the translation introduce (in minutes) task. However, in our analysis we discarded this answer, and the exact duration of we obtained that information from the video recordings of the task. From the the participants. moment you opened Swordfish until the moment you saved your translation. If you do not know the exact time, leave it blank. The researcher can find it out by analysing your video.
3. Topic of the text. This question aimed to obtain information about the How familiar were participant’s familiarity with the topic of the text. A scale of Difficulty
you with the topic of six grades was offered to the participants, where the lowest the text? point indicated “Total unfamiliar” and the highest point “I am very familiar with it.”
Duration and Duration 4. Experience This question aimed to obtain information about the working with participants experience with Microsoft© Office products. Microsoft© Office Five products were listed; we were particularly interested in Products. Microsoft Excel, because the translation task consisted in the translation of segments from the online documentation of that tool. A scale of six grades was offered to the participants, where the lowest point indicated “I have never worked with it” and the highest point “I am an advanced user.” 5. Linguistic This question aimed to obtain information about how difficulty linguistically difficult the translation had been to the participants. A scale of six grades was offered to the participants, where the lowest point indicated “Very difficult
100 (I needed to consult many terms and expressions)" and the highest point indicated “very easy (I could do it without consulting external resources).” 6. Assistance. How This question aimed to obtain information about the number many times did you of times that participants required assistance. Seven ask for help? predefined answers were offered to the participants and an eighth “other” option was also offered with a text box where participants could introduce additional information. 7. Doubts. If you had This question aimed to obtain information about the nature any doubts, what of the doubts that participants had during their translation were them about? task. Four predefined answers were given to the participants (please select more (linguistic related, technical related, CAT tool related and than one answer if experiment instructions), a fifth “other” option was also applicable). offered with a text box where participants could introduce additional information. 8. External This question aimed to obtain information about the external Resources. Did you resources that participants might have used during the consult any external translation task. Four predefined answers were given to the resources? (Please participants (No, machine translation, online dictionaries, select more than one Microsoft© Excel webpage), a fifth “other” option was also answer if applicable). offered with a text box where participants could introduce additional information. 9. Additional This question offered the participants an opportunity to
comments. Please include any additional comment that they might have wanted write below any to add. additional comment Final Remarks Final you might have.
Task Questionnaire Group C. Questions 1 to 8 were the same as in the previous questionnaire (the one that Group A and Group B received). Additional questions were presented in this questionnaire that referred to the provenance metadata that only this group received. In the following table you can find a detailed description of each of the questions:
101 Section Question Explanation 9. Which metadata This question aimed to obtain information about the item(s) did you metadata items that they consulted during the translation consult? (Please task. The five metadata items that they received were offered select more than one as predefined answers (contact-name, date, target-language, answer if applicable) original, and category). 10. Which metadata This question, which was related to the previous one, aimed item(s) did you find to obtain information about the metadata items that they more useful? (Please found more useful during the translation task. The five select more than one metadata items that they received were offered as predefined answer if applicable) answers (contact-name, date, target-language, original, and category). 11. Which metadata This question, which was related to the previous ones, aimed item(s) did you find to obtain information about the metadata items that they less useful? (Please found less useful during the translation task. The five select more than one metadata items that they received were offered as predefined answer if applicable) answers (contact-name, date, target-language, original, and
category). 12. Which metadata This question, which was related to the previous ones, aimed
Metadata item(s) did you not to obtain information about the metadata items that they did consult? (Please not consult during the translation task. The five metadata select more than one items that they received were offered as predefined answers answer if applicable) (contact-name, date, target-language, original, and category). 13. Could you explain This open question aimed to make participants think about in one sentence how the influence that the metadata had in their work and express has influenced your it through an example. work? 14. Would you This question aimed to obtain metadata item suggestions that suggest any other participants might have had. metadata item? 15. If you could This question aimed to know which option participants choose between would choose if they could have the choice: Translation having a translation memory with metadata or without it. Those two options were memory with offered as predefined answers. metadata or another one without metadata,
102 which one would you prefer?
16. How distracted This question aimed to obtain information about the grade of were you by the distraction that metadata meant for the participants. A scale metadata during the of six grades was offered to the participants, where the translation task? lowest point indicated “I was confused by the amount of data” and the highest point “I was not distracted at all, it only helped me.” 17. Did the metadata This question aimed to obtain a direct yes/no answer about influence you to do a the influence of the metadata on the participants’ work better job? according to their own opinion. Two Boolean predefined answers were given to the participants (yes/no) 18. If yes, could you This question, which was related to the previous one, aimed give us an example? to make participants think about the influence that the metadata had in their work and express it through an example. This question aimed to obtain similar data than question 13. 19. Additional This question offered the participants an opportunity to
comments. Please include any additional comment that they might have wanted write below any to add. additional comment
Final Remarks Final you might have.
103 4.9. Validity
As we stated in Chapter 1, the big challenge that experiments involving translators and translation memories is that their own nature imply a wide range of dependent variables (subject variability, topic of the text, time constrictions, environment used), this implies that they cannot be strictly considered as true experiments, but they can be better categorised as “quasi-experiment”. We carefully designed the experiment in order to be able to measure as many variables as possible that might affect the results and we used different data collection methods, which combined, increased the validity of the results.
In quasi-experiments, there are often variables that the researchers cannot control and might have caused the measured effect. They can never, therefore, establish cause and effect as conclusively as a true experiment. (Oates 2006 p.134)
External validity
Your experiment has good external validity if your results are not unique to a particular set of circumstances but are generalizable, that is, the same results can be predicted for subsequent occasions and in other situations. (Oates 2006 p.132)
The main threats to the external validity are the non representativeness of the participants and/or the use of a reduce number of participants (ibid). In terms of representativeness, in the pilot study we used translation students, from which we could not extract conclusions, but only some hints or indicators, however, this pilot study helped us to redefine our methodology and improve the design in some aspects. In the main study our aim was professional translators. In fact, although some students participated as well in the main study, we only analysed the data from those participants who had stated to have two or more years of professional experience, participants who had not answered that question were automatically discarded from our study. Therefore the data presented in the following chapters represents data coming from professional translators in the language combination English-Spanish. In terms of numbers, we analysed a total of 33 valid answers, this study represents (to the best of our knowledge) the second biggest dataset in translation experiments efforts that were carried up to date. The biggest one, which implied the participation of 90 translators, was carried out by a
104 group of translation researchers who participated in a national funded project: the TRACE36 project, based at the Autonomous University of Barcelona, Spain.
Internal validity
Your experiment has good internal validity if the measurements you obtain are indeed due to your manipulations of the independent variable and not to any other factors. (Oates 2006 p.131)
To ensure the internal validity of the experiments we put into practice three main control mechanisms: the use of more than one data generation methods which allowed us to infer the same conclusions by analysing different type of data; the use of an external reviewer who assessed blindly and independently all the translation task done by the participants, this avoided including any possible biased correction in the translations by the researcher; and the use of control questions in the questionnaire, that is, similar and opposite questions which aimed to obtain not only the information of their statements, but also helped us to determine if the participants were answering them consistently and not randomly.
In terms of participants, with the demographic questionnaire that all participants fulfilled, we did not find significant differences between the average results of the participants of each group: similar results in terms of personal data, translation experience and experience with CAT tools were found amongst the three groups. A summary of these results can be found in the first section of Chapter 6. Therefore we can state that the differences between the results obtained in the translation task might be attributable to the manipulation of the independent variable and not to the profile of the participants.
36 http://grupsderecerca.uab.cat/tradumatica/es/content/trace-traducci%C3%B3n-asistida-calidad-y- evaluaci%C3%B3n 105
Ecological validity
The question of whether an effect holds up across a wide variety of people or settings is somewhat different than asking whether the effect is representative of what happens in everyday life. This is the essence of ecological validity – whether an effect has been demonstrated to occur under conditions that are typical for the population at large.
(Crano et al 2001 p.110)
In the previous section we stated that we did not control the physical environment of the participants: the conditions of the room level of noise, external help, etc. that could distort the results of each experiment. This can be seen as a detrimental point of the validity of our experiment; however we believe that it represents a reinforcement of the ecological validity. Those factors are naturally present in freelance translators’ working environment. If we took those translators from their natural working and study them in a computer lab facility, where all those external factors can be minimised and controlled, we would be getting controlled results; however, they would be somehow artificial, as those would not represent real situations in which freelance translators would work.
The only “invasive” method that we imposed to the translators was the recording of their movements, which might have slightly affected their behaviour as they were fully aware of being observed, however, once the program was initiated it stayed in the background and the translator could act freely. The environment in the server was prepared so the translators could work in a simulated working environment: they had a CAT tool, the translation assignment and internet connection where they could consult any reference material they might have needed (for example, online dictionaries).
Freelance translators carry out their daily work on their own environment, which is not controlled or accessed by their employers. We had the same (null) control over those conditions as their employers do. We assumed that professional freelancers had their own delimited working space distributed in the way that better worked for them.
However, a controlled environment was used in the pilot study with translation students (the computer room in the faculty of Translation at the University of Salamanca). In this case the computer room did not represent an artificial working environment for the
106 students, as they regularly undertook lessons there, and they had freely access to that room when lessons were not taking place.
We should also say that "controlling" or "measuring" the physical environment of the participants could have been made if other data gathering methods were taken into place: video recording of the room with audio. However, these methods were discarded as we assessed them as impossible to implement in our circumstances (in the case of the main study, they would have to be manipulated by the participants themselves, who might not have the right equipment or might not want to be physically recorded) and to measure.
The concept of ecological validity reinforced our initial statement where we considered our experiments as “quasi-experiments”:
Some researchers therefore use quasi-experiments (quasi means 'as if'). They try to remain true to the spirit of classic laboratory experiments, but concentrate on observing events in real-life settings where there is a ‘naturally occurring’ experiment. (Oates 2006 p.134).
107
Les grandes personnes aiment les chiffres. Quand vous leur parlez d’un nouvel ami, elles ne vous questionnent jamais sur l’essentiel. Elles ne vous disent jamais: «Quel est le son de sa voix? Quels sont les jeux qu’il préfère? Est-ce qu’il collectionne les papillons?» Elles vous demandent: «Quel âge a-t-il? Combien a-t-il de frères? Combien pèse-t-il? Combien gagne son père?» Alors seulement elles croient le connaître. Si vous dites aux grandes personnes: «J’ai vu une belle maison en briques roses, avec de géraniums aux fenêtres et des colombes sur le toit...» elles ne parviennent pas à s’imaginer cette maison. Il faut leur dire: «J’ai vu une maison de cent mille francs.» Alors elles s’écrient: «Comme c’est joli!»
(de Saint-Exupéry 1946 pp.19–20)
108 Chapter 5 – Results
5.1. Introduction
In this chapter we present the data obtained from the main study of the experimental phase. Firstly we present each of the groups and their participants. Then we present the data obtained from the different data generation methods used in our experiments in the following order: Demographic Questionnaire, Task Questionnaire, Translation Evaluation (results from reviewing the translation tasks of each of the participants using the LISA QA Model), and Video Observations (which includes the keylog information and the video recordings). In each of the sections you can see the data divided by each of the groups, due to document length restrictions only relevant information is presented, however, the complete set of answers can be found in the appendix sections (Appendix D contains the answers to the Demographic Questionnaire, Appendices E and F contains the answers to the Task Questionnaire, Appendix G contains the answers to the Review of the translation task, Appendix H contains the keylog information, Appendix I contains the templates from the video recordings). In this chapter we present a descriptive analysis of the data obtained, an interpretation of the data and its discussion can be found in Chapter 6.
Much of the data presented in this section has similar attributes and may at time seem slightly repetitive, however in the interests of thoroughness we have decided to include it all in order to give a full report of each of the variables measured.
5.2. Participants
5.2.1. Group A – No TM
This group did not receive any translation memory. It was composed by ten participants. From that group we had to discard one of the participants, because the second questionnaire was not entirely completed. We had at this point nine valid responses. However, not all the participants had the same profile and we were focusing our research on experienced37 professional translators. There was one translation student,
37 By “experience” professional translators, we mean translators who have at least two years of professional experience. One of the questions in the demographic questionnaire was aimed to obtain this information. Participants who did not answer it were automatically discarded from our analysis. 109 seven freelance translators and one in-house translator. We finally discarded one more participant, the in-house translator; because he/she stated that he/she did not have professional experience. The rest of the participants had more than two years of professional experience.
5.2.2. Group B - TM
This group received the text with a translation memory, but without provenance metadata. It was composed by ten participants. There were two translation students, five freelance translators, one of the freelance translators that we included in this category worked as "Localization Specialist, Consultant, Sales & Marketing” and three in-house translators, one of participants that we included in the in-house category worked as an industry CAT supporter and another one was the CEO of a translation company and had 5 years experience as in-house translator and 5 years experience as freelancer. We finally discarded two participants because they did not have the two years experience. Our final Group B was formed by 6 experience professional translators.
5.2.3. Group C – TM + Provenance Metadata
This group received the text with a translation memory with provenance metadata. It was composed by 41 participants. From that group, incompleteness of the data made us discard five participants. We had at this point 36 valid responses coming from: 12 translation students, 18 freelance translators, and 6 in-house translators. We discarded translation students. From the remaining professional participants we had to discard two participants who had not answered to the question of years of experience and two other participants who had only one year of professional experience. We had 20 valid participants in this group: 14 freelance translators and six in-house translators. Data from this group will be presented in the next section.
110 Position Participants Translation Student 12 Freelance Translator 18 In-house Translator 6 Total 36 Table 3. Group C. Participants' position.
111 5.3. Demographic Questionnaire
5.3.1. Group A
We present in this section the data obtained from Group A38 from the demographic questionnaire, we obtained the following information: age, gender, position, experience years, hours per day, other translation-related activities, main language combinations, other language combinations, CAT tools usage, TM usage, familiarity with Swordfish (the tool used in the experiment) and familiarity with the XLIFF standard. This information allowed us to better define the profile of our participants and we could also study if different values in the different aspects studied had a relation39 with the results from the experiments.
5.3.1.1. Age
The arithmetic mean is 30.5 years. The median is 27 (which is the middle value) and the mode is 27 (which is the most common value). The standard deviation is moderately high 6.3. It is positively skewed as the frequency is higher in the low scores.
Age Mean 30.57143 Standard Error 2.418748 Median 27 Mode 27 Standard Deviation 6.399405 Sample Variance 40.95238 Kurtosis 1.759253 Skewness 1.42764 Range 18 Minimum 25 Maximum 43 Sum 214 Count 7 Table 4. Age. Group A
38 Only data from the participants that were considered as valid participants is presented here. 39 In the next chapter we present a study on the correlation of the different values obtained. 112 5.3.1.2. Gender
Six of the participants were females and there was one male.
5.3.1.3. Position
All of the seven participants were freelance translators.
5.3.1.4. Experience years
The arithmetic mean is five years of professional translation experience, the median and the mode are four. The standard deviation is 4. It is positively skewed (2.4) as the frequency is higher in the low scores.
Participant Experience Years GA1 4 GA2 14 GA3 5 GA4 3 GA5 2 GA6 3 GA7 4
Experience Years Mean 5 Standard Error 1.543033 Median 4 Mode 4 Standard Deviation 4.082483 Sample Variance 16.66667 Kurtosis 5.81568 Skewness 2.345631 Range 12 Minimum 2 Maximum 14 Sum 35
113 Count 7 Table 5. Experience Years. Group A 5.3.1.5. Hours per day
One of the participants did not select one of the predefined options for this answer. Two participants stated that they worked eight hours a day, one of the participants stated that he/she worked five hours a day, one participant worked three hours a day and two participants two hours a day. The arithmetic mean is 4.6, the median is 4 and the mode is 8. The standard deviation is 2.8. It has a weak positive skew distribution (-0.4), as the frequency is slightly higher in the lower scores.
Hours per day Mean 4.666667 Standard Error 1.145038 Median 4 Mode 8 Standard Deviation 2.804758 Sample Variance 7.866667 Kurtosis -2.22565 Skewness 0.429053 Range 6 Minimum 2 Maximum 8 Sum 28 Count 6 Table 6. Hours per day. Group A
114
Hours per day Participants Less than one hour 0 1 0 2 2 3 1 4 0 5 1 6 0 7 0 8 2 n/a 1
5.3.1.6. Other translation-related activities
When asked whether they fulfilled other translation related activities, two of the participants stated that they did not do anything apart from translation. Two of the participants stated that they did proofreading. Two participants worked also as teachers. One participant stated to work as an interpreter and one participant stated to do editing and terminology management (as well as proofreading).
Participant Other activities GA1 I do interpreting GA2 Proofreading, editing, terminology management. GA3 I review and I´m a teacher, too. GA4 No. GA5 Proofreading. GA6 No. GA7 I teach English Table 7. Other translation-related activities. Group A
115 5.3.1.7. Main language combinations
All the seven participants had EN-ES40 as their main language combinations. As well as EN-ES, two participants stated that they worked with the DE-ES combination, and one participant with the ES-EN, FR-EN and FR-ES combination.
Main language combinations 8 7 6 5 4 Participants 3 2 1 0 EN-ES ES-EN DE-ES FR-EN FR-ES
Figure 15 Main language combinations. Group A
5.3.1.8. Other language combinations
In other language combinations we had two participants that stated FR-ES, and for the following language combinations there was only one participant: FR-CA, IT-ES, GL- ES, EN-CA, IT-EN and AR-ES.
40 We use two letter language codes to represent the names of the languages: EN for English, ES for Spanish, DE for German, FR for French, IT for Italian, GL for Galician, CA for Catalan, PT for Portuguese, DU for Dutch and AR for Arabic. 116 Other Language combinations 2.5
2
1.5
Participants 1
0.5
0 FR-ES FR-CA IT-ES GL-ES ES-GL EN-CA IT-EN AR-ES
Figure 16. Other language combinations. Group A. 5.3.1.9. CAT tools usage
Six possible answers were offered to the students: from higher use to lower use: 1. I use them in all my translation activities, 2. I use them only with some of my translation assignments. 3. I use them only when I am required to do so. 4. I have tried them, but I do not use them for my daily work. 5. I have never tried. 6. Other.
Four participants chose option 1 (I use them in all my translation activities), two participants chose option 1 (I use them only with some of my translation assignments), and one participant chose option 3 (I use them only when I am required to do so).
The arithmetic mean is 1.5, the median and the mode are 1. The standard deviation is 0.7. It is positively skewed (1.1) as the frequency is higher in the low scores.
CAT tools usage Mean 1.571429 Standard Error 0.297381 Median 1 Mode 1 Standard Deviation 0.786796 Sample Variance 0.619048 Kurtosis 0.273373 Skewness 1.11455 Range 2 117 Minimum 1 Maximum 3 Sum 11 Count 7
Cat tools usage 4.5
4
3.5
3
2.5
2 Participants
1.5
1
0.5
0 1 2 3 4 5
Figure 17. CAT tools usage. Group A
5.3.1.10. TM usage
In this question we asked the participants about the use of translation memories - how often they used them. The same answer options as in the previous questions were offered. Participants from this group submitted the same scores for this question as in that previous one:
Four participants chose option 1 (I use them in all my translation activities), two participants chose option 1 (I use them only with some of my translation assignments), and one participant chose option 3 (I use them only when I am required to do so).
The arithmetic mean is 1.5, the median and the mode are 1. The standard deviation is 0.7. It is positively skewed (1.1) as the frequency is higher in the low scores.
118 TM usage 4.5
4
3.5
3
2.5
2 Participants
1.5
1
0.5
0 1 2 3 4 5
Figure 18. TM Usage. Group A
TM usage Mean 1.571429 Standard Error 0.297381 Median 1 Mode 1 Standard Deviation 0.786796 Sample Variance 0.619048 Kurtosis 0.273373 Skewness 1.11455 Range 2 Minimum 1 Maximum 3 Sum 11 Count 7
5.3.1.11. Swordfish
Participants were asked about their familiarity with Swordfish (the CAT tool used in the experiment), we asked how well did they know that CAT tool and six predefined answers were offered: “I use it on a daily basis”, “I use it on some of my projects”, “I
119 have tried it, but I do not use it”, “I have heard of it, but I have never tried it before”, “I have never heard of it”, and “other” where they could introduce their own answer.
One of the participants selected option 3 (I have tried it, but I do not use it), 4 participants selected option 4 (I have heard of it, but I have never tried it before) and one of the participants selected option 5 (I have never heard of it).
The arithmetic mean is 4, the median and the mode are 4. The standard deviation is 0.7. It is almost not skewed (0.5) as the frequency is concentrated in the medium scores.
Swordfish 6
5
4
3 Participants 2
1
0 1 2 3 4 5 6
Figure 19. Swordfish. Group A
Swordfish Mean 4 Standard Error 0.218218 Median 4 Mode 4 Standard Deviation 0.57735 Sample Variance 0.333333 Kurtosis 3 Skewness 0 Range 2 Minimum 3 Maximum 5
120 Sum 28 Count 7
5.3.1.12. XLIFF
A 6 points scale was offered to answer this question. With value 1 being “total unfamiliar” and value 6 “I am very familiar with it.”
Two participants selected option 1 (total unfamiliar), two participants selected the option 2, one participant selected option 3 and two participants selected option 6 (very familiar).
The arithmetic mean is 3, the median and the mode are 2. The standard deviation is 2.1. It is almost is positively (0.8) as the frequency is higher in the low scores.
XLIFF 2.5
2
1.5
Column1 1
0.5
0 1 2 3 4 5 6
Figure 20. XLIFF Group A
XLIFF Mean 3 Standard Error 0.816497 Median 2 Mode 2 Standard Deviation 2.160247 Sample Variance 4.666667 121 Kurtosis -1.2 Skewness 0.833238 Range 5 Minimum 1 Maximum 6 Sum 21 Count 7
122 5.3.2. Group B
We present in this section the data obtained from Group B.
5.3.2.1. Age
The arithmetic mean is 32.6 years. The median is 33.5 and the mode is 36. The standard deviation is moderately high 4.8. It is negatively skewed as the frequency is higher in the high scores.
Age Mean 32.66667 Standard Error 1.994437 Median 33.5 Mode 36 Standard Deviation 4.885352 Sample Variance 23.86667 Kurtosis -0.63505 Skewness -0.63924 Range 13 Minimum 25 Maximum 38 Sum 196 Count 6
5.3.2.2. Gender
Four of the participants were females and there were two males.
5.3.2.3. Position
There were 3 freelance translators (GB2 41 , GB4 and GB5), one of the freelance translators (GB2) that we included in this category worked as “Localization Specialist, Consultant, Sales & Marketing” and three in-house translators (GB1, GB3 and GB6), one of participants that we included in the in-house category worked as an industry
41 We used participant names in our analysis to identify their data. Names for participants from Group A started with GA, from group B started with GB and from group C started with GC. 123 CAT supporter and another one is the CEO of a translation company and has 5 years experience as in-house translator and 5 years experience as freelancer.
5.3.2.4. Experience years
The arithmetic mean is 8.5 years of translation experience, the median is 8.5 and the mode is 4. The standard deviation is 5.7. It is negatively skewed (-6.6) as the frequency is higher in the higher scores.
Participant Experience Years GB1 13 GB2 13 GB3 15 GB4 4 GB5 2 GB6 4
Experience Years Mean 8.5 Standard Error 2.348759 Median 8.5 Mode 4 Standard Deviation 5.75326 Sample Variance 33.1 Kurtosis -2.92102 Skewness -6.66133 Range 13 Minimum 2 Maximum 15 Sum 51 Count 6 Table 8. Experience Years. Group B 5.3.2.5. Hours per day
Two participants stated that they worked 8 hours a day one participant stated that he/she worked 5 hours a day, one participant stated that he/she worked 3 hours a day. One
124 participant stated that he/she worked one hour a day and another participant selected the option “less than one hour”. The arithmetic mean is 4.1, the median is 4 and the mode is 8. The standard deviation is 3.4. It has a weak positive skewed distribution -0.05.
Hours per day Mean 4.166667 Standard Error 1.400397 Median 4 Mode 8 Standard Deviation 3.430258 Sample Variance 11.76667 Kurtosis -2.02257 Skewness 0.056157 Range 8 Minimum 0 Maximum 8 Sum 25 Count 6 Table 9. Hours per day. Group B
Hours per day Participants Less than one hour 1 1 1 2 0 3 1 4 0 5 1 6 0 7 0 8 2 n/a 0
5.3.2.6. Other translation-related activities
When asked whether they fulfil other translation related activities, two of the participants stated that they did not do anything apart from translation. One of the
125 participants run demos on how to translate. One participant stated “Editing, QA, LQA, format conversion, project management”. One participant did proofreading and DTP. One participant worked also as a language teacher.
Participant Other activities GB1 Run demos on how to translate. GB2 No. GB3 Editing, QA, LQA, format conversion, project management. GB4 I teach French and English. GB5 No. GB6 I do proofreading and some DTP sometimes. Table 10. Other translation-related activities. Group B
5.3.2.7. Main language combinations
Five participants had EN-ES as their main language combination. One participant had FR-ES as well as EN-ES. The participant who did not have EN-ES as main language combination had EN-FR.
Main language combinations 6
5
4
3 Series1 2
1
0 EN-ES FR-ES EN-FR
Figure 21. Main language combinations. Group B 5.3.2.8. Other language combinations
In other language combinations we had three participants that stated ES-FR, two participants that stated PT-ES, two participants for the FR-ES language combination,
126 one participant for the ES-EN combination and another participant had the IT-ES combination. One of the participants (GB5) had also indicated more language combinations: Danish, Swedish, Norwegian, Spanish, Portuguese, Greek to French or Spanish which were not included in the figure below.
Other language combinations 3.5
3
2.5
2
1.5 Participants
1
0.5
0 ES-EN FR-ES IT-ES PT-ES ES-FR
Figure 22. Other language combinations. Group B
5.3.2.9. CAT tools usage
Six possible answers were offered to the students: from higher use to lower use: 1. I use them in all my translation activities, 2. I use them only with some of my translation assignments. 3. I use them only when I am required to do so. 4. I have tried them, but I do not use them for my daily work. 5. I have never tried. 6. Other.
Five participants chose option 1 (I use them in all my translation activities), and one participant chose option 4 (I have tried them, but I do not use them for my daily work).
The arithmetic mean is 1.5, the median and the mode are 1. The standard deviation is 0.5. It is positively skewed (2.4) as the frequency is higher in the low scores.
CAT tools usage Mean 1.5 Standard Error 0.5 Median 1
127 Mode 1 Standard Deviation 1.224744871 Sample Variance 1.5 Kurtosis 6 Skewness 2.449489743 Range 3 Minimum 1 Maximum 4 Sum 9 Count 6
Cat tools usage 6
5
4
3 Participants 2
1
0 1 2 3 4 5 6
Figure 23. CAT tools usage. Group B
5.3.2.10. TM usage
In this question we asked the participants about the use of translation memories - how often do they use them. The same answer options as in the previous questions were offered. Similarly to Group A, participants submitted the same scores for this question as in that previous one:
Five participants chose option 1 (I use them in all my translation activities), and one participant chose option 4 (I have tried them, but I do not use them for my daily work).
128 The arithmetic mean is 1.5, the median and the mode are 1. The standard deviation is 0.5. It is positively skewed (2.4) as the frequency is higher in the low scores.
TM usage Mean 1.5 Standard Error 0.5 Median 1 Mode 1 Standard Deviation 1.224744871 Sample Variance 1.5 Kurtosis 6 Skewness 2.449489743 Range 3 Minimum 1 Maximum 4 Sum 9 Count 6
TM usage 6
5
4
3 Participants 2
1
0 1 2 3 4 5 6
Figure 24. TM Usage. Group B 5.3.2.11. Swordfish
Participants were asked about their familiarity with Swordfish (the CAT tool used in the experiment), we asked how well did they know that CAT tool and six predefined answers were offered: “I use it on a daily basis”, “I use it on some of my projects”, “I 129 have tried it, but I do not use it”, “I have heard of it, but I have never tried it before”, “I have never heard of it”, and “other” where they could introduce their own answer.
One of the participants selected option 2 (I use it on some of my projects), 4 participants selected option 4 (I have heard of it, but I have never tried it before) and one of the participants selected option 5 (I have never heard of it).
The arithmetic mean is 3.8, the median and the mode are 4. The standard deviation is 0.9, It is negatively skewed (-1.4) as the frequency is concentrated in the higher scores.
Swordfish Mean 3.833333 Standard Error 0.401386 Median 4 Mode 4 Standard Deviation 0.983192 Sample Variance 0.966667 Kurtosis 3.602854 Skewness -1.43796 Range 3 Minimum 2 Maximum 5 Sum 23 Count 6
130 Swordfish 4.5 4 3.5 3 2.5 2 Participants 1.5 1 0.5 0 1 2 3 4 5 6
Figure 25. Swordfish. Group B
5.3.2.12. XLIFF
A 6 points scale was used for answering this question. With value 1 being “total unfamiliar” and value 6 “I am very familiar with it”.
Three participants selected the option 6 (very familiar), one participant selected option 4, one participant selected option 2, and one participant selected option 1 (total unfamiliar).
The arithmetic mean is 4.1, the median is 5 and the mode is 6. The standard deviation is 2.2. It is negatively skewed (-0.6) as the frequency is higher in the higher scores.
XLIFF Mean 4.166667 Standard Error 0.909823 Median 5 Mode 6 Standard Deviation 2.228602 Sample Variance 4.966667 Kurtosis -1.80938 Skewness -0.63542 Range 5
131 Minimum 1 Maximum 6 Sum 25 Count 6
XLIFF 3.5
3
2.5
2
1.5 Participant
1
0.5
0 1 2 3 4 5 6
Figure 26. XLIFF. Group B
132 5.3.3. Group C
We present in this section the data obtained from Group C from the demographic questionnaire.
5.3.3.1. Age
Three participants did not answer to this question. The arithmetic mean is 32.4 years. The median is 28 (which is the middle value) and the mode is 26 (which is the most common value). The standard deviation is high (8.9) and that is partly because we have extreme ends (23 and 58). It is positively skewed as the frequency is higher in the low scores.
Age Mean 32.41176471 Standard Error 2.167708914 Median 28 Mode 26 Standard Deviation 8.93769282 Sample Variance 79.88235294 Kurtosis 0.723871823 Skewness 1.309486976 Range 31 Minimum 23 Maximum 54 Sum 551 Count 17
5.3.3.2. Gender
There were four males and fifteen females. One participant did not answer to this question.
5.3.3.3. Position
There were 14 freelance translators and 6 in-house translators between the participants. Between the freelance translators, one of them stated to work part-time as freelance
133 translator, another one worked as “medical linguistic specialist” and another participant stated to work as “Translation Senior Project Manager”.
5.3.3.4. Experience years
The arithmetic mean is 7 years of translation experience, the median is 4.5 and the mode is 4. The standard deviation is 5.2. It is positively skewed (1.1) as the frequency is higher in the low scores.
Experience Years Mean 7.05 Standard Error 1.170863379 Median 4.5 Mode 4 Standard Deviation 5.236260216 Sample Variance 27.41842105 Kurtosis -0.087322733 Skewness 1.178329887 Range 16 Minimum 2 Maximum 18 Sum 141 Count 20 Table 11. Experience years. Group C
Participant Experience Years GC1 16 GC2 17 GC3 2 GC4 4 GC5 12 GC6 4 GC7 5 GC8 4 GC9 7 GC10 14
134 GC11 4 GC12 7 GC13 18 GC14 4 GC15 2 GC16 3 GC17 3 GC18 5 GC19 6 GC20 4
5.3.3.5. Hours per day
Twelve participants worked 8 hours per day, one of them 7 hours, two participants 6 hours, 2 participants 4 hours and one participant less than one hour. Two of the participants did not provide an exact number but added information in the “other” box: GC3 said “I do not work every day, depends on the activity”, GC20 said “I used to translate sporadically as I used to work as a full time software engineer.” Therefore their answer was computed as “no answer” in our calculation phase. The option “less than one hour” was calculated with a 0 value. The arithmetic mean is 6.8, the median is 8 and so is the mode. The standard deviation is low (2.1), as there is not much variance between scores. It has a negative skew distribution (-2.2), as the frequency is higher in the higher scores.
Hours per day Mean 6.833333333 Standard Error 0.512905347 Median 8 Mode 8 Standard Deviation 2.176073096 Sample Variance 4.735294118 Kurtosis 5.106886308 Skewness -2.226359265 Range 8 Minimum 0 Maximum 8
135 Sum 123 Count 18 Table 12. Hours per day. Group C
Hours per day Participants Less than one hour 1 1 0 2 0 3 0 4 2 5 0 6 2 7 1 8 12 n/a 2
5.3.3.6. Other translation-related activities
When asked if they carried out other translation related activities, 16 of them declared that they did. Three of them (GC1, GC3 and GC9) did not answer to the question and another participant said he/she did not carry out any other translation-related activity. Between the activities mentioned by them, there were: Proofreading and/or reviewing and/or revision 42 with eleven participants; Postediting with six participants; Coordination and/or management with two participants; QA and/or LA with two participants; Interpreting with one participant; Terminology management with one participant; Other: software development (one participant), copy-editing (one participant), DTP (one participant), TM Management (one1 participant). It is significant
42 These terms are normally interchangeable and confused in our field. García (2008) clearly disambiguated them by proposing: “by revising we mean the examination of a draft translation with the source text as a reference by a person other than the translator (editing for some authors); by checking we mean the translator going over his/her own draft and making the necessary amendments (what has been called self-revision) and by reviewing we mean the examination of a draft paying attention only to the target text” (ibid).
136 important the high number of translators (six) who work postediting, a relatively new discipline. It was interesting also to see that five of the six in-house translators did proofreading.
Participant Other activities GC1 N/a GC2 Yes, but not always. GC3 N/a GC4 Linguistic Reviews and Postediting GC5 Editing, proofing, QA, postediting, LA GC6 I do proof-reading on my colleagues' texts GC7 Copy-editing, proofreading, DTP, TM management, project management GC8 Yes, I do: proofreading, editing, transcription, interpreting. GC9 N/a GC10 Postediting, Author for some short guides bout veterinary items. GC11 I do proofreading. GC12 Proofreading and editing. GC13 Proofreading, postediting. GC14 No. GC15 Sometimes I proofread. GC16 I do proofreading, revision and coordination. GC17 proofreading and Qas GC18 I am learning to do postediting, terminology management. GC19 I do postediting. GC20 I develop software tools to help me improve my productivity. Table 13.Other activities. Group C 5.3.3.7. Main language combinations
When asked about their main working language combination. 19 out of the 20 participants said that it was EN-ES, the one who did not select that combination had ES- EN and DE-EN as main combinations, and had the EN-ES as secondary language combination (explained in the next section). Other language combinations included: ES- EN with three participants, EN-CA with two participants, EN-GL with one participant, FR-ES with two participants, FR-CA with two participants, DE-ES with three participants, and DE-EN with two participants. 137 Main language combinations 20 18 16 14 12 10 Participants 8 6 4 2 0 EN-ES ES-EN DE-ES EN-CA FR-ES FR-CA DE-EN EN-GL
Figure 27. Main language combinations. Group C 5.3.3.8. Other language combinations
We asked them also about other working language combination and these were their answers: three of the participants had ES-EN, one participant EN-ES (the one mentioned in the previous section), eight participants had FR-ES, one participant had DU-ES, one participant had ES-CA, one participant had CA-ES, one participant had CA-EN, one participant had GL-PT, three participants had GL-PT, three had DE-ES and one had IT-GL.
Main language combinations 9 8 7 6 5 4 Participants 3 2 1 0
Figure 28. Other language combinations. Group C
138 Both language combinations (main and other)
If we combine both the number of both the main and the secondary working language combination we obtain the following figure:
Total number of language combinations 25
20
15
10 Participants
5
0
Figure 29. Total number of language combinations. Group C.
5.3.3.9. CAT tools usage
Six possible answers were offered to the students: from higher use to lower use: 1. I use them in all my translation activities, 2. I use them only with some of my translation assignments. 3. I use them only when I am required to do so. 4. I have tried them, but I do not use them for my daily work. 5. I have never tried. 6. Other.
Eleven of the participants chose option 1 (I use them in all my translation activities), five participants chose option 2 (I use them only with some of their translation assignments), two participants chose option 3 (I use them only when they were required to do so), two of the participants chose option 4 (I have tried them but they did use them for their daily work). So we can interpret that all of them are familiar with CAT tools and half of them used them in all their translation activities.
The arithmetic mean is 1.7, the median and the mode are 1. The standard deviation is 1. It is positively skewed (1.2) as the frequency is higher in the low scores.
139 CAT tools usage Mean 1.75 Standard Error 0.227977377 Median 1 Mode 1 Standard Deviation 1.019545823 Sample Variance 1.039473684 Kurtosis 0.448181067 Skewness 1.220862968 Range 3 Minimum 1 Maximum 4 Sum 35 Count 20
CAT tools usage 12
10
8
6 Participants 4
2
0 1 2 3 4 5 6
Figure 30. CAT tools usage. Group C
5.3.3.10. TM usage
In this question we asked the participants about the use of translation memories - how often they used them. The same answer options as in the previous questions were offered. Similarly to what happened in groups A and B, participants submitted the same
140 scores for this question as in the previous question, except for one of the participants who did not provide any answer.
Ten participants chose option 1 (I use them in all my translation activities), five participants chose option 2 (I use them only with some of their translation assignments), two participants chose option 3 (I use them only when they were required to do so), two of the participants chose option 4 (I have tried them but they did use them for their daily work). So we can interpret that all of them are familiar with CAT tools and half of them used them in all their translation activities.
The arithmetic mean is 1.7, the median and the mode are 1. The standard deviation is 1. It is positively skewed (1.1) as the frequency is higher in the low scores.
TM Usage Mean 1.789473684 Standard Error 0.236679606 Median 1 Mode 1 Standard Deviation 1.031662486 Sample Variance 1.064327485 Kurtosis 0.259487523 Skewness 1.146629569 Range 3 Minimum 1 Maximum 4 Sum 34 Count 19
141 TM usage 12
10
8
6 Participants 4
2
0 1 2 3 4 5 6
Figure 31. TM usage. Group C.
5.3.3.11. Swordfish
Participants were asked about their familiarity with Swordfish (the CAT tool used in the experiment), we asked how well they knew the CAT tool and six predefined answers were offered: “I use it on a daily basis”, “I use it on some of my projects”, “I have tried it, but I do not use it”, “I have heard of it, but I have never tried it before”, “I have never heard of it”, and “other” where they could introduce their own answer.
One of the participants selected option 1 (I use it on a daily basis), six participants selected option 3 (I have tried it, but I do not use it), 11 participants selected option 4 (I have heard of it, but I have never tried it before) and one of the participants selected option 5 (I have never heard of it).
The arithmetic mean is 3.6, the median and the mode are 4. The standard deviation is 0.7. It is negatively skewed (-1.2) as the frequency is concentrated in the higher scores.
142 Swordfish 12
10
8
6 Participants 4
2
0 1 2 3 4 5 6
Figure 32. Swordfish. Group C
Swordfish
Mean 3.65 Standard Error 0.195676963 Median 4 Mode 4 Standard Deviation 0.87509398 Sample Variance 0.765789474 Kurtosis 3.506266077 Skewness -1.297193353 Range 4 Minimum 1 Maximum 5 Sum 73 Count 20
5.3.3.12. XLIFF
A 6 points scale was offered for answering this question. With value 1 being “total unfamiliar” and value 6 “I am very familiar with it”.
143 Four participants selected option 1 (total unfamiliar), four participants selected option 2, five participants selected option 3, one participant selected option 1, four participants selected option 5 and one participant selected option 6 (very familiar).
The arithmetic mean is 3, the median and the mode are 3. The standard deviation is 1.5.It is almost is slightly positively skewed (0.3) as the frequency is higher in the medium scores.
XLIFF Mean 3 Standard Error 0.366746403 Median 3 Mode 3 Standard Deviation 1.598610508 Sample Variance 2.555555556 Kurtosis -1.030301345 Skewness 0.364766903 Range 5 Minimum 1 Maximum 6 Sum 57 Count 19
XLIFF 6
5
4
3 Participants 2
1
0 1 2 3 4 5 6
Figure 33. XLIFF. Group C 144 5.3.4. Overall
In order to present an overall view of the whole dataset we present the data obtained from the participants from the different groups together, the data is presented following the same structure as in the previous sections (analysed by the answers of each of the questions). Following as well the same criteria to select the adequate participants that are the object of this study, we only present the data obtained by participants that were considered “experienced translators”, which for this study meant participants who had declared to have worked as professional translators for at least two years. In total we had 33 participants. The complete set of answers of all the participants can be found in Appendix A.
5.3.4.1. Age
Three participants did not answer to this question. The arithmetic mean is 32 years. The median is 29 (which is the middle value) and the mode is 27 (which is the most common value). The standard deviation is high 7.5. It is positively skewed (1.2) as the frequency is higher in the low scores.
Age Mean 32.03333 Standard Error 1.382554 Median 29 Mode 27 Standard Deviation 7.572561 Sample Variance 57.34368 Kurtosis 1.201071 Skewness 1.287688 Range 31 Minimum 23 Maximum 54 Sum 961 Count 30 Table 14. Age. All participants 5.3.4.2. Gender
There were 7 males, 25 females. One participant did not answer to this question. 145 5.3.4.3. Position
There are 24 freelance translators and 9 in-house translators between the participants.
5.3.4.4. Experience years
The arithmetic mean is 6.8 years of translation experience, the median is 4 and the mode is 4. The standard deviation is 5. It is positively skewed (1) as the frequency is higher in the low scores.
Experience Years Mean 6.878788 Standard Error 0.88523 Median 4 Mode 4 Standard 5.085258 Deviation Sample Variance 25.85985 Kurtosis -0.507 Skewness 1.028303 Range 16 Minimum 2 Maximum 18 Sum 227 Count 33 Table 15. Experience years. All participants 5.3.4.5. Hours per day
Sixteen participants worked 8 hours per day, one of them 7 hours, two participants 6 hours, two participants 5 hours, two participants 4 hours, one participant three hours, two participants 2 hours, one participant 1 hour and two participants less than 1 hour. Three of the participants did not provide an exact number but added information in the “other” box: GC3 said “I do not work every day, depends on the activity”, GA6 stated “from time to time” and GC20 said “I used to translate sporadically as I used to work as a full time software engineer.” Their answer was computed as “no answer” in our calculation phase. The option “less than one hour” was calculated with a 0 value. The arithmetic mean is 5.8, the median is 8 and so is the mode. The standard deviation is 2.7. It has a negative skew distribution (-0.9), as the frequency is higher in the higher scores.
146
Hours per day Mean 5.866667 Standard Error 0.504539 Median 8 Mode 8 Standard Deviation 2.763473 Sample Variance 7.636782 Kurtosis -0.52194 Skewness -0.9442 Range 8 Minimum 0 Maximum 8 Sum 176 Count 30 Table 16. Hours per day. All participants
Hours per day Participants Less than one hour 2 1 1 2 2 3 1 4 2 5 2 6 2 7 1 8 16
147 Hours per day 18 16 14 12 10 8 6 Series1 4 2 0 Less One Two Three Four Five Six Seven Eight than one hour
5.3.4.6. Other translation-related activities
When asked if they carried out other translation related activities, 25 of them declared that they did. Between the activities mentioned by them, there were: Proofreading and/or reviewing and/or revision with 15 participants; postediting (six participants); coordination and/or management (three participants); QA and/or LA (three participants); interpreting (two participants); terminology management (two participants); software development (one participant); copy-editing (one participant), teaching (three participants); editing (two participants); DTP (two participants); and TM management (one participant). It is significant important the high number of translators (six) who worked postediting, a relatively new discipline.
5.3.4.7. Main language combinations
When asked about their main working language combination 31 out of the 33 participants said that it was EN-ES, five participants selected DE-ES, four participants had ES-EN, four participants FR-ES, three participants DE-EN, two participants EN- CA, two participants FR-CA, one participant EN-GL, and one participant EN-FR.
148 Main language combinations 35
30
25
20
15 Participants 10
5
0
Figure 34. Main language combination. All participants
5.3.4.8. Other language combinations
We asked them also about other working language combination, these were their answers: twelve participants had FR-ES, four participants had ES-EN, three participants had DE-ES, two participants had ES-FR, two participants had IT-ES, and other language combinations with only one participant were: EN-ES, DU-ES, ES-CA, CA- ES, CA-ES, CA-EN, GL-PT, DE-ES, IT-GL, FR-CA, GL-ES, ES-GL, EN-CA, IT-EN, AR-ES, and PT-ES.
Other language combinations 14
12
10
8
6 Participants 4
2
0 ES ES ES ES ES ES ES ES ES PT FR GL EN EN GL EN CA CA CA ------IT IT IT PT FR ES ES DE EN GL ES GL ES CA AR FR DU CA EN
Figure 35. Other language combinations. All participants 149 Both language combinations (main and other)
If we combine both the number of both the main and the secondary working language combination we obtain the following table:
Language Combination Participants EN-ES 32 FR-ES 15 ES-EN 8 DE-ES 8 EN-CA 3 FR-CA 3 DE-EN 3 IT-ES 2 ES-FR 2 EN-GL 1 EN-FR 1 DU-ES 1 ES-CA 1 CA-ES 1 CA-EN 1 GL-PT 1 IT-GL 1 GL-ES 1 ES-GL 1 IT-EN 1 AR-ES 1 PT-ES 1
5.3.4.9. CAT tools usage
Six possible answers were offered to the students: from higher use to lower use: 1. I use them in all my translation activities, 2. I use them only with some of my translation
150 assignments, 3. I use them only when I am required to do so, 4. I have tried them, but I do not use them for my daily work, 5. I have never tried, 6. Other.
Twenty of the participants chose option 1 (I use them in all my translation activities), seven participants chose option 2 (I use them only with some of their translation assignments), three participants chose option 3 (I use them only when they were required to do so), three of the participants chose option 4 (I have tried them but they did use them for their daily work).
The arithmetic mean is 1.6, the median and the mode are 1. The standard deviation is 0.9. It is positively skewed (1.3) as the frequency is higher in the low scores.
CAT tools usage 25
20
15
Participants 10
5
0 1 2 3 4 5 6
Table 17. CAT tools usage. All participants
CAT tools usage Mean 1.666666667 Standard Error 0.172254803 Median 1 Mode 1 Standard Deviation 0.989528507 Sample Variance 0.979166667 Kurtosis 0.725197506 Skewness 1.358078348 Range 3
151 Minimum 1 Maximum 4 Sum 55 Count 33
5.3.4.10. TM usage
In this question we asked the participants about the use of translation memories - how often they used them. The same answer options as in the previous questions were offered. Participants submitted the same scores for this question as in the previous one, except for one of the participants who did not provide any answer.
Nineteen of the participants chose option 1 (I use them in all my translation activities), seven participants chose option 2 (I use them only with some of their translation assignments), three participants chose option 3 (I use them only when they were required to do so), three of the participants chose option 4 (I have tried them but they did use them for their daily work).
The arithmetic mean is 1.6, the median and the mode are 1. The standard deviation is 0.9. It is positively skewed (1.3) as the frequency is higher in the low scores.
TM Usage Mean 1.6875 Standard Error 0.1764199 Median 1 Mode 1 Standard Deviation 0.9979818 Sample Variance 0.9959677 Kurtosis 0.5942151 Skewness 1.3127708 Range 3 Minimum 1 Maximum 4 Sum 54
152 Count 32
TM usage 20 18 16 14 12 10 Participants 8 6 4 2 0 1 2 3 4 5 6
Figure 36. TM Usage. All participants 5.3.4.11. Swordfish
Participants were asked about their familiarity with Swordfish (the CAT tool used in the experiment), we asked how well did they know the CAT tool and six predefined answers were offered: “I use it on a daily basis”, “I use it on some of my projects”, “I have tried it, but I do not use it”, “I have heard of it, but I have never tried it before”, “I have never heard of it”, and “other” where they could introduce their own answer.
One of the participants selected option 1 (I use it on a daily basis), one of the participants selected option 2, seven participants selected option 3 (I have tried it, but I do not use it), twenty participants selected option 4 (I have heard of it, but I have never tried it before) and four of the participants selected option 5 (I have never heard of it).
The arithmetic mean is 3.7, the median and the mode are 4. The standard deviation is 0.8. It is negatively skewed (-1.2) as the frequency is concentrated in the higher scores.
Swordfish Mean 3.757575758 Standard Error 0.144536243
153 Median 4 Mode 4 Standard Deviation 0.830297501 Sample Variance 0.689393939 Kurtosis 2.985272933 Skewness -1.248682235 Range 4 Minimum 1 Maximum 5 Sum 124 Count 33
Swordfish 25
20
15
Participants 10
5
0 1 2 3 4 5 6
Figure 37. Swordfish. All participants. 5.3.4.12. XLIFF
A 6 points scale was offered to answer this question. With value 1 being “total unfamiliar” and value 6 “I am very familiar with it”.
Seven participants selected option 1 (total unfamiliar), seven participants selected option 2, six participants selected option 3, two participants selected option 4, four participants selected option 5 and six participants selected option 6 (very familiar).
154 The arithmetic mean is 3.2, the median is 3 and the mode is 1. The standard deviation is 1.8. It is almost is slightly positively skewed (0.3) as the frequency is higher in the medium scores.
XLIFF Mean 3.21875 Standard Error 0.3260664 Median 3 Mode 1 Standard Deviation 1.8445102 Sample Variance 3.4022177 Kurtosis -1.3345093 Skewness 0.3479073 Range 5 Minimum 1 Maximum 6 Sum 103 Count 32
XLIFF 8 7 6 5 4 Participants 3 2 1 0 1 2 3 4 5 6
155 5.4. Task Specific Questionnaire
The task specific questionnaire was carried out after the translation task by the participants. As explained in the previous chapter, this questionnaire aimed to obtain information about the participants’ experience. Group A and Group B received the same questionnaire. Group C received the same questionnaire as the other two groups as well as nine additional questions related to the metadata information that only this group received. In this section we only present the data obtained from the questions that were common to all the groups. In the next section we present the data obtained from the questions that were unique to Group C.
5.4.1. Group A
5.4.1.1. Topic of the text
We asked the participants about how familiar they were with the topic of the text. A six point scale was offered to the participants, number 1 represented the lowest point of the scale (total unfamiliar) and number 6 represented the highest point (very familiar).
Two participants selected option 1 (total unfamiliar), four participants selected option 4, one participant selected option 5 and two participants selected option 6 (very familiar).
The arithmetic mean is 3.8, the median is 4 and the mode 1. The standard deviation is 2.1. It is slightly negatively skewed (-0.6) as the frequency is higher in the higher scores.
156 Topic of the text 2.5
2
1.5
Participants 1
0.5
0 1 2 3 4 5 6
Figure 38. Topic of the text. Group A.
Topic Mean 3.857143 Standard Error 0.79966 Median 4 Mode 1 Standard Deviation 2.115701 Sample Variance 4.47619 Kurtosis -1.27388 Skewness -0.62753 Range 5 Minimum 1 Maximum 6 Sum 27 Count 7
5.4.1.2. Experience working with Microsoft Office products
As the translation task consisted on a text from the Microsoft Excel documentation we wanted to know the experience that our participants had with that tool. We wanted also to know the experience with other related tools of the Office Package (Microsoft Word, Microsoft PowerPoint, Microsoft Access and Microsoft Outlook.
157 A six point scale was proposed to the participants, number 1 represented the lowest point of the scale (I have never worked with it) and number 6 represented the highest point (I am an advanced user).We present first an overview of the five results. Participants have more experience in some products than in others: Microsoft Word (with an arithmetic mean of 6), followed by Excel (5), PowerPoint (4.8), Outlook (4.5) and Access (2.4). The results of each of the products are presented in the next section.
Experience with Microsoft products 8 7 6 Excel 5 Word 4 PowerPoint 3 Access 2 Outlook 1 0 1 2 3 4 5 6
Figure 39. Experience with Microsoft products. Group A.
158 Excel Word PowerPoint Access Outlook Mean 5 Mean 6 Mean 4.83 Mean 2.42 Mean 4.57 Standard 0.37 Standard 0 Standard 0.30 Standard 0.68 Standard 0.29 Error Error Error Error Error Median 5 Median 6 Median 5 Median 2 Median 5 Mode 5 Mode 6 Mode 5 Mode 1 Mode 5 Standard 1 Standard 0 Standard 0.75 Standard 1.81 Standard 0.78 Deviation Deviation Deviation Deviation Deviation Sample 1 Sample 0 Sample 0.56 Sample 3.28 Sample 0.61 Variance Variance Variance Variance Variance Kurtosis 3 Kurtosis #DI Kurtosis - Kurtosis -1.07 Kurtosis 2.36 V/0! 0.10 Skewness -1.4 Skewness #DI Skewness 0.31 Skewness 0.98 Skewness -1.75 V/0! Range 3 Range 0 Range 2 Range 4 Range 2 Minimum 3 Minimum 6 Minimum 4 Minimum 1 Minimum 3 Maximum 6 Maximum 6 Maximum 6 Maximum 5 Maximu 5 m Sum 35 Sum 42 Sum 29 Sum 17 Sum 32 Count 7 Count 7 Count 6 Count 7 Count 7
We present below the results obtained on each of the questions:
Excel
The mean is 5, the median and the mode are also 5. The standard deviation is 1. It is negatively skewed (-1.4) as the frequency is higher in the higher scores.
159 Excel 4.5 4 3.5 3 2.5 2 Participants 1.5 1 0.5 0 1 2 3 4 5 6
Figure 40. Experience with Excel. Group A.
Excel Mean 5 Standard Error 0.377964 Median 5 Mode 5 Standard Deviation 1 Sample Variance 1 Kurtosis 3 Skewness -1.4 Range 3 Minimum 3 Maximum 6 Sum 35 Count 7
Word
The arithmetic mean is 6, the median is 6 and the mode 6. The standard deviation is 0.
160 Word 8 7 6 5 4 Participants 3 2 1 0 1 2 3 4 5 6
Figure 41. Experience with Word. Group A.
Word Mean 6 Standard Error 0 Median 6 Mode 6 Standard Deviation 0 Sample Variance 0 Kurtosis #DIV/0! Skewness #DIV/0! Range 0 Minimum 6 Maximum 6 Sum 42 Count 7
PowerPoint
The arithmetic mean is 4.8, the median is 5 and the mode 5. The standard deviation is 0.7. It is slightly positively skewed (0.3) as the frequency is higher in the lower values.
161 Powerpoint 3.5
3
2.5
2
1.5 Participants
1
0.5
0 1 2 3 4 5 6
Figure 42. Experience with PowerPoint. Group A.
PowerPoint Mean 4.833333 Standard Error 0.307318 Median 5 Mode 5 Standard Deviation 0.752773 Sample Variance 0.566667 Kurtosis -0.10381 Skewness 0.31257 Range 2 Minimum 4 Maximum 6 Sum 29 Count 6
Access
The arithmetic mean is 2.4, the median is 2 and the mode 1. The standard deviation is 1.8. It is slightly positively skewed (0.9) as the frequency is higher in the lower scores.
Access Mean 2.428571 Standard Error 0.685119
162 Median 2 Mode 1 Standard Deviation 1.812654 Sample Variance 3.285714 Kurtosis -1.07713 Skewness 0.983425 Range 4 Minimum 1 Maximum 5 Sum 17 Count 7
Access 3.5
3
2.5
2
1.5 Participants
1
0.5
0 1 2 3 4 5 6
Figure 43. Experience with Access. Group A.
Outlook
The arithmetic mean is 4.5, the median is 5 and the mode 5. The standard deviation is 0.7. It is negatively skewed (-1.7) as the frequency is higher in the higher scores.
163 Outlook 6
5
4
3 Participants 2
1
0 1 2 3 4 5 6
Outlook Mean 4.571429 Standard Error 0.297381 Median 5 Mode 5 Standard Deviation 0.786796 Sample Variance 0.619048 Kurtosis 2.360947 Skewness -1.75982 Range 2 Minimum 3 Maximum 5 Sum 32 Count 7
5.4.1.3. Linguistic difficulty
A six point scale was proposed to the participants to measure the linguistic difficulty of the task, number 1 represented the lowest point of the scale (Very difficult (I needed to consult many terms and expressions)) and number 6 represented the highest point (Very easy (I could do it without consulting external resources)). One participant chose option 2, one participant chose option 3, four participants chose option 4 and three participants
164 chose option 5. The mean is 3.8, the median and the mode are 4. The standard deviation is 0.9. It is negatively skewed (-0.6) as the frequency is higher in the higher values.
Linguistic difficulty 3.5
3
2.5
2
1.5 Participants
1
0.5
0 1 2 3 4 5 6
Figure 44. Linguistic difficulty. Group A.
Difficulty Mean 3.833333 Standard Error 0.477261 Median 4 Mode 4 Standard Deviation 1.169045 Sample Variance 1.366667 Kurtosis -0.44616 Skewness -0.66763 Range 3 Minimum 2 Maximum 5 Sum 23 Count 6
165 5.4.1.4. Assistance
We asked participants in this question how many times they had asked for help. Eight options were offered to participants: None, 1, 2, 3, 4, 5, 6 and “Other”, where they could introduce their own answer. Five participants selected the option “none” and two participants said that they asked once for help. The arithmetic mean is 0.2, the median and the mode are 0, and the standard deviation is 0.4.
Assistance Mean 0.285714 Standard Error 0.184428 Median 0 Mode 0 Standard Deviation 0.48795 Sample Variance 0.238095 Kurtosis -0.84 Skewness 1.229634 Range 1 Minimum 0 Maximum 1 Sum 2 Count 7
5.4.1.5. Doubts
In this question we asked the participants about the nature of the doubts that they had during the translation task. Five predefined answers were proposed to the participants: “linguistic related”, “technical related”, “CAT tool related”, “experiment instructions” and “other” where they could introduce their own answer. Participants could select more than one answer if applicable.
Four participants declared that they had linguistic related doubts; one participant stated that he/she had technical related doubts; three participants stated that they had CAT tool related doubts; and there was not any participant that had doubts regarding the experiment instructions. One participant had also added his/her doubts to the “other”
166 box: he/she stated “Terminological doubts. I had proposals, but I wanted to use the official MS terminology.”
Linguistic Technical CAT Experiment Other related related tool instructions related Participants 4 1 3 0 1 Table 18. Doubts. Group A. 5.4.1.6. External resources
In this question we asked the participants if they had consulted any external resources. There were 5 predefined answers offered to the participants: “No” (If they had not consulted any external resources, if they consulted they had to continue answering), “Machine Translation”, “Online Dictionaries”, “Microsoft Excel official Webpage” and “Other” with a box where they could introduce their own answer.
One participant declared that he/she had not used external resources, there was not any participant that declared to have used Machine Translation, four participants declared to have used online dictionaries, five participants declared to have consulted the Microsoft Excel official Webpage. In the “other” box one participant claimed to have used Linguee and Proz.
No MT Online Ms Other dictionaries Excel Participants 1 0 4 5 1 Table 19. External resources. Group A. 5.4.1.7. Additional comments
Three participants included information in the additional comments text box. GA2 stated that his/her “connection to the remote desktop suffered from a heavy lag that slowed down my task to a great extend.” GA3 stated that he/she preferred “to check the official MS office website to check if [his/her] translations were conformant to the official terminology and translations of the company”, but he/she states that he/she “could have done the translation without any external resources (I'm familiar with the subject and it was not very difficult)”.
167 5.4.2. Group B
5.4.2.1. Topic of the text
We asked the participants about how familiar they were with the topic of the text. A six point scale was offered to the participants, number 1 represented the lowest point of the scale (total unfamiliar) and number 6 represented the highest point (very familiar).
One participant selected option 4, three participant selected option 5 and two participants selected option 6 (very familiar).
The arithmetic mean is 5.1, the median is 5 and the mode 5. The standard deviation is 0.7. It is slightly negatively skewed (-0.3) as the frequency is higher in the higher scores.
Topic of the text 3.5
3
2.5
2
1.5 Participants
1
0.5
0 1 2 3 4 5 6
Figure 45. Topic of the text. Group B.
Topic Mean 5.166667 Standard Error 0.307318 Median 5 Mode 5 Standard Deviation 0.752773 Sample Variance 0.566667
168 Kurtosis -0.10381 Skewness -0.31257 Range 2 Minimum 4 Maximum 6 Sum 31 Count 6
5.4.2.2. Experience working with Microsoft Office products
A six point scale was proposed to the participants, number 1 represented the lowest point of the scale (I have never worked with it) and number 6 represented the highest point (I am an advanced user).We present first an overview of the five results. You can see that participants had more experience in Microsoft Word (with an arithmetic mean of 5.5), followed by PowerPoint (5.3), Outlook (5.1), Excel (4.5), and Access (4.1). The results of each of the products are presented in the next section.
Experience with Microsoft products 4.5 4 3.5 3 Excel 2.5 Word 2 PowerPoint 1.5 Access 1 Outlook 0.5 0 1 2 3 4 5 6
Figure 46. Experience with Microsoft products. Group B.
169 Excel Word PowerPoint Access Outlook
Mean 4.5 Mean 5.5 Mean 5.3 Mean 4.16 Mean 5.16 Standard 0.42 Standard 0.22 Standard 0.21 Standard 0.6 Standard 0.47 Error Error Error Error Error Median 4.5 Median 5.5 Median 5 Median 4.5 Median 5.5 Mode 4 Mode 5 Mode 5 Mode 5 Mode 6 Standard 1.04 Standard 0.54 Standard 0.51 Standard 1.47 Standard 1.16 Deviation Deviation Deviation Deviation Deviation Sample 1.1 Sample 0.3 Sample 0.26 Sample 2.16 Sample 1.36 Variance Variance Variance Variance Variance Kurtosis -0.24 Kurtosis -3.3 Kurtosis -1.87 Kurtosis -0.85 Kurtosis 2.55 Skewness 0 Skewness -6.66 Skewness 0.96 Skewness -0.41 Skewness -1.5 Range 3 Range 1 Range 1 Range 4 Range 3 Minimum 3 Minimum 5 Minimum 5 Minimum 2 Minimum 3 Maximum 6 Maximum 6 Maximum 6 Maximum 6 Maximum 6 Sum 27 Sum 33 Sum 32 Sum 25 Sum 31 Count 6 Count 6 Count 6 Count 6 Count 6
We present below the results obtained on each of the questions:
Excel
The mean is 4.5, the median is 4.5 and the mode is 4. The standard deviation is 1. It is central skewed (0) as the frequency is equal on higher and lower scores.
Excel 2.5
2
1.5
Participants 1
0.5
0 1 2 3 4 5 6
Figure 47. Experience with Excel. Group B.
170 Excel Mean 4.5 Standard Error 0.428174 Median 4.5 Mode 4 Standard Deviation 1.048809 Sample Variance 1.1 Kurtosis -0.24793 Skewness 0 Range 3 Minimum 3 Maximum 6 Sum 27 Count 6
Word
The arithmetic mean is 5.5, the median is 5.5 and the mode 6. The standard deviation is 0.5. It is negatively skewed (-6.6) as the frequency is higher in the higher scores.
Word 3.5
3
2.5
2
1.5 Participants
1
0.5
0 1 2 3 4 5 6
Figure 48. Experience with Word. Group B.
Word Mean 5.5 Standard Error 0.223607
171 Median 5.5 Mode 5 Standard Deviation 0.547723 Sample Variance 0.3 Kurtosis -3.33333 Skewness -6.66 Range 1 Minimum 5 Maximum 6 Sum 33 Count 6
PowerPoint
The arithmetic mean is 5.3, the median is 5 and the mode 5. The standard deviation is 0.5. It is positively skewed (0.9) as the frequency is higher in the lower values.
Powerpoint 4.5 4 3.5 3 2.5 2 Participants 1.5 1 0.5 0 1 2 3 4 5 6
Figure 49. Experience with PowerPoint. Group B.
PowerPoint Mean 5.333333 Standard Error 0.210819 Median 5 Mode 5 Standard Deviation 0.516398
172 Sample Variance 0.266667 Kurtosis -1.875 Skewness 0.968246 Range 1 Minimum 5 Maximum 6 Sum 32 Count 6
Access
The arithmetic mean is 4.1, the median is 4.5 and the mode 5. The standard deviation is 1.4. It is negatively skewed (0.4) as the frequency is higher in the lower scores.
Access 2.5
2
1.5
Participants 1
0.5
0 1 2 3 4 5 6
Figure 50. Experience with Access. Group B.
Access Mean 4.166667 Standard Error 0.600925 Median 4.5 Mode 5 Standard Deviation 1.47196 Sample Variance 2.166667
173 Kurtosis -0.85917 Skewness -0.41807 Range 4 Minimum 2 Maximum 6 Sum 25 Count 6
Outlook
The arithmetic mean is 5.1, the median is 5.5 and the mode 6. The standard deviation is 1.1. It is negatively skewed (-1.5) as the frequency is higher in the higher scores.
Outlook 3.5
3
2.5
2
1.5 Participants
1
0.5
0 1 2 3 4 5 6
Figure 51. Experience with Outlook. Group B.
Outlook Mean 5.166667 Standard Error 0.477261 Median 5.5 Mode 6 Standard Deviation 1.169045 Sample Variance 1.366667 Kurtosis 2.552052
174 Skewness -1.58562 Range 3 Minimum 3 Maximum 6 Sum 31 Count 6
5.4.2.3. Linguistic difficulty
A six point scale was proposed to the participants to measure the linguistic difficulty of the task, number 1 represented the lowest point of the scale (Very difficult (I needed to consult many terms and expressions)) and number 6 represented the highest point (Very easy (I could do it without consulting external resources)). One participant chose option 3, four participants chose option 5 and one participant chose option 6. The mean is 4.8, the median and the mode are 5. The standard deviation is 0.9. It is negatively skewed (- 1.4) as the frequency is higher in the higher values.
Linguistic difficulty 4.5 4 3.5 3 2.5 2 Participants 1.5 1 0.5 0 1 2 3 4 5 6
Figure 52. Linguistic difficulty. Group B.
Difficulty Mean 4.833333 Standard Error 0.401386 Median 5 Mode 5
175 Standard Deviation 0.983192 Sample Variance 0.966667 Kurtosis 3.602854 Skewness -1.43796 Range 3 Minimum 3 Maximum 6 Sum 29 Count 6
5.4.2.4. Assistance
We asked participants in this question how many times they had asked for help. Eight options were offered to participants: None, 1,2,3,4,5,6 and “Other”, where they could introduce their own answer. Four participants selected the option “none”, one participant said that he/she asked once for help and one participant said he/she asked four times for help. The arithmetic mean is 0.8, the median and the mode are 0, and the standard deviation is 1.6.
Assistance Mean 0.833333 Standard Error 0.654047 Median 0 Mode 0 Standard Deviation 1.602082 Sample Variance 2.566667 Kurtosis 4.639906 Skewness 2.148179 Range 4 Minimum 0 Maximum 4 Sum 5 Count 6
176 5.4.2.5. Doubts
In this question we asked the participants about the nature of the doubts that they had during the translation task. Five predefined answers were proposed to the participants: “linguistic related”, “technical related”, “CAT tool related”, “experiment instructions” and “other” where they could introduce their own answer. Participants could select more than one answer if applicable.
Four participants declared that they had linguistic related doubts, there was not any participant that had technical related doubts, five participants stated that they had CAT tool related doubts and only one participant had doubts regarding the experiment instructions. One participant had also added his/her doubts to the “other” box: he/she stated “When I started the translation I wanted to know how to apply information but I did not need it.”
Linguistic Technical CAT Experiment Other related related tool instructions related Participants 4 0 5 1 1 Table 20. Doubts. Group B. 5.4.2.6. External resources
In this question we asked the participants if they had consulted any external resources. There were 5 options offered to the participants: “No” (If they had not consulted any external resources, if they consulted they had to continue answering), “Machine Translation”, “Online Dictionaries”, “Microsoft Excel official Webpage” and “Other” with a box where they could introduce their own answer.
Two participants declared that they had not used external resources, one of them clarified in the “other” box that he/she used “Just the internal concordance search of the CAT tool”. One participant declared to have used Machine Translation, two participants declared to have used online dictionaries, one participant declared to have consulted the Microsoft Excel official Webpage. In the other boxes three participants provided additional information: one of the participants stated to have used the Microsoft Linguistic Portal filtering by “Excel”, another participant stated to have used the Google search engine, the third one is the one we mentioned before in the “no” answer.
177 No MT Online Ms Other dictionaries Excel Participants 2 1 2 1 3 Table 21. External resources. Group B. 5.4.2.7. Additional comments
Two participants included additional comments. GB2 indicated that had some problems finding the shortcuts to quickly populate the TM suggested translation, but that he/she found it eventually. And GB5 said that he/she found it interesting to work with Swordfish, which he/she found it more useful than his/her regular CAT tool. He/she said that although he/she preferred to translate with his/her words rather than reusing translations from other translators, she/he also recognises that TMs can represent a great help for translators because they help them in terminology searches.
178 5.4.3. Group C
5.4.3.1. Topic of the text
We asked the participants about how familiar they were with the topic of the text. A six point scale was offered to the participants, number 1 represented the lowest point of the scale (total unfamiliar) and number 6 represented the highest point (very familiar).
Two participants selected option 2, four participants selected option 3, six participants selected option 4, seven participants selected option 5 and one participant selected option 6 (very familiar).
The arithmetic mean is 4, the median is 4 and the mode 5. The standard deviation is 1. It is slightly negatively skewed (-0.3) as the frequency is higher in the higher scores.
Topic of the text 8 7 6 5 4 Participants 3 2 1 0 1 2 3 4 5 6
Figure 53. Topic of the text. Group C.
Topic Mean 4.05 Standard Error 0.245753 Median 4 Mode 5 Standard Deviation 1.099043 Sample Variance 1.207895 Kurtosis -0.55139 Skewness -0.37201
179 Range 4 Minimum 2 Maximum 6 Sum 81 Count 20
5.4.3.2. Experience working with Microsoft Office products
A six point scale was proposed to the participants, number 1 represented the lowest point of the scale (I have never worked with it) and number 6 represented the highest point (I am an advanced user). We present first an overview of the five results. You can see that participants have more experience in Microsoft Word (with an arithmetic mean of 5.6), followed by PowerPoint (5.1), Excel (4.5), Outlook (4.4) and Access (2.6). The results of each of the products are presented in the next section.
Experience with Microsoft products 14
12
10 Excel 8 Word
6 PowerPoint Access 4 Outlook 2
0 1 2 3 4 5 6
Figure 54. Experience with Microsoft products. Group C.
Excel Word PowerPoint Access Outlook Mean 4.5 Mean 5.6 Mean 5.15 Mean 2.65 Mean 4.4 Standard 0.24 Standard 0.13 Standard 0.13 Standard 0.31 Standard 0.36 Error Error Error Error Error Median 5 Median 6 Median 5 Median 2 Median 5 Mode 5 Mode 6 Mode 5 Mode 2 Mode 6 Standard 1.1 Standard 0.59 Standard 0.58 Standard 1.42 Standard 1.63
180 Deviation Deviation Deviation Deviation Deviation Sample 1.21 Sample 0.35 Sample 0.34 Sample 2.02 Sample 2.67 Variance Variance Variance Variance Variance Kurtosis -0.07 Kurtosis 0.78 Kurtosis 0.17 Kurtosis -0.3 Kurtosis -0.67 Skewness -0.65 Skewness - Skewness -0.004 Skewness 0.57 Skewness -0.72 1.24 Range 4 Range 2 Range 2 Range 5 Range 5 Minimum 2 Minimum 4 Minimum 4 Minimum 1 Minimum 1 Maximum 6 Maximum 6 Maximum 6 Maximum 6 Maximum 6 Sum 90 Sum 112 Sum 103 Sum 53 Sum 88 Count 20 Count 20 Count 20 Count 20 Count 20
We present below the results obtained on each of the questions:
Excel
The mean is 4.5, the median and the mode are 5. The standard deviation is 1.1. It is negatively skewed (-0.6) as the frequency is higher in the higher scores.
Excel 10 9 8 7 6 5 Participants 4 3 2 1 0 1 2 3 4 5 6
Figure 55. Experience with Excel. Group C.
Excel Mean 4.5 Standard Error 0.246021 Median 5 Mode 5 Standard Deviation 1.100239
181 Sample Variance 1.210526 Kurtosis -0.07606 Skewness -0.65862 Range 4 Minimum 2 Maximum 6 Sum 90 Count 20
Word
The arithmetic mean is 5.6, the median is 6 and the mode 6. The standard deviation is 0.5. It is negatively skewed (-1.2) as the frequency is higher in the higher scores.
Word 14
12
10
8
6 Participants
4
2
0 1 2 3 4 5 6
Figure 56. Experience with Word. Group C.
Word Mean 5.6 Standard Error 0.133771 Median 6 Mode 6 Standard Deviation 0.598243 Sample Variance 0.357895 Kurtosis 0.783126 Skewness -1.24548
182 Range 2 Minimum 4 Maximum 6 Sum 112 Count 20
PowerPoint
The arithmetic mean is 5.1, the median is 5 and the mode 5. The standard deviation is 0.5. It is almost not skewed (-0.004) as the frequency is higher in the middle values.
Powerpoint 14
12
10
8
6 Participants
4
2
0 1 2 3 4 5 6
Figure 57. Experience with PowerPoint. Group C.
PowerPoint Mean 5.15 Standard Error 0.131289 Median 5 Mode 5 Standard Deviation 0.587143 Sample Variance 0.344737 Kurtosis 0.17758 Skewness -0.00433 Range 2 Minimum 4 Maximum 6
183 Sum 103 Count 20
Access
The arithmetic mean is 2.6, the median is 2 and the mode 2. The standard deviation is 1.4. It is slightly positively skewed (0.5) as the frequency is higher in the lower scores.
Access 7
6
5
4
3 Participants
2
1
0 1 2 3 4 5 6
Figure 58. Experience with Access.
Access Mean 2.65 Standard Error 0.318508 Median 2 Mode 2 Standard Deviation 1.424411 Sample Variance 2.028947 Kurtosis -0.30974 Skewness 0.573351 Range 5 Minimum 1 Maximum 6 Sum 53 Count 20
184 Outlook
The arithmetic mean is 4.4, the median is 5 and the mode 6. The standard deviation is 1.6. It is negatively skewed (-0.7) as the frequency is higher in the higher scores.
Outlook 8 7 6 5 4 Participants 3 2 1 0 1 2 3 4 5 6
Figure 59. Experience with Outlook. Group C.
Outlook Mean 4.4 Standard Error 0.365629 Median 5 Mode 6 Standard Deviation 1.63514 Sample Variance 2.673684 Kurtosis -0.67429 Skewness -0.72554 Range 5 Minimum 1 Maximum 6 Sum 88 Count 20
185 5.4.3.3. Linguistic difficulty
A six point scale was proposed to the participants to measure the linguistic difficulty of the task, number 1 represented the lowest point of the scale (Very difficult (I needed to consult many terms and expressions)) and number 6 represented the highest point (Very easy (I could do it without consulting external resources)). One participant chose option 3, six participants chose option 4, eleven participants chose option 5, and two participants chose option 6. The mean is 4.7, the median and the mode are 5. The standard deviation is 0.7. It is negatively skewed (-0.3) as the frequency is higher in the higher values.
Linguistic difficulty 12
10
8
6 Participants 4
2
0 1 2 3 4 5 6
Table 22. Linguistic difficulty. Group C.
Difficulty Mean 4.7 Standard Error 0.163836 Median 5 Mode 5 Standard Deviation 0.732695 Sample Variance 0.536842 Kurtosis 0.369541 Skewness -0.33898 Range 3 Minimum 3 Maximum 6
186 Sum 94 Count 20
5.4.3.4. Assistance
We asked participants in this question how many times they had asked for help. Eight options were offered to participants: None, 1, 2, 3, 4, 5, 6 and “Other”, where they could introduce their own answer. Fifteen participants selected the option “none”, one participant said that he/she asked once for help, two participants said that they had asked twice for help, one participant said that he/she had asked four times for help, and one participant said he/she asked six times for help.
The arithmetic mean is 0.7, the median and the mode are 0, the standard deviation is 1.6.
Assistance Mean 0.75 Standard Error 0.36183 Median 0 Mode 0 Standard Deviation 1.618154 Sample Variance 2.618421 Kurtosis 5.742265 Skewness 2.437793 Range 6 Minimum 0 Maximum 6 Sum 15 Count 20
5.4.3.5. Doubts
In this question we asked the participants about the nature of the doubts that they had during the translation task. Five predefined answers were proposed to the participants: “linguistic related”, “technical related”, “CAT tool related”, “experiment instructions”
187 and “other” where they could introduce their own answer. Participants could select more than one answer if applicable.
Nineteen participants declared that they had linguistic related doubts, three participants declared that they had technical related doubts, nine participants stated that they had CAT tool related doubts and only one participant doubts regarding the experiment instructions. One participant had also added their doubts to the “other” box, he/she stated “I did not know how to change validated segments. I had difficulties in being able to make general searches for consistency purposes, so I opted for manual visual search. Linguistic: I was not familiar with two Excel-specific terms since I use that programme rarely and always in English. I trusted the suggested translation.”
Linguistic Technical CAT Experiment Other related related tool instructions related Participants 19 3 9 1 1 Table 23. Doubts. Group C. 5.4.3.6. External resources
In this question we asked the participants if they had consulted any external resources. There were 5 options offered to the participants: “No” (If they had not consulted any external resources, if they consulted they had to continue answering), “Machine Translation”, “Online Dictionaries”, “Microsoft Excel official Webpage” and “Other” with a box where they could introduce their own answer.
Three participants declared that they had not used external resources, there was not any participant who had used Machine Translation, fourteen participants declared to have used online dictionaries, four participants declared to have consulted the Microsoft Excel official Webpage. In the other boxes five participants provided additional information: one participant claimed to have used the Microsoft Language Portal and another participant stated to have used Microsoft terminology, two participants stated to have used the Google search engine (one of them using the images search option), and one participant said that he/she had used his/her own personal dictionary.
188 No MT Online Ms Other dictionaries Excel Participants 3 0 14 4 5 Table 24. External resources. Group C.
189 5.4.4. Overall
5.4.4.1. Topic of the text
We asked the participants about how familiar they were with the topic of the text. A six point scale was offered to the participants, number 1 represented the lowest point of the scale (total unfamiliar) and number 6 represented the highest point (very familiar).
Two participants selected option 1 (total unfamiliar), two participants selected option 2, four participants selected option 3, nine participants selected option 4, eleven participants selected option 5 and five participants selected option 6 (very familiar).
The arithmetic mean is 4.2, the median is 4 and the mode 5. The standard deviation is 1.3. It is slightly negatively skewed as the frequency is higher in the higher scores.
Topic of the text 12
10
8
6 Participants 4
2
0 1 2 3 4 5 6
Topic Mean 4.212121 Standard Error 0.237401 Median 4 Mode 5 Standard Deviation 1.363763 Sample Variance 1.859848 Kurtosis 0.236732
190 Skewness -0.80252 Range 5 Minimum 1 Maximum 6 Sum 139 Count 33
5.4.4.2. Experience working with Microsoft Office products
A six point scale was proposed to the participants, number 1 represented the lowest point of the scale (I have never worked with it) and number 6 represented the highest point (I am an advanced user). We present first an overview of the five results. You can see that participants had more experience in Microsoft Word (with an arithmetic mean of 5.6), followed by PowerPoint (5.1), Excel (4.6), Outlook (4.5) and Access (2.8). The results of each of the products are presented in the next section.
Experience with Microsoft products 25
20
Excel 15 Word PowerPoint 10 Access Outlook 5
0 1 2 3 4 5 6
Figure 60. Experience with Microsoft products. Overall.
Excel Word PowerPoint Access Outlook
Mean 4.60 Mean 5.66 Mean 5.12 Mean 2.87 Mean 4.57 Standard 0.18 Standard 0.09 Standard 0.10 Standard 0.27 Standard 0.24 Error Error Error Error Error
191 Median 5 Median 6 Median 5 Median 2 Median 5 Mode 5 Mode 6 Mode 5 Mode 2 Mode 5 Standard 1.05 Standard 0.54 Standard 0.60 Standard 1.59 Standard 1.41 Deviation Deviation Deviation Deviation Deviation Sample 1.12 Sample 0.29 Sample 0.37 Sample 2.54 Sample 2.00 Variance Variance Variance Variance Variance Kurtosis -0.20 Kurtosis 1.03 Kurtosis -0.15 Kurtosis -1.05 Kurtosis 0.04 Skewness -0.63 Skewness -1.36 Skewness -0.05 Skewness 0.40 Skewness -0.93 Range 4 Range 2 Range 2 Range 5 Range 5 Minimum 2 Minimum 4 Minimum 4 Minimum 1 Minimum 1 Maximum 6 Maximum 6 Maximum 6 Maximum 6 Maximu 6 m Sum 152 Sum 187 Sum 164 Sum 95 Sum 151 Count 33 Count 33 Count 32 Count 33 Count 33
We present below the results obtained on each of the questions:
Excel
The mean is 4.6, the median and the mode are 5. The standard deviation is 1. It is negatively skewed (0.6) as the frequency is higher in the higher scores.
Excel 16 14 12 10 8 Participants 6 4 2 0 1 2 3 4 5 6
Figure 61. Experience with Excel. Overall
192 Excel Mean 4.606061 Standard Error 0.184326 Median 5 Mode 5 Standard Deviation 1.058873 Sample Variance 1.121212 Kurtosis -0.20771 Skewness -0.6327 Range 4 Minimum 2 Maximum 6 Sum 152 Count 33
Word
The arithmetic mean is 5.6, the median is 6 and the mode 6. The standard deviation is 0.5. It is negatively skewed (-1.3) as the frequency is higher in the higher scores.
Word 25
20
15
Participants 10
5
0 1 2 3 4 5 6
Figure 62. Experience with Word. Overall.
Word Mean 5.666667 Standard Error 0.094013
193 Median 6 Mode 6 Standard Deviation 0.540062 Sample Variance 0.291667 Kurtosis 1.030151 Skewness -1.361 Range 2 Minimum 4 Maximum 6 Sum 187 Count 33
PowerPoint
The arithmetic mean is 5.1, the median is 5 and the mode 5. The standard deviation is 0.6. It is almost not skewed (-0.05) as the frequency is higher in the middle values.
PowerPoint 25
20
15
Participants 10
5
0 1 2 3 4 5 6
Figure 63. Experience with PowerPoint. Overall
PowerPoint Mean 5.125 Standard Error 0.10767 Median 5
194 Mode 5 Standard Deviation 0.609071 Sample Variance 0.370968 Kurtosis -0.15519 Skewness -0.05711 Range 2 Minimum 4 Maximum 6 Sum 164 Count 32
Access
The arithmetic mean is 2.8, the median is 2 and the mode 2. The standard deviation is 1.5. It is slightly positively skewed (0.4) as the frequency is higher in the lower scores.
Access 10 9 8 7 6 5 Participants 4 3 2 1 0 1 2 3 4 5 6
Access Mean 2.878788 Standard Error 0.277835 Median 2 Mode 2 Standard Deviation 1.596042
195 Sample Variance 2.547348 Kurtosis -1.05528 Skewness 0.40666 Range 5 Minimum 1 Maximum 6 Sum 95 Count 33
Outlook
The arithmetic mean is 4.5, the median is 5 and the mode 5. The standard deviation is 1.4. It is negatively skewed (-0.9) as the frequency is higher in the higher scores.
Outlook 12
10
8
6 Participants 4
2
0 1 2 3 4 5 6
Figure 64. Experience with Outlook. Overall.
Outlook Mean 4.575758 Standard Error 0.2463 Median 5 Mode 5 Standard Deviation 1.414883 Sample Variance 2.001894
196 Kurtosis 0.046646 Skewness -0.93897 Range 5 Minimum 1 Maximum 6 Sum 151 Count 33
5.4.4.3. Linguistic difficulty
A six point scale was proposed to the participants to measure the linguistic difficulty of the task, number 1 represented the lowest point of the scale (Very difficult (I needed to consult many terms and expressions)) and number 6 represented the highest point (Very easy (I could do it without consulting external resources)). One participant chose option 2, three participants chose option 3, eight participants chose option 4, eighteen participants chose option 5 and three participants chose option 6. The mean is 4.5, the median and the mode are 5. The standard deviation is 0.9. It is negatively skewed (-0.9) as the frequency is higher in the higher values.
Linguistic difficulty 20 18 16 14 12 10 Participants 8 6 4 2 0 1 2 3 4 5 6
Figure 65. Linguistic difficulty. Overall.
197 Difficulty Mean 4.575758 Standard Error 0.157094 Median 5 Mode 5 Standard Deviation 0.902438 Sample Variance 0.814394 Kurtosis 1.035127 Skewness -0.91941 Range 4 Minimum 2 Maximum 6 Sum 151 Count 33
5.4.4.4. Assistance
We asked participants in this question how many times they had asked for help. Eight options were offered to participants: None, 1, 2, 3, 4, 5, 6 and “Other”, where they could introduce their own answer.
Twenty four participants selected the option “none”, four participants said that they had asked once for help, two participants said that they had asked twice for help, two participants said that they had asked four times for help, and one participant said he/she had asked six times for help.
The arithmetic mean is 0.6, the median and the mode are 0, and the standard deviation is 1.4.
Assistance Mean 0.666667 Standard Error 0.248734 Median 0 Mode 0 Standard Deviation 1.428869 Sample Variance 2.041667
198 Kurtosis 6.348387 Skewness 2.549239 Range 6 Minimum 0 Maximum 6 Sum 22 Count 33
5.4.4.5. Doubts
In this question we asked the participants about the nature of the doubts that they had during the translation task. Five predefined answers were proposed to the participants: “linguistic related”, “technical related”, “CAT tool related”, “experiment instructions” and “other” where they could introduce their own answer. Participants could select more than one answer if applicable.
Twenty-seven participants declared that they had linguistic related doubts; four participants declared that they had technical related doubts; se venteen participants stated that they had CAT tool related doubts; and only two participants had doubts regarding the experiment instructions. Three participants had also added their doubts to the “other” box: one participant stated “When I started the translation I wanted to know how to apply information but I did not need it.”, another participant stated “Terminological doubts. I had proposals, but I wanted to use the official MS terminology.”, and the third participant stated: “I did not know how to change validated segments. I had difficulties in being able to make general searches for consistency purposes, so I opted for manual visual search. Linguistic: I was not familiar with two Excel-specific terms since I use that programme rarely and always in English. I trusted the suggested translation.”
Linguistic Technical CAT Experiment Other related related tool instructions related Participants 27 4 17 2 3 Table 25. Doubts. Overall External resources
199 In this question we asked the participants if they had consulted any external resources. There were 5 options offered to the participants: “No” (If they had not consulted any external resources, if they had consulted they had to continue answering), “Machine Translation”, “Online Dictionaries”, “Microsoft Excel official Webpage” and “Other” with a box where they could introduce their own answer.
Six participants declared that they had not used external resources, one of these six participants clarified in the “other” box that she had used “Just the internal concordance search of the CAT tool”. One participant declared to have used Machine Translation, twenty participants declared to have used online dictionaries, ten participants declared to have consulted the Microsoft Excel official Webpage. In the other boxes nine participants provided additional information: two participants claimed to have used the Microsoft Language Portal (one of them filtering by “Excel”), three participants stated to have used the Google search engine (one of them using the images search option), another participant stated to have used Microsoft terminology, one participant said that he/she had used his/her own personal dictionary, and one participant claimed to have used Linguee and Proz.
No MT Online Ms Other dictionaries Excel Participants 6 1 20 10 9 Table 26. External resources. Overall
200 5.5. Task Specific Questionnaire. Group C, additional questions
In this section we present the data that was obtained from the additional questions that only Group C received in the task specific questionnaire. These questions aimed to obtain information about the participants’ experience with the metadata offered. The complete set of answers from all the participants in Group C can be found in Appendix F.
Consulted metadata items
In this question we asked participants which metadata items they had consulted. The five metadata items were offered to them as possible answers (contact-name, date, target-language, original and category) they could select more than one option if applicable.
Eighteen participants said that they had consulted the name of the translator (contact- name), eleven participants said that they had consulted the date of the translation, eleven participants selected the target language, three participants the name of the original file (original) and four participants the topic of the translation (category).
Metadata Participants contact-name 18 date 11 target-language 11 original 3 category 4
Useful metadata items
In this question we asked participants which metadata items they had found more useful. The five metadata items were offered to them as possible answers (contact- name, date, target-language, original and category) they could select more than one option if applicable.
201 The results are similar to the previous questions but with less quantity in each option. Thirteen participants selected the name of the translator (contact-name), four participants selected the date of the translation, eight participants selected the target language, two participants selected the name of the original file (original) and two participants selected the topic of the translation (category).
Metadata Participants contact-name 13 date 4 target-language 8 original 2 category 2
Less useful metadata items
In this question we asked participants which metadata items they found less useful, this question, which was related to the previous one, aimed to obtain information about the less useful metadata items and also to test the internal validity of the questionnaire, as it could help us to determine if the participants were answering truthfully (if we found any consistency in both answers) and not randomly (if we found any inconsistency between both answers, for example, the same participant could have been marking the same options in both opposite questions). The five metadata items were offered to them as possible answers (contact-name, date, target-language, original and category) they could select more than one option if applicable.
The results from this question showed us that there was a consistency in the way the participants answered this question and the previous one, as the answers to this question were different from the previous one. Three participants selected the name of the translator (contact-name), five participants selected the date of the translation, four participants selected the target language, six participants the name of the original file (original) and five participants the topic of the translation (category).
Metadata Participants contact-name 3 date 5
202 target-language 4 original 6 category 5
Not consulted metadata items
In this question we asked participants which metadata items they had not consulted. This question, which was related to the first question (which metadata items had you consulted) aimed to obtain the information about not consulted items and also to test the internal validity of the questionnaire, as it could help us to determine if the participants were answering truthfully and not randomly. The five metadata items were offered to them as possible answers (contact-name, date, target-language, original and category) they could select more than one option if applicable.
The results from this question showed us that there was a consistency in the way participants answered to this question and the first one, as the answers to this question were different from the first one. Three participants said that they had not consulted the name of the translator (contact-name), six participants said that they had not consulted the date of the translation, four participants selected the target language, thirteen participants the name of the original file (original) and twelve participants the topic of the translation (category).
Metadata Participants contact-name 3 date 6 target-language 4 original 13 category 12
Could you explain in one sentence how has the metadata influenced your work?
Nineteen participants answered to this question, their answers are presented below, an analysis of their content will be presented in the next chapter.
203 Participant Explain in one sentence how has the metadata influenced your work GC1 I checked the validity of the version, reliability and Spanish variant by looking at the metadata. GC2 Not much, as I had some experience translating Microsoft products. But I "respected" most of the provided TM segments as I assumed they were "approved", so I do not tend to change "approved" segments unless I find major mistakes in them. GC3 If the contact name is "Antonio Garcia" as the official translator, the suggested translation is reliable. GC4 Name did not influence that much since there were translation from the official translator that I thought could have been improved. Language variants is important only if translation are very old (2003-2004 or so) GC5 Knowing that Antonio Garcia’s translation after 2008 was considered official it helped me maintain his style and choice of terminology even if I didn´t agree. GC6 I didn’t influence my work to a large extent, I only associated the translator with how they had translated the segments. GC7 When I saw that the contact name was Antonio García, I´d try to stick as much as possible to his translation. Otherwise, I´d try to give my best translation (having in mind the rest of the text). 2. If I had a target language other than ES- ES, I reviewed and though more my translation. Since this text was quite technical, the chances to find a localism were minimal (although the software could have several Spanish versions, so the names of the options, menus functions, etc. could differ). 3. I realised of the original field quite in the end43. I think the last 2 fields can be very useful in a real work. GC8 It usually do not influence my translation, as whoever translated it if I believe I should change the translation I do it. May be, if I do really trust a translation/translator I do not check the terms. GC9 Only when I had a doubt with a word did I look at the metadata it[e]m "original", to see if it could help. GC10 Absolutely nothing. Didn´t find any use to them. GC11 n/a GC12 Metadata were very useful GC13 To confirm choices GC14 To know if I can trust the proposed translation.
43 We believe that the participant meant to say that he/she did not “notice” that there was the item “original” until almost the end of his/her task. 204 GC15 I considered that the translations of the approved translator (Antonio García), were quite official since they correspond to an official translation. However, I did not take his translations for granted, and sometimes I changed some things. I also used other translations by Spanish (es-ES) translators and by other translators (es-MX, es-CO, ....). When I see other varieties of Spanish other than es-ES I do not only look for differences within the context, but also for stylistic differences. Therefore, I read them more carefully, but I also trust them if I find out that they are right. GC16 If I was not sure on how to write the sentence I consulted the metadata. GC17 I've found it really useful when deciding whether or not to accept the proposed translation. GC18 Approved translations conditioned my translation in a higher degree than not approved ones; target information allowed me to understand when I had the option to choose different terminology/syntax. GC19 It helped me to make decisions GC20 It helped me decide whether to use or not the already existing segment.
Would you suggest any other metadata item?
Eight participants provided information in this question, their suggestions were related to: including information about the approved/final translation state (suggested by GC4and GC7); information about the project and client (suggested by CG6 and GC8), information about the value for the customer (suggested by GC10); and information about the “degree of trustworthiness” (suggested by GC20). GC9 considers that it was enough to have information about the "language combination", "date" and "name of the translator".
Participant MD suggestion GC1 No GC2 n/a GC3 No GC4 "Translation validated by excel" (having the translation of the official translator does not necessarily mean that this translation is the final translation available on the markets now, it could be a previous unreviewed version. GC5 No GC6 Yes, further details on the source of the translation, e.g. Project name, client
205 name... GC7 The translation status: is it pending review or has already been reviewed by somebody else? GC8 Preference/approval of the client/agency, mainly regarding terminology. GC9 Suggest the translation of the term in other languages known by the translator. GC10 Yes: value for the customer. GC11 n/a GC12 No GC13 n/a GC14 No GC15 I cannot think of any in this moment. GC16 n/a GC17 I do think it is enough with "language combination", "date" and "name of the translator". GC18 Not at this point GC19 n/a GC20 Antonio García is the "official" translator. This piece of information is critical, but is not a metadata item. I really do not care about the translator's name as long as I know he/she is trustworthy. The relevant information is not the name, but the degree of trustworthiness of the person who translated the segment. As a translator, I might have no idea who Antonio García is, which would render this metadata item useless. Of course I am aware of the difficulty of introducing a "trustworthiness degree" item.
If you could choose between having a translation memory with metadata or another one without metadata, which one would you prefer?
For this question two predefined answers were given to the participants: “translation memory with metadata”, that was selected by eighteen participants, and “translation memory without metadata”, that was selected by two participants.
How distracted were you by the metadata during the translation task?
A six point scale was offered to the participants, number 1 represented the lowest point of the scale (I was confused by the amount of data) and number 6 represented the highest point (I was not distracted at all, it only helped me). Most of the participants stayed in the higher levels which indicated less/no distraction, nine participants selected 206 option 6, eight participants selected option 5, one participant selected option 4 and two participants selected option 2. The mean is 5.3, the median 5, the mode 6 and the standard deviation 0.9. It is negatively skewed (-1.2) as the frequency is higher in the higher scores.
Level of distraction Participants 1 0 2 0 3 2 4 1 5 8 6 9
Level of distraction Mean 5.2 Standard Error 0.212751 Median 5 Mode 6 Standard Deviation 0.951453 Sample Variance 0.905263 Kurtosis 1.099614 Skewness -1.25471 Range 3 Minimum 3 Maximum 6 Sum 104 Count 20
Did the metadata influence you to do a better job?
We offered two predefined answers to this question: “yes”, that was selected by twelve of the participants and “no” that was selected by eight of the participants.
If yes, could you give us an example?
207 Fourteen44 participants answered to this question, their answers are presented below, an analysis of their content will be presented in the next chapter.
Participant Examples GC1 I preferred options by Antonio García, the official translator and made changes to the translations of other translators accordingly. GC2 n/a GC3 I had some doubts about the term "axis", but it has been translated as "eje" on translated segments by Antonio Garcia. GC4 I said no because even if metadata influenced me, I would still check the actual text and make as many changes as needed. I think only used one segment directly suggested by the programme GC5 As mentioned earlier, it helped me maintain consistency with official translation. GC6 I tried to stick to the official translation (the one provided by Antonio García) GC7 When I saw a Antonio Garcia’s segment, I didn´t have to worry about its accurateness. Also, when I saw that all of the translations were previous to 2011, I knew that I could update the word "sólo" to make it match the new RAE norm. GC8 n/a GC9 n/a GC10 n/a GC11 I think they haven’t changed during the whole translation. GC12 Less time spared. GC13 n/a GC14 n/a GC15 I have no concrete example, but when I know when the translation was done, in which variety of Spanish, by which translator, in which context,... It is easier to know if the translation applies to the text I am translating or not. I t may happen that the same sentence appears in two text about different topics, if you know which text that translation comes from, you may quickly know if you can use it on your text or not. GC16 I was giving an incorrect order in Spanish. GC17 Knowing the name of the official translator, we provide a translation which is consistent with the previous official ones.
44 Two of the participants (GC4 and GC11) who had answered “No” in the previous question, answered as well this question to better justify their choice. 208 GC18 I wouldn’t consider it a better job, but it allowed me to be consistent with the terminology and style that have been previously approved, even though at times I would have chosen other solutions. GC19 I'm not quite sure it did, but having information about the category helped me contextualize the different sentences. GC20 I corrected a term base on the term that was used by the "official" translator.
Additional comments
Nine participants provided information in the additional comments box.
GC1 stated that “The server was bit too slow sometimes, maybe it was my connection.” GC2 suggested that he/she might have worked better if he/she had had more experience on the CAT tool used on the experiment, he/she added that “I think the use of the CAT tool is very important to get a real feel of how both the tool and metadata can influence our work and time spent on our work.” GC4 suggested giving the translators more than one translation suggestion, so translators could choose between them based on the text quality and metadata. GC6 stated that “[t]he tool works very fast and is very useful. I like the number of options it offers.” GC10 said that the metadata could be very helpful if associated with the previous instructions from the customer. GC13 explained a problem he/she had with one of the sentences; he/she did not agree that the original sentence was consistent. GC15 explained a technical problem he/she had when changing from one segment to the other in the CAT tool. And finally, GC18 states he/she believes “having metadata is very useful. Thinking about larger projects where the amount of information available within a database is very wide, having information that pinpoints to the right decisions may make your job easier and limit the amount of unpreferred terminology used. However, I can also see that if no clear information is provided about metadata at the beginning of a project, if this is a lot, it can confuse translators rather than help them. This would be to a major extent when dealing with large TMs where a lot of information is provided, besides risking having to spend additional time reading the notes for each new entry.”
Participant Additional comments GC1 No comments. The server was bit too slow sometimes, maybe it was my connection.
209 GC2 I think that for the experiment to work out better, one would need to have a "practice" period of at least a couple of texts before working on the experiment. I think the use of the CAT tool is very important to get a real feel of how both the tool and metadata can influence our work and time spent on our work. GC3 n/a GC4 I found this experiment interesting. A suggestion: it is sometimes more useful for a translator to have several matches suggested by the programme so that translator can choose depending on text quality and metadata. If only one hit is offered, translator might tend to just copy the suggested match and improve it or change it as appropriate. good work!! GC5 n/a GC6 The tool works very fast and is very useful. I like the number of options it offers. GC7 n/a GC8 n/a GC9 I would be interested in knowing the final conclusions of the experiment. Thank you GC10 Metadata can be very useful associated to previous instructions from the customer. E.g. "all TMUs with code XXX are considered extremely reliable" GC11 n/a GC12 n/a GC13 I had an issue with the last sentence which I think it was wrong. Fewer number of cells would mean less time for Excel to calculate, not more. I made an alteration to the source language. In a real situation I would have consulted the client to ask for more information or to ascertain the accuracy of the text. I always work on my office and the fact that my family were in the house at the time of the experiment has affected my concentration which would not have been the case in normal circumstances. GC14 n/a GC15 When translating with Swordfish, I could not go from one segment to the next by using the arrows or the mouse. They would not open when clicking on them or trying to move from one to another. I had to go through the translation using ctrl+down arrow. At the end of the translation, I tried to go back to change the first segment, I could not open it again. I asked Lucía and we decided to stop the experiment there. i do not know what the exact problem was, but it made it
210 difficult to change my translation after doing it for the first time. GC16 n/a GC17 n/a GC18 I believe having metadata is very useful. Thinking about larger projects where the amount of information available within a database is very wide, having information that pinpoints to the right decisions may make your job easier and limit the amount of unpreferred terminology used. However, I can also see that if no clear information is provided about metadata at the beginning of a project, if this is a lot, it can confuse translators rather than help them. This would be to a major extent when dealing with large TMs where a lot of information is provided, besides risking having to spend additional time reading the notes for each new entry. GC19 n/a GC20 n/a
211 5.6. Review of the Translation
Measuring the quality of translation, and in language in general, is always something difficult to do, it is a very subjective area that is normally taken for granted (García 2008 p.51); we agree with García that the perception of quality depends on the subjects or the public who will “consume” the translated text, however we should also qualify his statement by saying that quality is not an underestimated topic in the translation area, on the contrary, it has been subject of much study and discussion (Ellen Wright 2006 pp.252-555). In order to evaluate the quality of the translation task in quantitative terms, we explored the different existing quality metrics on translation, these quantitative systems could be defined as “Metric-related expressions of translation adequacy yield formal quantifiable numerical values by providing lists of critical characteristics, which are weighted numerically in order to objectify any errors that may be present” (Ellen Wright 2006 p.259). This author identified three existing translation quality metrics: SAE J 2450 (whose “scope is to develop a metric for the evaluation of translation quality for service documentation in the automotive industry”); the ATA Framework for Standard Error Marking (which provided “a far more comprehensive evaluative tool than the SAE metric, but this higher level of complexity reflects the fact that it is designed for assessing a full range of text types and subject matters”) and the LISA QA Model which it was intended “for measuring not only translation quality, but also all aspects of the localisation process as well”.
For the evaluation of the quality of our experiments we chose the LISA QA Model. More specifically, we worked with its Printed and Online Documentation (Microsoft Excel) template from version 1.0, adapted to the latter requirements of version 3.1 (explained in detail later), which included the following error categories: Accuracy (Omissions, Additions, Cross-references, Headers/Footers), Terminology (Glossary adherence, Context), Language (Grammar, Semantics, Punctuation, Spelling), Style (General style, Register/Tone, Language variants/slang) and Country (Country standards, Local suitability, Company standards) (LISA 1995).
The errors were also graded as follows:
A= Critical error (max. error point allowed +1)
B= Major error (5 error points) 212 C= Minor error (1 error point)
In LISA QA Model 3.1 (2007) the quality analysis was divided into seven tasks: Documentation language, Documentation Formatting, Help Formatting, Help Formatting-Asian, Software Formatting, Software Functionally Testing and Documentation Formatting-Asian. Due to the nature of our translation text, we were only focused on the first task of the model (“Documentation Language”). This task had seven possible error categories:
-Mistranslation
-Accuracy (sub-categories: Omissions, additions, cross-references, and headers/footers).
-Terminology (sub-categories: glossary adherence, context).
-Language (sub-categories: grammar, semantics, punctuation and spelling).
-Style (sub-categories: general style, register/tone and language variants/slang)
-Country (sub-categories: country standards, local suitability and company standards).
-Consistency.
If we compare it with version 1.0, we can observe that the new 3.1 system included two more categories (“consistency” – which was part of the “functionality” task of the product documentation- and “mistranslation”). The subcategories of the other five were the same as in version 1.0. Therefore, for our analysis we modified the 1.0 template (Printed [and Online] Documentation Long, Prdclng.xls) to accommodate the changes described in version 3.1.
In order to avoid any bias from the researcher, the analysis of the participants’ translation task was carried by an external professional reviewer. The reviewer received a zip file which contained all the translation XLIFF files, which were pre-processed and transformed into Microsoft .docx format to help45 him in his task. Each of the files had
45 XLIFF files are XML files are normally intended to be read and transformed using CAT tools, although as they are XML file they can be opened with tools that process that format, we decided to 213 the original participant number (e.g. P1 or P71), the reviewer did not receive any information about the origin of individuals translators. We only told him that translations could come from professional translators or translation students, and that there was no relation whatsoever with the order of the numbers. The reviewer also received the template and he used a different worksheet for each of the participants.
Figure 66. Screenshot of one of participants’ LISA QA Model results The template started with a 100% percentage, which was the maximum punctuation that a translation would get if no errors were found by the translator. In our case, we were not expecting perfect, as we had stated in the translation task assignment instructions that they should only translate, and not review their work. Therefore, the translation we transform the translated XLIFF files into docx documents and only present the
In the next section we present a summary of the data obtained from the review of the translation obtained with the LISA QA Model. The number that is calculated for each of the participants represents the points that the translator had in each task, they could be negative (and, in fact, they were in most of the cases), as each error deducts a certain number of points depending of the type of error and its severity. In Appendix G, a compilation of all the evaluation sheets of all the participants can be found.
5.6.1. Group A
All participants received a negative score in the analysis. The results varied between the participants (the standard deviation is 47.4). The arithmetic mean is -79, the median is - 97 and the mode -124. It is positively skewed as the frequency is higher in the lower values.
Quality 0
-20
-40
-60 Quality -80
-100
-120
-140
Figure 67. LISA QA Model results. Group A.
215 Quality Mean -79 Standard Error 17.9417 Median -97 Mode -124 Standard Deviation 47.46929 Sample Variance 2253.333 Kurtosis -1.99281 Skewness 0.480806 Range 106 Minimum -124 Maximum -18 Sum -553 Count 7
5.6.2. Group B
All participants received a negative score in the analysis, but in general are higher than those from group A and slighter higher than group C. The arithmetic mean is -39, and the median is -38. It is negatively skewed (-0.6) as the frequency is higher in the higher values. The standard deviation (27) was not as high as in Group A.
Quality 0 -10 -20 -30 -40 Quality -50 -60 -70 -80 -90
Figure 68. LISA QA Model results. Group B.
216 Quality Mean -39 Standard Error 11.04234 Median -38 Mode #N/A Standard Deviation 27.04811 Sample Variance 731.6 Kurtosis 2.08478 Skewness -0.68331 Range 83 Minimum -85 Maximum -2 Sum -234 Count 6
5.6.3. Group C
All participants received a negative score in the analysis, but in general are higher than those from group A, but slightly lower than group B. The arithmetic mean is -42, the median is -24.5 and the mode is -12. It is negatively skewed (-1.8) as the frequency is higher in the higher values. The standard deviation (44) is similar to group A.
Quality 0 -20 -40 -60 -80 -100 Quality -120 -140 -160 -180 -200
Figure 69. LISA QA Model results. Group C.
217 Quality Mean -42 Standard Error 10.00789 Median -24.5 Mode -12 Standard Deviation 44.75665 Sample Variance 2003.158 Kurtosis 3.681946 Skewness -1.83449 Range 177 Minimum -180 Maximum -3 Sum -840 Count 20
5.6.4. Overall
All participants received a negative score in the analysis, but in general are higher than those from group A, but slightly lower than group B. The arithmetic mean is -49, the median is -37 and the mode is -124. It is negatively skewed (-1.1) as the frequency is higher in the higher values.
If we take into account the arithmetic mean of the three groups, Group A clearly obtained worse results (-79), a small difference between the scores of Group B (-39) and C (-42) was observed. Correlation analysis between all the variables computed will be presented in the next chapter, where we determine if there were significant relationships between the quality and other variables (years of experience, experience using Swordfish, etc.).
Quality
Mean -49.303 Standard Error 7.730974 Median -37 Mode -124 Standard Deviation 44.41107 218 Sample Variance 1972.343 Kurtosis 0.82516 Skewness -1.16351 Range 178 Minimum -180 Maximum -2 Sum -1627 Count 33
Quality 0 Group A Group B Group C -10 -20 -30 -40 Quality -50 -60 -70 -80 -90
Figure 70. LISA QA Model results. Arithmetic mean all groups.
219 Quality all participants 0 -20 -40 -60 -80 Group A -100 Group B -120 Group C -140 -160 -180 -200
Figure 71. LISA QA Model results. Overall
220 5.7. Video observations
The movements of the screen were recorded using the software application BB FlashBack Express. Participants were told to activate the software (by clicking in a shortcut of the application that was in the desktop of the server, and then clicking on the recording (red) button). Participants were asked to stop the recording after they had finished their task and name the output file with their correspondent participant name.
This application recorded two types of information: the video image which contained the movements of the screen and which was presented in the proprietary .fbr46 format, and an XML file with the keystroke information.
5.7.1. Analysis of the Keylog information
From the XML we obtained the typing information of each of the participants, which gave us a detail dataset of the data introduced by the participants. In order to analyse the information included in those files we introduced it into a spreadsheet (using Microsoft Excel) and we made use of the "IFCount" formula to automate the analysis of the data. The tool already divided the data obtained following a proprietary XML vocabulary47, here is an example of one of these XML files with the four elements that were recorded:
The root element is
46 This format can also be transformed into standardised video formats like .avi 47 The description of the XML vocabulary here presented is based on our own observation of the vocabulary through the files obtained from the participants. An XML Schema or DTD was not available, therefore we interpreted the vocabulary based on the sample of files we had. 221 of the keys that were pressed, it only includes keys that represent letters48 and numbers, the attribute “timestamp” includes the information of the time when that key was pressed. The
From the analysis of all the files obtained from the participants we obtained the following combination of two or three keys (that we called "multiple virtual keys" in the analysis, and that are traditionally called “shortcuts” or “accelerators”): [Alt]+[], [Alt]+[+], [Alt]+[a], [Alt]+[á], [Alt]+[ArwDown], [Alt]+[ArwUp], [Alt]+[CpLk], [Alt]+[e], [Alt]+[Esc], [Alt]+[f], [Alt]+[F4], [Alt]+[i], [Alt]+[Ins], [Alt]+[NUM 0], [Alt]+[NUM 1], [Alt]+[NUM 2], [Alt]+[NUM 3], [Alt]+[NUM 4], [Alt]+[NUM 5], [Alt]+[NUM 6], [Alt]+[NUM 7], [Alt]+[PrntScr], [Alt]+[q], [Alt]+[Ret], [Alt]+[s], [Alt]+[Spc], [Alt]+[TAB], [Ctrl]+, [Ctrl]+[=], [Ctrl]+[A], [Ctrl]+[B], [Ctrl]+[Alt]+[Ret], [Ctrl]+[Alt]+[U], [Ctrl]+[ArwDwn], [Ctrl]+[ArwLft], [Ctrl]+[ArwRght], [Ctrl]+[ArwUp], [Ctrl]+[BkSp], [Ctrl]+[C], [Ctrl]+[Del], [Ctrl]+[E], [Ctrl]+[F], [Ctrl]+[F4], [Ctrl]+[HOME], [Ctrl]+[L], [Ctrl]+[O], [Ctrl]+[p], [Ctrl]+[R], [Ctrl]+[Ret], [Ctrl]+[Shift]+[ArwLft], [Ctrl]+[Shift]+[ArwRght], [Ctrl]+[Shift]+[HOME], [Ctrl]+[T], [Ctrl]+[TAB], [Ctrl]+[V], [Ctrl]+[W], [Ctrl]+[X], [Ctrl]+[Y], [Ctrl]+[Z], [Shift]+[ArwDwn], [Shift]+[ArwLft], [Shift]+[ArwRght], [Shift]+[ArwUp], [Shift]+[BkSp], [Shift]+[END], [Shift]+[HOME], [Shift]+[PgUp], [Shift]+[Ret], and [Shift]+[Win]; and the following virtualkeys: ArwDwn, ArwLft, ArwRght, ArwUp, BkSp, CpLk, Ctrl, Del, END, Esc, F1, F8,HOME, Ins, NmLk, NUM 1, NUM 7, PgDwn, PgUp, Ret, Spc, TAB, and Win.
48 Letters with accents, that are produced using the Spanish Keyboard layout as a combination of two keys, were presented in the XML file as one single key. 222 As explained below, we analysed the keylog information focusing on two categories that the XML differentiates: keys and virtual keys (we also divided this last group between combination of keys –that we called “multiple virtual keys” – and single keys). We present a descriptive analysis of the data obtained divided by each group and finally an overall view of the results. In Appendix H, we present all the data obtained from all the groups.
5.7.1.1. Group A
The arithmetic mean of the total number of keys introduced is 5337.8, the median 5455 and the standard deviation 571.7, it is positively skewed (0.4) as the frequency is higher in the lower scores. The arithmetic mean of the normal keys (letters and numbers) is 3866.4, the median is 3870, the standard deviation is 308.9 it is slightly positively skewed (0.1). The arithmetic mean of the multiple virtual keys (a combination of two or three keys pressed together) is 174.4, the median is 78, it is positively skewed (0.3). The arithmetic mean of the single virtual keys (individual keys that are not numbers or letters) is 1297, the median is 1277, the standard deviation is 325, it is positively skewed (0.5). The arithmetic mean of the total number of virtual keys (a sum of the multiple and single keys) is 1471.4, the media is 1531, the standard deviation is 346.3, it is positively skewed (0.5).
Keys Multiple Vkeys SingleVkeys TotalVirtualkeys Total Keys
Mean 3866. Mean 174.4 Mean 1297 Mean 1471. Mean 5337. 429 286 429 857 Standard 116.7 Standard 72.43 Standard 122.8 Standard 130.8 Standard 216.1 Error 657 Error 613 Error 737 Error 996 Error 197 Median 3870 Median 78 Median 1277 Median 1531 Median 5455 Mode #N/A Mode #N/A Mode #N/A Mode #N/A Mode #N/A Standard 308.9 Standard 191.6 Standard 325.0 Standard 346.3 Standard 571.7 Deviation 33 Deviation 48 Deviation 933 Deviation 278 Deviation 99 Sample 9543 Sample 3672 Sample 1056 Sample 1199 Sample 3269 Variance 9.62 Variance 8.95 Variance 85.7 Variance 43 Variance 54.1 Kurtosis - Kurtosis - Kurtosis - Kurtosis - Kurtosis - 0.514 2.676 0.800 0.036 1.733 53 84 86 61 43 Skewness 0.140 Skewness 0.320 Skewness - Skewness 0.582 Skewness - 521 217 0.573 411 0.492 62 96
223 Range 877 Range 397 Range 891 Range 993 Range 1322 Minimum 3464 Minimum 0 Minimum 776 Minimum 1071 Minimum 4593 Maximum 4341 Maximum 397 Maximum 1667 Maximum 2064 Maximum 5915 Sum 2706 Sum 1221 Sum 9079 Sum 1030 Sum 3736 5 0 5 Count 7 Count 7 Count 7 Count 7 Count 7
Participant Keys Multiple SingleVkeys TotalVirtualkeys Total Vkeys Keys GA1 3851 397 1667 2064 5915 GA2 3464 357 776 1133 4597 GA3 4123 377 1277 1654 5777 GA4 3894 0 1262 1262 5156 GA5 3522 78 993 1071 4593 GA6 3870 4 1581 1585 5455 GA7 4341 8 1523 1531 5872
5.7.1.2. Group B
There was a technical problem with the videos of participants GB3 and GB5, participant GB5’s video was divided into two parts and we decided not to include his/her data in the keylog analysis as it might not be complete and therefore might have altered our results.
The arithmetic mean of the total number of keys introduced is 3166.2, the median 2734 and the standard deviation 1231.9, it is positively skewed (1.7) as the frequency is higher in the lower scores. The arithmetic mean of the normal keys (letters and numbers) is 1534.5, the median is 776, the standard deviation is 1533.7 it is slightly positively skewed (1.9). The arithmetic mean of the multiple virtual keys (a combination of two or three keys pressed together) is 163.7, the median is 199.5, it is positively skewed (1.6). The arithmetic mean of the single virtual keys (individual keys that are not numbers or letters) is 1467, the median is 1464, the standard deviation is 320.1, it is positively skewed (0.3). The arithmetic mean of the total number of virtual keys (a sum of the multiple and single keys) is 1631.7, the media is 1675.5, the standard deviation is 409.3, it is positively skewed (0.3).
224 Participant Group Keys Multiple SingleVkeys TotalVirtualkeys Total Vkeys Keys GB1 B 787 242 1653 1896 0 GB2 B 765 180 1275 1455 4977 GB3 B GB4 B 3835 14 1128 1142 2785 GB5 B GB6 B 751 219 1815 2034 841
Keys Multiple Vkeys SingleVkeys TotalVirtualkeys Total Keys
Mean 1534. Mean 163.7 Mean 1467. Mean 1631. Mean 3166. 5 5 75 75 25 Standard 766.8 Standard 51.53 Standard 160.0 Standard 204.6 Standard 615.9 Error 691 Error 053 Error 736 Error 804 Error 739 Median 776 Median 199.5 Median 1464 Median 1675. Median 2734 5 Mode #N/A Mode #N/A Mode #N/A Mode #N/A Mode #N/A Standard 1533. Standard 103.0 Standard 320.1 Standard 409.3 Standard 1231. Deviation 738 Deviation 611 Deviation 472 Deviation 608 Deviation 948 Sample 2352 Sample 1062 Sample 1024 Sample 1675 Sample 1517 Variance 353 Variance 1.58 Variance 94.3 Variance 76.3 Variance 696 Kurtosis 3.998 Kurtosis 2.743 Kurtosis - Kurtosis - Kurtosis 3.327 129 36 3.846 2.868 324 8 77 Skewness 1.999 Skewness - Skewness 0.037 Skewness - Skewness 1.754 44 1.648 609 0.383 837 03 41 Range 3084 Range 228 Range 687 Range 892 Range 2757 Minimum 751 Minimum 14 Minimum 1128 Minimum 1142 Minimum 2220 Maximum 3835 Maximum 242 Maximum 1815 Maximum 2034 Maximum 4977 Sum 6138 Sum 655 Sum 5871 Sum 6527 Sum 1266 5 Count 4 Count 4 Count 4 Count 4 Count 4
225 5.7.1.3. Group C
The arithmetic mean of the total number of keys introduced is 2835.7, the median 2483 and the standard deviation 1615.9, it is positively skewed (0.7) as the frequency is higher in the lower scores. The arithmetic mean of the normal keys (letters and numbers) is 1537.7, the median is 1311.5, the standard deviation is 903.7, it is positively skewed (1). The arithmetic mean of the multiple virtual keys (a combination of two or three keys pressed together) is 289.2, the median is 109.5 and the mode is 58, it is positively skewed (1.2). The arithmetic mean of the single virtual keys (individual keys that are not numbers or letters) is 1012, the median is 629.5, the standard deviation is 1140.4, it is positively skewed (2.3). The arithmetic mean of the total number of virtual keys (a sum of the multiple and single keys) is 1301.6, the media is 1068, the standard deviation is 1152.9, it is positively skewed (2.3).
Keys Multiple Vkeys SingleVkeys TotalVirtualkeys Total Keys
Mean 1537. Mean 289.2 Mean 1012. Mean 1301. Mean 2838. 15 5 05 6 75 Standard 202.0 Standard 75.17 Standard 255.0 Standard 257.7 Standard 361.3 Error 79 Error 005 Error 138 Error 995 Error 458 Median 1311. Median 109.5 Median 629.5 Median 1068. Median 2483 5 5 Mode #N/A Mode 58 Mode #N/A Mode #N/A Mode #N/A Standard 903.7 Standard 336.1 Standard 1140. Standard 1152. Standard 1615. Deviation 246 Deviation 707 Deviation 456 Deviation 914 Deviation 988 Sample 8167 Sample 1130 Sample 1300 Sample 1329 Sample 2611 Variance 18.1 Variance 10.7 Variance 641 Variance 212 Variance 416 Kurtosis 0.369 Kurtosis 0.303 Kurtosis 6.397 Kurtosis 5.083 Kurtosis - 193 664 656 119 0.530 03 Skewness 1.004 Skewness 1.255 Skewness 2.582 Skewness 2.309 Skewness 0.760 187 366 253 825 146 Range 3174 Range 1086 Range 4529 Range 4468 Range 5304 Minimum 506 Minimum 9 Minimum 175 Minimum 280 Minimum 841 Maximum 3680 Maximum 1095 Maximum 4704 Maximum 4748 Maximum 6145 Sum 3074 Sum 5785 Sum 2024 Sum 2603 Sum 5677 3 1 2 5 Count 20 Count 20 Count 20 Count 20 Count 20
226 Participant Keys Multiple Vkeys SingleVkeys TotalVirtualkeys Total Keys
GC1 1646 553 3606 4159 5805 GC2 561 105 175 280 841 GC3 1049 114 433 547 1596 GC4 2051 139 1191 1330 3381 GC5 750 784 275 1059 1809 GC6 825 270 808 1078 1903 GC7 1350 641 438 1079 2429 GC8 2072 1095 602 1697 3769 GC9 506 58 461 519 1025 GC10 1997 58 657 715 2712 GC11 1397 44 4704 4748 6145 GC12 653 12 473 485 1138 GC13 2748 559 842 1401 4149 GC14 819 115 701 816 1635 GC15 1242 901 394 1295 2537 GC16 3251 71 1501 1578 4829 GC17 1273 69 401 470 1743 GC18 3680 9 1297 1306 4986 GC19 789 98 588 686 1475 GC20 2084 90 694 784 2868
227 5.7.1.4. Overall
The arithmetic mean of the total number of keys introduced is 3445.3, the median 2868 and the standard deviation 1721.1, it is positively skewed (0.1) as the frequency is higher in the lower scores. The arithmetic mean of the normal keys (letters and numbers) is 206.7, the median is 1646, the standard deviation is 1323.6, it is positively skewed (0.4). The arithmetic mean of the multiple virtual keys (a combination of two or three keys pressed together) is 247.1, the median is 114 and the mode is 58, it is positively skewed (1.5). The arithmetic mean of the single virtual keys (individual keys that are not numbers or letters) is 1135.1, the median is 842, the standard deviation is 941.3, it is positively skewed (2.4). The arithmetic mean of the total number of virtual keys (a sum of the multiple and single keys) is 1382.5, the media is 1262, the standard deviation is 947.1, it is positively skewed (2.3).
Keys Multiple Vkeys SingleVkeys TotalVirtualkeys Total Keys Mean 2062.77 Mean 247.129 Mean 1135. Mean 1382. Mean 3445. 4194 0323 194 548 323 Standard 237.734 Standard 51.8448 Standard 169.0 Standard 170.1 Standard 309.1 Error 4579 Error 0028 Error 667 Error 112 Error 328 Median 1646 Median 114 Median 842 Median 1262 Median 2868 Mode #N/A Mode 58 Mode #N/A Mode #N/A Mode #N/A Standard 1323.64 Standard 288.659 Standard 941.3 Standard 947.1 Standard 1721. Deviation 9443 Deviation 6314 Deviation 235 Deviation 389 Deviation 179 Sample 1752047 Sample 83324.3 Sample 88609 Sample 89707 Sample 29624 Variance .847 Variance 828 Variance 0 Variance 2.2 Variance 55 Kurtosis - Kurtosis 1.68936 Kurtosis 7.210 Kurtosis 6.429 Kurtosis - 1.46143 4252 449 5 1.462 906 98 Skewness 0.43845 Skewness 1.53221 Skewness 2.436 Skewness 2.310 Skewness 0.134 5587 5555 733 101 987 Range 3835 Range 1095 Range 4529 Range 4468 Range 5304 Minimum 506 Minimum 0 Minimum 175 Minimum 280 Minimum 841 Maximum 4341 Maximum 1095 Maximum 4704 Maximum 4748 Maximum 6145 Sum 63946 Sum 7661 Sum 35191 Sum 42859 Sum 10680 5 Count 31 Count 31 Count 31 Count 31 Count 31
If we compare the results of the three groups, the most significant difference that can be stated is in the number of keys (that also has an influence on the total number of keys) that Group A has a bigger score on the normal keys (3866), followed by group B (1679)
228 and group C (1537). Group A did not have the translation memory, and had to type all the text. This figure also indicates that Group B and Group C used the TM.
Keys Multiple Vkeys Single Vkeys TotalVirtualkeys Total Keys Group A 3866.429 174.4286 1297 1471.429 5337.857 Group B 1534.5 163.75 1467.75 1631.75 3166.25 Group C 1537.15 289.25 1012.05 1301.6 2838.75
Keylog of three groups 6000 5000 4000 3000 Group A 2000 Group B 1000 Group C 0
229 5.7.2. Analysis of the videos
From the observation of the video files (in .fbr) we obtained the following information: the time spent in every activity (reading the task assignment, opening Swordfish, translation of each of the segments, external resources consulted and difficulties using the CAT tool). All this data was annotated in a template (in Microsoft Excel) that we developed ad hoc, and that could have been automated and analysed.
We present in this section the annotation of the time that each of the participants spent on the translation task. Firstly, we annotated the following time information: the moment the participant opened Swordfish and the moment the participant closed Swordfish, which in theory would give us the total time that participants spent on the translation task. However, by analysing the videos, we realised that there were many different behaviour patterns in the beginning and the end of the task: some of the participants spent several minutes setting up the task and translating the first segment, while others started translating straight away; in the last part of the translation task similar differences were observed: some participants decided to go back to previous segments or/and spent time checking some segments49 while others after finishing the last segment, saved the file and closed the application. Therefore, we decided to study also the time spent between the opening of the second segment and the closing of the 35th segment, which ideally only contains the time spending in the translation activity itself, without the time spending in the first and last stages of the translation task.
In this section we present the different times studied, the complete data set of all the participants can be found in Appendix I.
5.7.3. Group A
The arithmetic mean of the total time spent on the translation task is 01:10:43, if we only take into account the time spent between the second and 35th segment the mean time is 01:00:04, participants spent a mean of 00:01:00 to open Swordfish, 00:06:28 to move to the second segment, 01:11:53 to close Swordfish. The standard deviation is high in the total time (00:28:25) and in the 2-35 time (00:28:47), as two of the
49 Some participants carried out some kind of minimal review, even though in the instructions given we had asked them not to do it. 230 participants spent almost two hours translating, while the rest of the translators spent around one hour.
Open Swordfish 2 Segment 35 Segment Close Swordfish Mean 00:01:10 Mean 00:06:28 Mean 01:06:32 Mean 01:11:53 Standard 00:00:08 Standard 00:01:07 Standard 00:11:22 Standard 00:10:50 Error Error Error Error Median 00:01:13 Median 00:06:55 Median 00:55:33 Median 01:00:05 Mode #N/A Mode #N/A Mode #N/A Mode #N/A Standard 00:00:21 Standard 00:02:58 Standard 00:30:05 Standard 00:28:41 Deviation Deviation Deviation Deviation Sample 5.81E-08 Sample 4.24E-06 Sample 0.000436 Sample 0.000397 Variance Variance Variance Variance Kurtosis -0.51075 Kurtosis -0.07235 Kurtosis -0.78819 Kurtosis -0.7618 Skewness -0.40854 Skewness 0.297183 Skewness 1.086729 Skewness 1.063506 Range 00:01:00 Range 00:08:31 Range 01:12:55 Range 01:13:05 Minimum 00:00:36 Minimum 00:02:50 Minimum 00:40:48 Minimum 00:43:24 Maximum 00:01:36 Maximum 00:11:21 Maximum 01:53:43 Maximum 01:56:29 Sum 00:08:07 Sum 00:45:17 Sum 07:45:45 Sum 08:23:11 Count 7 Count 7 Count 7 Count 7
Total Time 2-35 Time Time Spent on 1st Time Spent on Last Segment Segment Mean 01:10:43 Mean 01:00:04 Mean 00:05:19 Mean 00:05:21 Standard 00:10:44 Standard 00:10:53 Standard 00:01:02 Standard 00:02:21 Error Error Error Error Median 00:58:53 Median 00:48:11 Median 00:06:01 Median 00:03:01 Mode #N/A Mode #N/A Mode #N/A Mode #N/A Standard 00:28:25 Standard 00:28:47 Standard 00:02:44 Standard 00:06:14 Deviation Deviation Deviation Deviation Sample 0.000389 Sample 0.000399 Sample 3.58157E Sample 1.88E-05 Variance Variance Variance Variance Kurtosis -0.75614 Kurtosis -0.43315 Kurtosis -0.24127 Kurtosis 6.36160 Skewness 1.063835 Skewness 1.160729 Skewness 0.284104 Skewness 2.48077 Range 01:12:30 Range 01:12:40 Range 00:07:56 Range 00:18:10 Minimum 00:42:30 Minimum 00:35:22 Minimum 00:01:49 Minimum 00:01:07 Maximum 01:55:00 Maximum 01:48:02 Maximum 00:09:45 Maximum 00:19:17 Sum 08:15:04 Sum 07:00:28 Sum 00:37:10 Sum 00:37:26
231 Count 7 Count 7 Count 7 Count 7
5.7.4. Group B
The arithmetic mean of the total time spent on the translation task is 00:50:02, if we only take into account the time spent between the second and 35th segment the mean time is 00:40:22, participants spent a mean of 00:01:49 to open Swordfish, 00:06:42 to move to the second segment, 00:51:52 to close Swordfish. One of the participants exceeded the hour in total time, but the other three participants spent between 34 and 47 minutes on the task.
Open Swordfish 2 Segment 35 Segment Close Swordfish Mean 00:01:49 Mean 00:06:42 Mean 00:47:03 Mean 00:51:52 Standard 00:00:16 Standard 00:01:01 Standard 00:08:41 Standard 00:11:04 Error Error Error Error Median 00:01:39 Median 00:06:15 Median 00:41:15 Median 00:44:10 Mode #N/A Mode #N/A Mode #N/A Mode #N/A Standard 00:00:32 Standard 00:02:02 Standard 00:17:21 Standard 00:22:08 Deviation Deviation Deviation Deviation Sample 1.4E-07 Sample 1.99E-06 Sample 0.000145 Sample 0.00023 Variance Variance Variance Variance 6 Kurtosis 2.389026 Kurtosis -2.46399 Kurtosis 0.974693 Kurtosis 2.12251 Skewness 1.553363 Skewness 0.632684 Skewness 1.304151 Skewness 1.53637 Range 00:01:12 Range 00:04:12 Range 00:36:49 Range 00:48:05 Minimum 00:01:24 Minimum 00:05:02 Minimum 00:34:27 Minimum 00:35:31 Maximum 00:02:36 Maximum 00:09:14 Maximum 01:11:16 Maximum 01:23:36 Sum 00:07:18 Sum 00:26:46 Sum 03:08:12 Sum 03:27:26 Count 4 Count 4 Count 4 Count 4
Total Time 2-35 Time Time Spent on 1st Time Spent on Last Segment Segment Mean 00:50:02 Mean 00:40:22 Mean 00:04:52 Mean 00:04:48 Standard 00:11:08 Standard 00:08:59 Standard 00:01:00 Standard 00:02:34 Error Error Error Error Median 00:42:41 Median 00:34:04 Median 00:04:15 Median 00:02:56 Mode #N/A Mode #N/A Mode #N/A Mode #N/A Standard 00:22:15 Standard 00:17:58 Standard 00:02:00 Standard 00:05:07
232 Deviation Deviation Deviation Deviation Sample 0.000239 Sample 0.000156 Sample 1.93366E Sample 1.27E-05 Variance Variance Variance 6 Variance Kurtosis 2.0028 Kurtosis 2.275489 Kurtosis 1.650386 Kurtosis 3.20798 5 Skewness 1.49548 Skewness 1.567002 Skewness 1.405034 Skewness 1.75573 1 Range 00:48:55 Range 00:39:10 Range 00:04:26 Range 00:11:19 Minimum 00:32:55 Minimum 00:27:04 Minimum 00:03:16 Minimum 00:01:01 Maximum 01:21:50 Maximum 01:06:14 Maximum 00:07:42 Maximum 00:12:20 Sum 03:20:08 Sum 02:41:26 Sum 00:19:28 Sum 00:19:14 Count 4 Count 4 Count 4 Count 4
5.7.5. Group C
The arithmetic mean of the total time spent on the translation task is 01:01:16, if we only take into account the time spent between the second and 35th segment the mean time is 00:49:27, participants spent a mean of 00:02:13 to open Swordfish, 00:07:34 to move to the second segment, 01:03:29 to close Swordfish. One of the participants exceeded the hour in total time, but the other three participants spent between 34 and 47 minutes on the task. The standard deviation in total time is high (00:29:39) and it is positively skewed (1.7) as the frequency is higher in the lower values.
Open Swordfish 2 Segment 35 Segment Close Swordfish Mean 00:02:13 Mean 00:07:34 Mean 00:57:01 Mean 01:03:29 Standard 00:00:17 Standard 00:01:05 Standard 00:06:20 Standard 00:06:39 Error Error Error Error Median 00:02:06 Median 00:07:50 Median 00:50:36 Median 00:56:03 Mode #N/A Mode #N/A Mode #N/A Mode #N/A Standard 00:01:15 Standard 00:04:49 Standard 00:28:20 Standard 00:29:45 Deviation Deviation Deviation Deviation Sample 7.56E-07 Sample 1.12E-05 Sample 0.000387 Sample 0.00042 Variance Variance Variance Variance Kurtosis 0.648161 Kurtosis 6.767225 Kurtosis 6.543688 Kurtosis 5.33423 Skewness 0.457313 Skewness 1.919733 Skewness 2.124634 Skewness 1.84409 Range 00:04:43 Range 00:23:41 Range 02:08:05 Range 02:14:22
233 Minimum 00:00:07 Minimum 00:00:14 Minimum 00:25:25 Minimum 00:26:49 Maximum 00:04:50 Maximum 00:23:55 Maximum 02:33:30 Maximum 02:41:11 Sum 00:44:22 Sum 02:31:19 Sum 19:00:28 Sum 21:09:38 Count 20 Count 20 Count 20 Count 20
Total Time 2-35 Time Time Spent on 1st Time Spent on Last Segment Segment Mean 01:01:16 Mean 00:49:27 Mean 00:05:21 Mean 00:06:27 Standard 00:06:38 Standard 00:05:28 Standard 00:00:57 Standard 00:01:47 Error Error Error Error Median 00:53:30 Median 00:45:45 Median 00:04:28 Median 00:05:10 Mode #N/A Mode #N/A Mode #N/A Mode #N/A Standard 00:29:39 Standard 00:24:29 Standard 00:04:14 Standard 00:07:59 Deviation Deviation Deviation Deviation Sample 0.000424 Sample 0.000289 Sample 8.6663E Sample 3.07E-05 Variance Variance Variance Variance Kurtosis 4.958748 Kurtosis 5.191796 Kurtosis 7.8935214 Kurtosis 14.9945 09 3 Skewness 1.786027 Skewness 1.865258 Skewness 2.3678104 Skewness 3.65255 6 Range 02:10:50 Range 01:46:55 Range 00:20:16 Range 00:37:49 Minimum 00:26:42 Minimum 00:22:40 Minimum 00:00:00 Minimum 00:00:32 Maximum 02:37:32 Maximum 02:09:35 Maximum 00:20:16 Maximum 00:38:21 Sum 20:25:16 Sum 16:29:09 Sum 01:46:57 Sum 02:09:10 Count 20 Count 20 Count 20 Count 20
5.7.6. Overall
The arithmetic mean of the total time spent on the translation task is 01:01:57, if we only take into account the time spent between the second and 35th segment the mean time is 00:50:41, participants spent a mean of 00:01:56 to open Swordfish, 00:07:12 to move to the second segment, 01:03:53 to close Swordfish.
Open Swordfish 2 Segment 35 Segment Close Swordfish Mean 00:01:56 Mean 00:07:12 Mean 00:57:53 Mean 01:03:53 Standard 00:00:12 Standard 00:00:45 Standard 00:04:56 Standard 00:05:06 Error Error Error Error
234 Median 00:01:46 Median 00:07:26 Median 00:50:49 Median 00:56:52 Mode 00:01:46 Mode #N/A Mode #N/A Mode #N/A Standard 00:01:07 Standard 00:04:08 Standard 00:27:27 Standard 00:28:26 Deviation Deviation Deviation Deviation Sample 5.99E-07 Sample 8.24E-06 Sample 0.000363 Sample 0.00039 Variance Variance Variance Variance Kurtosis 1.460665 Kurtosis 8.269304 Kurtosis 4.054246 Kurtosis 3.423136 Skewness 0.95069 Skewness 2.036263 Skewness 1.784293 Skewness 1.558018 Range 00:04:43 Range 00:23:41 Range 02:08:05 Range 02:14:22 Minimum 00:00:07 Minimum 00:00:14 Minimum 00:25:25 Minimum 00:26:49 Maximum 00:04:50 Maximum 00:23:55 Maximum 02:33:30 Maximum 02:41:11 Sum 00:59:47 Sum 03:43:22 Sum 05:54:25 Sum 09:00:15 Count 31 Count 31 Count 31 Count 31
Total Time 2-35 Time Time Spent on 1st Time Spent on Last Segment Segment Mean 01:01:57 Mean 00:50:41 Mean 00:05:17 Mean 00:06:00 Standard 00:05:06 Standard 00:04:27 Standard 00:00:39 Standard 00:01:17 Error Error Error Error Median 00:53:43 Median 00:48:01 Median 00:04:29 Median 00:03:55 Mode #N/A Mode #N/A Mode #N/A Mode 00:01:01 Standard 00:28:22 Standard 00:24:46 Standard 00:03:39 Standard 00:07:09 Deviation Deviation Deviation Deviation Sample 0.000388 Sample 0.000296 Sample 6.41088E Sample 2.47E-05 Variance Variance Variance Variance Kurtosis 3.116947 Kurtosis 2.766247 Kurtosis 8.8097920 Kurtosis 14.32945 Skewness 1.502837 Skewness 1.571352 Skewness 2.3434261 Skewness 3.466937 Range 02:10:50 Range 01:46:55 Range 00:20:16 Range 00:37:49 Minimum 00:26:42 Minimum 00:22:40 Minimum 00:00:00 Minimum 00:00:32 Maximum 02:37:32 Maximum 02:09:35 Maximum 00:20:16 Maximum 00:38:21 Sum 08:00:28 Sum 02:11:03 Sum 02:43:35 Sum 03:05:50 Count 31 Count 31 Count 31 Count 31
235 If we compare the arithmetic means of the three groups we obtain the following table:
Group A Group B Group C Open Swordfish 00:01:10 00:01:49 00:02:13 2 Segment 00:06:28 00:06:42 00:07:34 35 Segment 01:06:32 00:47:03 00:57:01 Close Swordfish 01:11:53 00:51:52 01:03:29 Total Time 01:10:43 00:50:02 01:01:16 2-35 Time 01:00:04 00:40:22 00:49:27 Time Spent on 1st Segment 00:05:19 00:04:52 00:05:21 Time Spent on Last Segment 00:05:21 00:04:48 00:06:27
In terms of opening Swordfish, Group A (00:01:10) was significantly faster than Groups B (00:01:49) and C (00:02:13). In the time spent on the first segments there is not a significant difference between the three groups (A with 00:05:19, B with 00:04:52 and C 00:05:21). In terms of total time Group B is faster (00:50:02) than group C (01:01:16), and group A (01:10:43), group A is clearly the slower group; similar results were obtained in the 2-35 time (group A with 01:00:04, group B with 00:40:22 and group C with 00:49:27). The differences of time obtained do not depend on the time spent on the first and last segment, as they received similar times in all groups. You can see in the figure below that group B was faster in most of the sections studied and Group A was considerably slower than the other two groups.
Time all groups 01:26:24 01:12:00 00:57:36 00:43:12 00:28:48 Group A 00:14:24 Group B 00:00:00 Group C
Figure 72. Time all groups 236
Il me fallut longtemps pour comprendre d’où il venait. Le petit prince, qui me posait beaucoup de questions, ne semblait jamais entendre les miennes. Ce sont des mots prononcés par hasard qui, peu à peu, m’ont tout révélé.
(de Saint-Exupéry 1946 p.15)
237 Chapter 6 –Analysis & Interpretation
In this chapter we present the analysis and interpretation of the data obtained in the experiments, which was presented in the previous chapter. We start by presenting a summary of the demographic data obtained from the first questionnaire. Then we present an analysis of the data divided in two main axes: quality of the data (where we discuss the data obtained from the LISA QA model review and we correlate it with other variables); and time spent on the translation (we also correlate results from time with other measured variables). Finally we analyse the attitude of the participants of group C towards the metadata they received.
The first stage to analyse the data was coding it in spreadsheets (using Microsoft Excel). A descriptive analysis was used with all the quantitative data and it was presented in the previous chapter, we focused on obtaining average information (arithmetic mean, the median, the mode) and variability information (range, standard deviation and the skewness of the data distribution). You can find a table with a detailed descriptive analysis of each of the variables presented in the previous chapter.
6.1. Summary and analysis of the demographic data
The questionnaire which obtained demographic data from the participants was the same for all the groups, and it was fulfilled before they carried out the translation task; therefore all the participants were under the same circumstances and their answers could be comparable between groups. Three types of information were obtained from the different questions: personal data (age and gender), translation experience and experience with CAT tools. Participants, who were distributed randomly, shared the similar characteristics, and we did not find any significant difference in average in any of the measured parameters.
All this information allowed us to have a defined prototype profile of our participants: a 32 year old female translation professional, with an average of 6 years of experience, whose main language combination is EN-ES, who also devotes her time to other
238 translated-related tasks such as reviewing and who spends an average of 5 hours translating per day. She uses CAT tools and TMs in all her translation activities and has heard of Swordfish but has never used it before.
6.1.1. Personal data
The average age of all the participants was 32 years old and there were not significant differences between the groups (30.5 Group A, 32.6 Group B and 32.4 Group C). In terms of gender almost a fifth of the participants were males: 7 of the participants were males and 25 females, and one participant did not provide that information. Group A had one male and six females, Group B had 2 males and four females, and Group C had four males and fifteen females (one participant did not provide the gender information).
6.1.2. Translation Experience
Four pieces of information were obtained from this set of questions: the current position of the translator, the numbers of years of professional experience, the number of hours they spent every day translating, other translation-related activities that they might also undertake, and their main and secondary language combinations.
We only included in our final data set those participants who had at least two years of professional translation experience. In terms of position we had 24 freelance translators and 9 in-house translators. The average years of professional experience among all the participants is 6.8, there is small variation between the groups in this aspect: Group A (5) Group B (8.5) and Group C (7.05). Our participants spent an average of 5.8 hours translating per day, in average participants from Group C spent more time translating per day (6.8 hours) than participants from Group A (4.6) and B (4.1). Translators did not only spend their working time purely translating, 25 of the participants have declared to undertake other translation related activities such as proofreading, reviewing (with fifteen participants), postediting (with six participants) and coordination and management (with three participants) between others.
The main language combination for 31 of the 33 participants was EN-ES, followed by FR-ES and DE-ES with four participants each. When asked about secondary language combination, FR-ES was clearly the most common one with twelve participants, followed by ES-EN with four participants and DE-ES with three amongst others.
239 6.1.3. Experience with CAT tools
With these set of questions we obtained four pieces of information: experience with CAT tools, experience in the use of TMs, experience with Swordfish (the tool used in the experiments), and the familiarity of the participants with the XLIFF standard.
Almost the same results were obtained to the questions about the experience with CAT tools and TMs, the majority of the participants use them with all their translation assignments, in a six point scale (being 1 “I use them in all my translation activities” and 550 “I have never tried them”), the average of all the participants was 1.5, with little difference between the three groups: Group A with 1.5, Group B 1.5, and Group 1.7.
In the use of Swordfish we could say that most of the participants had no experience with the tool, two thirds of the participants (20) have never tried it before and four participants had never heard about it before. In a six point scale (being 1 "I use it in on a daily basis and 551 "I have never heard of it") the average score of all the participants was 3.7, with Group A being slightly higher (4), than Group B (3.8) and Group C (3.6). Answers to the familiarity with XLIFF was spread among the participants, in a six points scale (being 1 "total unfamiliar" and 5 "I am very familiar with it"), the average was 3.2. Group B (4.1) had a higher average score than Groups A (3) and C (3).
6.1.4. Correlation between the different variables measured
The scores present in the table 27 represent the correlation coefficient between the variables above presented. The correlation coefficient can have a score between -1 and 1. There is a very strong relationship between two variables if the score is between 0.8 and 1 (or -0.8 and -1). There is a strong relationship between two variables if the score is between 0.6 and 0.8 (or -0.6 and -0.8). There is a moderate relationship between two variables if the score is between 0.4 and 0.6 (or -0.4 and -0.6). There is a weak relationship between two variables if the score is between 0.2 and 0.4 (or -0.2 and -0.4). There is a weak or not relationship between two variables if the score is between 0 and 0.2 (or 0 and -0.2). If the score has a positive value the relationship is direct (for
50 Number 6 was reserved for “other” with a textbox where participants could introduce their own answer. 51 Ibid. 240 example, the older the person, the more experience as a translation has), if the score has a negative the relationship is indirect (Salkind 2009 pp.113–129).
We should also state that a correlation relationship does not imply necessarily any cause-effect relation; it simply proves that there is a relationship between the variables. The cause-effect relation can only be proven by the changes in the results of the groups due to the manipulation of the independent variable.
In our initial analysis we have highlighted in bold those correlation coefficient scores which were over 0.4 (or -0.4), which means that they have a moderate, strong or very strong relation, in other words, there is a significant relationship between the two variables. We have significant relationships between the following variables: Gender- Age, Years of experience-Age, Gender-CAT tool/TM 52 usage, Gender-XLIFF, and XLIFF-CAT tool/TM usage. It is also remarkable to state that we did not find significant relationships between the variables age and years of experience and any of the variables that measured their experience with translation tools (CAT tools/TMs, Swordfish or XLIFF), and also that there is not relationship between their current working position (freelance or in-house) with any of the other variables.
day Age usage Years XLIFF Gender Position Swordfish Hours per CAT tools TM UsageTM Experience Age 1.00 Gender -0.57 1.00 Position -0.25 0.02 1.00 Experience Years 0.84 -0.14 -0.18 1.00 Hours per day 0.29 -0.24 -0.06 0.22 1.00 CAT tools usage 0.09 -0.56 0.05 0.05 -0.35 1.00 TM Usage 0.09 -0.56 0.03 0.03 -0.33 1.00 1.00 Swordfish 0.14 0.11 -0.37 0.11 -0.21 -0.28 -0.24 1.00 XLIFF -0.09 0.42 -0.07 0.10 0.24 -0.43 -0.41 0.39 1.00 Table 27. Correlation between the demographic variables
52 As these two variables CAT tool and TM usage received almost exact answers they were presented together in our analysis. There was a perfect relationship between them with a score of 1. 241 6.1.4.1. Age-Gender
There was a moderate indirect relationship between age and gender (-0.57), as our male participants were generally older than females. In the following scatterplot, in the vertical axe, number one represents males and number two females.
Age-Gender 2.5
2
1.5
Gender 1
0.5
0 0 10 20 30 40 50 60 Age
6.1.4.2. Gender-CAT tool/TM usage
There was a moderate indirect relationship between gender and CAT tool/TM usage (- 0.56), which means that men tended to make a lower use of CAT tool/TMs than females. In the following scatterplot, in the horizontal axe, number one represents males and number two females.
Gender-CAT/TM usage 4.5 4 3.5
3 2.5 2 1.5 CAT/TMusage 1 0.5 0 0 0.5 1 1.5 2 2.5 Gender
242 6.1.4.3. Gender-XLIFF
There was a moderate direct relationship between gender and the familiarity with XLIFF (-0.42), which means that men tended to be more familiar with XLIFF than women. In the following scatterplot, in the horizontal axe, number one represents males and number two females. The familiarity with the XLIFF was measured with a 6 points scale, number 1 being the lowest point (total unfamiliar) and number 6 being the highest point (very familiar).
Gender-XLIFF 7 6 5
4
XLIFF 3 2 1 0 0 0.5 1 1.5 2 2.5 Gender
6.1.4.4. Years of experience-Age
There was a very strong direct relationship between the age of the participants and the years of professional experience they had. Which means that the older the participant the more professional experience he/she tended to have.
243 Years of Experience-Age 20 18
16 14 12 10 8 6 Years of Years Experience 4 2 0 0 10 20 30 40 50 60 Age
6.1.4.5. XLIFF-CAT tools/TM Usage
There was a moderate direct relationship between gender and the familiarity with XLIFF (-0.43/-0.41), which means that the lower the use of CAT tools/TMs men was, the lower the familiarity with XLIFF tended to be. The familiarity with the XLIFF was measured with a 6 points scale, number 1 being the lowest point (total unfamiliar) and number 6 being the highest point (very familiar). In the CAT tools/TM axis on the contrary, lower numbers mean higher use of CAT tools/TMs.
CAT tools/TM -XLIFF 7 6 5
4
XLIFF 3 2 1 0 0 1 2 3 4 5 CAT /TM usage
244 6.2. Quality
As seen in the previous chapter, all the translation files from our participants were reviewed using the LISA QA Model, from which we obtained a numeric score.
In average participants from group A (the group without metadata) clearly obtained worse results (-79), than groups B (-39) and C (-42). These results confirm that the use of a translation memory53 (with or without metadata) improves the quality of a technical text.
The average quality of group B was slightly better than group A, but the difference was not significant enough to conclude that the metadata had a negative impact in the quality of the translation. We could conclude by saying that we did not find any significant difference in quality due to the absence or not of the metadata between those two groups.
6.2.1. Correlation with other variables
In order to see, if there was any significant relationships between the quality scores in each of the groups and all the other variables measured, we calculated the correlation coefficient score. We present the results of that calculation in each of the groups:
6.2.1.1. Group A
Quality - Demographic data
We firstly compared quality with the data obtained from the demographic data, and we found a moderate indirect relationship (-56) between the quality and gender (the only male in this group obtained one of the highest scores in the LISA QA analysis).
53 Having also into account that that translation memory comes from an official source and with a range of matches between 80 and 90%, which was our case. 245
day Age usage Years XLIFF Gender Position Swordfish Hours per Hours tools CAT TM Usage TM Experience Experience
Quality 0.34 -0.56 #DIV/054 0.27 -0.02 -0.52 -0.52 -0.36 0.75 !
Quality-Difficulty and Experience with Microsoft Office products
We compared quality with data obtained from the task specific questionnaire about the difficulty of the text and the experience with Microsoft Office products. We found a strong relationship (0.65) between the familiarity with the topic of the text: the more familiar a participant said he/she was with the topic of the text, the higher the quality of their translation. The experience with some Microsoft Office products was measured with a 6 points scale, number 1 being the lowest point (I have never worked with it) and number 6 being the highest point (I am an advanced user). In the case of Access (0.68) and Excel (0.42) there was a direct relationship: the more experience a participant said he/she had with the program, the higher the quality of their translation. In the case of Outlook there was a strong indirect relationship (-0.65): the more experience a participant said he/she had with the program, the lower the quality of his/her translation.
Excel Topic Word Access Outlook Difficulty Assistance PowerPoint
Quality 0.65 0.42 #DIV/0! -0.21 0.68 -0.65 0.39 0.04
Quality-Keylog
We found a moderate indirect relationship (-0.51) with the number of single keys: Those participants who typed more single keys (numbers and letters) tended to obtain a lower quality score.
54 The score for position could not be calculated as this value in group A did not have any variance, all participants were freelancers. 246
eys Keys Vkeys Multiple Multiple Total Keys Total SingleVkeys TotalVirtualk Quality -0.03 0.21 -0.51 -0.36 -0.23
Quality-Time
We found a strong relationship (0.65) with the time that participants spent on the last segment. The time spent on last segment measures the time spent after the participant opens the last segment until the moment he/she closes Swordfish. There is a direct relationship: the more time a participant spent in the last part of the translation task, the higher quality it tended to get. This can be explained by the fact that some participants reviewed their work after finishing translating all the segments, therefore they improve it.
Total 2-35 Time Time Time Time 1st S. Last S. Quality2 -0.22 -0.35 -0.13 0.65
6.2.1.2. Group B
Quality - Demographic data
We found moderate direct relationships with the variables age (0.47) and position (0.43). In the case of age, older participants tended to obtain better quality results, and in the case of position, participants who were working as in-house translators tended to obtain better quality results.
day Age usage Years XLIFF Gender Position Swordfish Hours per Hours tools CAT TM Usage TM Experience Experience Quality 0.47 -0.17 0.43 0.20 0.17 0.04 0.04 0.09 0.49
Quality-Difficulty and Experience with Microsoft Office products 247 We found a very strong direct relationship (0.93) with the familiarity with the topic of text: the more familiar a participant said he/she was with the topic of the text, the higher the quality of their translation. There was strong direct relationship (0.72) with the experience using Microsoft Excel: the more experience a participant said he/she has with the program, the higher the quality of their translation. And in the case of Word and PowerPoint there was a moderate indirect relationship (-0.46 and -0.49), which means that there was a tendency that the more experience a participant said he/she has with the program, the lower the quality of their translation
e nt Excel Topic Word Access Outlook Assistanc Difficulty PowerPoi Quality 0.93 0.72 -0.46 -0.49 0.27 -0.24 0.09 0.01
Quality-Keylog
We found moderate direct relationships with the number of single keys (0.51) and with the total number of Virtual Keys: Those participants who typed more single keys (numbers and letters) tended to obtain better quality, the same applies to the total number of Virtual Keys.
Keys Total Keys Total SingleVkeys Multiple Vkeys Multiple TotalVirtualkeys Quality -0.35 0.27 0.51 0.46 -0.29
Quality-Time
We found moderate indirect relationships with the following time measures: total time (-0.59), 2-35 time (-0.57) and time spent on the last segment (-0.45), which means that participant who spent less time tended to have a better quality.
248
35
S. - 2 Time Time Time Total Last S. Time 1st 1st Time Quality -0.59 -0.57 -0.30 -0.45
6.2.1.3. Group C
Quality - Demographic data
We found a moderate direct relationship with the variable hours per day (0.44) and a moderate indirect one with the experience with Swordfish (-0.41). In the case of hours per day, the participants who stated to work more hours tended to obtain better quality results, and in the case of Swordfish, participants who had less experience with Swordfish tended to obtain worse quality results.
day Age usage Years XLIFF Gender Position Swordfish Hours per Hours tools CAT TM Usage TM Experience Experience
Quality 0.08 -0.10 -0.03 -0.07 0.44 0.05 0.07 -0.41 0.13
Quality-Difficulty and Experience with Microsoft Office products
We found moderate direct relationship (0.42) with the familiarity with the topic of text: the more familiar a participant said he/she was with the topic of the text, the higher the quality of their translation. There was a moderate indirect relationship with the experience using Word, which means that there was a tendency that the more experience a participant said he/she has with the program, the lower the quality of their translation. There was a strong relationship (-0.67) with the number of times that participants asked for help: participants who asked more time for help tended to obtain worse quality results.
249
rd
Excel Topic Wo Access Outlook Difficulty Assistance PowerPoint Quality 0.42 0.12 -0.40 -0.28 -0.18 -0.20 -0.07 -0.67
Quality-Keylog
We found a moderate indirect relationship (-0.54) with the number of single keys: Those participants who typed more single keys (numbers and letters) tended to obtain lower quality results.
keys Keys Vkeys Multiple Multiple Total Keys Total SingleVkeys TotalVirtual Quality -0.54 0.07 -0.01 0.01 -0.30
Quality-Time
We did not find any relation between time and quality.
35 Time - 2 Total Time Time 1st S. 1st Time Time Last S. Time Last Quality -0.21 -0.22 -0.26 0.07
250 6.3. Time
Time was the second measurement that allowed us to determine if the manipulation of the variable had any impact on the translators’ behaviour. By observing the videos, we annotated different time stamps: opening of Swordfish, opening of the second segment, opening of the last segment and closing of Swordfish.
Taking into account the total time (with very similar scores to 2-35 time), in average participants from group A (01:10:43) was clearly slower than group B (00:50:02) and C (01:01:16). Similarly to the results obtained in the quality analysis, we can conclude that the use of the translation memory (with or without provenance metadata) meant an improvement in the time that translators spent on the translation task. In parallel with the quality results, we could see that Group C (the one with metadata), obtained worse results than Group B, but the difference was again not significantly enough to allow us to extract any further conclusion.
The complete set of measurements obtained from each of the groups was presented in Chapter 5.
6.3.1. Correlation with other variables
In order to explore if there were any significant relationships between the time scores in each of the groups and all the other variables measured, we calculated the correlation coefficient score. The correlation with the variable quality was presented in the previous section. We present the results of that calculation in each of the groups:
6.3.1.1. Group A
Time - Demographic data
We found indirect strong relationships with the working hours per day: participants who worked more hours tended to be faster (in total and 2-35 time). A strong direct relationship was found with CAT tools/TM usage: participants who used more CAT tools/TMs tended to be faster (in total and 2-35 time). A moderate direct relationship was found with Swordfish: those who had more experience with the tool tended to be faster and those who were less familiar with XLIFF tended to be slower (in 2-35 time and in the last segment).
251 Age Gender Experience Hours CAT TM Swordfish XLIFF Years per day tools Usage usage Total Time -0.07 0.18 -0.27 -0.77 0.73 0.73 0.51 -0.34 2-35 Time -0.11 0.18 -0.29 -0.64 0.77 0.77 0.46 -0.46 Time 1st S. -0.08 -0.14 0.19 -0.26 0.23 0.23 0.32 -0.12 Time Last S. 0.21 0.06 0.01 -0.32 -0.37 -0.37 0.04 0.65
Time-Difficulty and Experience with Microsoft Office products
We found strong indirect relationships with the familiarity with the topic of text: the more familiar a participant said he/she was with the topic of the text, the less time they spent in total time (-0.89), 2-35 time (-0.89), however, in the time spent on the last segment we found a direct moderate relation (0.44). We found as well strong relationships between the linguistic difficulty and total time (-0.81), 2-35 time (-0.81) and time spent on the first segment (-0.67): the more difficult the participants said the task was, the more time they tended to spend. There were indirect relationships with Excel and total and 2-35 time, and with Outlook and time spent on the last segment, which means that the more experience a participant said he/she had with the program, the less time they tended to spend. On the other hand, there were direct relationships with Excel and Access and time spent on the last segment, and Outlook and 2-35 time, which means that the more experience a participant said he/she had with the program, the more time they tended to spend on those sections.
Topic Excel PowerPoint Access Outlook Difficulty Assistance Total Time -0.84 -0.66 0.03 -0.41 0.26 -0.81 0.38 2-35 Time -0.89 -0.75 0.02 -0.51 0.42 -0.81 0.43 Time 1st S. -0.34 0.05 0.32 -0.28 0.33 -0.67 0.04 Time Last S. 0.44 0.45 -0.38 0.63 -0.87 0.34 -0.27
252 Time-Keylog
We found indirect relationships with the number of multiple virtual keys and the 2-35 time, and time spent on the first segment, and also between the total number of virtual keys and total time, 2-35 time and time spent on the first segment. An indirect relationship in those cases meant that participants who typed more keys tended to spend less time on those sections. Direct relationships were found between the time spent on the last segment and keys and multiple keys.
Keys Multiple SingleVkeys TotalVirtualkeys Vkeys Total Time 0.08 -0.37 -0.22 -0.41 2-35 Time -0.03 -0.40 -0.19 -0.40 Time 1st S. 0.25 -0.63 -0.09 -0.44 Time Last S. 0.41 0.44 -0.06 0.19
6.3.1.2. Group B
Time - Demographic data
We found significant relationships with most of the variables measured in this group:
Age. We found indirect moderate relationships between the age and the total time, 2-35 time and time spent on the last segment. A direct relationship was found with the time spent on the first segment. The direct relationship meant that the older the participant, the less time he/she tended to spend on those sections.
Gender. We found direct moderate relationships between the gender and the total time, 2-35 time and time spent on the first segment, which meant that men tended to be faster than women.
Position. We found indirect relationships with total time, 2-35 time and time spent on the last segment, and a direct relationship with the time spent on the first segment. Indirect relationships meant that in-house translators tended to be faster than freelancers, direct relationships meant the opposite.
Experience years. We found indirect relationships with 2-35 time and time spent on the last segment, and a direct relationship with the time spent on the first segment. Indirect 253 relationships meant that the more years of translation experience, the faster he/she tended to be on those sections.
Hours per day. We found moderate direct relationships with total time, 2-35 time and time spent on the last segment, which means that the more hours they spent translating per day, the more time they tended to spend on those sections.
CAT tool/TM usage. We found direct relationships with total time, 2-35 time and time spent on the last segment. An indirect relationship was found with the time spent on the first segment. Direct relationships meant that those who had more experience using CAT tools/TMs tended to spend less time in that last section.
Swordfish. We found indirect relationships with total time, 2-35 time and time spent on the first segment, which meant that those who had less experience with the tool tended to spend less time in those sections.
XLIFF. We found indirect relationships with total time, 2-35 time and time spent on the last segment. A direct relationship was found with the time spent on the first segment. Indirect relationships meant that those who were more familiar with XLIFF tended to be faster on those sections.
day Age usage Years XLIFF Gender Position Swordfish Hours per Hours tools CAT TM Usage TM Experience Experience
Total Time -0.49 0.40 -0.48 -0.38 0.40 0.95 0.95 -0.40 -0.73 2-35 Time -0.51 0.41 -0.48 -0.40 0.42 0.96 0.96 -0.41 -0.74 Time 1st S. 0.51 0.40 0.81 0.47 -0.39 -0.53 -0.53 -0.40 0.56 Time Last S. -0.53 0.16 -0.71 -0.42 0.41 0.98 0.98 -0.16 -0.77
Time-Difficulty and Experience with Microsoft Office products
We found indirect relationships with the familiarity with the topic of text and total time, 2-35 time and time spent on the last segment: the more difficult the participants said the task was, the more time they tended to spend on those sections. We found as well moderate indirect relationships between the linguistic difficulty and total time, 2-35
254 time and time spent on the first segment: the more difficult the participants said the task was, the more time they tended to spend on those sections.
We found indirect relationships between most of the timeframes studied and the different Microsoft Office products. Indirect relationships, in these cases meant that the more experience a participant said he/she has with the program, the less time they tended to spend on those sections.
Topic Excel Word PowerPoint Access Outlook Difficulty Assistance Total Time -0.79 -0.83 -0.40 -0.40 -0.92 -0.92 -0.40 1.00 2-35 Time -0.78 -0.84 -0.41 -0.41 -0.92 -0.92 -0.41 1.00 Time 1st S. -0.36 0.08 -0.40 -0.40 0.26 0.26 -0.40 -0.31 Time Last S. -0.57 -0.70 -0.16 -0.16 -0.84 -0.84 -0.16 0.95
Time-Keylog
We found significant indirect and direct relationships with all the variables measured in this group. An indirect relationship meant that participants who typed more keys tended to spend less time on those sections. On the other hand, a direct relationship meant that participants who typed more keys tended to spend more time on those sections. In the following table, all the correlation scores are indicated:
Keys Multiple SingleVkeys TotalVirtualkeys Total Vkeys Keys Total Time 0.96 -0.88 -0.67 -0.75 0.94 2-35 Time 0.96 -0.88 -0.67 -0.75 0.95 Time 1st S. -0.53 0.71 0.67 0.70 -0.42 Time Last S. 0.98 -0.98 -0.83 -0.90 0.92
6.3.1.3. Group C
Time - Demographic data
We found a moderate relationship (-0.41) between age and the time spent on the first segment, which means that the older the participant, the less time he/she tended to spend on the first segment. A moderate direct relationship was found with TM usage
255 and time spent on the last segment: participants who used more TMs tended to spend more time on the last section of the task. A moderate direct relationship was found with Swordfish: those who had more experience with the tool tended to spend less time in that last section.
day Age usage Years XLIFF Gender Position Swordfish Hours per Hours tools CAT TM Usage TM Experience Experience
Total Time -0.29 0.37 0.05 0.30 0.10 -0.18 -0.15 0.07 0.36 2-35 Time -0.23 0.34 0.11 0.34 0.07 -0.10 -0.10 -0.04 0.32 Time 1st S. -0.41 0.27 -0.05 0.28 0.01 -0.08 -0.14 -0.13 -0.08 Time Last S. -0.33 0.23 -0.14 -0.12 0.17 -0.31 -0.44 0.41 0.27
Time-Difficulty and Experience with Microsoft Office products
We found moderate indirect relationships between the experience with Excel and total time, 2-35 time and time spent on the first segment, which means that the more experience a participant said he/she had with the program, the less time they tended to spend on those sections.
t Excel Topic Word Access Outlook Difficulty Assistance PowerPoin Total Time -0.15 -0.10 -0.04 -0.18 -0.45 -0.04 -0.26 0.11 2-35 Time -0.20 -0.14 0.04 -0.15 -0.55 0.04 -0.34 0.11 Time 1st S. -0.17 -0.04 0.12 -0.16 -0.43 0.20 -0.16 0.17 Time Last S. 0.19 0.05 -0.31 -0.15 0.23 -0.36 0.13 -0.03
Time-Keylog
We found direct relationships with the number of keys and total time, 2-35 time, and time spent on the first segment; between the number of single virtual keys and the total number of virtual keys and time spent on the last segment; and between the total number of keys and total time, 2-35 time, and time spent on the first segment. A direct
256 relationship in those cases meant that participants who typed more keys tended to spend more time on those sections.
Keys Multiple SingleVkeys TotalVirtualkeys Total Vkeys Keys Total Time 0.44 0.22 0.31 0.37 0.51 2-35 Time 0.45 0.21 0.17 0.23 0.41 Time 1st S. 0.44 0.26 -0.16 -0.08 0.19 Time Last S. -0.02 0.00 0.72 0.71 0.49
257 6.4. Participants attitudes and opinions towards the metadata
In the task specific questionnaire that participants from group C received, there were several questions that aimed to obtain their impressions and opinions in the use of the provenance metadata that they received. The complete results from those questions can be found in the previous chapter. We present now an analysis and interpretation of the data obtained from them.
In terms of useful and consulted metadata, the name of the translator contact-name, the date and the target language were indicated as the most consulted and useful ones. Those two concepts: consultation and usefulness were indicated separately, and we could see a drop in numbers in the second one: participants indicated that more items were consulted (18 for contact-name, 11 date and 11 for target-language) than considered useful (13 for contact-name, 4 for date and 8 for target-language).When asked the opposite questions: less useful and not consulted metadata participants answered to the not consulted metadata firmly (13 for original, 12 for category, 4 for target language, 6 for date and 3 for contact name), however fewer results were received in the less useful metadata (6 for original, 5 for category, 4 for target language, 5 for date and 3 for contact name).
Then, we directly asked them to explain to us in one sentence how the metadata had influenced their work. Nineteen out of the twenty participants answered this question. By analysing their opinions, we could say that thirteen of the participants expressed they were helpful, whereas five participants clearly stated that they did not find them useful. Ten participants stated that they used them to make the decision on whether to accept or not the suggested translation (for example, GC17 stated “I’ve found it really useful when deciding whether or not to accept the proposed translation”). The related concepts of trust and reliability were found in six of the participants’ answers (for example, GC14 stated “To know if I can trust the proposed translation”. These two concepts were also included in the answer of one of the participants to the question “Would you suggest any other metadata item?”, where he/she proposes to introduce a “trustworthiness degree” to the metadata proposed to the translator, this new item could represent a quick and overall indication of the origin of the translation memory which
258 would help the translator to better decide whether to use or discard the translation suggestion that is being presented by the tool. GC4 and GC7 in the same question, also went in the same direction indicating that they would like to have information about the final or official state of the translation suggestion, which could help them to decide whether to trust it or not. GC10 in the additional comments section also indicated this idea of the degree of reliability when he/she stated “Metadata can be very useful associated to previous instructions from the customer. E.g. "all TMUs with code XXX are considered extremely reliable"”
Later in the questionnaire we asked a similar question (to improve our internal validity of our questionnaire by comparing if similar results were obtained from both). In this case we asked them if the metadata had influenced them to do a better job, twelve of the participants answered positively to it, which can be directly compared to the thirteen participants who had given a positive example when asked to express how the metadata had influenced their work. A second question followed to those who had answered “yes”. We asked them to give us an example. Five participants stated that they preferred those translations done by “Antonio García” (the person that we indicated as official translator in the translation assigment), three participants indicated that it helped him/her to maintain terminology consistency with the official translation. Another participant said that observing the metadata helped her to decide if that particular translation applied to text he/she was translating. And another translator stated that it helped him/her to contextualize the different translation suggestions.
When asked about the degree distraction, the majority of the participants stayed in the lower levels (9 of them indicated level 6, which meant not distraction at all, and 8 indicated level 5)
And finally, participants were directly asked whether they would prefer to have had the chance to have a translation memory with or without metadata, 18 of the 20 participants preferred to have that additional information. Which also reinforces the previous idea that metadata does not represent a negative impact on the translator's behaviour, and invalidates the third hypothesis of our main research question which stated that “provenance metadata had a negative impact on the behaviour of translators during their work”.
259 Chapter 7 – Conclusions & Recommendations –
7.1. Summary of results
In terms of quality there was a significant improvement in quality in groups B and C, the ones that had the translation memory. Group B obtained slightly better quality than group C, however the difference was not significant enough to declare any casual relationship due to the independent variable. Although we could not obtain significant differences between Groups B and C, we did prove that there was an improvement in terms of quality when the translation memory (with or without metadata) was applied, those group obtained better scores in general than Group A, whose participants did not receive any translation memory.
In terms of time similar results were obtained: Group A (the one without translation memory) spent more time in total to fulfil the translation task than groups B and C, which proves again that the translation memory helped to reduce time when applied. Group B was faster in average than group C, but the difference again was not significant enough to declare a cause –effect relationship due to the absent or not of the metadata.
In terms of the attitude of the participants, more than half of the participants had a positive attitude towards the metadata items they received when we asked them to describe how they had influenced their work. In terms of distraction, none of the participants indicated that the metadata meant a big distraction to them, on the contrary, the majority of the participants indicated that they were not (or almost not ) distracted by them, and when asked if they had the choice to receive a translation memory with or without metadata 19 of the 20 participants preferred with it, which tell us that even though that they might not have found it useful, they preferred to have it and also it implies that that information did not mean any impediment to them when translating. The name of the translator, the date and the target language were the metadata items indicated as more useful and consulted by the participants. Many participants mentioned the key concepts of trust and reliability and three of them also pointed out that metadata per se does not have useful information, and they added that metadata can be useful for translators if the information that it contains has some direct meaning for the translators and if it can be trusted. Therefore, we recommend that the meaning of the metadata should be explained or indicated to the target translators at some point of the process 260 (ideally with the translation assignment), especially when translators are working in teams or in big companies, which would help translators to decide on the reliability of the translation suggestions that they are receiving. Project managers or the person in charge of the maintenance of those databases (TMs) should also be responsible for providing that key information to the translators.
7.2. Conclusion and recommendations
As seen in chapter one our main research question was “How does the provenance that surrounds translation memory suggestion influence the behaviour of translators during their work in a localisation process?”
Three hypotheses were considered for this main question.
H0- Provenance metadata has no effect on the behaviour of translators during their work.
H1- Provenance metadata has a positive impact on the behaviour of translators during their work.
H2- Provenance metadata has a negative impact on the behaviour of translators during their work.
An experimental strategy was put in place to answer our research question. From the analysis of the quantitative data obtained from the experiments (where we were focused on measuring the dependent variables “time” and “quality”), we should conclude that we did not find any significant55 effect (either positive or negative) on the variables measured due to the absence or presence of the provenance metadata, this result matches with our hypothesis 0: Provenance metadata has no effect on the behaviour of translators during their work. What we shown through the quantitative data is that the use of translation memories (with or without metadata) does improve the quality of the translation output and it also reduces the time that translators devote to that task. This outcome indicates that using a translation memory under similar circumstances as the ones that we reproduce, can improve the work of professional translators.
55 It should be said that a bigger difference in terms of time was observed between Group B and Group C, than in terms of quality. 261 If we take into account the qualitative data obtained from the participants' opinions, we should say that most of our participants prefer to have provenance metadata along with their translation suggestions and they also considered that those metadata items helped them either to take decisions or to reinforce their choices. We interpret that professional translators perceive metadata as a possible help to their work, something that can have a positive impact, and this idea goes along more with our H1, which indicated that provenance metadata had a positive impact in the translator’s behaviour. Another important point that was raised by the translators was the importance of knowing the meaning of the provenance metadata they receive, which would help them to assess their reliability. It is not about the quantity of information we provide them, but about the quality that that information has and its relevance for each specific situation.
Taking both sides into account, we should recommend to those in charge of dealing with translation memories (project managers, translation memory responsible, or translators) to include provenance metadata whenever possible, but also to inform translators about the meaning of those metadata items. For example: if they are including the names of former translators, they should also include information about the reliability of those people, so translators can better assess whether they should follow the translation suggestion or not. Presenting metadata without meaningful information to the translator would not beneficiate any of the parties involved.
7.3. Impact of the research
We can differentiate two main outputs that this research has produced: the participation and work in the XLIFF Technical Committee and the dissemination of our progress and results in academic journals and conferences.
In the XLIFF TC we have collaborated since March 2009 in the regular meetings and mailing discussions56. In the development of the new version of the standard, we have added information obtained from this research in the new
56 More information about our involvement in the TC can be found in Chapter 2. 262 information could be introduced without breaking the validity of the file, and without using any other alternative methods, like the processing instructions that we had to use with XLIFF Phoenix. Our input is now under discussion and might be introduced in the new version which is expected to be published at the end of this year57.
The progress and results of the research have been presented over the past three years in international conferences and journals, a complete list of our contribution to the our academic field can be consulted in Appendix K. It should also be highlighted that our tool, XLIFF Phoenix, has been declared as a University of Limerick invention disclosure, with the Case Number at UL being 2006155. It is envisaged that the tool will be open sourced in the near future in order to facilitate other researcher, developers or interested people to explore it, modify or use for their own interest. XLIFF Phoenix was demonstrated in several CNGL open demonstration days and it has attracted interest from industrial parties.
Christensen and Schjoldager (2010) in their review of existing empirical research on Translation Memories discovered that researchers took for granted that TM represents an improvement in terms of production time and quality, however there had not been empirical evidence that demonstrates it. Through our research, we have shown, that applying a translation memory (with or without provenance metadata) can improve translators’ productivity in terms of time (we observed a reduction of time in the groups that received the translation memory) and it also improves the quality of the translation (groups that received the translation memory obtained higher quality scores).
57 There is not an official release date yet. Technical work is still being carrying out at the moment. 263 7.4. Limitations and Future Research
Limitations
The main limitation that this type of research has, is the lack of control over the all the variables that might affect the final results: subject variability 58 (profile of the participants, previous experience with the tool used, previous experience with the topic of the text, etc.) and control over some environmental variables (levels of noise, possible distractions, level of tiredness of the participants, etc). We faced these two main constraints with a thoughtful design of the experiment: on the one hand, we prepared a demographic questionnaire where we could measure different characteristics of the participants, and we demonstrated that those variables where uniform in average through the three groups, therefore, differences in the results between the three groups could not be associated directly to the profile of the participants. And on the other hand, we prepared a quasi-real environment that was accessible through remote desktop connection which was the same for all the participants and could be controlled (a recording method was put into place), this environment allowed us the possibility of comparing the behaviour of all the participants under the same set of circumstances. What was out of our control was the physical control of the participants; we did not have information about their physical environment and its conditions. However, it should be said that this condition is what actually happens in real working scenarios with freelance translators, they work on their own computer from their own homes (or any other space that might better suit them) and project managers or any other responsible person that sends them their translation assignments do not have control over that. Only if we studied in-house translators, we could actually control the physical environment if a sufficient number of them were to be working in the same office or department.
Other limitations have to deal with the topic and language combination we chose. As we have seen in the literature review chapter, not all the textual types are better suited to work with translation memory software. We chose the textual type that has been indicated to better work with TMs, but it would be interesting to see if with other textual
58 It should be said although we included in this section “subject variability” as a limitation, this aspect is not per se a limitation of our type of research, but an intrinsic characteristic of the subjects being studied: human beings. 264 types, the same results we obtained could be reproduced. In terms of language combination, we only worked with the English-Spanish combination, which are two Indo-European languages that use the same script. It would be also interesting to see if the same results could be obtained from languages which are more distant between each other. We cannot predict whether the results will be replicable or not with other language combinations, however, we tend to believe that similar results would be achieved regardless the language combination used.
Lastly, we should say, that we only carried out our experiment using a specific CAT tool, it would be necessary to explore how other tools show provenance metadata to translators (in case they have that option) and also it would be interesting to investigate if the place on the screen has any influence on the translators use of that information.
We are aware that our experiment is a quasi-experiment and its results cannot be considered conclusive or universally externalised, however, they do represent a starting point for more research efforts, which might try to prove or disprove our results by reproducing the methodology we put into place or by introducing new approaches which might improve the validity of the results.
Future Research
In order to continue doing research on the influence that metadata might have in the translators’ behaviour, a different data generation method could be put into place: eye- tracking, which measures the movements and fixation of the participants’ eyes through the computer screen and the dilation of the pupil. This new research tool, has already been explored in the research with translation memories (O’Brien 2010), and it could help us to determine the use that translators make of the metadata floating window: how many times do the participants look at that floating window, and for how long. Adding this new method would increase the validity of our experiment, as it would help us to verify (by comparing it with the other data generation methods that we put in place) if the results are consistent along all of them.
A bigger data set of participants would be desirable if we want to repeat the experiments and verify that their results can be replicable over time. However, in order to deal with a bigger number of participants, a research group involving more than two researchers would be desirable. A single researcher could not possibly manage a bigger set of
265 participants and the data analysis that it would imply, , at least if the results would be published in a reasonable period of time that would make them still relevant for the scientific community.
266 Bibliography
Aouad, L., O’Keeffe, I.R., Collins, J.J., Wasala, A., Nishio, N., Morera, A., Morado Vázquez, L., Ryan, L., Gupta, R., Schaler, R. (2011) ‘A View of Future Technologies and Challenges for the Automation of Localisation Processes: Visions and Scenarios’, in Lee, G., Howard, D. and Ślęzak, D., eds., Convergence and Hybrid Information Technology, Communications in Computer and Information Science, Springer Berlin Heidelberg, 371–382.
Benito, D. (2009) ‘Future trends in translation memory’, Tradumàtica: traducció i tecnologies de la informació i la comunicació, (7).
Botto, F. (1999) Dictionary of Multimedia and Internet Applications, Chichester: New York: Wiley.
Bowker, L. (2005) ‘Productivity vs Quality? A pilot study on the impact of translation memory systems’, Localisation Focus, 4(1), 13–20.
Britannica Online Encyclopedia (2010) Phoenix (mythological Bird) [online], available: http://www.britannica.com/EBchecked/topic/457189/phoenix [accessed 9 Jul 2010].
Brkić, M., Seljan, S., Mikulić, B.B. (2009) ‘Using Translation Memory to Speed up Translation Process’, in International Conference The Future of Information Sciences (2; 2009).
Christensen, T.P. (2003) Translation Memory-systemer Som V\aerktøjer Til Juridisk Overs\aettelse: Kritisk Vurdering Af Anvendeligheden Af Translation Memory-systemer Til Overs\aettelse Af Selskabsretlig Dokumentation. Ph.D. dissertation.Syddansk Universitet.
Christensen, T.P., Schjoldager, A. (2010) ‘Translation-Memory (TM) Research: What Do We Know and How Do We Know It?’, Hermes–Journal of Language and Communication Studies, 44, 89–101.
Christensen, T.P., Schjoldager, A. (2011) ‘The Impact of Translation-Memory (TM) Technology on Cognitive Processes: Student-Translators’ Retrospective Comments in an Online Questionnaire’, Human-machine Interaction in Translation, 119–130.
267 CNGL (2012a) ‘Next Generation Localisation. Exciting Careers for tomorrow’s global economy’, available: http://www.cngl.ie/drupal/sites/default/files/CNGL_Next_Gen_Localisation_Careers_B rochure.pdf [accessed 29 May 2012].
CNGL (2012b) About Us | CNGL [online], available: http://www.cngl.ie/about.html [accessed 29 May 2012].
CNGL (2012c) Research Publications | CNGL [online], available: http://www.cngl.ie/researchpub.html#n2469 [accessed 8 Aug 2012].
Cohen, W., Ravikumar, P., Fienberg, S. (2003) ‘A comparison of string distance metrics for name-matching tasks’, in The Proceedings of the IJCAI.
Crano, W.D., Brewer, M.B., Lac, A. (2001) Principles and Methods of Social Research, 2nd ed, Lawrence Erlbaum: New Jersey..
Diaz Fouces, O., García González, M. (2008) Traducir (con) Software Libre, Collecció: Interlingua, 77, Comares: Granada.
Doherty, S., O’Brien, S., Carl, M. (2010) ‘Eye tracking as an MT evaluation technique’, Machine translation, 24(1), 1–13.
Dunne, K.J. (2006) Perspectives on Localization, John Benjamins Publishing Co: Kent State University.
Ellen Wright, S. (2006) ‘The creation and application of language industry standards’, in Perspectives on Localization, John Benjamins Pub. Co.: Kent State University, 242– 278.
Esselink, B. (2000) A Practical Guide to Localization, John Benjamins Pub. Co.: Amsterdam.
Filip, D. (2012) ‘The localization standards ecosystem’, Multilingual computing and technology, 23(3), 29–36.
Fredrik, E. (2012) ‘Extensibility methods’, XLIFF TC Mailing List, available: https://lists.oasis-open.org/archives/xliff/201205/msg00001.html [accessed 29 July 2012].
268 Fulford, H., Granell-Zafra, J. (2005) ‘Translation and technology: A study of UK freelance translators’, The Journal of Specialised Translation, 4(1).
García, I. (2008) ‘Translating and Revising for Localisation: What do We Know? What do We Need to Know?’, Perspectives: Studies in Translatology, 16(1-2), 49–60.
García, I. (2006) ‘Formatting and the Translator: Why XLIFF Does Matter’, Localisation Focus, Jun, 14–20.
García, I. (2009) ‘Research on translation tools’, Translation Research Projects 2, 27.
Gough, J. (2010) ‘A troubled relationship: the compatibility of CAT tools’, TAUS technology article, downloaded from: http://www.translationautomation. com/technology/a-troubledrelationship-the-compatibility-of-cat-tools.html [accessed 29 July 2012].
Guerberof, A. (2009) ‘Productivity and quality in MT post-editing’, in MT Summit XII- Workshop: Beyond Translation Memories: New Tools for Translators MT.
Hansen, B. (2004) The Dictionary of Computing & Digital Media: Terms & Acronyms, ABF Content.
Lagoudaki, E. (2006a) ‘Translation memories survey 2006: Users’ perceptions around tm use’, in Proceedings of the ASLIB International Conference Translating & the Computer, 1–29.
Lagoudaki, E. (2006b) ‘Translation memory systems: Enlightening users’ perspective’, Translation Memories Survey 2006.
LISA (1995) ‘LISA QA MODEL Reference Manual’.
LISA (2007) ‘LISA QA Model 3.1 User’s Manual’.
LISA (2009) What Is Globalization? [online], available: http://www.lisa.org/What-Is- Globalizatio.48.0.html [accessed 25 Oct 2009].
LISA (2011a) LISA: LISA Members [online], available: http://www.lisa.org/LISA- Members.74.0.html [accessed 3 Jan 2011].
269 LISA (2011b) LISA: OSCAR, LISA’s Standards Committee [online], available: http://www.lisa.org/OSCAR-LISA-s-Standards- Committee.79.0.html?&no_cache=1&sword_list[]=oscar [accessed 3 Jan 2011].
Lommel, A. (2006) ‘Localization standards, knowledge- and information-centirc business models, and the commoditization of linguistic information.’, in Perspectives on Localization, John Benjamins Pub. Co.: Kent State University, 223–239.
Marotta, R. (1986) The DEC Dictionary: a Guide to Digital’s Technical Terminology, DECbooks: Burlington, MA.
Mesa-Lao, B. (2011) ‘Explicitation in translation memory-mediated environments. Methodological conclusions from a pilot study’, Translation & Interpreting, 3(1), 44– 57.
Moorkens, J. (2011) ‘Translation Memories guarantee consistency: Truth or fiction?’, In Proc: ASLIB Translating and the Computer, (33).
Morado Vázquez, L., Filip, D. (2012) ‘XLIFF Support in CAT Tools’, XLIFF Promotion and Liaison Subcommittee, OASIS.
Morado Vázquez, L., Torres del Rey, J., Collantes Fraile, C. (2009) ‘Translation in the cloud, localisation over the rainbow’, Presented at the LRC XIV Localisation in the Cloud, Limerick, Ireland.
Negroponte, N. (1995) Being Digital, Knopf New York.
Neira Vilas, X. (1961) Memorias Dun Neno Labrego, Edicións do Castro: Osedo- Coruña.
O’Brien, S. (2007) ‘An empirical investigation of temporal and technical post-editing effort’, Translation and Interpreting Studies, 2(1), 83–136.
O’Brien, S. (2010) ‘Eye tracking in translation process research: methodological challenges and solutions’, Copenhagen studies in language, (38), 251–266.
O’Brien, S., O’Hagan, M., Flanagan, M. (2010) ‘Keeping an eye on the UI design of Translation Memory: how do translators use the Concordance feature?’, in Proceedings of the 28th Annual European Conference on Cognitive Ergonomics, 187–190.
270 Oates, B.J. (2006) Researching Information Systems and Computing, Sage Publications Ltd.
OSCAR LISA (2005) ‘TMX 1.4b Specification’.
Pym, A. (2004) The Moving Text: Localization, Translation, and Distribution, John Benjamins Pub. Co.
Pym, A. (2011) ‘Democratizing translation technologies–the role of humanistic research’, in Luspio Translation Automation Conference, Rome, April, 2011.
Raya, R. (2004) ‘XML in localisation: A practical analysis’, IBM Technical Library, available: http://www.ibm.com/developerworks/library/x-localis/ [accessed 6 Oct 2011]
Raya, R. (2005) ‘XML in Localisation: Reuse Translations with TM and TMX’, IBM Technical Library, available: http://www.ibm.com/developerworks/library/x-localis3/ [accessed 6 Oct 2011].de Saint Robert, M.J. (2008) ‘CAT tools in international organisations’, Topics in language resources for translation and localisation, 79, 107. de Saint-Exupéry, A. (1946) Le Petit Prince, Collection Folio Junior. ed, Gallimard Paris.
Salkind, N.J. (2009) Statistics for People Who (Think They) Hate Statistics: Excel 2007 Edition, SAGE.
Savourel, Y. (2012) ‘Thoughts on custom namespaces for extensions’, XLIFF TC Mailing List, available: https://lists.oasis- open.org/archives/xliff/201204/msg00038.html [accessed 29 July 2012].
Schäler, R. (2009) ‘Localization’, Routledge Enclyclopedia of Translation Studies.
Sprung, R.C., Jaroniec, S. (2000) Translating into Success: Cutting-edge Strategies for Going Multilingual in a Global Age [online], John Benjamins Publishing Company.
TAUS Data Association (2010) ‘TDA: A Starter’s Guide’, available: http://www.tausdata.org/images/documents/taus_data_association-starters_guide.pdf [accessed 20 Jan 2010].
Teixeira, C. (2011) ‘Knowledge of Provenance and its Effects on Translation Performance in an Integrated TM/MT Environment’, in Proceedings of the 8th 271 International NLPCS Workshop– Special Theme: Human–Machine Interaction in Translation. Copenhagen: Samfundslitteratur.
Torres del Rey, J. (2003) ‘Entretejiendo palabras: tratamiento de textos digitales’, available: http://campus.usal.es/~palabras/palabras03/virtual/hipertexto/0.htm [accessed 20 Jan 2010].
Torres del Rey, J. (2005) La interfaz de la traducción: Formación de traductores y nuevas tecnologías, Interlingua, Comares: Albolote (Granada).
Torres del Rey, J. (2011) ‘Localización de software. Recursos y software para la localización’, Universidad de Salamanca, unpublished.
Torres-Hostench, O., P. C (2010) ‘TRACE: Measuring the Impact of CAT Tools on Translated Texts’, Linguistic and Translation Studies in Scientific Communication, 255.
Trillaud, S., Guillemin, P. (2012) ‘What has become of LISA’s OSCAR standards?’, Multilingual, 23(3), 38.
W3C (2008) Extensible Markup Language (XML) 1.0 (Fifth Edition) [online], Extensible Markup Language (XML) 1.0 (Fifth Edition), available: http://www.w3.org/TR/REC-xml/#sec-pi [accessed 14 Jul 2012].
Warren, S. (2007) ‘The Zachman Enterprise Framework’, Cambridge Technical Communicators, available: http://www.technical- communicators.com/articles/zachman_framework.pdf [accessed 14 Jul 2012].
XLIFF TC (2007) XLIFF Frequently Asked Questions [online], available: http://www.oasis-open.org/committees/xliff/faq.php [accessed 4 Jan 2011].
XLIFF TC (2008a) ‘XLIFF 1.2 White Paper’, available: http://www.oasis- open.org/committees/download.php/26817/xliff-core-whitepaper-1.2-cs.pdf [accessed 29 July 2012].
XLIFF TC (2008b) XLIFF 1.2 Specification [online], available: http://docs.oasis- open.org/xliff/xliff-core/xliff-core.html [accessed 14 Feb 2011].
Yamada, M. (2011) ‘The effect of translation memory databases on productivity’, Translation Research Projects 3, 63.
272
273 Appendices
7.5. Appendix A – Demographic Questionnaire
General Questionnaire
Introduction
You are about to participate in the experimental procedures of Lucía Morado Vázquez’s doctoral thesis. These experiments try to specify the importance of localisation data and metadata during the localisation process. All questions contained in this questionnaire are strictly confidential and will only be used for scientific purposes. Personal data will not be revealed under any circumstances.
Page 1 274 General Questionnaire
Consent Form
Please answer to the statements below if you are happy to proceed.
*1. I confirm that I have read the Information Sheet regarding the proposed research and understand the nature of the study. ml Yes j mlj No
*2. I acknowledge that I have had the opportunity to ask the researchers involved in this research any questions that I might have and these have been met to my satisfaction. ml Yes j mlj No
*3. I understand the terms under which I am participating. ml Yes j mlj No
*4. I understand that the information that I give may be used by researcher’s involved in this research and that my participation is voluntary and that I may withdraw myself and my data at any time throughout the study until the research is submitted. ml Yes j mlj No
Page 2
275 General Questionnaire
Confidential Agreement
The data to be manipulated today was provided by CNGL Industrial partners and it is protected by IP agreements.
*5. I confirm that I will not reveal any detail of the experiments (procedures, data content and software used) to any third parties. ml Yes j mlj No
Page 3
276 General Questionnaire
Personal Data
This section aims to obtain demographic data about the participants. It will not be revealed related to your participant name under no circumstances.
*6. Participant Name (e.g. Participant22)
7. Date of Birth
8. Gender ml Male j ml Female j
Page 4
277 General Questionnaire
Translation Experience
This section aims to obtain information about participants' translation experience. 9. Current position (if applicable, check more than once). fe Translation Student c Freelance Translator fe c fe In-house Translator c Other (please specify)
5
6
10. If you are a translation student, specify in which year are you? (e.g. I am in my 3rd year).
11. If you work as a translator, how long have you been working professionally? (E.g. three years)
12. Translation activity, in hours per day: fe Less than one hour c fe One hour c fe Two hours c fe Three hours c fe Four hours c fe Five hours c fe Six hours c fe Seven hours c fe Eight hours c fe Nine hours c fe Ten hours c fe More than ten hours c
Other (please specify)
Page 5
278 General Questionnaire 13. Do you carry out other translation related activities on your daily work? (E.g. "I do postediting")
14. Main working language combination(s). (E.g. English into Spanish)
15. Other working language combination(s). (E.g. French into Spanish)
Page 6
279 General Questionnaire
Experience with Computer Assisted Translation (CAT) tools
This section aims to obtain information about the participant's familiarity with CAT tools
16. Use of CAT tools. How often do you use them? fe I use them in all my translation activities. c fe I use them only with some of my translation assignments. c fe I use them only when I am required to do so. c fe I have tried them, but I do not use them for my daily work. c fe I have never tried them. c
Other (please specify)
17. Use of Translation Memories. How often do you use them? (This question is related to the previous one). fe I use them in all my translation activities. c fe I use them only with some of my translation assignments. c I use them only when I am required to do so. e c fe I have tried them, but I do not use them for my daily work. c fe I have never tried them. c
Other (please specify)
18. Familiarity with Swordfish. How well do you know this CAT tool? fe I use it on a daily basis. c fe I use it on some of my projects. c fe I have tried it, but I do not use it. c fe I have heard of it, but I have never tried it before. c fe I have never heard of it. c
Other (please specify)
Page 7
280
General Questionnaire
19. Familiarity with XLIFF (XML Localisation Interchange File Format)
Total unfamiliar I am very familiar with it
1 nmlkj nmlkj nmlkj nmlkjmlk 6
Page 8
281
General Questionnaire
End of the survey
Thanks for your time. You can now start with the translation task. Do not forget to complete the second questionnaire after the translation task.
Page 9
282 7.6. Appendix B –Task Questionnaire. Groups A and B
Task Questionnaire
Task Questionnaire
This is the second part of the questionnaire. The following questions are related to the task that you have just fulfilled.
Page 1
283 Task Questionnaire
Identification
*1. Introduce your participant name (e.g. Participant22)
Page 2
284 Task Questionnaire
Duration and difficulty
2. Duration. Please introduce (in minutes) the exact duration of the task. From the moment you opened Swordfish until the moment you saved your translation. If you do not know the exact time, leave it blank. The researcher can find it out by analysing your video.
3. Topic of the text. How familiar were you with the topic of the text?
ml 1 (Total Unfamiliar) j
mlj 2
mlj 3
mlj 4
mlj 5
ml 6 (Very familiar) j
4. Experience working with Microsoft©
Office products. I am an Advanced User. I have never worked with it.
Microsoft Excel nmlkj nmlkj nmlkj nmlkj nmlkj nmlkj Microsoft Word mlj mlj mlj mlj mlj mlj Microsoft PowerPoint nmlkj nmlkj nmlkj nmlkj nmlkj nmlkj Microsoft Access mlj mlj mlj mlj mlj mlj Microsoft Outlook nmlkj nmlkj nmlkj nmlkj nmlkj nmlkj
ml 5. Linguistic difficulty j1. Very difficult (I needed to consult many terms and expressions)
mlj 2
mlj 3
mlj 4
mlj 5
ml6. Very easy (I could do it without consulting external resources) j
Page 3
285 Task Questionnaire 6. Assistance. How many times did you ask for help?
ml n None j ml 1 1 j ml 2 2 j ml 3 3 j ml 4 4 j ml 5 5 j ml 6 j 6 Other (please specify)
7. Doubts. If you had any doubts, what were them about? (please select more than one answer if applicable) fe Linguistic related c fe Technical related c fe Cat tool related c fe Experiment Instructions c
Other (please specify)
8. External Resources. Did you consult any external resources? (please select more than one answer if applicable)
fe No (If Yes, continue answering) c fe Machine Translation c fe Online Dictionaries c fe Microsoft© Excel official Webpage c Other (please specify)
Page 4
286 Task Questionnaire
Final Remarks
9. Additional comments. Please write below any additional comment you might have.
Page 5
287 Task Questionnaire
Thank you for your time!
Page 6 288 7.7. Appendix C –Task Questionnaire. Group C
Task Questionnaire (Main Group)
Task Questionnaire
This is the second part of the questionnaire. The following questions are related to the task that you have just fulfilled.
Page 1
289 Task Questionnaire (Main Group)
Identification
*1. Introduce your participant name (e.g. Participant22)
Page 2
290 Task Questionnaire (Main Group)
Duration and difficulty
2. Duration. Please introduce (in minutes) the exact duration of the task. From the moment you opened Swordfish until the moment you saved your translation. If you do not know the exact time, leave it blank. The researcher can find it out by analysing your video.
3. Topic of the text. How familiar were you with the topic of the text? ml 1 (Total Unfamiliar) j mlj 2
mlj 3
mlj 4
mlj 5 ml 6 (Very familiar) j
4. Experience working with
Microsoft©Office products. I am an Advanced I U have never ser. worked with it.
M nmlk nmlk nmlk nmlk nmlk nmlk Microsoft Excel j j j j j j M ml ml ml ml ml ml Microsoft Word j j j j j j M nmlk nmlk nmlk nmlk nmlk nmlk Microsoft PowerPoint j j j j j j M ml ml ml ml ml ml Microsoft Access j j j j j j M nmlk nmlk nmlk nmlk nmlk nmlk Microsoft Outlook j j j j j j
ml j 5. Linguistic Difficulty
1. Very difficult (I needed to consult many terms and expressions)
mlj 2
mlj 3
mlj 4
mlj 5
ml 6. Very easy (I could do it without consulting external resources) j
Page 3
291 Task Questionnaire (Main Group) 6. Assistance. How many times did you ask for help?
ml N None j ml 1 1 j 2 ml 2 2 j ml 3 3 j ml 4 4 j ml 5 5 j ml 6 j 6 Other (please specify)
5
6
7. Doubts. If you had any doubts, what were them about? (please select more than one answer if applicable) fe Linguistic related c e Technical related c fe Cat tool related c fe Experiment Instructions c Other (please specify)
8. External Resources. Did you consult any external resources? (please select more than one answer if applicable) fe No (If Yes, continue answering) c fe Machine Translation c fe Online Dictionaries c fe Microsoft© Excel official Webpage c Other (please specify)
Page 4
292 Task Questionnaire (Main Group)
METADATA
9. Which metadata items did you consult? (please select more than one answer if applicable) fe contact-name (e.g. "Antonio García) c fe date (e.g. "2006-1-5T12:00:00Z" c fe target-language (e.g. " es-ES") c fe original (e.g. "CubeFunctions.chm") c fe category (e.g. "Spreadsheet") c
10. Which medatata item(s) did you find more useful? (please select more than oneanswer if applicable) fe contact-name (e.g. "Antonio García) c fe date (e.g. "2006-1-5T12:00:00Z" c fe target-language (e.g. " es-ES") c fe original (e.g. "CubeFunctions.chm") c category (e.g. "Spreadsheet") fe c
11. Which metadata item(s) did you find less useful? (please select more than one answer if applicable) fe contact-name (e.g. "Antonio García) c fe date (e.g. "2006-1-5T12:00:00Z" c fe target-language (e.g. " es-ES") c fe original (e.g. "CubeFunctions.chm") c fe category (e.g. "Spreadsheet") c
12. Which metadata item(s) did you not consult? (please select more than one answer if applicable) fe contact-name (e.g. "Antonio García) c fe date (e.g. "2006-1-5T12:00:00Z" c fe target-language (e.g. " es-ES") c fe original (e.g. "CubeFunctions.chm") c fe category (e.g. "Spreadsheet") c 13. Could you explain in one sentence how has the metadata influenced your work?
Page 5
293 Task Questionnaire (Main Group)
14. Would you suggest any other metadata item?
5
6
15. If you could choose between having a translation memory with metatada or another one whitout metadata, which one would you prefer? ml Translation memory with metadata j ml Transaltion memory without metadata j
16. How distracted were you by the metadata during the translation task? ml 1 (I was confused by the amount of data) j mlj 2 mlj 3 mlj 4 mlj 5 ml 6 (I was not distracted at all, it only helped me) j 17. Did the metadata influence you to do a better job? ml Yes j No
18. If yes, could you give us an example?
Page 6
294 Task Questionnaire (Main Group)
Final Remarks
19. Additional comments. Please write below any additional comment you might have.
Page 7
295 Task Questionnaire (Main Group)
Thank you for your time!
Page 8 296 7.8. Appendix D – Answers to the Demographic Questionnaire
We present in this appendix the data obtained from the demographic questionnaire. All the participants’ data are presented together. Personal data (age and gender) is intentionally not included in this appendix due to confidentiality reasons.
7.8.1. Current position
In this table, number 2 represents “Freelance Translator” and number 3 represents “In- house Translator”.
Participant Position Other GA1 2 GA2 2 GA3 2 And associate lecturer. GA4 2 Also MA student. GA5 2 GA6 2 And researcher. GA7 2 GB1 3 Industry CAT support. GB2 2 Localization Specialist, Consultant, Sales & Marketing. GB3 3 CEO of a translation company (5 years). I’ve been an in-house translator for 5 years and a freelancer for 5 years. GB4 2 GB5 2 GB6 3 GC1 2 GC10 2 GC11 2 GC12 2 GC13 2 GC14 2 GC15 2
297 GC16 3 GC17 2 GC18 2 Translation Senior Project Manager. GC19 3 GC2 2 GC20 2 GC3 2 Part-time freelance translator. GC4 2 Medical Linguistic Specialist. GC5 3 GC6 3 GC7 3 GC8 3 GC9 2
7.8.2. Experience years
Participant Experience Years GA1 4 GA2 14 GA3 5 GA4 3 GA5 2 GA6 3 GA7 4 GB1 13 GB2 13 GB3 15 GB4 4 GB5 2 GB6 4 GC1 16 GC10 14 GC11 4 GC12 7 GC13 18
298 GC14 4 GC15 2 GC16 3 GC17 3 GC18 5 GC19 6 GC2 17 GC20 4 GC3 2 GC4 4 GC5 12 GC6 4 GC7 5 GC8 4 GC9 7
7.8.3. Hours per day
In this table “0” represents the option “less than one hour”.
Participant Hours per day Other GA1 8 GA2 8 GA3 3 GA4 5 GA5 2 GA6 n/a From time to time. GA7 2 GB1 1 GB2 0 Activities are more related to processing (pre & and post) files for translation and designing translation and localization processes for translation by translation teams. GB3 3 I edit and do QA's mainly. GB4 8 GB5 5
299 GB6 8 GC1 8 GC10 8 GC11 8 GC12 8 GC13 7 GC14 8 GC15 4 GC16 4 GC17 8 GC18 8 GC19 6 GC2 8 GC20 n/a I used to translate sporadically as I used to work as a full time software engineer. GC3 n/a I do not work every day, depends on the activity. GC4 8 GC5 8 GC6 8 GC7 8 GC8 6 GC9 0
7.8.4. Other translation-related activities
Participant Other activities GA1 I do interpreting GA2 Proofreading, editing, terminology management. GA3 I review and I´m a teacher, too. GA4 No. GA5 Proofreading. GA6 No. GA7 I teach English GB1 Run demos on how to translate GB2 No.
300 GB3 Editing, QA, LQA, format conversions, project management. GB4 I teach French and English. GB5 No. GB6 I do proofreading and some DTP sometimes. GC1 N/a GC10 Postediting, Author for some short guides about veterinary items. GC11 I do proofreading. GC12 Proofreading and editing GC13 Proofreading , postediting GC14 No. GC15 Sometimes I proofread GC16 I do proofreading, revision and coordination. GC17 Proofreading and QAs. GC18 I am learning to do postediting, terminology management. GC19 I do postediting. GC2 Yes, but not always. GC20 I develop software tools to help me improve my productivity. GC3 N/a GC4 Linguistic Reviews and Postediting GC5 editing, proofing, QA, postediting, LA GC6 I do proof-reading on my colleagues' texts GC7 Copy-editing, proofreading, DTP, TM management, project management. GC8 Yes, I do: proofreading, editing, transcription, interpreting. GC9 N/a
7.8.5. Main language combinations
Participant EN ES- EN- EN- FR- FR- DE- DE- EN- -ES EN CA GL ES CA ES EN FR GA1 1 1 GA2 1 GA3 1 1 GA4 1 GA5 1
301 GA6 1 GA7 1 1 1 1 GB1 1 GB2 1 GB3 1 GB4 1 1 GB5 1 GB6 1 GC1 1 GC10 1 1 GC11 1 1 1 GC12 1 1 GC13 1 GC14 1 GC15 1 1 GC16 1 GC17 1 1 GC18 1 1 1 1 GC19 1 1 1 1 GC2 1 GC20 1 GC3 1 GC4 1 GC5 1 GC6 1 GC7 1 GC8 1 1 GC9 1 1 Total 31 4 2 1 4 2 5 3 1 7.8.6. Other language combinations
In the following table, the name of the languages are represented in two letters: ES for Spanish, EN for English, FR for French, DU for Dutch, CA for Catalan, GL for Galician, DE for German, PT for Portuguese, IT for Italia and AR for Arabic
302
ES ES EN PT ES ES ES ES CA ES CA ES EN CA GL FR ------GL ES - EN ------Participant ES EN FR DU ES CA CA GL DE IT IT FR GL ES EN IT AR PT ES GA1 GA2 1 GA3 GA4 1 1 1 GA5 1 1 GA6 1 GA7 1 1 GB1 1 1 GB2 1 1 GB3 GB4 1 1 GB5 GB6 1 GC1 GC10 1 1 1 GC11 GC12 1 GC13 GC14 GC15 GC16 1 GC17 GC18 1 1 GC19 1 1 1 GC2 1 GC20 1 1 GC3 1 GC4 1 1 GC5 GC6 1 GC7 1 GC8 1 GC9 1 1
303 4 1 12 1 1 1 1 1 3 1 2 1 1 1 1 1 1 1 2
Total
7.8.7. CAT tools usage
In this table, number 1 represents the predefined answer “I use them in all my translation activities”, number 2 represents the predefined answer “ I use them only with some of my translation activities”, number 3 represents the predefined answer “ I use them only when I am required to do so”, number 4 represents the predefined answer “I have tried them, but I do not use them for my daily work.”, and number 5 represents the predefined answer “I have never tried them”.
Participant CAT tools usage GA1 1 GA2 1 GA3 1 GA4 1 GA5 3 GA6 2 GA7 2 GB1 1 GB2 1 GB3 1 GB4 4 GB5 1 GB6 1 GC1 1 GC10 2 GC11 1 GC12 1 GC13 2 GC14 4 GC15 1 GC16 1
304 GC17 1 GC18 1 GC19 3 GC2 1 GC20 2 GC3 2 GC4 1 GC5 3 GC6 2 GC7 1 GC8 1 GC9 4
7.8.8. TM usage
In this table, number 1 represents the predefined answer “I use them in all my translation activities”, number 2 represents the predefined answer “I use them only with some of my translation activities”, number 3 represents the predefined answer “I use them only when I am required to do so”, number 4 represents the predefined answer “I have tried them, but I do not use them for my daily work.”, and number 5 represents the predefined answer “I have never tried them”.
Participant TM Usage GA1 1 GA2 1 GA3 1 GA4 1 GA5 3 GA6 2 GA7 2 GB1 1 GB2 1 GB3 1 GB4 4 GB5 1
305 GB6 1 GC1 1 GC10 2 GC11 n/a GC12 1 GC13 2 GC14 4 GC15 1 GC16 1 GC17 1 GC18 1 GC19 3 GC2 1 GC20 2 GC3 2 GC4 1 GC5 3 GC6 2 GC7 1 GC8 1 GC9 4
7.8.9. Swordfish
In this table, number 1 represents the predefined answer “I use it on a daily basis”, number 2 represents the predefined answer “I use it on some of my projects”, number 3 represents the predefined answer “I have tried it, but I do not use it”, number 4 represents the predefined answer “I have heard of it, but I have never tried it before”, and number 5 represents the predefined answer “I have never heard of it”.
Participant Swordfish GA1 4 GA2 3 GA3 4 GA4 4
306 GA5 4 GA6 4 GA7 5 GB1 4 GB2 5 GB3 2 GB4 4 GB5 4 GB6 4 GC1 4 GC10 3 GC11 5 GC12 4 GC13 4 GC14 3 GC15 4 GC16 5 GC17 4 GC18 4 GC19 3 GC2 4 GC20 4 GC3 3 GC4 4 GC5 4 GC6 1 GC7 3 GC8 3 GC9 4
7.8.10. XLIFF
In this table, number 1 represents the lowest point of the predefined scale (total unfamiliar) and number six represents the highest point of the predefined scale (I am very familiar with it).
307 Participant XLIFF GA1 2 GA2 6 GA3 6 GA4 3 GA5 1 GA6 1 GA7 2 GB1 6 GB2 6 GB3 6 GB4 2 GB5 1 GB6 4 GC1 5 GC10 1 GC11 5 GC12 5 GC13 GC14 1 GC15 2 GC16 3 GC17 1 GC18 6 GC19 3 GC2 3 GC20 2 GC3 4 GC4 3 GC5 3 GC6 1 GC7 5 GC8 2 GC9 2
308 7.9. Appendix E – Answers to the Task Questionnaire. Groups A and B, and first part of Group C.
We present in this appendix the data obtained from the task specific questionnaire. Group A and Group B received the same task specific questionnaire. Group C received the same questionnaire that the other two groups received and some additional questions related to the provenance metadata that only this group received, the answers to those additional questions can be found in Appendix F. All the participants’ data are presented together, but we also indicated in the same row of each participant its group affiliation.
7.9.1. Topic of the text
In the following table, number 1 represents the lowest point of the predefined scale (total unfamiliar) and number six represents the highest point of the predefined scale (very familiar).
Participant Topic GA1 4 GA2 6 GA3 6 GA4 5 GA5 1 GA6 4 GA7 1 GB1 5 GB2 6 GB3 5 GB4 5 GB5 4 GB6 6 GC1 5
309 GC10 4 GC11 5 GC12 5 GC13 4 GC14 3 GC15 4 GC16 2 GC17 3 GC18 3 GC19 2 GC2 5 GC20 5 GC3 5 GC4 3 GC5 6 GC6 4 GC7 5 GC8 4 GC9 4
7.9.2. Experience working with Microsoft Office products
In the following table, number 1 represents the lowest point of the predefined scale (I have never worked with it) and number six represents the highest point of the predefined scale (I am an advanced user).
Participant Excel Word PowerPoint Access Outlook GA1 5 6 4 2 5 GA2 6 6 4 5 4 GA3 6 6 5 3 GA4 5 6 5 2 5 GA5 3 6 5 1 5 GA6 5 6 6 1 5 GA7 5 6 5 1 5
310 GB1 5 5 5 5 5 GB2 6 6 6 6 6 GB3 4 6 5 2 6 GB4 4 5 5 3 3 GB5 3 6 6 4 6 GB6 5 5 5 5 5 GC1 5 6 6 1 6 GC10 6 6 6 1 2 GC11 5 5 5 4 2 GC12 6 6 6 6 6 GC13 5 6 5 1 5 GC14 4 6 5 3 1 GC15 4 6 5 2 2 GC16 3 6 5 4 6 GC17 3 5 5 2 4 GC18 5 6 5 1 6 GC19 2 5 5 1 3 GC2 4 4 4 3 4 GC20 6 6 5 2 6 GC3 4 6 5 2 5 GC4 3 6 6 4 6 GC5 5 5 4 4 6 GC6 5 5 5 4 5 GC7 5 6 5 2 4 GC8 5 5 5 2 4 GC9 5 6 6 4 5
7.9.3. Linguistic difficulty
In the following table, number 1 represents the lowest point of the predefined scale (I have never worked with it) and number six represents the highest point of the predefined scale (I am an advanced user).
Participant Difficulty GA1 5
311 GA2 5 GA3 5 GA4 4 GA5 3 GA6 4 GA7 2 GB1 5 GB2 6 GB3 3 GB4 5 GB5 5 GB6 5 GC1 5 GC10 5 GC11 5 GC12 6 GC13 5 GC14 4 GC15 5 GC16 4 GC17 5 GC18 4 GC19 3 GC2 4 GC20 6 GC3 4 GC4 5 GC5 5 GC6 4 GC7 5 GC8 5 GC9 5
312 7.9.4. Assistance
Participant Assistance Other GA1 0 GA2 1 GA3 0 I checked the official translation every time in order to see if my translation contained the Microsoft Terminology. GA4 0 I checked dictionaries and other resources at least 20 times, but I did not ask for help to anybody. GA5 1 GA6 0 GA7 0 GB1 1 Connection lost once… not sure if that is part of assistance request. I had to reconnect to the remote server. GB2 0 GB3 0 GB4 4 GB5 0 GB6 0 GC1 0 GC10 0 GC11 0 GC12 0 GC13 2 GC14 0 GC15 2 GC16 6 GC17 0 GC18 0 GC19 0 GC2 0 GC20 0 GC3 0 GC4 0 GC5 4 GC6 0
313 GC7 0 GC8 0 I didn't remember where to find some of the icons for the functions of Swordfish, but I just looked it the software’s help. GC9 1
7.9.5. Doubts
Particip Linguis Techni Cat Experiment Other ant tic cal tool Instructions related related relate GA1 1 GA2 1 GA3 1 1 Terminological doubts. I had proposals, but I wanted to use the official MS terminology. GA4 1 GA5 1 GA6 1 GA7 1 GB1 1 When I started the translation I wanted to know how to apply information but I did not need it. GB2 1 GB3 1 GB4 1 1 GB5 1 1 GB6 1 1 1 GC1 1 0 GC10 1 1 GC11 1 GC12 1 1 GC13 1 1 GC14 1 1 GC15 1 1 GC16 1
314 GC17 1 GC18 1 1 GC19 1 GC2 1 1 0 GC20 1 GC3 1 1 0 GC4 1 1 I did not know how to change validated segments. I had difficulties in being able to make general searches for consistency purposes, so I opted for manual visual search. Linguistic: I was not familiar with two Excel-specific terms since I use that programme rarely and always in English. I trusted the suggested translation. GC5 1 1 0 GC6 1 1 GC7 1 1 GC8 1 GC9 1
7.9.6. External resources
Participant No MT Online Ms Excel Other Dictionaries Webpage GA1 1 1 GA2 1 GA3 1 GA4 1 GA5 1 1 GA6 1 1 GA7 1 Proz and Linguee GB1 1 Just the internal
315 concordance search of the CAT tool. GB2 1 GB3 Microsoft Linguistic Portal filtering by "Excel" GB4 1 Google GB5 1 1 GB6 1 GC1 1 0 GC10 Personal dictionary (few times). GC11 1 GC12 1 GC13 1 GC14 1 GC15 1 Microsoft terminology GC16 1 GC17 1 GC18 1 GC19 1 GC2 1 Microsoft Language Portal GC20 1 GC3 1 0 GC4 1 0 GC5 1 0 GC6 1 Google search GC7 1 1 GC8 1 1 Google image hits for terms GC9 1
7.9.7. Additional Comments
Participant Additional comments
316 GA1 N/a. GA2 My connection to the remote desktop suffered from a heavy lag that slowed down my task to a great extend. GA3 I don’t know if it was the aim of it, but although I could have done the translation without any external resources (I'm familiar with the subject and it was not very difficult), I preferred to check the official MS office website to check if my translations were conformant to the official terminology and translations of the company. GA4 Some of the resources I checked were just to make sure that Microsoft uses the same terminology I was going to use. GA5 N/a. GA6 N/a. GA7 N/a. GB1 N/a. GB2 Had some trouble in the beginning to find the necessary shortcuts to quickly populate the TM suggested translation. Found it at the end. GB3 N/a. GB4 N/a. GB5 Me pareció una experiencia muy interesante traducir un documento corto con un software que nunca había utilizado, principalmente porque me demostró que el software que yo utilizo normalmente tiene algunas deficiencias o faltas. En cuanto al experimento, creo que personalmente me niego a copiar la traducción de otro, prefiero hacerla con mi estilo personal aunque debo reconocer que las TMs son una ayuda increíble para los traductores para nos ahorran muchísimo tiempo en la búsqueda de terminología. GB6 N/a. GC1 No comments. The server was bit too slow sometimes, maybe it was my connection. GC10 Metadata can be very useful associated to previous instructions from the customer. E.g. "all TMUs with code XXX are considered extremely reliable" GC11 N/a. GC12 N/a. GC13 I had an issue with the last sentence which I think it was wrong. Fewer number of cells would mean less time for Excel to calculate,
317 not more. I made an alteration to the source language. In a real situation I would have consulted the client to ask for more information or to ascertain the accuracy of the text. I always work on my office and the fact that my family were in the house at the time of the experiment has affected my concentration which would not have been the case in normal circumstances. GC14 N/a. GC15 When translating with Swordfish, I could not go from one segment to the next by using the arrows or the mouse. They would not open when clicking on them or trying to move from one to another. I had to go through the translation using ctrl+down arrow. At the end of the translation, I tried to go back to change the first segment, I could not open it again. I asked Lucía and we decided to stop the experiment there. i do not know what the exact problem was, but it made it difficult to change my translation after doing it for the first time. GC16 N/a. GC17 N/a. GC18 I believe having metadata is very useful. Thinking about larger projects where the amount of information available within a database is very wide, having information that pinpoints to the right decisions may make your job easier and limit the amount of unpreferred terminology used. However, I can also see that if no clear information is provided about metadata at the beginning of a project, if this is a lot, it can confuse translators rather than help them. This would be to a major extent when dealing with large TMs where a lot of information is provided, besides risking having to spend additional time reading the notes for each new entry. GC19 N/a. GC2 I think that for the experiment to work out better, one would need to have a "practice" period of at least a couple of texts before working on the experiment. I think the use of the CAT tool is very important to get a real feel of how both the tool and metadata can influence our work and time spent on our work. GC20 N/a. GC3 N/a. GC4 I found this experiment interesting. A suggestion: it is sometimes
318 more useful for a translator to have several matches suggested by the programme so that translator can choose depending on text quality and metadata. If only one hit is offered, translator might tend to just copy the suggested match and improve it or change it as appropriate. good work!! GC5 N/a. GC6 The tool works very fast and is very useful. I like the number of options it offers. GC7 N/a. GC8 N/a. GC9 N/a.
319 7.10. Appendix F – Answers to the Task Questionnaire. Second part of Group C
In this section we present the results from the second part of the task questionnaire that only participants from group C received. The questions contained in this section are related to the use of the metadata, that only group C had.
7.10.1. Most consulted metadata items
Participant contact-name date target-language original category GC1 1 1 1 GC2 1 GC3 1 GC4 1 1 1 GC5 1 1 1 1 GC6 1 GC7 1 1 1 GC8 1 1 GC9 1 1 1 GC10 GC11 GC12 1 GC13 1 1 GC14 1 1 GC15 1 1 1 GC16 1 GC17 1 1 1 GC18 1 1 1 1 1 GC19 1 1 1 1 GC20 1 1 1 1 Total 18 11 11 3 4
7.10.2. Most useful metadata item
Participant contact-name date target-language original category
320 GC1 1 1 GC2 1 GC3 1 GC4 1 GC5 1 1 GC6 1 GC7 1 1 GC8 GC9 1 GC10 GC11 GC12 1 GC13 1 1 GC14 1 GC15 1 1 GC16 1 GC17 1 1 1 GC18 1 1 1 1 1 GC19 1 GC20 1 1 Total 13 4 8 2 2
7.10.3. Less useful metadata item
Participant contact-name date target-language original category GC1 1 1 GC2 1 1 GC3 1 GC4 1 1 GC5 1 GC6 GC7 1 GC8 1 1 1 1 1 GC9 1 1 GC10
321 GC11 GC12 1 GC13 GC14 GC15 1 GC16 1 GC17 1 1 GC18 GC19 1 GC20 1 Total 3 5 4 6 5
7.10.4. Not consulted metadata items
Participant contact-name date target-language original category GC1 1 1 GC2 1 1 GC3 1 GC4 1 GC5 1 GC6 1 1 1 1 GC7 1 GC8 1 GC9 1 1 GC10 1 1 1 1 1 GC11 GC12 1 1 1 1 GC13 1 1 1 1 GC14 1 1 1 GC15 1 1 GC16 1 GC17 1 1 GC18 GC19 1 GC20 1
322 Total 3 6 4 13 12
7.10.5. Could you explain in one sentence how has the metadata influenced your work?
Participant Explain in one sentence how has the metadata influenced your work GC1 I checked the validity of the version, reliability and Spanish variant by looking at the metadata. GC2 Not much, as I had some experience translating Microsoft products. But I "respected" most of the provided TM segments as I assumed they were "approved", so I do not tend to change "approved" segments unless I find major mistakes in them. GC3 If the contact name is "Antonio Garcia" as the official translator, the suggested translation is reliable. GC4 Name did not influence that much since there were translation from the official translator that I thought could have been improved. Language variants is important only if translation are very old (2003-2004 or so) GC5 Knowing that Antonio Garcia’s translation after 2008 was considered official it helped me maintain his style and choice of terminology even if I didn´t agree. GC6 I didn’t influence my work to a large extent, I only associated the translator with how they had translated the segments. GC7 When I saw that the contact name was Antonio García, I´d try to stick as much as possible to his translation. Otherwise, I´d try to give my best translation (having in mind the rest of the text). 2. If I had a target language other than ES- ES, I reviewed and though more my translation. Since this text was quite technical, the chances to find a localism were minimal (although the software could have several Spanish versions, so the names of the options, menus functions, etc. could differ). 3. I realised of the original field quite in the end. I think the last 2 fields can be very useful in a real work. GC8 It usually do not influence my translation, as whoever translated it if I believe I should change the translation I do it. May be, if I do really trust a translation/translator I do not check the terms. GC9 Only when I had a doubt with a word did I look at the metadata it[e]m "original", to see if it could help. GC10 Absolutely nothing. Didn´t find any use to them.
323 GC11 n/a GC12 Metadata were very useful GC13 To confirm choices GC14 To know if I can trust the proposed translation. GC15 I considered that the translations of the approved translator (Antonio García), were quite official since they correspond to an official translation. However, I did not take his translations for granted, and sometimes I changed some things. I also used other translations by Spanish (es-ES) translators and by other translators (es-MX, es-CO, ....). When I see other varieties of Spanish other than es-ES I do not only look for differences within the context, but also for stylistic differences. Therefore, I read them more carefully, but I also trust them if I find out that they are right. GC16 If I was not sure on how to write the sentence I consulted the metadata. GC17 I've found it really useful when deciding whether or not to accept the proposed translation. GC18 Approved translations conditioned my translation in a higher degree than not approved ones; target information allowed me to understand when I had the option to choose different terminology/syntax. GC19 It helped me to make decisions GC20 It helped me decide whether to use or not the already existing segment.
7.10.6. Would you suggest any other metadata item?
Participant MD suggestion GC1 No GC2 n/a GC3 No GC4 "Translation validated by excel" (having the translation of the official translator does not necessarily mean that this translation is the final translation available on the markets now, it could be a previous unreviewed version. GC5 No GC6 Yes, further details on the source of the translation, e.g. Project name, client name... GC7 The translation status: is it pending review or has already been reviewed by somebody else?
324 GC8 Preference/approval of the client/agency, mainly regarding terminology. GC9 Suggest the translation of the term in other languages known by the translator. GC10 Yes: value for the customer. GC11 n/a GC12 No GC13 n/a GC14 No GC15 I cannot think of any in this moment. GC16 n/a GC17 I do think it is enough with "language combination", "date" and "name of the translator". GC18 Not at this point GC19 n/a GC20 Antonio García is the "official" translator. This piece of information is critical, but is not a metadata item. I really do not care about the translator's name as long as I know he/she is trustworthy. The relevant information is not the name, but the degree of trustworthiness of the person who translated the segment. As a translator, I might have no idea who Antonio García is, which would render this metadata item useless. Of course I am aware of the difficulty of introducing a "trustworthiness degree" item.
7.10.7. If you could choose between having a translation memory with metadata or another one without metadata, which one would you prefer?
Two predefined answers were offered to the participants: Translation memory with metadata (represented by number 1 in the following table) and Translation memory without metadata (represented by number 2 in the following table).
Participant Which one would you prefer? GC1 1 GC2 1 GC3 1 GC4 1 GC5 1
325 GC6 1 GC7 1 GC8 1 GC9 1 GC10 2 GC11 2 GC12 1 GC13 1 GC14 1 GC15 1 GC16 1 GC17 1 GC18 1 GC19 1 GC20 1
7.10.8. How distracted were you by the metadata during the translation task?
A six point scale was offered to the participants, number 1 represented the lowest point of the scale (I was confused by the amount of data) and number 6 represented the highest point (I was not distracted at all, it only helped me).
Participant Distraction GC1 6 GC2 3 GC3 4 GC4 3 GC5 6 GC6 6 GC7 6 GC8 5 GC9 5 GC10 6 GC11 5
326 GC12 6 GC13 5 GC14 5 GC15 5 GC16 5 GC17 6 GC18 5 GC19 6 GC20 6
7.10.9. Did the metadata influence you to do a better job?
We offered two predefined answers to this question: “yes” which is represented in the following table with the number one, and “no” which is represented in the following table with the number two.
Participant Did the metadata influence you to do a better job? GC1 1 GC2 2 GC3 1 GC4 2 GC5 1 GC6 1 GC7 1 GC8 2 GC9 2 GC10 2 GC11 2 GC12 1 GC13 2 GC14 2 GC15 1 GC16 1 GC17 1 GC18 1
327 GC19 1 GC20 1
7.10.10. If yes, could you give us an example?
Participant Examples GC1 I preferred options by Antonio García, the official translator and made changes to the translations of other translators accordingly. GC2 n/a GC3 I had some doubts about the term "axis", but it has been translated as "eje" on translated segments by Antonio Garcia. GC4 I said no because even if metadata influenced me, I would still check the actual text and make as many changes as needed. I think only used one segment directly suggested by the programme GC5 As mentioned earlier, it helped me maintain consistency with official translation. GC6 I tried to stick to the official translation (the one provided by Antonio García) GC7 When I saw a Antonio Garcia’s segment, I didn´t have to worry about its accurateness. Also, when I saw that all of the translations were previous to 2011, I knew that I could update the word "sólo" to make it match the new RAE norm. GC8 n/a GC9 n/a GC10 n/a GC11 I think they haven’t changed during the whole translation. GC12 Less time spared. GC13 n/a GC14 n/a GC15 I have no concrete example, but when I know when the translation was done, in which variety of Spanish, by which translator, in which context,... It is easier to know if the translation applies to the text I am translating or not. I t may happen that the same sentence appears in tow text about different topics, if you know which text that translation comes from, you may quickly know if you can use it on your text or not. GC16 I was giving an incorrect order in Spanish. GC17 Knowing the name of the official translator, we provide a translation which is
328 consistent with the previous official ones. GC18 I wouldn’t consider it a better job, but it allowed me to be consistent with the terminology and style that have been previously approved, even though at times I would have chosen other solutions. GC19 I'm not quite sure it did, but having information about the category helped me contextualize the different sentences. GC20 I corrected a term base on the term that was used by the "official" translator.
7.10.11. Additional comments.
Participant Additional comments GC1 No comments. The server was bit too slow sometimes, maybe it was my connection. GC2 I think that for the experiment to work out better, one would need to have a "practice" period of at least a couple of texts before working on the experiment. I think the use of the CAT tool is very important to get a real feel of how both the tool and metadata can influence our work and time spent on our work. GC3 n/a GC4 I found this experiment interesting. A suggestion: it is sometimes more useful for a translator to have several matches suggested by the programme so that translator can choose depending on text quality and metadata. If only one hit is offered, translator might tend to just copy the suggested match and improve it or change it as appropriate. good work!! GC5 n/a GC6 The tool works very fast and is very useful. I like the number of options it offers. GC7 n/a GC8 n/a GC9 I would be interested in knowing the final conclusions of the experiment. Thank you GC10 Metadata can be very useful associated to previous instructions from the customer. E.g. "all TMUs with code XXX are considered extremely reliable" GC11 n/a GC12 n/a
329 GC13 I had an issue with the last sentence which I think it was wrong. Fewer number of cells would mean less time for Excel to calculate, not more. I made an alteration to the source language. In a real situation I would have consulted the client to ask for more information or to ascertain the accuracy of the text. I always work on my office and the fact that my family were in the house at the time of the experiment has affected my concentration which would not have been the case in normal circumstances. GC14 n/a GC15 When translating with Swordfish, I could not go from one segment to the next by using the arrows or the mouse. They would not open when clicking on them or trying to move from one to another. I had to go through the translation using ctrl+down arrow. At the end of the translation, I tried to go back to change the first segment, I could not open it again. I asked Lucía and we decided to stop the experiment there. i do not know what the exact problem was, but it made it difficult to change my translation after doing it for the first time. GC16 n/a GC17 n/a GC18 I believe having metadata is very useful. Thinking about larger projects where the amount of information available within a database is very wide, having information that pinpoints to the right decisions may make your job easier and limit the amount of unpreferred terminology used. However, I can also see that if no clear information is provided about metadata at the beginning of a project, if this is a lot, it can confuse translators rather than help them. This would be to a major extent when dealing with large TMs where a lot of information is provided, besides risking having to spend additional time reading the notes for each new entry. GC19 n/a GC20 n/a
330 7.11. Appendix G – Translation Evaluation. Results
In this appendix we present the information from the translation evaluation carried out by an external reviewer using the LISA QA Model. In the following table you can see a summary of the scores received by each of the participants:
Participant Quality GA1 -124 GA2 -56 GA3 -18 GA4 -19 GA5 -97 GA6 -124 GA7 -115 GB1 -44 GB2 -27 GB3 -39 GB4 -37 GB5 -85 GB6 -2 GC1 -12 GC10 -81 GC11 -23 GC12 -42 GC13 -109 GC14 -24 GC15 -40 GC16 -180 GC17 -16 GC18 -26 GC19 -3 GC2 -7 GC20 -12 GC3 -19
331 GC4 -87 GC5 -25 GC6 -3 GC7 -7 GC8 -47 GC9 -77
And in the following sections you can explore the evaluation sheets of each of the participants; the data is presented by groups and participants:
7.11.1. Group A
7.11.2. GA1
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 4 4 2 66
Category total 4 4 2 66 1.5 -64.50
Accuracy Omissions 0 1 3 8
Additions 0 0 2 2
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 1 5 10 1 -9.00
Terminology Glossary adherence 0 5 5 30
Context 0 1 2 7
Category total 0 6 7 37 1 -36.00
Language Grammar 0 0 1 1
Semantics 0 1 5 10
Punctuation 0 0 1 1
Spelling 0 0 3 3
332 Category total 0 1 10 15 2 -13.00
Style General style 0 0 0 0
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 0 0 2 2.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 1 0 5
Category total 0 1 0 5 1.5 -3.50
Section total 4 13 25 134 10 -56.00
Section points: -124.00
Section result: -1,240.00%
7.11.3. GA2
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 2 4 0 42
Category total 2 4 0 42 1.5 -40.50
Accuracy Omissions 0 0 0 0
Additions 0 0 0 0
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
333 Category total 0 0 0 0 1 1.00
Terminology Glossary adherence 0 1 3 8
Context 0 1 0 5
Category total 0 2 3 13 1 -12.00
Language Grammar 0 0 2 2
Semantics 0 0 1 1
Punctuation 0 0 0 0
Spelling 0 0 7 7
Category total 0 0 10 10 2 -8.00
Style General style 0 0 1 1
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 1 1 2 1.00
Country Country standards 0 0 0 0
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 0 0 1 1.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 2 6 14 66 10 -17.00
Section points: -56.00
Section result: -560.00%
7.11.4. GA3
Errors Found Error pts
334 Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 0 2 1 11
Category total 0 2 1 11 1.5 -9.50
Accuracy Omissions 0 0 2 2
Additions 0 0 1 1
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 3 3 1 -2.00
Terminology Glossary adherence 0 1 0 5
Context 0 0 0 0
Category total 0 1 0 5 1 -4.00
Language Grammar 0 0 1 1
Semantics 0 0 0 0
Punctuation 0 0 3 3
Spelling 0 0 2 2
Category total 0 0 6 6 2 -4.00
Style General style 0 0 2 2
Register/Tone 0 0 1 1
Language variants/slang 0 0 0 0
Category total 0 0 3 3 2 -1.00
Country Country standards 0 0 0 0
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 0 0 1 1.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
335 Section total 0 3 13 28 10 -10.00
Section points: -18.00
Section result: -180.00%
7.11.5. GA4
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 0 1 0 5
Category total 0 1 0 5 1.5 -3.50
Accuracy Omissions 0 0 0 0
Additions 0 0 0 0
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 0 0 1 1.00
Terminology Glossary adherence 0 1 1 6
Context 0 1 3 8
Category total 0 2 4 14 1 -13.00
Language Grammar 0 0 0 0
Semantics 0 0 1 1
Punctuation 0 0 1 1
Spelling 0 0 1 1
Category total 0 0 3 3 2 -1.00
Style General style 0 0 4 4
Register/Tone 0 0 1 1
Language variants/slang 0 0 0 0
Category total 0 0 5 5 2 -3.00
336 Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 1 1
Category total 0 0 1 1 1.5 0.50
Section total 0 3 14 29 10 -16.00
Section points: -19.00
Section result: -190.00%
7.11.6. GA5
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 1 1 0 16
Category total 1 1 0 16 1.5 -14.50
Accuracy Omissions 0 2 1 11
Additions 1 0 0 11
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 1 2 1 22 1 -21.00
Terminology Glossary adherence 0 1 1 6
Context 0 4 1 21
Category total 0 5 2 27 1 -26.00
Language Grammar 0 1 1 6
Semantics 0 0 2 2
337 Punctuation 0 0 3 3
Spelling 0 3 9 24
Category total 0 4 15 35 2 -33.00
Style General style 0 0 1 1
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 1 1 2 1.00
Country Country standards 0 0 0 0
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 0 0 1 1.00
Consistency Consistency 0 1 1 6
Category total 0 1 1 6 1.5 -4.50
Section total 2 13 20 107 10 -78.00
Section points: -97.00
Section result: -970.00%
7.11.7. GA6
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 8 1 1 94
Category total 8 1 1 94 1.5 -92.50
Accuracy Omissions 0 0 2 2
Additions 0 0 0 0
Cross-references 0 0 0 0
338 Headers/Footers 0 0 0 0
Category total 0 0 2 2 1 -1.00
Terminology Glossary adherence 0 3 2 17
Context 0 1 0 5
Category total 0 4 2 22 1 -21.00
Language Grammar 0 0 0 0
Semantics 0 1 2 7
Punctuation 0 0 0 0
Spelling 0 1 2 7
Category total 0 2 4 14 2 -12.00
Style General style 0 0 1 1
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 1 1 2 1.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 8 7 11 134 10 -33.00
Section points: -124.00
Section result: -1,240.00%
339 7.11.8. GA7
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 3 1 1 39
Category total 3 1 1 39 1.5 -37.50
Accuracy Omissions 0 0 2 2
Additions 0 1 2 7
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 1 4 9 1 -8.00
Terminology Glossary adherence 0 2 4 14
Context 0 0 4 4
Category total 0 2 8 18 1 -17.00
Language Grammar 0 2 0 10
Semantics 0 1 5 10
Punctuation 0 0 5 5
Spelling 0 0 1 1
Category total 0 3 11 26 2 -24.00
Style General style 0 0 3 3
Register/Tone 0 0 2 2
Language variants/slang 0 0 0 0
Category total 0 0 5 5 2 -3.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
340 Consistency Consistency 1 3 1 27
Category total 1 3 1 27 1.5 -25.50
Section total 4 10 31 125 10 -52.00
Section points: -115.00
Section result: -1,150.00%
341 7.11.9. Group B
7.11.9.1. GB1
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 1 1 0 16
Category total 1 1 0 16 1.5 -14.50
Accuracy Omissions 0 0 3 3
Additions 0 0 1 1
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 4 4 1 -3.00
Terminology Glossary adherence 0 1 7 12
Context 0 0 0 0
Category total 0 1 7 12 1 -11.00
Language Grammar 0 1 0 5
Semantics 0 2 0 10
Punctuation 0 0 0 0
Spelling 0 0 3 3
Category total 0 3 3 18 2 -16.00
Style General style 0 0 2 2
Register/Tone 0 0 1 1
Language variants/slang 0 0 0 0
Category total 0 0 3 3 2 -1.00
Country Country standards 0 0 0 0
Local suitability 0 0 0 0
Company standards 0 0 0 0
342 Category total 0 0 0 0 1 1.00
Consistency Consistency 0 0 1 1
Category total 0 0 1 1 1.5 0.50
Section total 1 5 18 54 10 -30.00
Section points: -44.00
Section result: -440.00%
7.11.9.2. GB2
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 0 1 1 6
Category total 0 1 1 6 1.5 -4.50
Accuracy Omissions 0 0 0 0
Additions 0 0 1 1
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 1 1 1 0.00
Terminology Glossary adherence 0 2 1 11
Context 0 0 0 0
Category total 0 2 1 11 1 -10.00
Language Grammar 0 0 1 1
Semantics 0 0 0 0
Punctuation 0 0 1 1
Spelling 0 0 3 3
Category total 0 0 5 5 2 -3.00
343 Style General style 0 0 0 0
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 0 0 2 2.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 1 0 2 13
Category total 1 0 2 13 1.5 -11.50
Section total 1 3 11 37 10 -11.00
Section points: -27.00
Section result: -270.00%
7.11.9.3. GB3
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 1 1 0 16
Category total 1 1 0 16 1.5 -14.50
Accuracy Omissions 0 0 2 2
Additions 0 0 0 0
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 2 2 1 -1.00
Terminology Glossary adherence 2 0 3 25
344 Context 0 0 0 0
Category total 2 0 3 25 1 -24.00
Language Grammar 0 0 1 1
Semantics 0 0 0 0
Punctuation 0 0 2 2
Spelling 0 0 1 1
Category total 0 0 4 4 2 -2.00
Style General style 0 0 1 1
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 1 1 2 1.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 3 1 11 49 10 -26.00
Section points: -39.00
Section result: -390.00%
7.11.9.4. GB4
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 2 1 0 27
345 Category total 2 1 0 27 1.5 -25.50
Accuracy Omissions 0 0 0 0
Additions 0 0 0 0
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 0 0 1 1.00
Terminology Glossary adherence 0 1 4 9
Context 0 1 1 6
Category total 0 2 5 15 1 -14.00
Language Grammar 0 0 0 0
Semantics 0 0 0 0
Punctuation 0 0 2 2
Spelling 0 0 2 2
Category total 0 0 4 4 2 -2.00
Style General style 0 0 0 0
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 0 0 2 2.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 2 3 10 47 10 -13.00
Section points: -37.00
346 Section result: -370.00%
7.11.9.5. GB5
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 2 3 0 37
Category total 2 3 0 37 1.5 -35.50
Accuracy Omissions 0 1 1 6
Additions 0 1 0 5
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 2 1 11 1 -10.00
Terminology Glossary adherence 0 1 2 7
Context 0 1 2 7
Category total 0 2 4 14 1 -13.00
Language Grammar 0 2 3 13
Semantics 0 1 0 5
Punctuation 0 0 1 1
Spelling 1 0 0 11
Category total 1 3 4 30 2 -28.00
Style General style 0 0 2 2
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 2 2 2 0.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
347 Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 3 10 12 95 10 -51.00
Section points: -85.00
Section result: -850.00%
7.11.9.6. GB6
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 0 1 0 5
Category total 0 1 0 5 1.5 -3.50
Accuracy Omissions 0 0 0 0
Additions 0 0 0 0
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 0 0 1 1.00
Terminology Glossary adherence 0 1 1 6
Context 0 0 0 0
Category total 0 1 1 6 1 -5.00
Language Grammar 0 0 0 0
Semantics 0 0 0 0
Punctuation 0 0 0 0
Spelling 0 0 0 0
348 Category total 0 0 0 0 2 2.00
Style General style 0 0 0 0
Register/Tone 0 0 1 1
Language variants/slang 0 0 0 0
Category total 0 0 1 1 2 1.00
Country Country standards 0 0 0 0
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 0 0 1 1.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 0 2 2 12 10 0.00
Section points: -2.00
Section result: -20.00%
349 7.11.10. Group C
7.11.10.1. GC1
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 0 0 2 2
Category total 0 0 2 2 1.5 -0.50
Accuracy Omissions 0 0 0 0
Additions 0 0 0 0
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 0 0 1 1.00
Terminology Glossary adherence 0 1 2 7
Context 0 0 0 0
Category total 0 1 2 7 1 -6.00
Language Grammar 0 1 0 5
Semantics 0 0 0 0
Punctuation 0 0 2 2
Spelling 0 0 2 2
Category total 0 1 4 9 2 -7.00
Style General style 0 0 2 2
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 2 2 2 0.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
350 Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 1 1
Category total 0 0 1 1 1.5 0.50
Section total 0 2 12 22 10 -12.00
Section points: -12.00
Section result: -120.00%
7.11.10.2. GC2
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 0 1 0 5
Category total 0 1 0 5 1.5 -3.50
Accuracy Omissions 0 0 0 0
Additions 0 0 0 0
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 0 0 1 1.00
Terminology Glossary adherence 0 0 0 0
Context 0 1 0 5
Category total 0 1 0 5 1 -4.00
Language Grammar 0 0 0 0
Semantics 0 0 0 0
Punctuation 0 0 0 0
Spelling 0 0 0 0
Category total 0 0 0 0 2 2.00
351 Style General style 0 0 1 1
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 1 1 2 1.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 1 0 5
Category total 0 1 0 5 1.5 -3.50
Section total 0 3 2 17 10 0.00
Section points: -7.00
Section result: -70.00%
7.11.10.3. GC3
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 0 1 0 5
Category total 0 1 0 5 1.5 -3.50
Accuracy Omissions 0 0 0 0
Additions 0 1 2 7
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 1 2 7 1 -6.00
Terminology Glossary adherence 0 2 0 10
352 Context 0 0 0 0
Category total 0 2 0 10 1 -9.00
Language Grammar 0 0 1 1
Semantics 0 0 0 0
Punctuation 0 0 1 1
Spelling 0 0 0 0
Category total 0 0 2 2 2 0.00
Style General style 0 0 3 3
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 3 3 2 -1.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 1 1
Category total 0 0 1 1 1.5 0.50
Section total 0 4 9 29 10 -16.00
Section points: -19.00
Section result: -190.00%
7.11.10.4. GC4
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 4 2 2 56
353 Category total 4 2 2 56 1.5 -54.50
Accuracy Omissions 0 0 0 0
Additions 0 0 3 3
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 3 3 1 -2.00
Terminology Glossary adherence 0 2 0 10
Context 0 0 1 1
Category total 0 2 1 11 1 -10.00
Language Grammar 0 1 0 5
Semantics 0 0 2 2
Punctuation 0 0 0 0
Spelling 0 0 2 2
Category total 0 1 4 9 2 -7.00
Style General style 0 0 1 1
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 1 1 2 1.00
Country Country standards 0 0 0 0
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 0 0 1 1.00
Consistency Consistency 1 1 1 17
Category total 1 1 1 17 1.5 -15.50
Section total 5 6 12 97 10 -17.00
Section points: -87.00
354 Section result: -870.00%
7.11.10.5. GC5
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 2 0 0 22
Category total 2 0 0 22 1.5 -20.50
Accuracy Omissions 0 0 0 0
Additions 0 0 0 0
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 0 0 1 1.00
Terminology Glossary adherence 0 1 0 5
Context 0 1 1 6
Category total 0 2 1 11 1 -10.00
Language Grammar 0 0 0 0
Semantics 0 0 1 1
Punctuation 0 0 0 0
Spelling 0 0 0 0
Category total 0 0 1 1 2 1.00
Style General style 0 0 0 0
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 0 0 2 2.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
355 Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 2 2 3 35 10 -6.00
Section points: -25.00
Section result: -250.00%
7.11.10.6. GC6
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 0 1 0 5
Category total 0 1 0 5 1.5 -3.50
Accuracy Omissions 0 0 0 0
Additions 0 0 0 0
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 0 0 1 1.00
Terminology Glossary adherence 0 1 1 6
Context 0 0 0 0
Category total 0 1 1 6 1 -5.00
Language Grammar 0 0 0 0
Semantics 0 0 1 1
Punctuation 0 0 0 0
356 Spelling 0 0 0 0
Category total 0 0 1 1 2 1.00
Style General style 0 0 0 0
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 0 0 2 2.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 0 2 3 13 10 -1.00
Section points: -3.00
Section result: -30.00%
7.11.10.7. GC7
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 0 1 0 5
Category total 0 1 0 5 1.5 -3.50
Accuracy Omissions 0 0 1 1
Additions 0 0 0 0
Cross-references 0 0 0 0
357 Headers/Footers 0 0 0 0
Category total 0 0 1 1 1 0.00
Terminology Glossary adherence 0 1 1 6
Context 0 0 0 0
Category total 0 1 1 6 1 -5.00
Language Grammar 0 0 1 1
Semantics 0 0 0 0
Punctuation 0 0 2 2
Spelling 0 0 1 1
Category total 0 0 4 4 2 -2.00
Style General style 0 0 1 1
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 1 1 2 1.00
Country Country standards 0 0 0 0
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 0 0 1 1.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 0 2 7 17 10 -5.00
Section points: -7.00
Section result: -70.00%
358 7.11.10.8. GC8
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 2 1 0 27
Category total 2 1 0 27 1.5 -25.50
Accuracy Omissions 0 0 0 0
Additions 0 1 1 6
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 1 1 6 1 -5.00
Terminology Glossary adherence 0 2 1 11
Context 0 0 0 0
Category total 0 2 1 11 1 -10.00
Language Grammar 0 0 1 1
Semantics 0 1 0 5
Punctuation 0 0 0 0
Spelling 0 0 0 0
Category total 0 1 1 6 2 -4.00
Style General style 0 0 3 3
Register/Tone 0 0 1 1
Language variants/slang 0 0 0 0
Category total 0 0 4 4 2 -2.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
359 Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 2 2
Category total 0 0 2 2 1.5 -0.50
Section total 2 5 10 57 10 -21.00
Section points: -47.00
Section result: -470.00%
7.11.10.9. GC9
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 6 0 0 66
Category total 6 0 0 66 1.5 -64.50
Accuracy Omissions 0 2 0 10
Additions 0 0 1 1
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 2 1 11 1 -10.00
Terminology Glossary adherence 0 1 1 6
Context 0 0 0 0
Category total 0 1 1 6 1 -5.00
Language Grammar 0 0 0 0
Semantics 0 0 0 0
Punctuation 0 0 1 1
Spelling 0 0 1 1
Category total 0 0 2 2 2 0.00
360 Style General style 0 0 1 1
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 1 1 2 1.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 6 3 6 87 10 -14.00
Section points: -77.00
Section result: -770.00%
7.11.10.10. GC10
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 3 1 0 38
Category total 3 1 0 38 1.5 -36.50
Accuracy Omissions 0 0 0 0
Additions 0 0 1 1
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 1 1 1 0.00
Terminology Glossary adherence 0 2 3 13
361 Context 0 0 2 2
Category total 0 2 5 15 1 -14.00
Language Grammar 0 1 0 5
Semantics 1 2 2 23
Punctuation 0 0 1 1
Spelling 0 1 2 7
Category total 1 4 5 36 2 -34.00
Style General style 0 0 0 0
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 0 0 2 2.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 4 7 12 91 10 -46.00
Section points: -81.00
Section result: -810.00%
7.11.10.11. GC11
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 2 0 0 22
362 Category total 2 0 0 22 1.5 -20.50
Accuracy Omissions 0 0 0 0
Additions 0 0 0 0
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 0 0 1 1.00
Terminology Glossary adherence 0 1 1 6
Context 0 0 0 0
Category total 0 1 1 6 1 -5.00
Language Grammar 0 0 1 1
Semantics 0 0 0 0
Punctuation 0 0 1 1
Spelling 0 0 2 2
Category total 0 0 4 4 2 -2.00
Style General style 0 0 0 0
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 0 0 2 2.00
Country Country standards 0 0 0 0
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 0 0 1 1.00
Consistency Consistency 0 0 1 1
Category total 0 0 1 1 1.5 0.50
Section total 2 1 6 33 10 -3.00
Section points: -23.00
363 Section result: -230.00%
7.11.10.12. GC12
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 3 0 1 34
Category total 3 0 1 34 1.5 -32.50
Accuracy Omissions 0 0 0 0
Additions 0 0 0 0
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 0 0 1 1.00
Terminology Glossary adherence 0 1 3 8
Context 0 0 3 3
Category total 0 1 6 11 1 -10.00
Language Grammar 0 0 0 0
Semantics 0 1 0 5
Punctuation 0 0 0 0
Spelling 0 0 0 0
Category total 0 1 0 5 2 -3.00
Style General style 0 0 1 1
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 1 1 2 1.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
364 Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 3 2 9 52 10 -11.00
Section points: -42.00
Section result: -420.00%
7.11.10.13. GC13
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 4 2 0 54
Category total 4 2 0 54 1.5 -52.50
Accuracy Omissions 0 0 4 4
Additions 1 0 1 12
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 1 0 5 16 1 -15.00
Terminology Glossary adherence 0 1 3 8
Context 0 2 2 12
Category total 0 3 5 20 1 -19.00
Language Grammar 0 0 1 1
Semantics 0 0 0 0
Punctuation 0 1 1 6
Spelling 0 1 9 14
365 Category total 0 2 11 21 2 -19.00
Style General style 0 0 2 2
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 2 2 2 0.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 1 0 5
Category total 0 1 0 5 1.5 -3.50
Section total 5 8 24 119 10 -53.00
Section points: -109.00
Section result: -1,090.00%
7.11.10.14. GC14
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 2 0 0 22
Category total 2 0 0 22 1.5 -20.50
Accuracy Omissions 0 0 0 0
Additions 0 0 1 1
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 1 1 1 0.00
366 Terminology Glossary adherence 0 1 2 7
Context 0 0 1 1
Category total 0 1 3 8 1 -7.00
Language Grammar 0 0 0 0
Semantics 0 0 0 0
Punctuation 0 0 1 1
Spelling 0 0 1 1
Category total 0 0 2 2 2 0.00
Style General style 0 0 0 0
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 0 0 2 2.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 2 1 7 34 10 -5.00
Section points: -24.00
Section result: -240.00%
7.11.10.15. GC15
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
367 Mistranslation Mistranslation 3 1 0 38
Category total 3 1 0 38 1.5 -36.50
Accuracy Omissions 0 0 0 0
Additions 0 0 1 1
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 1 1 1 0.00
Terminology Glossary adherence 0 1 2 7
Context 0 0 0 0
Category total 0 1 2 7 1 -6.00
Language Grammar 0 0 0 0
Semantics 0 0 1 1
Punctuation 0 0 1 1
Spelling 0 0 0 0
Category total 0 0 2 2 2 0.00
Style General style 0 0 0 0
Register/Tone 0 0 1 1
Language variants/slang 0 0 0 0
Category total 0 0 1 1 2 1.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 3 2 7 50 10 -5.00
368 Section points: -40.00
Section result: -400.00%
7.11.10.16. GC16
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 4 2 0 54
Category total 4 2 0 54 1.5 -52.50
Accuracy Omissions 1 1 0 16
Additions 0 0 0 0
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 1 1 0 16 1 -15.00
Terminology Glossary adherence 1 4 4 35
Context 0 1 4 9
Category total 1 5 8 44 1 -43.00
Language Grammar 0 2 1 11
Semantics 0 0 2 2
Punctuation 0 0 2 2
Spelling 0 0 2 2
Category total 0 2 7 17 2 -15.00
Style General style 0 0 1 1
Register/Tone 0 0 34 34
Language variants/slang 0 1 0 5
Category total 0 1 35 40 2 -38.00
Country Country standards 0 0 1 1
369 Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 1 1 2 18
Category total 1 1 2 18 1.5 -16.50
Section total 7 12 53 190 10 -111.00
Section points: -180.00
Section result: -1,800.00%
7.11.10.17. GC17
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 0 3 0 15
Category total 0 3 0 15 1.5 -13.50
Accuracy Omissions 0 0 0 0
Additions 0 0 2 2
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 2 2 1 -1.00
Terminology Glossary adherence 0 1 3 8
Context 0 0 0 0
Category total 0 1 3 8 1 -7.00
Language Grammar 0 0 0 0
Semantics 0 0 0 0
Punctuation 0 0 0 0
370 Spelling 0 0 0 0
Category total 0 0 0 0 2 2.00
Style General style 0 0 0 0
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 0 0 2 2.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 0 4 6 26 10 -4.00
Section points: -16.00
Section result: -160.00%
7.11.10.18. GC18
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 1 2 1 22
Category total 1 2 1 22 1.5 -20.50
Accuracy Omissions 0 0 0 0
Additions 0 0 1 1
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
371 Category total 0 0 1 1 1 0.00
Terminology Glossary adherence 0 1 1 6
Context 0 0 0 0
Category total 0 1 1 6 1 -5.00
Language Grammar 0 0 2 2
Semantics 0 0 0 0
Punctuation 0 0 1 1
Spelling 0 0 3 3
Category total 0 0 6 6 2 -4.00
Style General style 0 0 0 0
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 0 0 2 2.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Section total 1 3 10 36 10 -7.00
Section points: -26.00
Section result: -260.00%
7.11.10.19. GC19
Errors Found Error pts
372 Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 0 0 0 0
Category total 0 0 0 0 1.5 1.50
Accuracy Omissions 0 0 1 1
Additions 0 0 1 1
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 2 2 1 -1.00
Terminology Glossary adherence 0 1 0 5
Context 0 0 0 0
Category total 0 1 0 5 1 -4.00
Language Grammar 0 1 0 5
Semantics 0 0 0 0
Punctuation 0 0 0 0
Spelling 0 0 0 0
Category total 0 1 0 5 2 -3.00
Style General style 0 0 0 0
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 0 0 2 2.00
Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 0 0
Category total 0 0 0 0 1.5 1.50
373 Section total 0 2 3 13 10 -6.00
Section points: -3.00
Section result: -30.00%
7.11.10.20. GC20
Errors Found Error pts
Error category Error type A B C Type tot allowed Cat. res.
Mistranslation Mistranslation 0 1 0 5
Category total 0 1 0 5 1.5 -3.50
Accuracy Omissions 0 0 1 1
Additions 0 0 1 1
Cross-references 0 0 0 0
Headers/Footers 0 0 0 0
Category total 0 0 2 2 1 -1.00
Terminology Glossary adherence 0 1 3 8
Context 0 0 0 0
Category total 0 1 3 8 1 -7.00
Language Grammar 0 0 1 1
Semantics 0 0 1 1
Punctuation 0 0 0 0
Spelling 0 0 3 3
Category total 0 0 5 5 2 -3.00
Style General style 0 0 0 0
Register/Tone 0 0 0 0
Language variants/slang 0 0 0 0
Category total 0 0 0 0 2 2.00
374 Country Country standards 0 0 1 1
Local suitability 0 0 0 0
Company standards 0 0 0 0
Category total 0 0 1 1 1 0.00
Consistency Consistency 0 0 1 1
Category total 0 0 1 1 1.5 0.50
Section total 0 2 12 22 10 -9.00
Section points: -12.00
Section result: -120.00%
375 7.12. Appendix H – Keylog Information
We present in this appendix the keylog data obtained from the XML file. In the following table, “Keys” includes the number of pressed keys that are numbers and letters, “Single Vkeys” includes the number of pressed keys that are other than numbers and letters, “Multiple Vkeys” includes the number of a combination of two or more keys pressed together, “Total Vkeys” includes the sum of Single and Multiple Vkeys, “Total Keys” includes a sum of “Keys” and “Total Vkeys”.
Participant Keys Multiple Single Total Vkeys Total Vkeys Vkeys Keys GA1 3851 397 1667 2064 5915 GA2 3464 357 776 1133 4597 GA3 4123 377 1277 1654 5777 GA4 3894 0 1262 1262 5156 GA5 3522 78 993 1071 4593 GA6 3870 4 1581 1585 5455 GA7 4341 8 1523 1531 5872 GB1 787 242 1653 1896 2683 GB2 765 180 1275 1455 2220 GB3 GB4 3835 14 1128 1142 4977 GB5 GB6 751 219 1815 2034 2785 GC1 1646 553 3606 4159 5805 GC10 1997 58 657 715 2712 GC11 1397 44 4704 4748 6145 GC12 653 12 473 485 1138 GC13 2748 559 842 1401 4149 GC14 819 115 701 816 1635 GC15 1242 901 394 1295 2537 GC16 3251 71 1501 1578 4829 GC17 1273 69 401 470 1743 GC18 3680 9 1297 1306 4986 GC19 789 98 588 686 1475 GC2 561 105 175 280 841 376 GC20 2084 90 694 784 2868 GC3 1049 114 433 547 1596 GC4 2051 139 1191 1330 3381 GC5 750 784 275 1059 1809 GC6 825 270 808 1078 1903 GC7 1350 641 438 1079 2429 GC8 2072 1095 602 1697 3769 GC9 506 58 461 519 1025
377 7.13. Appendix I – Video Observations
In this appendix we present the time values observed from the video files. We present all the participants data in a table. The table has the following columns:
“Open Swordfish”, indicates the moment when the participant opens the program Swordfish II. “2 Segment”, indicates the moment when the participant opens segment 2. “35 Segment”, indicates the movement when the participant finishes segment 35 and opens segment 36. “Close Swordfish” indicates the moment when the participant closes Swordfish59. “Total time” is time from the moment of opening Swordfish to its closing. “2-35 Time” indicates the time that goes from opening segment 2 to opening segment 36. “Time Spent on 1st Segment” indicates the time that goes from opening Swordfish until the opening of segment 2. “Time Spent on Last segment” indicates the time that goes from opening segment 36 to closing the program.
Spent Spent on 35 Time - Participant Open Swordfish 2 Segment 35 Segment Close Swordfish Total Time 2 Time 1st Segment Time Spent on Last Segment GA1 00:01:01 00:02:50 00:51:57 00:54:44 00:53:43 00:49:07 00:01:49 00:02:47 GA2 00:01:18 00:08:04 00:56:05 00:59:06 00:57:48 00:48:01 00:06:46 00:03:01 GA3 00:00:36 00:03:04 00:40:48 01:00:05 00:59:29 00:37:44 00:02:28 00:19:17 GA4 00:01:13 00:07:22 00:55:33 01:00:06 00:58:53 00:48:11 00:06:09 00:04:33 GA5 00:02:12 00:10:19 00:59:02 01:02:35 01:00:23 00:48:43 00:08:07 00:03:33 GA6 00:00:07 00:02:01 00:26:17 00:26:49 00:26:42 00:24:16 00:01:54 00:00:32 GB1 00:01:32 00:09:14 00:47:59 00:50:18 00:48:46 00:38:45 00:07:42 00:02:19 GB2 00:01:24 00:05:04 00:34:27 00:38:01 00:36:37 00:29:23 00:03:40 00:03:34 GB3 00:02:07 00:10:30 01:17:20 01:21:19 01:19:12 01:06:50 00:08:23 00:03:59 GB4 00:00:14 00:00:14 00:50:22 01:28:43 01:28:29 00:50:08 00:00:00 00:38:21 GB5 00:00:23 00:02:45 00:25:25 00:32:30 00:32:07 00:22:40 00:02:22 00:07:05 GB6 00:04:50 00:08:51 00:36:59 00:38:08 00:33:18 00:28:08 00:04:01 00:01:09 GC1 00:02:22 00:07:53 01:12:14 01:17:53 01:15:31 01:04:21 00:05:31 00:05:39 GC10 00:01:46 00:05:02 01:11:16 01:23:36 01:21:50 01:06:14 00:03:16 00:12:20
59 In some cases, participants did not close the program, we took in the case the reference point of minimizing the application. 378 GC11 00:03:39 00:23:55 02:33:30 02:41:11 02:37:32 02:09:35 00:20:16 00:07:41 GC12 00:02:24 00:05:53 00:42:49 00:48:14 00:45:50 00:36:56 00:03:29 00:05:25 GC13 00:02:01 00:08:37 00:44:11 00:52:39 00:50:38 00:35:34 00:06:36 00:08:28 GC14 00:02:11 00:08:07 01:08:31 01:14:43 01:12:32 01:00:24 00:05:56 00:06:12 GC15 00:02:05 00:07:36 00:45:18 00:54:23 00:52:18 00:37:42 00:05:31 00:09:05 GC16 00:01:48 00:10:10 01:12:25 01:15:29 01:13:41 01:02:15 00:08:22 00:03:04 GC17 00:01:36 00:06:05 01:18:51 01:20:49 01:19:13 01:12:46 00:04:29 00:01:58 GC18 00:02:36 00:07:26 00:34:30 00:35:31 00:32:55 00:27:04 00:04:50 00:01:01 GC19 00:01:36 00:11:21 01:45:22 01:49:17 01:47:41 01:34:01 00:09:45 00:03:55 GC2 00:03:19 00:07:46 00:50:49 00:56:52 00:53:33 00:43:03 00:04:27 00:06:03 GC3 00:04:47 00:07:59 00:36:17 00:38:48 00:34:01 00:28:18 00:03:12 00:02:31 GC4 00:02:43 00:05:25 00:32:12 00:37:08 00:34:25 00:26:47 00:02:42 00:04:56 GC5 00:01:29 00:05:41 01:53:43 01:56:29 01:55:00 01:48:02 00:04:12 00:02:46 GC6 00:01:46 00:03:42 00:52:08 00:55:13 00:53:27 00:48:26 00:01:56 00:03:05 GC7 00:02:03 00:09:15 01:12:53 01:22:16 01:20:13 01:03:38 00:07:12 00:09:23 GC8 00:00:54 00:06:55 00:42:17 00:43:24 00:42:30 00:35:22 00:06:01 00:01:07 GC9 00:01:45 00:04:16 00:42:55 00:43:56 00:42:11 00:38:39 00:02:31 00:01:01
379 7.14. Appendix J – Translation Text
7.14.1. Group A
380
381
382
384
385 7.14.2. Group B
388
389
393
394 7.14.3. Group C
397
398
399
400
402
404
405
406
407 7.15. Appendix K – Publications, conference presentations and other research related activities
7.15.1. Publications:
Morado Vázquez, L., Torres del Rey, J., (2012, forthcoming), The relevance of metadata during the localisation process – an experiment. T3L: Tradumática, Translation Technologies and Localization.The coming of age of Translation Technologies in Translation Studies, Autonomous University of Barcelona.
Morado Vázquez, L., Filip, D. (2012) XLIFF Support in CAT Tools, XLIFF Promotion and Liaison Subcommittee, OASIS.
Morado Vázquez, L. and Wolff, F. (2011) Bringing industry standards to Open Source localisers: a case study of Virtaal. In Tradumàtica, vol. 9, Programari lliure i traducció, pp. 74-83. Available at http://revistes.uab.cat/tradumatica/article/view/4/pdf.
Aouad, L., O’Keeffe I.R., Collins J.J., Wasala A., Nishio N., Morera A., Morado L., Ryan L., Gupta R., Schaler R.. (2011) A View of Future Technologies and Challenges for the Automation of Localisation Processes: Visions and Scenarios. In Convergence and Hybrid Information Technology, 5th International Conference, ICHIT 2011, Daejeon, Korea, September 22-24, 2011 Proceedings. Communications in Computer and Information Science, vol. 206, pp. 371-382, Springer.
Wolff, F. (2011) La localización al servicio de un cambio sostenible, translated by Morado Vázquez, L. and Rodríguez Vázquez, S., Pretoria:Translate.org.za.
Morado Vázquez, L. and Lieske, C. (2010) First XLIFF Symposium. In MultiLingual, December Issue, p. 8.
Anastasiou, D. and Morado Vázquez, L. (2010) Localisation Standards and Metadata. In Metadata and Semantic Research, 4th International Conference, MTSR 2010 Proceedings. Communications in Computer and Information Science, vol. 108, pp. 255- 276, Springer.
Anastasiou, D. and Morado Vázquez, L. (Eds.) (2010) The 1st XLIFF International Symposium Proceedings. Localisation Research Centre, CSIS, University of Limerick.
408 Morado Vázquez, L. and Mooney, S. (2010) XLIFF Phoenix and LMC Builder: Organising, capturing and using localisation data and metadata. In The Annual Conference Proceedings of the Localisation Research Centre, LRC XV Brave New World, pp. 16-17, CSIS, University of Limerick.
7.15.2. Conference presentations:
Morado Vázquez, L., Torres del Rey, J., Translation suggestions in XLIFF. How much metadata should be included? In 2nd International XLIFF Symposium. 28 September 2011. Warsaw (Poland).
Reynolds, P., Filip, D., Coady, S., Morado Vázquez, L., XLIFF 2.0 - What do we want? What do we need? In 2nd International XLIFF Symposium. 28 September 2011. Warsaw (Poland).
Morado Vázquez, L., Torres del Rey, J., The relevance of metadata during the localisation process – an experiment. In Internacional T3L Conference: Tradumatica, Translation Technologies & Localization. 21st-22 June 2011. Universitat Autònoma de Barcelona (Spain). Available at http://bit.ly/pcOS8r (min. 25)
Morado Vázquez, L., Defining localisation and its relations with translation studies. In II International Symposium for Young Researchers. 20 June 2011. Universitat Autònoma de Barcelona (Spain).
Morado Vázquez, L., Anastasiou, D., Exton, C., O'Keeffe, I. Web 2.0 and Localisation. In First International Workshop on Social Media Engagement (SOME 2011), 29 March 2011, Hyderabad (India). Full paper available at http://bit.ly/hnvdxk.
O Conchuir, E., Morado L., Wasala A., Morera A., Ryan L., Gupta R., Solas - the Service Oriented Architecture Solution. In Action Week for Global Information Sharing 2010, 6-7 December 2010, New Delhi (India).
Anastasiou, D. and Morado Vázquez, L., Localisation Standards and Metadata. 4th Metadata and Semantics Research Conference (MTSR 2010), 20-22 October 2010, Alcalá de Henares (Spain).
Morado Vázquez, L. and Mooney, S., XLIFF Phoenix and LMC Builder: Organising, capturing and using localisation data and metadata. In LRC XV Brave New World,
409 The Future of Localisation Services, 22-24 September 2010, Limerick (Ireland). Abstract available at http://bit.ly/aWTVWd.
Morado Vázquez, L., Anastasiou, D. and Exton, C., Localisation into Galician: another form of cultural resistance. In III International Symposium - 'The Time Has Come: The Future Of Interdisciplinary Studies In Galiza And Eire', 22- 23 January 2010. Cork (Ireland).
Morado Vázquez, L., Anastasiou, D. and Exton, C., Web 2.0: The great opportunity for Galician Language. In IX Congreso Internacional da Asociación Internacional de Estudos Galegos, 13-17 July 2009, Santiago de Compostela, Vigo and A Coruña (Spain). Available at https://uvigo.tv/gl/serial/583.html
7.15.3. Life Demos and Posters
Morera Mesa, A., Ryan, L., Nishio, N., Morado Vázquez, L., Wasala, A., Solas: A Localisation Platform, in CNGL public Localisation Innovation Showcase, 10 November 2010, Microsoft European Development Centre, Dublin (Ireland).
Morado Vázquez, L., XLIFF Phoenix. Live demo presentation in CNGL public Localisation Innovation Showcase, 10 November 2010, Microsoft European Development Centre, Dublin (Ireland).
Morado Vázquez, L., Mooney, S., XLIFF Phoenix and LMC Builder: Organising, capturing and using localisation data and metadata. Poster in LRC XV Brave New World, The Future of Localisation Services, 22-24 September 2010, Limerick (Ireland). Available in page 43 at http://bit.ly/9w3wob.
Morado Vázquez, L., Anastasiou, D., Exton, C., XLIFF Interoperability Challenges. Poster in CNGL Scientific Committee Meeting, 27-29 April 2010, Limerick (Ireland). Available at http://bit.ly/bFt8Zt.
Morado Vázquez, L., Torres del Rey, J., Collantes Fraile, C., Translation in the cloud, localisation over the rainbow. Poster in LRC XIV Localisation in the Cloud, 24-25 September 2009, Limerick (Ireland).
410 Collantes Fraile, C., Torres del Rey, J., Morado Vázquez, L., Powered by cloud localisation: disempowering the translator to empower the user?. Poster in LRC XIV Localisation in the Cloud, 24-25 September 2009, Limerick (Ireland).
411 7.16. Appendix L – Call for participation
412 7.17. Appendix M – Participant Information Sheet
Participant Information Sheet
‘Influence of metadata in the localisation process’
INVITATION TO TAKE PART IN A RESEARCH STUDY
We are conducting an experiment to try to elucidate the influence of metadata in the localisation process. In this study we ask you to complete a translation task and two small questionnaires related to it.
PURPOSE OF THE RESEARCH STUDY
The purpose of this study is to better understand the influence of metadata in the localisation process and how it affects the decision making process of translators.
TIME COMMITMENT
The study will take approximately 2 hours to complete in total.
TERMINATION OF PARTICIPATION
You may decide to withdraw from this research at any time, there are no penalties for withdrawing from the study.
413 VOLUNTARY PARTICIPATION
Participation in this study is voluntary.
BENEFITS
This study may make you more aware of how metadata influences your work as a translator and help the researcher better understand how that metadata will be need to be presented or not in future tools developments. The participant will also benefit from a free webinar of Swordfish III (a Computer-Assisted Translation tool) and three full licences of that programme will be raffled between the participants.
RISKS
There are no known risks associated with this study. If you should at any time find yourself upset or uncomfortable, you may quit at any time.
CONFIDENTIALITY/ANONYMITY
· We will not collect any identifying information, except your e-mail account and name, from you but we will ask you to report your demographic information such as, your age and gender. Your name and e-mail will only be used for initial identification in relation to this study and no other, under no circumstances these data would be published or revealed.
· The data we collect may be published in scholarly journals but no individual participant or their data will be identified.
414 FOR FUTHER INFORMATION ABOUT THIS RESEARCH STUDY: contact Lucía Morado Vázquez, PhD student on [email protected] or +353 61 2347 97.
415