Limitations of Localisation Data Exchange Standards and Their Effect on Interoperability

Lossless Exchange of Data: Limitations of Localisation Data Exchange Standards and their Effect on Interoperability A Thesis Submitted for the Degree of Doctor of Philosophy By Ruwan Asanka Wasala Department of Computer Science and Information Systems, University of Limerick Supervisors: Reinhard Schäler, Dr. Chris Exton Submitted to the University of Limerick, May, 2013 Abstract Localisation is a relatively new field, one that is constantly changing with the latest technology to offer services that are cheaper, faster and of higher quality. As the complexity of localisation projects increases, interoperability between different localisation tools and technologies becomes ever more important. However, lack of interoperability is a significant problem in localisation. One important aspect of interoperability in localisation is the lossless exchange of data between different technologies and different stages of the overall process. Standards play a key role in enabling interoperability. Several standards exist that address data-exchange interoperability. The XML Localisation Interchange File Format (XLIFF) has been developed by a Technical Committee of Organization for the Advancement of Structured Information Standards (OASIS) and is an important standard for enabling interoperability in the localisation domain. It aims to enable the lossless exchange of localisation data and metadata between different technologies. With increased adoption, XLIFF is maturing. Solutions to significant issues relating to the current version of the XLIFF standard and its adoption are being proposed in the context of the development of XLIFF version 2. Important matters for a successful adoption of the standard, such as standard-compliance, conformance and interoperability of technologies claiming to support the standard have not yet been adequately addressed. In this research, we aim to fill this gap by focusing on the identification of limitations of the XLIFF standard and the implementations which are leading to interoperability issues. First, we conducted a pilot study to gain an in-depth understanding of the features of the localisation data-exchange standards. The pilot study involved a systematic comparison of XLIFF and Localisation Content Exchange, Microsoft’s internal localisation data- exchange format. Having gained a better understanding of the features of localisation data-exchange standards, in our main experiment we focused on identifying the limitations of XLIFF and its implementations. For this purpose we have constructed the first large corpus of XLIFF files that has enabled us to perform various statistical i analyses on the real usage of features of the XLIFF specification. From this research, we could not only identify the limitations that are leading to interoperability issues, but also important features of the standard for successful syntactic and semantic interoperability. In parallel to our main experiment, we designed and created a prototype based on service oriented architecture to investigate interoperability issues among distributed localisation tools and technologies used in the localisation process. As the main contribution of our thesis, we propose a framework called XML - Data Interoperability Usage Evaluation (XML-DIUE), a systematic approach based on our main experiment, to the identification of limitations of data-exchange standards and implementations. The framework accomplishes this by helping the standard developers to empirically identify the most important elements, the less important elements, the problematic areas, usage patterns, associated tools, and so on. The proposed framework provides a practical approach for improving interoperability of data-exchange formats. ii Declaration I hereby declare that this thesis is my own work and that it has not been submitted previously to this for the award of any other degree or academic award at any other institute. Much of this research has been published as journal articles, conference proceedings and presentations. A full list of publications can be consulted in Appendix I. Some of the appendices (Appendix B, Appendix C, and Appendix E) could not be made available to the public due to the requirements of confidentiality of content provided by Microsoft. Signature :_____________________________ Ruwan Asanka Wasala Date: :_____________________________ iii Acknowledgements The study was carried out under the supervision of Mr. Reinhard Schäler and Dr. Chris Exton of the University of Limerick, Ireland. I would also like to thank my previous supervisors Dr. Ian O'Keeffe and Dr. Liam Murray for their support. I owe my deepest gratitude to my supervisors for their invaluable guidance, time, advice and kindness. I consider myself very fortunate for being able to work under very supportive eminent scholars like them. Thank you to Microsoft for the generous scholarship that has allowed me to undertake my Masters research. In particular, many thanks to Mr. Dag Schmidtke for arranging the scholarship, helping me to gain access to Microsoft localisation resources and for guiding me throughout the research. I would also like to take this opportunity to thank the Microsoft staff who took time out of their busy schedules to help me with LCX programming and answering my questions. We would also like to thank the CNGL industrial partner organisations that contributed to the XLIFF corpus. Thanks to Dr. Jim Buckley and Dr. David Filip for all their guidance, suggestions and feedback, especially for our main experiment. I am further indebted to Geraldine Exton, wife of Dr. Chris Exton for proofreading the final version of the dissertation. Enormous thank you to Mr. Karl Kelly and Ms. Geraldine Harrahill of the Localisation Research Centre for their immeasurable assistance during the course of this work. I would also like to thank all post-doctoral scholars of the LRC for their help and their time. Further thanks should also go to all of my colleagues at the LRC PhD lab. I would like to extend my thanks to the CNGL staff for allowing me to attend various meetings, training sessions and workshops. The knowledge I gained from those events has immensely assisted my study. Last but not least, I thank my gorgeous daughter, Siluni, who brought laughter and relieved the stress of my PhD life. I am very much thankful to my parents, my iv wonderful wife, and my brother for their cheerful recognition, continuous encouragement and love of all my academic activities. The PhD research was supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (www.cngl.ie) at the University of Limerick, Limerick, Ireland. v Table of Contents Abstract ......................................................................................................................... i Declaration .................................................................................................................. iii Acknowledgements ...................................................................................................... iv Table of Contents ......................................................................................................... vi List of Tables .............................................................................................................. xii List of Figures ............................................................................................................ xiii List of Appendices ...................................................................................................... xv List of Abbreviations ................................................................................................. xvi Chapter 1: Introduction ........................................................................................... 1 1.1 Research Introduction .................................................................................... 1 1.1.1 Interoperability .......................................................................................... 2 1.1.2 Localisation ............................................................................................... 3 1.1.3 Interoperability in Localisation .................................................................. 4 1.2 Research Questions, Objectives and Scope .................................................... 7 1.3 Methodology ................................................................................................. 8 1.3.1 Selection of a Standard .............................................................................. 9 1.3.2 Pilot Study: Data-Exchange Format Feature Analysis and Comparison .... 10 1.3.3 Development of the XML Data Interoperability Usage Evaluation (XML- DIUE) Framework .............................................................................................. 10 1.3.4 Development of LocConnect: an Interoperability Test-bed ...................... 11 1.4 Research Contributions ................................................................................ 12 1.5 Outline of the Thesis ................................................................................... 14 Chapter 2: Literature Review ................................................................................ 16 2.1 Introduction ................................................................................................. 16 2.2 Approaches to Solving Interoperability Issues ............................................. 18 2.3 Levels of Interoperability............................................................................

Limitations of Localisation Data Exchange Standards and Their Effect on Interoperability

A Semantic Model for Integrated Content Management, Localisation and Language Technology Processing

Euractiv Proposal

Metadata-Group Report

OAXAL Open Architecture for XML Authoring and Localization

App Localization SDL LSP Partner Program Our Partners

How to Link Productivity and Quality Andrzej Zydron

Dita + Xliff + Cms

Translations in Libre Software

IJCNLP 2011 Proceedings of the Workshop on Language Resources,Technology and Services in the Sharing Paradigm

Language Industry Standards and Guidelines

Multilingualweb – Language Technology a New W3C Working Group

Translating and the Computer 36