Master of Computer Science

UNRESTRICTED MULTILINGUAL SUPPORT BUILT ON HISTORICAL OPERATING SYSTEM A thesis submitted in partial fulfilment of the requirements for the award of the degree Master of Computer Science (18 Credit project) from UNIVERSITY OF NEW SOUTH WALES by Zheng-Yu JU School of Computer Science & Engineering February 1996 CERTIFICATE OF ORIGINALITY I hereby declare that this submission is my own work and that, to the best of my knowledge and belief, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted for the award of any other degree or diploma of a university or other institute of higher learning, except where due acknowledgement is made in the text. I also declare that the intellectual content of this thesis is the product of my own work, even though I may have received assistance from others on style, presentation and language expression. ABSTRACT Unicode standard encoding makes developing unrestricted multilingual application possible. In this thesis we explain the current situation, discuss the encoding standard of the future - Unicode and its ASCII compatible variant in details. We also explain how the current implementation of the X Window System supports the internationalization and describe how this windowing system can be extended by using Unicode standard encoding to provide unrestricted multilingual support based on a traditional operating system. iii ACKNOWLEDGMENTS Firstly, I would like to express my gratitude and special thanks to my supervisor Associate Professor John Lions for his guidance, support and valuable suggestions in the research and development of this thesis. Secondly, I would like to sincerely thank my supervisor Dr John Zic for his guidance, encouragement and in depth review of my thesis. His comments have significantly improved it. I would also like to thank him for his patience and time on occasions beyond normal hours. Thirdly, I would like to thank Mr. Raymond, and CSG officers for the help they have generously provided. I would also like to acknowledge and thank the academic and support staff of the School of Computer Science and Engineering, University of New South Wales who have created one of the most pleasant study and research environment in computing that I have encountered. Finally, thanks to my wife for her full support and patience, my mother-in law for her taking care of my new born daughter, and other family members for their support. iv TABLE OF CONTENTS Chapter 1 Introduction 1.1 Encodings methods .............................................................. 1 1.2 Objective of the thesis .......................................................... 11 1.3 Overview of the thesis ......................................................... 12 Chapter 2 Character Sets 2.1 Introduction ..................................................................... 14 2.2 Current Situation ............................................................... 14 2.2.1 ASCII, ISO 646, NRCS ................................................. 14 2.2.2 ISO8859 ................................................................... 15 2.2.3 Han (Chinese/Japanese/Korean) characters .......................... 18 2.2.4 Character set switching ................................................... 19 2.3 Towards a universal coded character set. ................................... 20 2.3.1 How many characters are there? ........................................ 20 2.3.2 Writing Systems ........................................................... 20 2.3.3 What is a Character? ...................................................... 23 2.3.4 Character and Glyph ...................................................... 26 V 2.3.5 Keysym .................................................................. 27 2.4 Standard character sets of the future ........................................ 28 2.4.1 ISO/IEC DIS 10646 .................................................... 28 2.4.2 Unicode Character Encoding Standard .............................. 29 2.4.3 Goal of the Unicode .................................................... 29 2.4.4 Conformance ............................................................ 30 2.4.5 Coverage ................................................................. 31 2.4.6 Unification of Unicode and 10646 ................................... 33 2.4.7 Code structure of the 10646 ........................................... 34 2.4.8 Unicode Standard Codepoints Assignment .......................... 36 2.5 UTF-8 ...........................................................................45 2.6 UTF-7 ...........................................................................47 Chapter 3 Internationalization and Multilingual Support in X 3.1 Introduction ....................................................................49 3.2 Background .................................................................... 51 3.3 Internationalisation In X .....................................................53 3.3.1 Internationalisation with ANSI-C .................................... 58 3.3.2 Text Representation ....................................................61 vi 3.3.3 ISO8859-1 and Other Encodings ..................................... 64 3.3.4 Multi-byte and Wide-Character Strings .............................. 66 3.3.5 Locale Management ....................................................67 3.3.6 Internationalised Text Output. ........................................68 3.3.7 String Encoding for Internationalisation .............................69 3.3.8 Internationalised Interclient Communication ........................70 3. 3. 9 Localisation of Resource Database ...................................71 3.4 Multilingual support in X .................................................... 71 3.4.2 Extending X .............................................................72 3.4.3 Font. ...................................................................... 74 3.4.3.1 Homogenised typeface .......................................... 74 3.4.3.2 Harmonised typeface ............................................76 3.4.3.3 One big font vs. many little fonts ............................... 77 3.4.4 Multilingual Text Output .............................................. 78 3.4.5 Multilingual Interclient Communication ............................. 88 3.4.6 Multilingual Resource Database ...................................... 89 vii Chapter 4 Text Input 4.1 Introduction ......................................................................93 4.2 Background ......................................................................94 4.3 Input Method .................................................................... 98 4.4 Architecture .................................................................... 100 4.4.1 Client /Server model vs. Library Model. ............................ 101 4.5 Connect to Input Method .................................................... 102 4.6 Input Context .................................................................. 103 4.6.1 Input Context Focus Management ................................... 104 4.6.2 Preedit and Status Area Geometry Management. ................. 104 4.6.3 Preedit and Status Callbacks ......................................... 104 4.6.4 Getting Composed Input .............................................. 105 4.6.5 Event Handling ......................................................... 105 4.6.5.1 BackEnd Method ................................................ 106 viii 4.6.5.2 FrontEnd Method ................................................. 107 4. 7 Layering of IM ............................................................... 108 4.8 Multilingual Text Input. ..................................................... 110 4.8.1 Achieving the Goal ................................................. 113 4.9 Application Programming ................................................... 117 4.9.1 programming based on Xlib or higher level toolkits ........... 128 4.10 Miscellaneous ................................................................ 119 Chapter 5 Validation and Conclusion 5.1 Introduction ................................................................... 122 5.2 What have been achieved ................................................... 122 5.3 Future research directions .................................................... 125 5.4 Problem cannot be solved ................................................... 126 5.5 Conclusion ..................................................................... 126 References Character Set Standards ......................................................... 128 Language and Writing System .................................................. 130 Asian Language Input ............................................................ 131 ix Internationalisation ............................................................. 132 X Window System ............................................................. 133 Appendix A Important Data Structures ..................................................... 136 X 1 INTRODUCTION 1.1 Encoding methods Modem computer systems have their origins in the United States and Great Britain; it is certainly natural, then, that the early computers systems supported a single language: English. About 45 years after the invention of the first electronic computer, the hardware is so advanced that even a desk top personal computer has enough power to support multilingual systems or applications.

Master of Computer Science

1 Introduction 1

Modern Programming Languages CS508 Virtual University of Pakistan

About ILE C/C++ Compiler Reference

Technical Study Desktop Internationalization

Unix Standardization and Implementation

IJCNLP 2011 Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011)

Pdflib Reference Manual

ISO/IEC JTC1/SC2/WG2/Irgn2456eisofeedback 2021-3-15

The Not So Short Introduction to Latex2ε

Cyrillic Languages Support in LATEX

AIX Globalization

ESC 151 –C Programming (ANSI C) Note on Medical Reasons For