Master of Computer Science

Total Page:16

File Type:pdf, Size:1020Kb

Master of Computer Science UNRESTRICTED MULTILINGUAL SUPPORT BUILT ON HISTORICAL OPERATING SYSTEM A thesis submitted in partial fulfilment of the requirements for the award of the degree Master of Computer Science (18 Credit project) from UNIVERSITY OF NEW SOUTH WALES by Zheng-Yu JU School of Computer Science & Engineering February 1996 CERTIFICATE OF ORIGINALITY I hereby declare that this submission is my own work and that, to the best of my knowledge and belief, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted for the award of any other degree or diploma of a university or other institute of higher learning, except where due acknowledgement is made in the text. I also declare that the intellectual content of this thesis is the product of my own work, even though I may have received assistance from others on style, presentation and language expression. ABSTRACT Unicode standard encoding makes developing unrestricted multilingual application possible. In this thesis we explain the current situation, discuss the encoding standard of the future - Unicode and its ASCII compatible variant in details. We also explain how the current implementation of the X Window System supports the internationalization and describe how this windowing system can be extended by using Unicode standard encoding to provide unrestricted multilingual support based on a traditional operating system. iii ACKNOWLEDGMENTS Firstly, I would like to express my gratitude and special thanks to my supervisor Associate Professor John Lions for his guidance, support and valuable suggestions in the research and development of this thesis. Secondly, I would like to sincerely thank my supervisor Dr John Zic for his guidance, encouragement and in depth review of my thesis. His comments have significantly improved it. I would also like to thank him for his patience and time on occasions beyond normal hours. Thirdly, I would like to thank Mr. Raymond, and CSG officers for the help they have generously provided. I would also like to acknowledge and thank the academic and support staff of the School of Computer Science and Engineering, University of New South Wales who have created one of the most pleasant study and research environment in computing that I have encountered. Finally, thanks to my wife for her full support and patience, my mother-in­ law for her taking care of my new born daughter, and other family members for their support. iv TABLE OF CONTENTS Chapter 1 Introduction 1.1 Encodings methods .............................................................. 1 1.2 Objective of the thesis .......................................................... 11 1.3 Overview of the thesis ......................................................... 12 Chapter 2 Character Sets 2.1 Introduction ..................................................................... 14 2.2 Current Situation ............................................................... 14 2.2.1 ASCII, ISO 646, NRCS ................................................. 14 2.2.2 ISO8859 ................................................................... 15 2.2.3 Han (Chinese/Japanese/Korean) characters .......................... 18 2.2.4 Character set switching ................................................... 19 2.3 Towards a universal coded character set. ................................... 20 2.3.1 How many characters are there? ........................................ 20 2.3.2 Writing Systems ........................................................... 20 2.3.3 What is a Character? ...................................................... 23 2.3.4 Character and Glyph ...................................................... 26 V 2.3.5 Keysym .................................................................. 27 2.4 Standard character sets of the future ........................................ 28 2.4.1 ISO/IEC DIS 10646 .................................................... 28 2.4.2 Unicode Character Encoding Standard .............................. 29 2.4.3 Goal of the Unicode .................................................... 29 2.4.4 Conformance ............................................................ 30 2.4.5 Coverage ................................................................. 31 2.4.6 Unification of Unicode and 10646 ................................... 33 2.4.7 Code structure of the 10646 ........................................... 34 2.4.8 Unicode Standard Codepoints Assignment .......................... 36 2.5 UTF-8 ...........................................................................45 2.6 UTF-7 ...........................................................................47 Chapter 3 Internationalization and Multilingual Support in X 3.1 Introduction ....................................................................49 3.2 Background .................................................................... 51 3.3 Internationalisation In X .....................................................53 3.3.1 Internationalisation with ANSI-C .................................... 58 3.3.2 Text Representation ....................................................61 vi 3.3.3 ISO8859-1 and Other Encodings ..................................... 64 3.3.4 Multi-byte and Wide-Character Strings .............................. 66 3.3.5 Locale Management ....................................................67 3.3.6 Internationalised Text Output. ........................................68 3.3.7 String Encoding for Internationalisation .............................69 3.3.8 Internationalised Interclient Communication ........................70 3. 3. 9 Localisation of Resource Database ...................................71 3.4 Multilingual support in X .................................................... 71 3.4.2 Extending X .............................................................72 3.4.3 Font. ...................................................................... 74 3.4.3.1 Homogenised typeface .......................................... 74 3.4.3.2 Harmonised typeface ............................................76 3.4.3.3 One big font vs. many little fonts ............................... 77 3.4.4 Multilingual Text Output .............................................. 78 3.4.5 Multilingual Interclient Communication ............................. 88 3.4.6 Multilingual Resource Database ...................................... 89 vii Chapter 4 Text Input 4.1 Introduction ......................................................................93 4.2 Background ......................................................................94 4.3 Input Method .................................................................... 98 4.4 Architecture .................................................................... 100 4.4.1 Client /Server model vs. Library Model. ............................ 101 4.5 Connect to Input Method .................................................... 102 4.6 Input Context .................................................................. 103 4.6.1 Input Context Focus Management ................................... 104 4.6.2 Preedit and Status Area Geometry Management. ................. 104 4.6.3 Preedit and Status Callbacks ......................................... 104 4.6.4 Getting Composed Input .............................................. 105 4.6.5 Event Handling ......................................................... 105 4.6.5.1 BackEnd Method ................................................ 106 viii 4.6.5.2 FrontEnd Method ................................................. 107 4. 7 Layering of IM ............................................................... 108 4.8 Multilingual Text Input. ..................................................... 110 4.8.1 Achieving the Goal ................................................. 113 4.9 Application Programming ................................................... 117 4.9.1 programming based on Xlib or higher level toolkits ........... 128 4.10 Miscellaneous ................................................................ 119 Chapter 5 Validation and Conclusion 5.1 Introduction ................................................................... 122 5.2 What have been achieved ................................................... 122 5.3 Future research directions .................................................... 125 5.4 Problem cannot be solved ................................................... 126 5.5 Conclusion ..................................................................... 126 References Character Set Standards ......................................................... 128 Language and Writing System .................................................. 130 Asian Language Input ............................................................ 131 ix Internationalisation ............................................................. 132 X Window System ............................................................. 133 Appendix A Important Data Structures ..................................................... 136 X 1 INTRODUCTION 1.1 Encoding methods Modem computer systems have their origins in the United States and Great Britain; it is certainly natural, then, that the early computers systems supported a single language: English. About 45 years after the invention of the first electronic computer, the hardware is so advanced that even a desk top personal computer has enough power to support multilingual systems or applications.
Recommended publications
  • 1 Introduction 1
    The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
    [Show full text]
  • Modern Programming Languages CS508 Virtual University of Pakistan
    Modern Programming Languages (CS508) VU Modern Programming Languages CS508 Virtual University of Pakistan Leaders in Education Technology 1 © Copyright Virtual University of Pakistan Modern Programming Languages (CS508) VU TABLE of CONTENTS Course Objectives...........................................................................................................................4 Introduction and Historical Background (Lecture 1-8)..............................................................5 Language Evaluation Criterion.....................................................................................................6 Language Evaluation Criterion...................................................................................................15 An Introduction to SNOBOL (Lecture 9-12).............................................................................32 Ada Programming Language: An Introduction (Lecture 13-17).............................................45 LISP Programming Language: An Introduction (Lecture 18-21)...........................................63 PROLOG - Programming in Logic (Lecture 22-26) .................................................................77 Java Programming Language (Lecture 27-30)..........................................................................92 C# Programming Language (Lecture 31-34) ...........................................................................111 PHP – Personal Home Page PHP: Hypertext Preprocessor (Lecture 35-37)........................129 Modern Programming Languages-JavaScript
    [Show full text]
  • About ILE C/C++ Compiler Reference
    IBM i 7.3 Programming IBM Rational Development Studio for i ILE C/C++ Compiler Reference IBM SC09-4816-07 Note Before using this information and the product it supports, read the information in “Notices” on page 121. This edition applies to IBM® Rational® Development Studio for i (product number 5770-WDS) and to all subsequent releases and modifications until otherwise indicated in new editions. This version does not run on all reduced instruction set computer (RISC) models nor does it run on CISC models. This document may contain references to Licensed Internal Code. Licensed Internal Code is Machine Code and is licensed to you under the terms of the IBM License Agreement for Machine Code. © Copyright International Business Machines Corporation 1993, 2015. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents ILE C/C++ Compiler Reference............................................................................... 1 What is new for IBM i 7.3.............................................................................................................................3 PDF file for ILE C/C++ Compiler Reference.................................................................................................5 About ILE C/C++ Compiler Reference......................................................................................................... 7 Prerequisite and Related Information..................................................................................................
    [Show full text]
  • Technical Study Desktop Internationalization
    Technical Study Desktop Internationalization NIC CH A E L T S T U D Y [This page intentionally left blank] X/Open Technical Study Desktop Internationalisation X/Open Company Ltd. December 1995, X/Open Company Limited All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owners. X/Open Technical Study Desktop Internationalisation X/Open Document Number: E501 Published by X/Open Company Ltd., U.K. Any comments relating to the material contained in this document may be submitted to X/Open at: X/Open Company Limited Apex Plaza Forbury Road Reading Berkshire, RG1 1AX United Kingdom or by Electronic Mail to: [email protected] ii X/Open Technical Study (1995) Contents Chapter 1 Internationalisation.............................................................................. 1 1.1 Introduction ................................................................................................. 1 1.2 Character Sets and Encodings.................................................................. 2 1.3 The C Programming Language................................................................ 5 1.4 Internationalisation Support in POSIX .................................................. 6 1.5 Internationalisation Support in the X/Open CAE............................... 7 1.5.1 XPG4 Facilities.........................................................................................
    [Show full text]
  • Unix Standardization and Implementation
    Contents 1. Preface/Introduction 2. Standardization and Implementation 3. File I/O 4. Standard I/O Library 5. Files and Directories 6. System Data Files and Information 7. Environment of a Unix Process 8. Process Control 9. Signals 10.Inter-process Communication * All rights reserved, Tei-Wei Kuo, National Taiwan University, 2003. Standardization and Implementation Why Standardization? Proliferation of UNIX versions What should be done? The specifications of limits that each implementation must define! * All rights reserved, Tei-Wei Kuo, National Taiwan University, 2003. 1 UNIX Standardization ANSI C American National Standards Institute ISO/IEC 9899:1990 International Organization for Standardization (ISO) Syntax/Semantics of C, a standard library Purpose: Provide portability of conforming C programs to a wide variety of OS’s. 15 areas: Fig 2.1 – Page 27 * All rights reserved, Tei-Wei Kuo, National Taiwan University, 2003. UNIX Standardization ANSIC C <assert.h> - verify program assertion <ctype.h> - char types <errno.h> - error codes <float.h> - float point constants <limits.h> - implementation constants <locale.h> - locale catalogs <math.h> - mathematical constants <setjmp.h> - nonlocal goto <signal.h> - signals <stdarg.h> - variable argument lists <stddef.h> - standard definitions <stdio.h> - standard library <stdlib.h> - utilities functions <string.h> - string operations <time.h> - time and date * All rights reserved, Tei-Wei Kuo, National Taiwan University, 2003. 2 UNIX Standardization POSIX.1 (Portable Operating System Interface) developed by IEEE Not restricted for Unix-like systems and no distinction for system calls and library functions Originally IEEE Standard 1003.1-1988 1003.2: shells and utilities, 1003.7: system administrator, > 15 other communities Published as IEEE std 1003.1-1990, ISO/IEC9945-1:1990 New: the inclusion of symbolic links No superuser notion * All rights reserved, Tei-Wei Kuo, National Taiwan University, 2003.
    [Show full text]
  • IJCNLP 2011 Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011)
    IJCNLP 2011 Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011) November 13, 2011 Shangri-La Hotel Chiang Mai, Thailand IJCNLP 2011 Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011) November 13, 2011 Chiang Mai, Thailand We wish to thank our sponsors Gold Sponsors www.google.com www.baidu.com The Office of Naval Research (ONR) Department of Systems Engineering and The Asian Office of Aerospace Research and Devel- Engineering Managment, The Chinese Uni- opment (AOARD) versity of Hong Kong Silver Sponsors Microsoft Corporation Bronze Sponsors Chinese and Oriental Languages Information Processing Society (COLIPS) Supporter Thailand Convention and Exhibition Bureau (TCEB) We wish to thank our sponsors Organizers Asian Federation of Natural Language National Electronics and Computer Technolo- Processing (AFNLP) gy Center (NECTEC), Thailand Sirindhorn International Institute of Technology Rajamangala University of Technology Lanna (SIIT), Thailand (RMUTL), Thailand Chiang Mai University (CMU), Thailand Maejo University, Thailand c 2011 Asian Federation of Natural Language Proceesing vii Preface Welcome to the IJCNLP Workshop on Advances in Text Input Methods (WTIM 2011)! Methods of text input have entered a new era. The number of people who have access to computers and mobile devices is skyrocketing in regions where people do not have a convenient method of inputting their native language. It has also become commonplace to input text not through a keyboard but through different modes such as voice and handwriting recognition. Even when people input text using a keyboard, it is done differently from only a few years ago – adaptive software keyboards, word auto- completion and prediction, and spell correction are just a few examples of such recent changes in text input experience.
    [Show full text]
  • Pdflib Reference Manual
    PDFlib GmbH München, Germany Reference Manual ® A library for generating PDF on the fly Version 5.0.2 www.pdflib.com Copyright © 1997–2003 PDFlib GmbH and Thomas Merz. All rights reserved. PDFlib GmbH Tal 40, 80331 München, Germany http://www.pdflib.com phone +49 • 89 • 29 16 46 87 fax +49 • 89 • 29 16 46 86 If you have questions check the PDFlib mailing list and archive at http://groups.yahoo.com/group/pdflib Licensing contact: [email protected] Support for commercial PDFlib licensees: [email protected] (please include your license number) This publication and the information herein is furnished as is, is subject to change without notice, and should not be construed as a commitment by PDFlib GmbH. PDFlib GmbH assumes no responsibility or lia- bility for any errors or inaccuracies, makes no warranty of any kind (express, implied or statutory) with re- spect to this publication, and expressly disclaims any and all warranties of merchantability, fitness for par- ticular purposes and noninfringement of third party rights. PDFlib and the PDFlib logo are registered trademarks of PDFlib GmbH. PDFlib licensees are granted the right to use the PDFlib name and logo in their product documentation. However, this is not required. Adobe, Acrobat, and PostScript are trademarks of Adobe Systems Inc. AIX, IBM, OS/390, WebSphere, iSeries, and zSeries are trademarks of International Business Machines Corporation. ActiveX, Microsoft, Windows, and Windows NT are trademarks of Microsoft Corporation. Apple, Macintosh and TrueType are trademarks of Apple Computer, Inc. Unicode and the Unicode logo are trademarks of Unicode, Inc. Unix is a trademark of The Open Group.
    [Show full text]
  • ISO/IEC JTC1/SC2/WG2/Irgn2456eisofeedback 2021-3-15
    ISO/IEC JTC1/SC2/WG2/IRGN2456EisoFeedback 2021-3-15 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Feedback on MSCS-2020 Source: Eiso Chan (陈永聪, Culture and Art Publishing House) Status: Individual Contribution to online IRG #56 Action: For consideration by IRG and MSARG Date: 2021-3-15 Macao SARG has (pre-)published MSCS-2020 as the appendix of IRGN2456. After reviewing the chart during IRG #56 online meeting, I point out some comments as below for reference only. 1. Feedback on the ME-9759-001 glyph Fig. 1 Current ME-9759-001 glyph The first stoke of the bottom left component of the current ME-9758-001 glyph is Stroke P (撇), but this stroke of the component 靑 under other characters / glyphs should be Stroke S (竖/ 竪/豎, aka the vertical bar) in MSCS-2020, such as ME-701E-001, ME-83C1-001, ME-84A8- 001 and ME-975C-001. Maybe it will be better to make this stroke form consistent. 2. Feedback on the version of HKSCS In IRGN2430R, MSARG clearly showed the version of HKSCS which would be used in Macao SAR is HKSCS-2008 not the latest version or the future version, so it’s better to clarify the version of HKSCS means HKSCS-2008 in MSCS-2020 and the future version. 3. Feedback on the subtitles for 1.1.3.2 and 1.1.3.3 The current subtitles for these two sections are “《大五碼字符集》中的基本字” and “《香港 增補字符集》中的基本字”.
    [Show full text]
  • The Not So Short Introduction to Latex2ε
    The Not So Short Introduction to LATEX 2ε Or LATEX 2ε in 139 minutes by Tobias Oetiker Hubert Partl, Irene Hyna and Elisabeth Schlegl Version 4.20, May 31, 2006 ii Copyright ©1995-2005 Tobias Oetiker and Contributers. All rights reserved. This document is free; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this document; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. Thank you! Much of the material used in this introduction comes from an Austrian introduction to LATEX 2.09 written in German by: Hubert Partl <[email protected]> Zentraler Informatikdienst der Universität für Bodenkultur Wien Irene Hyna <[email protected]> Bundesministerium für Wissenschaft und Forschung Wien Elisabeth Schlegl <noemail> in Graz If you are interested in the German document, you can find a version updated for LATEX 2ε by Jörg Knappen at CTAN:/tex-archive/info/lshort/german iv Thank you! The following individuals helped with corrections, suggestions and material to improve this paper. They put in a big effort to help me get this document into its present shape.
    [Show full text]
  • Cyrillic Languages Support in LATEX
    Cyrillic languages support in LATEX © Copyright 1998{1999, Vladimir Volovich, Werner Lemberg and LATEX Project Team. All rights reserved. 12 March 1999 Contents 1 Introduction1 1.1 Acknowledgments...........................2 2 Installation2 2.1 Fonts..................................2 2.2 Hyphenation patterns........................3 2.3 babel support for Russian and Ukrainian..............3 2.4 Getting pre-built packages......................3 3 Usage3 4 Font encodings for Cyrillic languages4 5 Input encodings5 6 Reporting bugs6 7 Miscellanea in the T2 bundle7 Abstract This document contains basic information on the Cyrillic setup for LATEX: how to get the fonts, how to set them up, how to use the interface, its interaction with babel, etc. This is only a first draft of the document and it will probably be modified in future; so please send in comments on it via the latexbug system (see below). 1 Introduction Most Latin-based European languages were supported in LATEX by introducing the T1 font encoding and by using the fontenc and inputenc packages; these use only standard TEX means to support any 8-bit input encoding and this one 1 standard font encoding. The restriction to a single font encoding guarantees that multiple languages can happily coexist in one document (e.g., hyphenation will be correct for all languages). Starting with the December 1998 Release, LATEX finally supports Cyrillic languages. This support is based on the new standard Cyrillic TEX font encodings|T2A, T2B, T2C, and X2. The first three of these satisfy some ba- sic requirements for LATEX T* encodings, and thus can be used in multi-lingual documents with other languages based on standard font encodings.
    [Show full text]
  • AIX Globalization
    AIX Version 7.1 AIX globalization IBM Note Before using this information and the product it supports, read the information in “Notices” on page 233 . This edition applies to AIX Version 7.1 and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright International Business Machines Corporation 2010, 2018. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this document............................................................................................vii Highlighting.................................................................................................................................................vii Case-sensitivity in AIX................................................................................................................................vii ISO 9000.....................................................................................................................................................vii AIX globalization...................................................................................................1 What's new...................................................................................................................................................1 Separation of messages from programs..................................................................................................... 1 Conversion between code sets.............................................................................................................
    [Show full text]
  • ESC 151 –C Programming (ANSI C) Note on Medical Reasons For
    ESC 151 –C Programming (ANSI C) Class: Section 1: Remote, quizzes Monday at 9:30 Section 2: TTH 10:00 AM-11:15 AM Suggested External Information (in place of a book): http://www.cprogramming.com/tutorial/c- tutorial.html Instructor: Robert Fiske Office: FH 315, Office Hours/Availability: Officially: TTH 6:00-8:00 in general I’m available at any time except TTH: 9:00-12:30. Send me an email and if need be we can meet via zoom, if I don’t reply in a timely fashion send another email and/or post on the class discussion board. Note on medical reasons for extensions/other requirements Educational access is the provision of classroom accommodations, auxiliary aids and services to ensure equal educational opportunities for all students regardless of their disability. Any student who feels he or she may need an accommodation based on the impact of a disability should contact the Office of Disability Services at (216)687-2015. The Office is located in MC 147. Accommodations need to be requested in advance and will not be granted retroactively. Any medical/family issues you have need to be brought to my attention before you take any quiz/exam, once the exam is in your hands no makeup will be oered. Once you begin to take an exam/quiz you may not leave the room until you turn in the quiz/exam. Course Objectives: This course is designed to: 1. Introduce the C programming language. 2. Introduce the concepts of computer programming. 3. Teach students to think about how to approach programming challenges.
    [Show full text]