Character Encoding

Total Page:16

File Type:pdf, Size:1020Kb

Character Encoding Unicode J. Schneeberger University of Applied Sciences Deggendorf [email protected] Unicode and Character Sets With the help of Olaf Winterstein (Greifswald 08) 2 What is Unicode? • An international standard • Goal: unify all characters of all languages worldwide – Until then, there were various national standards. – Standardization and integration of these standards • Publisher is the Unicode Consortium (founded 1991) • aka ISO 10646 3 ASCII – the base • American Standard Code for Information Interchange (ASCII) • 128 characters – presentable by 7 bits XY (X = 1..8, Y = 1..F) 4 ISO-Latin-1 • western characters • 256 characters • 8 bits • hexa- decimal: XY (X=1..F, Y=1..F) http://www.unicode.org/charts/PDF/U0000.pdf 5 http://www.unicode.org/charts/PDF/U0080.pdf More character sets 6 Balinese • But also latin script 7 ... (indefinitely) more characters • 65,536 places for characters – 256 tables of 256 characters – enumeration from 1 to 65,536 – hexadecimal 0001 - FFFF • For readability: 256 blocks with 256 entries • e.g. block 00 for all entries 0001 to 00FF • normally: complete blocks for character systems • now: max. 1.114.112 characters • Unicode Consortium http://www.unicode.org/ 8 Unicode Character Set 9 Unicode Plane 0 10 [Wikipedia] Unicode Planes Plane Range Name Description 0 0000-FFFF Basic Multilingual BMP Integration of old Plane characters sets 1 10000-1FFFF Supplementary SMP Historic characters, Multilingual Plane music, mathematics 2 20000-2FFFF Supplementary SIP Han unification Ideographic Plane (40.000 Zeichen) 3-13 30000-DFFFF unassigned 14 E0000-EFFFF Supplementary SSP non graphical Special-purpose characters, e.g. Plane country codes 15 F0000-FFFFF Private Use Area PUA private usage 16 100000-10FFFF Private Use Area PUA 11 Unicode Plane 1 • Linear B (13.-15. century b.C.) • ancient Greek • Cuneiform script • Old Italian • gothic • Old Persian • Ottoman Turkish language • Phoenician [Wikipedia] • ... 12 Characters and Glyphs • Unicode stores characters no glyphs. • A glyph is a particular writing of a character. • Multiple glyphs for one character. • Glyphs are stored in fonts. • If a font is “Unicode-conform”, the character table of the font resembles the unicode index correctly. It does not contain all unicode characters. 13 Finding Unicode Characters • Different methods 1. Unicode NamesList http://unicode.org/Public/UNIDATA/NamesList.txt 2. Unicode Character Name Index http://unicode.org/charts/charindex.html 3. Unicode Character Code Charts http://unicode.org/charts/ 4. Unicode Search http://www.fileformat.info/info/unicode/char/searc h.htm 14 Unicode NamesList ftp://ftp.unicode.org/ 15 ftp://ftp.unicode.org/Public/5.2.0/ucd/NamesList-5.2.0d2.txt Unicode Character Name Index http://unicode.org/charts/charindex.html 16 Unicode Charts 17 http://www.unicode.org/charts/ Character Search http://www.fileformat.info 18 Complex Characters • Some characters are a combination of multiple other characters. • Search is difficult [http://www.unicode.org/standard/where/] 19 Character Encoding The pitfalls of Unicode ... 20 What is a character encoding? • Historic examples – Morse code – Braille – ... • There are multiple ways to code the numbers 1 – 65.536 with binaries 21 4 Levels / Steps • ACR: Abstract Character Repertoire – The set of characters which has to be coded (eg. an alphabet) • CCS: Coded Character Set – Mapping of the characters set or alphabet (abstract character repertoire – ACR) to a set of (non negative) numbers. Typically 1..n • CEF: Character Encoding Form – Mapping of a set of numbers to a set of units of fixed length (eg. 32-bit units). – If the size of the set exceeds the number of available units (eg. 265 for 8-bit) an escaping procedure has to be agreed on. • CES: Character Encoding Scheme – An invertible transformation of CDF to 8-bit units (octets) to store character sets on old computers. 22 Encoding ACR CCS CEF CES A 0 D50 D 5 0 B 1 D51 D 5 1 C 2 D52 D 5 2 D 3 D53 D 5 3 E 4 D54 D 5 4 ... ... ... ... 23 Common Encodings • ISO 646: ASCII • EBCDIC: CP930 • ISO 8859: ISO 8859-1 Western Europe, ..., ISO 8859-16 • MS-Windows character sets: Windows-1250, ..., Windows-1258 • Mac OS Roman • Cork / T1 • JIS X 0208 weit verbreitet für Japanisch: z.B. EUC-JP, ISO-2022-JP • Chinese Guobiao: GB 2312, GBK (Microsoft Code page 936), GB 18030 • Unicode: UTF-8, UTF-16, ... 24 Unicode and Encodings • Unicode in Programs – UCS-2: two-byte characters – UCS-4: four-byte characters (future) • Unicode in files – UTF-8: ASCII is ASCII, rest are 1- to 4-bytes – UTF-16: two octets per character, initial • ASCII with (hexa)decimal position codes &#x00A9; for © character reference 25 UTF-8 (Unicode Transformation Format) Charakter Range Range UTF-8 (octet) Sequenz 0000 0000-0000 007F 0-127 (ASCII) 0xxxxxxx 0000 0080-0000 07FF 128-2047 110xxxxx 10xxxxxx 0000 0800-0000 FFFF 2048-65535 1110xxxx 10xxxxxx 10xxxxxx 0001 0000-0010 FFFF 65536-1114111 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx Example: A≠α. A<NOT IDENTICAL TO><ALPHA> . U+0041 U+2262 U+0391 U+002E 41 E2 89 A2 CE 91 2E http://www.ietf.org/rfc/rfc3629.txt 26 Examples and Tests http://www.columbia.edu/kermit/utf8.html http://www.cl.cam.ac.uk/~mgk25/unicode.html 27 UTF-16/UCS-2 • UCS = Universal Character Set code point Character UTF-16 code Glyph 122 (hex 7A) small Z (Latin) 007A z 27700 (hex 6C34) water (Chinese) 6C34 水 119070 (hex 1D11E) treble clef D834 DD1E • UTF-16 Encoding of these 3 characters: 27700 122 119070 Encoding Byte Order Byte Sequenz UTF-16LE little-endian 34 6C, 7A 00, 34 D8 1E DD UTF-16BE big-endian 6C 34, 00 7A, D8 34 DD 1E UTF-16 little-endian, w. BOM FF FE, 34 6C, 7A 00, 34 D8 1E DD UTF-16 big-endian, w. BOM FE FF, 6C 34, 00 7A, D8 34 DD 1E 28 pairs of bytes in inverse order BOM – Byte Order Mark • optional at the beginning of a file. • used to specify the byte order in UTF-16 or UTF-32 files • .. or to label UTF-8, UTF-16 or UTF-32 files • troublesome when used over platform borders Kodierung Bytefolge UTF-8 EF BB BF UTF-16 Big Endian FE FF UTF-16 Little Endian FF FE UTF-32 Big Endian 00 00 FE FF UTF-32 Little Endian FF FE 00 00 30 [http://de.wikipedia.org/wiki/Byte_Order_Mark] Analyze Encoding debug command: analyzes files hex dump BOM: FF FE 31 Unicode and XML <?xml version="1.0" encoding="ISO-8859-1" ?> <?xml version="1.0" encoding=“UTF-8" ?> • XML Programs usually work with UTF-8 and UTF-16 • ... but also ASCII, EBCDIC, JIS, KO18-R, Big5 are accepted by most programs. 32 Literature • J. Allen, J. Becker (Hrsg.), The Unicode Standard, Version 5.0, Addison-Wesley Longman, Amsterdam, 2006. • RFC 20, V. Cerf, ASCII format for Network Interchange, http://tools.ietf.org/html/rfc20 1969. 33 Hands-on section • Take a look at the unicode tables at www.unicode.org • Try to find a particular character using unicode character search at www.fileformat.info and integrate it into your documents. • Try to write a UTF-16 document and analyze it using debug or od. • Convert a document from isolatin to UTF-8 or vice versa using iconv. iconv -f ISO-8859-15 -t UTF-8 infile > outfile 34 .
Recommended publications
  • 1 Introduction 1
    The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
    [Show full text]
  • Observation and a Numerical Study of Gravity Waves During Tropical Cyclone Ivan (2008)
    Open Access Atmos. Chem. Phys., 14, 641–658, 2014 Atmospheric www.atmos-chem-phys.net/14/641/2014/ doi:10.5194/acp-14-641-2014 Chemistry © Author(s) 2014. CC Attribution 3.0 License. and Physics Observation and a numerical study of gravity waves during tropical cyclone Ivan (2008) F. Chane Ming1, C. Ibrahim1, C. Barthe1, S. Jolivet2, P. Keckhut3, Y.-A. Liou4, and Y. Kuleshov5,6 1Université de la Réunion, Laboratoire de l’Atmosphère et des Cyclones, UMR8105, CNRS-Météo France-Université, La Réunion, France 2Singapore Delft Water Alliance, National University of Singapore, Singapore, Singapore 3Laboratoire Atmosphères, Milieux, Observations Spatiales, UMR8190, Institut Pierre-Simon Laplace, Université Versailles-Saint Quentin, Guyancourt, France 4Center for Space and Remote Sensing Research, National Central University, Chung-Li 3200, Taiwan 5National Climate Centre, Bureau of Meteorology, Melbourne, Australia 6School of Mathematical and Geospatial Sciences, Royal Melbourne Institute of Technology (RMIT) University, Melbourne, Australia Correspondence to: F. Chane Ming ([email protected]) Received: 3 December 2012 – Published in Atmos. Chem. Phys. Discuss.: 24 April 2013 Revised: 21 November 2013 – Accepted: 2 December 2013 – Published: 22 January 2014 Abstract. Gravity waves (GWs) with horizontal wavelengths ber 1 vortex Rossby wave is suggested as a source of domi- of 32–2000 km are investigated during tropical cyclone (TC) nant inertia GW with horizontal wavelengths of 400–800 km, Ivan (2008) in the southwest Indian Ocean in the upper tropo- while shorter scale modes (100–200 km) located at northeast sphere (UT) and the lower stratosphere (LS) using observa- and southeast of the TC could be attributed to strong local- tional data sets, radiosonde and GPS radio occultation data, ized convection in spiral bands resulting from wave number 2 ECMWF analyses and simulations of the French numerical vortex Rossby waves.
    [Show full text]
  • Macwise Version 19 User's Manual
    [email protected] www.CarnationSoftware.com www.MacWise.com MacWise Version 19 User's Manual You can use Command F to find what you are looking for in this document. Introduction Terminal Emulation MacWise emulates ADDS Viewpoint, Wyse 50, Wyse 60, Wyse 370, Televideo TV 925, DEC VT100, VT220 and Prism terminals. Supports ANSI color. Esprit III color is also supported in Wyse 370 mode. MacWise allows a Macintosh to be used as a terminal -- connected to a host computer directly, by modem, or over the Internet. The emulators support video attributes such as dim, reverse, underline, 132-column modes, protected fields and graphic characters sent from the host computer, as well as enhanced Viewpoint mode. Features include phone list and dialer for modems, on-screen programmable function keys, connection scripts and more. Connectivity 1. Built in Modem 2. Telnet / TCP/IP 3. SSH Secure Shell 4. Serial ports via USB to Serial adaptor . 5. Also communicates directly with the Mac unix shell Telnet Telnet settings are under the Connection Menu. Select "Telnet" to enable telnet. Select "Telnet Connection..." to enter your Host IP address, port number and terminal type. =============================== KERMIT ================================ NOTE: If you are running Mac OS 10.13 or later, you need to also use Kermit. (There should be a check mark on "Kermit" under the Connection Menu.) Kermit is installed automatically when Mac OS 10.13 or later is detected. You can re-install kermit any time by selecting Kermit Installer from the Help Menu in MacWise. Echo Kermit Characters ( under the Connection Menu ) This is normally enabled when Kermit is enabled.
    [Show full text]
  • Unicode Ate My Brain
    UNICODE ATE MY BRAIN John Cowan Reuters Health Information Copyright 2001-04 John Cowan under GNU GPL 1 Copyright • Copyright © 2001 John Cowan • Licensed under the GNU General Public License • ABSOLUTELY NO WARRANTIES; USE AT YOUR OWN RISK • Portions written by Tim Bray; used by permission • Title devised by Smarasderagd; used by permission • Black and white for readability Copyright 2001-04 John Cowan under GNU GPL 2 Abstract Unicode, the universal character set, is one of the foundation technologies of XML. However, it is not as widely understood as it should be, because of the unavoidable complexity of handling all of the world's writing systems, even in a fairly uniform way. This tutorial will provide the basics about using Unicode and XML to save lots of money and achieve world domination at the same time. Copyright 2001-04 John Cowan under GNU GPL 3 Roadmap • Brief introduction (4 slides) • Before Unicode (16 slides) • The Unicode Standard (25 slides) • Encodings (11 slides) • XML (10 slides) • The Programmer's View (27 slides) • Points to Remember (1 slide) Copyright 2001-04 John Cowan under GNU GPL 4 How Many Different Characters? a A à á â ã ä å ā ă ą a a a a a a a a a a a Copyright 2001-04 John Cowan under GNU GPL 5 How Computers Do Text • Characters in computer storage are represented by “small” numbers • The numbers use a small number of bits: from 6 (BCD) to 21 (Unicode) to 32 (wchar_t on some Unix boxes) • Design choices: – Which numbers encode which characters – How to pack the numbers into bytes Copyright 2001-04 John Cowan under GNU GPL 6 Where Does XML Come In? • XML is a textual data format • XML software is required to handle all commercially important characters in the world; a promise to “handle XML” implies a promise to be international • Applications can do what they want; monolingual applications can mostly ignore internationalization Copyright 2001-04 John Cowan under GNU GPL 7 $$$ £££ ¥¥¥ • Extra cost of building-in internationalization to a new computer application: about 20% (assuming XML and Unicode).
    [Show full text]
  • The Unicode Cookbook for Linguists: Managing Writing Systems Using Orthography Profiles
    Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2017 The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles Moran, Steven ; Cysouw, Michael DOI: https://doi.org/10.5281/zenodo.290662 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-135400 Monograph The following work is licensed under a Creative Commons: Attribution 4.0 International (CC BY 4.0) License. Originally published at: Moran, Steven; Cysouw, Michael (2017). The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles. CERN Data Centre: Zenodo. DOI: https://doi.org/10.5281/zenodo.290662 The Unicode Cookbook for Linguists Managing writing systems using orthography profiles Steven Moran & Michael Cysouw Change dedication in localmetadata.tex Preface This text is meant as a practical guide for linguists, and programmers, whowork with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together. The intersection of the Unicode Standard and the International Phonetic Al- phabet is often not met without frustration by users. Nevertheless, thetwo standards have provided language researchers with a consistent computational architecture needed to process, publish and analyze data from many different languages. We bring to light common, but not always transparent, pitfalls that researchers face when working with Unicode and IPA. Our research uses quantitative methods to compare languages and uncover and clarify their phylogenetic relations. However, the majority of lexical data available from the world’s languages is in author- or document-specific orthogra- phies.
    [Show full text]
  • Character Set Migration Best Practices For
    Character Set Migration Best Practices $Q2UDFOH:KLWH3DSHU October 2002 Server Globalization Technology Oracle Corporation Introduction - Database Character Set Migration Migrating from one database character set to another requires proper strategy and tools. This paper outlines the best practices for database character set migration that has been utilized on behalf of hundreds of customers successfully. Following these methods will help determine what strategies are best suited for your environment and will help minimize risk and downtime. This paper also highlights migration to Unicode. Many customers today are finding Unicode to be essential to supporting their global businesses. Oracle provides consulting services for very large or complex environments to help minimize the downtime while maximizing the safe migration of business critical data. Why migrate? Database character set migration often occurs from a requirement to support new languages. As companies internationalize their operations and expand services to customers all around the world, they find the need to support data storage of more World languages than are available within their existing database character set. Historically, many legacy systems required support for only one or possibly a few languages; therefore, the original character set chosen had a limited repertoire of characters that could be supported. For example, in America a 7-bit character set called ASCII is satisfactory for supporting English data exclusively. While in Europe a variety of 8 bit European character sets can support specific subsets of European languages together with English. In Asia, multi byte character sets that could support a given Asian language and English were chosen. These were reasonable choices that fulfilled the initial requirements and provided the best combination of economy and performance.
    [Show full text]
  • Unicode and Code Page Support
    Natural for Mainframes Unicode and Code Page Support Version 4.2.6 for Mainframes October 2009 This document applies to Natural Version 4.2.6 for Mainframes and to all subsequent releases. Specifications contained herein are subject to change and these changes will be reported in subsequent release notes or new editions. Copyright © Software AG 1979-2009. All rights reserved. The name Software AG, webMethods and all Software AG product names are either trademarks or registered trademarks of Software AG and/or Software AG USA, Inc. Other company and product names mentioned herein may be trademarks of their respective owners. Table of Contents 1 Unicode and Code Page Support .................................................................................... 1 2 Introduction ..................................................................................................................... 3 About Code Pages and Unicode ................................................................................ 4 About Unicode and Code Page Support in Natural .................................................. 5 ICU on Mainframe Platforms ..................................................................................... 6 3 Unicode and Code Page Support in the Natural Programming Language .................... 7 Natural Data Format U for Unicode-Based Data ....................................................... 8 Statements .................................................................................................................. 9 Logical
    [Show full text]
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • Infovox Ivox – User Manual
    Infovox iVox – User Manual version 4 Published the 22nd of April 2014 Copyright © 2006-2014 Acapela Group. All rights reserved http://www.acapela-group.com Table of Contents INTRODUCTION .......................................................................................................... 1. WHAT IS INFOVOX IVOX? .................................................................................................. 1. HOW TO USE INFOVOX IVOX ............................................................................................. 1. TRIAL LICENSE AND PURCHASE INFORMATION ........................................................................ 2. SYSTEM REQUIREMENTS ................................................................................................... 2. LIMITATIONS OF INFOVOX IVOX .......................................................................................... 2. INSTALLATION/UNINSTALLATION ................................................................................ 3. HOW TO INSTALL INFOVOX IVOX ......................................................................................... 3. HOW TO UNINSTALL INFOVOX IVOX .................................................................................... 3. INFOVOX IVOX VOICE MANAGER ................................................................................. 4. THE VOICE MANAGER WINDOW ......................................................................................... 4. INSTALLING VOICES ........................................................................................................
    [Show full text]
  • A Kermit File Transfer Protocol for the Apple II Series Personal Computers : John Patrick Francisco Lehigh University
    Lehigh University Lehigh Preserve Theses and Dissertations 1986 A Kermit file transfer protocol for the Apple II series personal computers : John Patrick Francisco Lehigh University Follow this and additional works at: https://preserve.lehigh.edu/etd Part of the Electrical and Computer Engineering Commons Recommended Citation Francisco, John Patrick, "A Kermit file transfer protocol for the Apple II series personal computers :" (1986). Theses and Dissertations. 4628. https://preserve.lehigh.edu/etd/4628 This Thesis is brought to you for free and open access by Lehigh Preserve. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of Lehigh Preserve. For more information, please contact [email protected]. A KERMIT FILE TRANSFER PROTOCOL FOR THE APPLE II SERIES PERSONAL COMPUTERS (Using the Apple Pascal Operating system) by John Patrick Francisco A Thesis Presented to the Graduate Committee of Lehigh University in Candidacy for the Degree of Master of Science 1n• Computer Science Lehigh University March 1986 This thesis is accepted and approved in partial fulfillment of the requirements for the degree of Master of science.• (date) Professor in Charge -------------- --------------- Chairman of the Division Chairman of the Department • • -11- ACKNOWLEDGEMENTS It would be somewhat of an understatement to say this project was broad in scope as the disciplines involved ranged from Phychology to Electrical Engineering. Since the project required an extensive amount of detailed in­ formation in all fields, I was impelled to seek the help, advice and opinion of many. There were also numerous t friends and relatives upon whom I relied for both moral and financial support.
    [Show full text]
  • Unicode Overview.E
    Unicode SAP Systems Unicode@sap NW AS Internationalization SupportedlanguagesinUnicode.doc 09.05.2007 © Copyright 2006 SAP AG. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. Microsoft, Windows, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation. IBM, DB2, DB2 Universal Database, OS/2, Parallel Sysplex, MVS/ESA, AIX, S/390, AS/400, OS/390, OS/400, iSeries, pSeries, xSeries, zSeries, z/OS, AFP, Intelligent Miner, WebSphere, Netfinity, Tivoli, and Informix are trademarks or registered trademarks of IBM Corporation in the United States and/or other countries. Oracle is a registered trademark of Oracle Corporation. UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc. HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts Institute of Technology. Java is a registered trademark of Sun Microsystems, Inc. JavaScript is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape. MaxDB is a trademark of MySQL AB, Sweden. SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world.
    [Show full text]
  • San José, October 2, 2000 Feel Free to Distribute This Text
    San José, October 2, 2000 Feel free to distribute this text (version 1.2) including the author’s email address ([email protected]) and to contact him for corrections and additions. Please do not take this text as a literal translation, but as a help to understand the standard GB 18030-2000. Insertions in brackets [] are used throughout the text to indicate corresponding sections of the published Chinese standard. Thanks to Markus Scherer (IBM) and Ken Lunde (Adobe Systems) for initial critical reviews of the text. SUMMARY, EXPLANATIONS, AND REMARKS: CHINESE NATIONAL STANDARD GB 18030-2000: INFORMATION TECHNOLOGY – CHINESE IDEOGRAMS CODED CHARACTER SET FOR INFORMATION INTERCHANGE – EXTENSION FOR THE BASIC SET (信息技术-信息交换用汉字编码字符集 Xinxi Jishu – Xinxi Jiaohuan Yong Hanzi Bianma Zifuji – Jibenji De Kuochong) March 17, 2000, was the publishing date of the Chinese national standard (国家标准 guojia biaozhun) GB 18030-2000 (hereafter: GBK2K). This standard tries to resolve issues resulting from the advent of Unicode, version 3.0. More specific, it attempts the combination of Uni- code's extended character repertoire, namely the Unihan Extension A, with the character cov- erage of earlier Chinese national standards. HISTORY The People’s Republic of China had already expressed her fundamental consent to support the combined efforts of the ISO/IEC and the Unicode Consortium through publishing a Chinese National Standard that was code- and character-compatible with ISO 10646-1/ Unicode 2.1. This standard was named GB 13000.1. Whenever the ISO and the Unicode Consortium changed or revised their “common” standard, GB 13000.1 adopted these changes subsequently. In order to remain compatible with GB 2312, however, which at the time of publishing Unicode/GB 13000.1 was an already existing national standard widely used to represent the Chinese “simplified” characters, the “specification” GBK was created.
    [Show full text]