The Unicode Standard, Version 6.3

Total Page:16

File Type:pdf, Size:1020Kb

The Unicode Standard, Version 6.3 Kannada Range: 0C80–0CFF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 6.3 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-6.3/ for charts showing only the characters added in Unicode 6.3. See http://www.unicode.org/Public/6.3.0/charts/ for a complete archived file of character code charts for Unicode 6.3. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 6.3 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 6.3, online at http://www.unicode.org/versions/Unicode6.3.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, and #45, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation. Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts. See http://www.unicode.org/charts/fonts.html for a list. Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these code charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site. See http://www.unicode.org/pending/pending.html and http://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2013 Unicode, Inc. All rights reserved. 0C80 Kannada 0CFF 0C8 0C9 0CA 0CB 0CC 0CD 0CE 0CF 0 ಐ ಠ ರ $ೀ ೠ 0C90 0CA0 0CB0 0CC0 0CE0 1 ಡ ಱ $ು ೡ ೱ 0CA1 0CB1 0CC1 0CE1 0CF1 2 $ಂ ಒ ಢ ಲ $ೂ ೢ ೲ 0C82 0C92 0CA2 0CB2 0CC2 0CE2 0CF2 3 $ಃ ಓ ಣ ಳ $ೃ ೣ 0C83 0C93 0CA3 0CB3 0CC3 0CE3 4 ಔ ತ $ೄ 0C94 0CA4 0CC4 5 ಅ ಕ ಥ ವ $ೕ 0C85 0C95 0CA5 0CB5 0CD5 6 ಆ ಖ ದ ಶ $ೆ $ೖ ೦ 0C86 0C96 0CA6 0CB6 0CC6 0CD6 0CE6 7 ಇ ಗ ಧ ಷ $ೇ ೧ 0C87 0C97 0CA7 0CB7 0CC7 0CE7 8 ಈ ಘ ನ ಸ $ೈ ೨ 0C88 0C98 0CA8 0CB8 0CC8 0CE8 9 ಉ ಙ ಹ ೩ 0C89 0C99 0CB9 0CE9 A ಊ ಚ ಪ $ೊ ೪ 0C8A 0C9A 0CAA 0CCA 0CEA B ಋ ಛ ಫ $ೋ ೫ 0C8B 0C9B 0CAB 0CCB 0CEB C ಌ ಜ ಬ $಼ $ೌ ೬ 0C8C 0C9C 0CAC 0CBC 0CCC 0CEC D ಝ ಭ ಽ $್ ೭ 0C9D 0CAD 0CBD 0CCD 0CED E ಎ ಞ ಮ $ಾ ೞ ೮ 0C8E 0C9E 0CAE 0CBE 0CDE 0CEE F ಏ ಟ ಯ $ಿ ೯ 0C8F 0C9F 0CAF 0CBF 0CEF The Unicode Standard 6.3, Copyright © 1991-2013 Unicode, Inc. All rights reserved. 0C82 Kannada 0CEF Various signs 0CBD ಽ KANNADA SIGN AVAGRAHA 0C82 $ಂ KANNADA SIGN ANUSVARA Dependent vowel signs KANNADA SIGN VISARGA 0C83 $ಃ 0CBE $ಾ KANNADA VOWEL SIGN AA Independent vowels 0CBF $ಿ KANNADA VOWEL SIGN I 0C85 ಅ KANNADA LETTER A 0CC0 $ೀ KANNADA VOWEL SIGN II 0C86 ಆ KANNADA LETTER AA ≡ 0CBF $ ಿ 0CD5 $ೕ 0C87 ಇ KANNADA LETTER I 0CC1 $ು KANNADA VOWEL SIGN U 0C88 ಈ KANNADA LETTER II 0CC2 $ೂ KANNADA VOWEL SIGN UU 0C89 ಉ KANNADA LETTER U 0CC3 $ೃ KANNADA VOWEL SIGN VOCALIC R 0C8A ಊ KANNADA LETTER UU 0CC4 $ೄ KANNADA VOWEL SIGN VOCALIC RR 0C8B ಋ KANNADA LETTER VOCALIC R 0CC5 " <reserved> 0C8C ಌ KANNADA LETTER VOCALIC L 0CC6 $ೆ KANNADA VOWEL SIGN E 0C8D " <reserved> 0CC7 $ೇ KANNADA VOWEL SIGN EE 0C8E ಎ KANNADA LETTER E ≡ 0CC6 $ೆ 0CD5 $ೕ KANNADA VOWEL SIGN AI 0C8F ಏ KANNADA LETTER EE 0CC8 $ೈ 0C90 ಐ KANNADA LETTER AI ≡ 0CC6 $ೆ 0CD6 $ೖ <reserved> 0C91 " <reserved> 0CC9 " KANNADA VOWEL SIGN O 0C92 ಒ KANNADA LETTER O 0CCA $ೊ 0C93 KANNADA LETTER OO ≡ 0CC6 $ೆ 0CC2 $ೂ ಓ KANNADA VOWEL SIGN OO 0C94 ಔ KANNADA LETTER AU 0CCB $ೋ ≡ 0CCA $ೊ 0CD5 $ೕ Consonants 0CCC $ೌ KANNADA VOWEL SIGN AU 0C95 KANNADA LETTER KA ಕ Virama 0C96 KANNADA LETTER KHA ಖ KANNADA SIGN VIRAMA 0C97 ಗ KANNADA LETTER GA 0CCD $್ 0C98 ಘ KANNADA LETTER GHA • preferred name is halant 0C99 ಙ KANNADA LETTER NGA Various signs 0C9A ಚ KANNADA LETTER CA 0CD5 $ೕ KANNADA LENGTH MARK 0C9B ಛ KANNADA LETTER CHA 0CD6 $ೖ KANNADA AI LENGTH MARK 0C9C KANNADA LETTER JA ಜ Additional consonants 0C9D ಝ KANNADA LETTER JHA 0CDE KANNADA LETTER FA 0C9E ಞ KANNADA LETTER NYA ೞ KANNADA LETTER LLLA 0C9F ಟ KANNADA LETTER TTA ※ obsolete historic letter 0CA0 ಠ KANNADA LETTER TTHA • • name is a mistake for LLLA 0CA1 ಡ KANNADA LETTER DDA 0CA2 ಢ KANNADA LETTER DDHA Additional vowels for Sanskrit 0CA3 ಣ KANNADA LETTER NNA 0CE0 ೠ KANNADA LETTER VOCALIC RR 0CA4 ತ KANNADA LETTER TA 0CE1 ೡ KANNADA LETTER VOCALIC LL KANNADA LETTER THA 0CA5 ಥ Dependent vowels 0CA6 KANNADA LETTER DA ದ 0CE2 ೢ KANNADA VOWEL SIGN VOCALIC L 0CA7 KANNADA LETTER DHA ಧ 0CE3 ೣ KANNADA VOWEL SIGN VOCALIC LL 0CA8 ನ KANNADA LETTER NA 0CA9 " <reserved> Reserved 0CAA ಪ KANNADA LETTER PA For viram punctuation, use the generic Indic 0964 and 0965. 0CAB ಫ KANNADA LETTER PHA 0CE4 " <reserved> 0CAC ಬ KANNADA LETTER BA → 0964 । devanagari danda 0CAD ಭ KANNADA LETTER BHA 0CE5 " <reserved> 0CAE ಮ KANNADA LETTER MA → 0965 ॥ devanagari double danda KANNADA LETTER YA 0CAF ಯ Digits 0CB0 KANNADA LETTER RA ರ 0CE6 ೦ KANNADA DIGIT ZERO 0CB1 KANNADA LETTER RRA ಱ 0CE7 ೧ KANNADA DIGIT ONE 0CB2 KANNADA LETTER LA ಲ 0CE8 ೨ KANNADA DIGIT TWO 0CB3 KANNADA LETTER LLA ಳ 0CE9 ೩ KANNADA DIGIT THREE 0CB4 " <reserved> 0CEA ೪ KANNADA DIGIT FOUR 0CB5 KANNADA LETTER VA ವ 0CEB ೫ KANNADA DIGIT FIVE 0CB6 KANNADA LETTER SHA ಶ 0CEC ೬ KANNADA DIGIT SIX 0CB7 KANNADA LETTER SSA ಷ 0CED ೭ KANNADA DIGIT SEVEN 0CB8 KANNADA LETTER SA ಸ 0CEE ೮ KANNADA DIGIT EIGHT 0CB9 KANNADA LETTER HA ಹ 0CEF ೯ KANNADA DIGIT NINE Various signs 0CBC $಼ KANNADA SIGN NUKTA The Unicode Standard 6.3, Copyright © 1991-2013 Unicode, Inc. All rights reserved. 0CF1 Kannada 0CF2 Signs used in Sanskrit 0CF1 ೱ KANNADA SIGN JIHVAMULIYA → 1CF5 ᳵ vedic sign jihvamuliya 0CF2 ೲ KANNADA SIGN UPADHMANIYA → 1CF6 ᳶ vedic sign upadhmaniya The Unicode Standard 6.3, Copyright © 1991-2013 Unicode, Inc. All rights reserved..
Recommended publications
  • 1 Introduction 1
    The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
    [Show full text]
  • The Unicode Cookbook for Linguists: Managing Writing Systems Using Orthography Profiles
    Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2017 The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles Moran, Steven ; Cysouw, Michael DOI: https://doi.org/10.5281/zenodo.290662 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-135400 Monograph The following work is licensed under a Creative Commons: Attribution 4.0 International (CC BY 4.0) License. Originally published at: Moran, Steven; Cysouw, Michael (2017). The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles. CERN Data Centre: Zenodo. DOI: https://doi.org/10.5281/zenodo.290662 The Unicode Cookbook for Linguists Managing writing systems using orthography profiles Steven Moran & Michael Cysouw Change dedication in localmetadata.tex Preface This text is meant as a practical guide for linguists, and programmers, whowork with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together. The intersection of the Unicode Standard and the International Phonetic Al- phabet is often not met without frustration by users. Nevertheless, thetwo standards have provided language researchers with a consistent computational architecture needed to process, publish and analyze data from many different languages. We bring to light common, but not always transparent, pitfalls that researchers face when working with Unicode and IPA. Our research uses quantitative methods to compare languages and uncover and clarify their phylogenetic relations. However, the majority of lexical data available from the world’s languages is in author- or document-specific orthogra- phies.
    [Show full text]
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • Chinese Script Generation Panel Document
    Chinese Script Generation Panel Document Proposal for the Generation Panel for the Chinese Script Label Generation Ruleset for the Root Zone 1. General Information Chinese script is the logograms used in the writing of Chinese and some other Asian languages. They are called Hanzi in Chinese, Kanji in Japanese and Hanja in Korean. Since the Hanzi unification in the Qin dynasty (221-207 B.C.), the most important change in the Chinese Hanzi occurred in the middle of the 20th century when more than two thousand Simplified characters were introduced as official forms in Mainland China. As a result, the Chinese language has two writing systems: Simplified Chinese (SC) and Traditional Chinese (TC). Both systems are expressed using different subsets under the Unicode definition of the same Han script. The two writing systems use SC and TC respectively while sharing a large common “unchanged” Hanzi subset that occupies around 60% in contemporary use. The common “unchanged” Hanzi subset enables a simplified Chinese user to understand texts written in traditional Chinese with little difficulty and vice versa. The Hanzi in SC and TC have the same meaning and the same pronunciation and are typical variants. The Japanese kanji were adopted for recording the Japanese language from the 5th century AD. Chinese words borrowed into Japanese could be written with Chinese characters, while Japanese words could be written using the character for a Chinese word of similar meaning. Finally, in Japanese, all three scripts (kanji, and the hiragana and katakana syllabaries) are used as main scripts. The Chinese script spread to Korea together with Buddhism from the 2nd century BC to the 5th century AD.
    [Show full text]
  • Ahom Range: 11700–1174F
    Ahom Range: 11700–1174F This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
    [Show full text]
  • Unicode Overview.E
    Unicode SAP Systems Unicode@sap NW AS Internationalization SupportedlanguagesinUnicode.doc 09.05.2007 © Copyright 2006 SAP AG. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. Microsoft, Windows, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation. IBM, DB2, DB2 Universal Database, OS/2, Parallel Sysplex, MVS/ESA, AIX, S/390, AS/400, OS/390, OS/400, iSeries, pSeries, xSeries, zSeries, z/OS, AFP, Intelligent Miner, WebSphere, Netfinity, Tivoli, and Informix are trademarks or registered trademarks of IBM Corporation in the United States and/or other countries. Oracle is a registered trademark of Oracle Corporation. UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc. HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts Institute of Technology. Java is a registered trademark of Sun Microsystems, Inc. JavaScript is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape. MaxDB is a trademark of MySQL AB, Sweden. SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world.
    [Show full text]
  • Plain Text & Character Encoding
    Journal of eScience Librarianship Volume 10 Issue 3 Data Curation in Practice Article 12 2021-08-11 Plain Text & Character Encoding: A Primer for Data Curators Seth Erickson Pennsylvania State University Let us know how access to this document benefits ou.y Follow this and additional works at: https://escholarship.umassmed.edu/jeslib Part of the Scholarly Communication Commons, and the Scholarly Publishing Commons Repository Citation Erickson S. Plain Text & Character Encoding: A Primer for Data Curators. Journal of eScience Librarianship 2021;10(3): e1211. https://doi.org/10.7191/jeslib.2021.1211. Retrieved from https://escholarship.umassmed.edu/jeslib/vol10/iss3/12 Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License. This material is brought to you by eScholarship@UMMS. It has been accepted for inclusion in Journal of eScience Librarianship by an authorized administrator of eScholarship@UMMS. For more information, please contact [email protected]. ISSN 2161-3974 JeSLIB 2021; 10(3): e1211 https://doi.org/10.7191/jeslib.2021.1211 Full-Length Paper Plain Text & Character Encoding: A Primer for Data Curators Seth Erickson The Pennsylvania State University, University Park, PA, USA Abstract Plain text data consists of a sequence of encoded characters or “code points” from a given standard such as the Unicode Standard. Some of the most common file formats for digital data used in eScience (CSV, XML, and JSON, for example) are built atop plain text standards. Plain text representations of digital data are often preferred because plain text formats are relatively stable, and they facilitate reuse and interoperability.
    [Show full text]
  • UTF-8 from Wikipedia, the Free Encyclopedia
    UTF-8 From Wikipedia, the free encyclopedia UTF-8 is a character encoding capable of encoding all possible characters, or code points, defined by Unicode and originally designed by Ken Thompson and Rob Pike.[1] The encoding is variable-length and uses 8-bit code units. It was designed for backward compatibility with ASCII and to avoid the complications of endianness and byte order marks in the alternative UTF-16 and UTF-32 encodings. The name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8- bit.[2] UTF-8 is the dominant character encoding for the World Wide Web, accounting for 89.1% of all Web pages in May 2017 (the most popular East Asian encodings, Shift JIS and GB 2312, have 0.9% and 0.7% respectively).[4][5][3] The Internet Mail Consortium (IMC) recommended that all e-mail programs be able to display and create mail using UTF-8,[6] and the W3C recommends UTF-8 as the default encoding in XML and HTML.[7] UTF-8 encodes each of the 1,112,064[8] valid code points in Unicode using one to four 8-bit bytes.[9] Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well. Since ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, UTF-8 is safe to use within most programming and document languages that interpret certain ASCII characters in a special way, such as '/' in filenames, '\' in escape sequences, and '%' in printf.
    [Show full text]
  • Relationship to ISO/IEC 10646 C
    The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
    [Show full text]
  • Unicode Characters and UTF-8
    Software Design Lecture Notes Prof. Stewart Weiss Unicode and UTF-8 Unicode and UTF-8 1 About Text The Problem Most computer science students are familiar with the ASCII character encoding scheme, but no others. This was the most prevalent encoding for more than forty years. The ASCII encoding maps characters to 7-bit integers, using the range from 0 to 127 to represent 94 printing characters, 33 control characters, and the space. Since a byte is usually used to store a character, the eighth bit of the byte is lled with a 0. The problem with the ASCII code is that it does not provide a way to encode characters from other scripts, such as Cyrillic or Greek. It does not even have encodings of Roman characters with diacritical marks, such as ¦, ¡, ±, or ó. Over time, as computer usage extended world-wide, other encodings for dierent alphabets and scripts were developed, usually with overlapping codes. These encoding systems conicted with one another. That is, two encodings could use the same number for two dierent characters, or use dierent numbers for the same character. A program transferring text from one computer to another would run the risk that the text would be corrupted in the transition. Unifying Solutions In 1989, to overcome this problem, the International Standards Organization (ISO) started work on a universal, all-encompassing character code standard, and in 1990 they published a draft standard (ISO 10646) called the Universal Character Set (UCS). UCS was designed as a superset of all other character set standards, providing round-trip compatibility to other character sets.
    [Show full text]
  • The Unicode Standard, Version 6.1
    Cherokee Range: 13A0–13FF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 6.1 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-6.1/ for charts showing only the characters added in Unicode 6.1. See http://www.unicode.org/Public/6.1.0/charts/ for a complete archived file of character code charts for Unicode 6.1. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 6.1 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 6.1, online at http://www.unicode.org/versions/Unicode6.1.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, and #44, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See http://www.unicode.org/ucd/ and http://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation. Fonts The shapes of the reference glyphs used in these code charts are not prescriptive.
    [Show full text]
  • Omnis for Unicode
    Omnis for Unicode White Paper August 2009 No part of this publication may be reproduced, transmitted, stored in a retrieval system or translated into any language in any form by any means without the written permission of Omnis Software. © Omnis Software, and its licensors 1992-2009. All rights reserved. Portions © Copyright Microsoft Corporation. Regular expressions Copyright (c) 1986,1993,1995 University of Toronto. © 1999-2009 The Apache Software Foundation. All rights reserved. The Omnis product includes software developed by the Apache Software Foundation (http://www.apache.org/). OMNIS® and Omnis Studio® are registered trademarks of Omnis Software Ltd. Microsoft, MS, MS-DOS, Visual Basic, Windows, Windows 95, Win32, Win32s are registered trademarks, and Windows NT, Visual C++ are trademarks of Microsoft Corporation in the US and other countries. SAP, R/3, mySAP, mySAP.com, xApps, xApp, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. IBM, DB2, and INFORMIX are registered trademarks of International Business Machines Corporation. ICU is Copyright © 1995-2003 International Business Machines Corporation and others. UNIX is a registered trademark in the US and other countries exclusively licensed by X/Open Company Ltd. Sun, Sun Microsystems, the Sun Logo, Solaris, Java, and Catalyst are trademarks or registered trademarks of Sun Microsystems Inc. J2SE is Copyright (c) 2003 Sun Microsystems Inc under a licence agreement to be found at: http://java.sun.com/j2se/1.4.2/docs/relnotes/license.html MySQL is a registered trademark of MySQL AB in the United States, the European Union and other countries (www.mysql.com).
    [Show full text]