The Unicode Standard, Version 4.0--Online Edition

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edition, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten. Cover and CD-ROM label design: Steve Mehallo, http://www.mehallo.com The publisher offers discounts on this book when ordered in quantity for bulk purchases and special sales. For more information, customers in the U.S. please contact U.S. Corporate and Government Sales, (800) 382-3419, [email protected]. For sales outside of the U.S., please contact International Sales, +1 317 581 3793, [email protected] Visit Addison-Wesley on the Web: http://www.awprofessional.com Library of Congress Cataloging-in-Publication Data The Unicode Standard, Version 4.0 : the Unicode Consortium /Joan Aliprand... [et al.]. p. cm. Includes bibliographical references and index. ISBN 0-321-18578-1 (alk. paper) 1. Unicode (Computer character set). I. Aliprand, Joan. QA268.U545 2004 005.7’2—dc21 2003052158 Copyright © 1991–2003 by Unicode, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or other- wise, without the prior written permission of the publisher or Unicode, Inc. Printed in the United States of America. Published simultaneously in Canada. For information on obtaining permission for use of material from this work, please submit a written request to the Unicode Consortium, Post Office Box 39146, Mountain View, CA 94039-1476, USA, Fax +1 650 693 3010 or to Pearson Education, Inc., Rights and Contracts Department, 75 Arlington Street, Suite 300 Boston, MA 02116, USA, Fax: +1 617 848 7047. ISBN 0-321-18578-1 Text printed on recycled paper 1 2 3 4 5 6 7 8 9 10—CRW—0706050403 First printing, August 2003 Chapter 12 Additional Modern Scripts 12 This chapter contains a collection of additional scripts in modern use that do not fit well into the script categories featured in other chapters: •Ethiopic •Mongolian •Osmanya •Cherokee • Canadian Aboriginal Syllabics •Deseret •Shavian Ethiopic and Mongolian are scripts with long histories. Although their roots can be traced back to the original Semitic writing systems, they would not be classified as Middle Eastern scripts today. The remaining scripts in this chapter have been developed relatively recently. In different ways they show roots in Latin and other letterforms including shorthand, but they are also original creative contributions intended to serve the communities that use them. The Unicode Standard 4.0 8 Aug 03 321 12.1 Ethiopic Additional Modern Scripts 12.1 Ethiopic Ethiopic: U+1200–U+137F The Ethiopic syllabary originally evolved for writing the Semitic language Ge’ez, and indeed the English noun “Ethiopic” simply means “the Ge’ez language.” Ge’ez itself is now limited to liturgical usage, but its script has been adopted for modern use in writing several languages of central east Africa, including Amharic, Tigre, and Oromo. Basic and Extended Ethiopic. The Ethiopic characters encoded here are the basic set that has become established in common usage for writing major languages. As with other pro- ductive scripts, the basic Ethiopic forms are sometimes modified to produce an extended range of characters for writing additional languages. Research is ongoing to identify these extended Ethiopic forms, though many are rare and have scant typographic tradition. Encoding Principles. The syllables of the Ethiopic script are traditionally presented as a two-dimensional matrix of consonant-vowel combinations. The encoding follows this structure; in particular, the codespace range U+1200..U+1357 is interpreted as a matrix of 43 consonants crossed with 8 vowels, making 344 conceptual syllables. Most of these consonant-vowel syllables are represented by characters in the script, but some of them happen to be unused, accounting for the blank cells in the matrix. Variant Glyph Forms. A given Ethiopic syllable may be represented by different glyph forms, analogous to the glyph variants of Latin lowercase “a” or “g”, which do not coexist in the same font. Thus the particular glyph shown in the code chart for each position in the matrix is merely one representation of that conceptual syllable, and the glyph itself is not the object that is encoded. Labialized Subseries. A few Ethiopic consonants have labialized (“W”) forms that are traditionally allotted their own consonant series in the syllable matrix, although only a subset of the possible voweled forms are realized. Each of these derivative series is encoded imme- diately after the corresponding main consonant series. Because the standard vowel series includes both “AA” and “WAA”, two different cells might represent the “consonant + W + AA” syllable. For example: U+1247 = Q + WAA: unused version of U+124B = QW + AA: In these cases, where the two conceptual syllables are equivalent, the entry in the labialized subseries is encoded and not the “consonant + WAA” entry in the main syllable series. The six specific cases are enumerated in Table 12-1. Table 12-1. Labialized Forms in -WAA -WAA Form Encoded as Not Used QWAA U+124B d 1247 QHWAA U+125B e 1257 XWAA U+128B f 1287 KWAA U+12B3 g 12AF KXWAA U+12C3 h 12BF GWAA U+1313 i 130F 322 8 Aug 03 The Unicode Standard 4.0 Additional Modern Scripts 12.1 Ethiopic Also, within the labialized subseries, the sixth vowel (“-E”) forms are sometimes considered to be second vowel (“-U”) forms. For example: U+1249 = QW + U: unused version of U+124D = QW + E: In these cases, where the two syllables are nearly equivalent, the “-E” entry is encoded and not the “-U” entry. The six specific cases are enumerated in Table 12-2. Table 12-2. Labialized Forms in -WE “-WE” Form Encoded as Not Used QWE U+124D j 1249 QHWE U+125D k 1259 XWE U+128D l 1289 KWE U+12B5 m 12B1 KXWE U+12C5 n 12C1 GWE U+1315 o 1311 Keyboard Input. Because the Ethiopic script includes more than 300 characters, the units of keyboard input must constitute some smaller set of entities, typically 43+8 codes interpreted as the coordinates of the syllable matrix. Because these keyboard input codes are expected to be transient entities that are resolved into syllabic characters before they enter stored text, keyboard input codes are not specified in this standard. Syllable Names. The Ethiopic script often has multiple syllables corresponding to the same Latin letter, making it difficult to assign unique Latin names. Therefore the names list makes use of certain devices (such as doubling a Latin letter in the name) merely to create uniqueness; this device has no relation to the phonetics of these syllables in any particular language. Encoding Order and Sorting. The order of the consonants in the encoding is based on the traditional alphabetical order. It may differ from the sort order used for one or another language, if only because in many languages various pairs or triplets of syllables are treated as equivalent in the first sorting pass. For example, an Amharic dictionary may start out with a section headed by three H-like syllables: U+1200 U+1210 U+1280 Thus the encoding order cannot and does not implement a collation procedure for any particular language using this script. Word Separators. The traditional word separator is U+1361 ( : ). In modern usage, a plain white wordspace (U+0020 ) is becoming common. Diacritical Marks. The Ethiopic script generally makes no use of diacritical marks, but they are sometimes employed for scholarly or didactic purposes. In particular, U+0308 and U+030E are sometimes used to indicate emphasis or gemination (consonant doubling). Numbers. Ethiopic digit glyphs are derived from the Greek alphabet, possibly borrowed from Coptic letterforms. In modern use, European digits are often used. The Ethiopic number system does not use a zero, nor is it based on digital-positional notation. A number is denoted as a sequence of powers of 100, each preceded by a coefficient (2 through 99). In each term of the series, the power 100^n is indicated by n HUNDRED characters The Unicode Standard 4.0 8 Aug 03 323 12.1 Ethiopic Additional Modern Scripts (merged to a digraph when n = 2).

The Unicode Standard, Version 4.0--Online Edition

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support