Adobe Tech Note #5148

Total Page:16

File Type:pdf, Size:1020Kb

Adobe Tech Note #5148 SING Glyphlet Production: Tips, Tricks & Techniques Technical Note #5148 ADOBE SYSTEMS INCORPORATED Corporate Headquarters 345 Park Avenue San Jose, CA 95110-2704 August 2, 2006 © 2005, 2006 Adobe Systems Incorporated. All rights reserved. NOTICE: All information contained herein is the property of Adobe Systems Incorporated. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written consent of the publisher. Any software referred to herein is furnished under license and may only be used or copied in accordance with the terms of such license. PostScript is a registered trademark of Adobe Systems Incorporated. All instances of the name PostScript in the text are references to the PostScript language as defined by Adobe Systems Incorporated unless otherwise stated. The name PostScript also is used as a product trademark for Adobe Systems’ implementation of the PostScript language interpreter. Except as otherwise stated, any reference to a “PostScript printing device,” “PostScript display device,” or similar item refers to a printing device, display device or item (respectively) that contains PostScript technology created or licensed by Adobe Systems Incorporated and not to devices or items that purport to be merely compatible with the PostScript language. Adobe, the Adobe logo, Acrobat, the Acrobat logo, Acrobat Capture, Acrobat Exchange, Distiller, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States and/or other countries. Mac OS is a trademark of Apple Computer, Inc., registered in the United States and other countries. OpenType and Windows are either registered trademarks or a trademarks of Microsoft Corporation in the United States and/or other countries. All other trademarks are the property of their respective owners. This publication and the information herein is furnished AS IS, is subject to change without notice, and should not be construed as a commitment by Adobe Systems Incorporated. Adobe Systems Incorporated assumes no responsibility or liability for any errors or inaccuracies, makes no warranty of any kind (express, implied, or statutory) with respect to this publication, and expressly disclaims any and all warranties of merchantability, fitness for particular purposes, and noninfringement of third party rights. Contents Introduction. 1 Conventions . 1 Glyphlet Types & Categories . 1 Glyphlets Based on Known Glyph Collection . 1 Glyphlets Beyond Known Glyph Collections. 2 Generic Glyphlets . 2 Tools . 2 Glyphlet Production . 2 Glyphlet Verification . 3 Source Fonts . 3 Handling and Specifying GID=0 . 3 Font Size Versus Speed . 4 XML Issues. 4 UTF-8 Encoding . 4 UTF-8 Encoding Versus Unicode Character XML References . 4 Unicode Character XML Reference Issues . 5 Minimal XML File . 5 Learning Through Example . 6 META Table—General Issues . 6 Non-BMP Code Points. 6 PUA Code Points . 6 Setting Permissions . 7 Glyphlet Naming Conventions . 7 Localized Menu Names . 8 OpenType Only . 8 Vertical Variants . 8 References to CIDs . 9 META Table—Specific Fields. 9 MojikumiX4051 (ID=0) . 9 UNIUnifiedBaseChars (ID=1) . 9 BaseFontName (ID=2). 10 Language (ID=3). 10 WritingScript (ID=8) . 11 StrokeCount (ID=10). 11 IndexingRadical (ID=11) . 11 August 2, 2006 i Contents RemStrokeCount (ID=12) . 11 DesignCategory (ID=15). 12 DesignWeight (ID=16) . 12 DesignWidth (ID=17) . 12 DesignAngle (ID=18) . 13 CIDBaseChars (ID=20) . 13 IsUpdatedGlyph (ID=21) . 13 VendorGlyphURI (ID=22) . 14 LocalizedInfo (ID=24) . 14 LocalizedBaseFontName (ID=25) . 15 ReadingString: GlyphName (ID=100) . 16 ReadingString: JapaneseYomi (ID=101). 16 ReadingString: JapaneseTranslit (ID=102) . 16 Additional ReadingString Fields (IDs 103 through 114) . 17 AltIndexingRadical Fields (IDs 200 through 202) . 17 AltLookup Fields (IDs 250 through 257) . 18 UNICODEproperty: GeneralCategory (ID=500) . 18 Additional UNICODEproperty Fields (IDs 501 through 505). 19 UNIDerivedByOpenType (ID=602). 19 UNIOpenTypeYields (ID=603) . 19 CIDDerivedByOpenType (ID=604). 20 CIDOpenTypeYields (ID=605) . 21 BASE Table Overrides . 21 User-defined & Vendor-specific Metadata . 21 ii August 2, 2006 1 SING Glyphlet Production 1.1 Introduction This tutorial is intended for font developers who plan to build SING glyphlets, and is specifically designed to provide tips, tricks, and techniques that go beyond the standard documentation and specifications, and in effect, offer a more easily understood interpretation of them. Note that this document is not a substitute for the standard documentation and specifications, but rather complements them. While this initial version is written in English, a Japanese translation is planned and forthcoming. Until the Japanese translation is available, please note that this English version was specifically written to be clear, concise, and easy to understand. If you find any errors or omissions in this Adobe Tech Note, please contact its author, Dr. Ken Lunde, at the following email address, in English or in Japanese: [email protected] 1.1.1 Conventions This document assumes that the Adobe-supplied cvt2sing tool is used for building glyphlets. It is available for a variety of platforms, such as Windows and Mac OS X. Unicode values are indicated through the use of the standard “U+” prefix notation, and does not imply a particular encoding. For example, U+0020 represents an ASCII “space,” U+4E00 represents the CJK Unified Ideograph that means one, and U+20000 represents the first character in Plane 2. 1.2 Glyphlet Types & Categories Glyphlets, in terms of the glyphs that they carry throughout the workflow, can be categorized in different ways, yet they are ways that are regular and predictable. Each category has different methods for constructing the metadata, in terms of its content. 1.2.1 Glyphlets Based on Known Glyph Collection Under some circumstances, a type foundry may wish to make the glyphs for some of their designs available as individual glyphlets, rather than issuing a complete font. In this case, the glyphlets can be from an established glyph collection, such as Adobe-Japan1-5 or Adobe- Japan1-6. Information stored in the ‘META’ table makes the relationship to a known glyph collection explicit. August 2, 2006 1 1 SING Glyphlet Production Tools 1.2.2 Glyphlets Beyond Known Glyph Collections For glyphs that are not found in known glyph collections—meaning that they are variants of glyphs in known glyph collections—or for glyphs of parent characters not covered by known glyph collections, glyphlets can be built to make these accessible by users through existing (installed) fonts or as separate glyphs. The metadata that is recorded in the ‘META’ table becomes critical in allowing users to enter the glyphs into their documents. The more rich the metadata, the greater the ease in which users can enter the glyphs. 1.2.3 Generic Glyphlets Generic glyphs, whether they are considered variants of glyphs in known glyph collections, or beyond the scope of those glyph collections, benefit from an adequate amount of metadata to allow users to enter them into their documents. Corporate logos are but a single example of a generic glyphlet. 1.3 Tools A variety of tools can be used to create and verify glyphlets. The tools supplied by Adobe are generally cross-platform, and will run on Mac OS X and Windows. Always be sure that you are using the latest version of these tools, to ensure that the best possible glyphlets are being built. Also ensure that you are using the latest version of the specifications. 1.3.1 Glyphlet Production The cvt2sing tool is the general-purpose glyphlet production tool. There are two modes available. One is single-glyphlet mode, meaning that a single execution builds a single glyphlet, specifying parameters on the command line. The other is batch mode, in which a single file serves as input to cvt2sing, each line of which contains the parameters for building a single glyphlet. The following command lines build two glyphlets, using CID 23056 and 23057 of R.cff as the source font, and using the ADBE_KozMin-R_C23056.xml and ADBE_KozMin- R_C23057.xml files as metadata, respectively: % cvt2sing -g /23056 -m XML/ADBE_KozMin-R_C23056.xml -dd ./GAI R.cff % cvt2sing -g /23057 -m XML/ADBE_KozMin-R_C23057.xml -dd ./GAI R.cff The following command line builds the same glyphlets, but uses batch mode: % cvt2sing -s R.txt The following lines represent the contents of the R.txt file: 2 August 2, 2006 SING Glyphlet Production 1 Source Fonts -g /23056 -m XML/ADBE_KozMin-R_C23056.xml -dd ./GAI R.cff -g /23057 -m XML/ADBE_KozMin-R_C23057.xml -dd ./GAI R.cff 1.3.2 Glyphlet Verification Besides using the glyphlets within a workflow, or with applications that are SING-savvy, two Adobe-supplied tools are useful for inspecting the contents of glyphlets. The OTFProof tool allows any ‘sfnt’ table to be viewed, usually in human-readable form, and often in more than one way. The tx tool can be used to inspect the ‘CFF’ tables of glyphlets, and can also be used to subset the source fonts, which is a technique for increasing the speed at which cvt2sing runs. This technique is described elsewhere in this document. 1.4 Source Fonts There are several factors to consider when preparing the source fonts, specifically the fonts from which cvt2sing extracts the glyphs for building glyphlet files. Note that if a CIDFont file or a CID-keyed OpenType font is used as the source font for glyphlets, whatever glyph collection designation (/Registry, /Ordering, and /Supplement, such as “Adobe-Japan1-6”) that the source font advertises will be flattened to become “Adobe- Identity-0” in the glyphlet’s ‘CFF’ table. Adobe-Identity-0 is a language- and script- independent glyph collection designation. If a name-keyed font is used as the source font, no glyph collection designation will be advertised in the resulting glyphlet’s ‘CFF’ table. 1.4.1 Handling and Specifying GID=0 GID=0 in a glyphlet file is extracted from the source font.
Recommended publications
  • 1 Introduction 1
    The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
    [Show full text]
  • SUPPORTING the CHINESE, JAPANESE, and KOREAN LANGUAGES in the OPENVMS OPERATING SYSTEM by Michael M. T. Yau ABSTRACT the Asian L
    SUPPORTING THE CHINESE, JAPANESE, AND KOREAN LANGUAGES IN THE OPENVMS OPERATING SYSTEM By Michael M. T. Yau ABSTRACT The Asian language versions of the OpenVMS operating system allow Asian-speaking users to interact with the OpenVMS system in their native languages and provide a platform for developing Asian applications. Since the OpenVMS variants must be able to handle multibyte character sets, the requirements for the internal representation, input, and output differ considerably from those for the standard English version. A review of the Japanese, Chinese, and Korean writing systems and character set standards provides the context for a discussion of the features of the Asian OpenVMS variants. The localization approach adopted in developing these Asian variants was shaped by business and engineering constraints; issues related to this approach are presented. INTRODUCTION The OpenVMS operating system was designed in an era when English was the only language supported in computer systems. The Digital Command Language (DCL) commands and utilities, system help and message texts, run-time libraries and system services, and names of system objects such as file names and user names all assume English text encoded in the 7-bit American Standard Code for Information Interchange (ASCII) character set. As Digital's business began to expand into markets where common end users are non-English speaking, the requirement for the OpenVMS system to support languages other than English became inevitable. In contrast to the migration to support single-byte, 8-bit European characters, OpenVMS localization efforts to support the Asian languages, namely Japanese, Chinese, and Korean, must deal with a more complex issue, i.e., the handling of multibyte character sets.
    [Show full text]
  • Legacy Character Sets & Encodings
    Legacy & Not-So-Legacy Character Sets & Encodings Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/unicode/iuc15-tb1-slides.pdf Tutorial Overview dc • What is a character set? What is an encoding? • How are character sets and encodings different? • Legacy character sets. • Non-legacy character sets. • Legacy encodings. • How does Unicode fit it? • Code conversion issues. • Disclaimer: The focus of this tutorial is primarily on Asian (CJKV) issues, which tend to be complex from a character set and encoding standpoint. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations dc • GB (China) — Stands for “Guo Biao” (国标 guóbiâo ). — Short for “Guojia Biaozhun” (国家标准 guójiâ biâozhün). — Means “National Standard.” • GB/T (China) — “T” stands for “Tui” (推 tuî ). — Short for “Tuijian” (推荐 tuîjiàn ). — “T” means “Recommended.” • CNS (Taiwan) — 中國國家標準 ( zhôngguó guójiâ biâozhün) in Chinese. — Abbreviation for “Chinese National Standard.” 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • GCCS (Hong Kong) — Abbreviation for “Government Chinese Character Set.” • JIS (Japan) — 日本工業規格 ( nihon kôgyô kikaku) in Japanese. — Abbreviation for “Japanese Industrial Standard.” — 〄 • KS (Korea) — 한국 공업 규격 (韓國工業規格 hangug gongeob gyugyeog) in Korean. — Abbreviation for “Korean Standard.” — ㉿ — Designation change from “C” to “X” on August 20, 1997. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • TCVN (Vietnam) — Tiu Chun Vit Nam in Vietnamese. — Means “Vietnamese Standard.” • CJKV — Chinese, Japanese, Korean, and Vietnamese. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated What Is A Character Set? dc • A collection of characters that are intended to be used together to create meaningful text.
    [Show full text]
  • Implementing Cross-Locale CJKV Code Conversion
    Implementing Cross-Locale CJKV Code Conversion Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-paper.pdf ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-slides.pdf Code Conversion Basics dc • Algorithmic code conversion — Within a single locale: Shift-JIS, EUC-JP, and ISO-2022-JP — A purely mathematical process • Table-driven code conversion — Required across locales: Chinese ↔ Japanese — Required when dealing with Unicode — Mapping tables are required — Can sometimes be faster than algorithmic code conversion— depends on the implementation September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Code Conversion Basics (Cont’d) dc • CJKV character set differences — Different number of characters — Different ordering of characters — Different characters September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Character Sets Versus Encodings dc • Common CJKV character set standards — China: GB 1988-89, GB 2312-80; GB 1988-89, GBK — Taiwan: ASCII, Big Five; CNS 5205-1989, CNS 11643-1992 — Hong Kong: ASCII, Big Five with Hong Kong extension — Japan: JIS X 0201-1997, JIS X 0208:1997, JIS X 0212-1990 — South Korea: KS X 1003:1993, KS X 1001:1992, KS X 1002:1991 — North Korea: ASCII (?), KPS 9566-97 — Vietnam: TCVN 5712:1993, TCVN 5773:1993, TCVN 6056:1995 • Common CJKV encodings — Locale-independent: EUC-*, ISO-2022-* — Locale-specific: GBK, Big Five, Big Five Plus, Shift-JIS, Johab, Unified Hangul Code — Other: UCS-2, UCS-4, UTF-7, UTF-8,
    [Show full text]
  • JFP Reference Manual 5 : Standards, Environments, and Macros
    JFP Reference Manual 5 : Standards, Environments, and Macros Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 817–0648–10 December 2002 Copyright 2002 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, docs.sun.com, AnswerBook, AnswerBook2, and Solaris are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements.
    [Show full text]
  • Implementing Cross-Locale CJKV Code Conversion
    Implementing Cross-locale CJKV Code Conversion Ken Lunde, Adobe Systems Incorporated [email protected] http://www.oreilly.com/~lunde/ 1. Introduction Most operating systems today deal with single locales. Within a single CJKV locale, different operating sys- tems often use different encodings for the same character set. Consider Shift-JIS and EUC-JP encodings for Japanese—Shift-JIS is historically used on MacOS and Windows, but EUC-JP is used on Unix. This makes code conversion a necessity. Code conversion within a single locale is, by and large, a trivial operation that typically involves a mathematical algorithm. In the past, a lot of code conversion was performed by users through dedicated software tools. Many of today’s applications include built-in code conversion routines, but these routines deal with only multiple encodings of a single locale (such as EUC-KR, ISO-2022-KR, Johab, and Unified hangul Code encodings for Korean). Code conversion across CJKV locales, such as between Chinese and Japanese, is more problematic. While Unicode serves as an excellent basis for implementing cross-locale code conversion, there are still problems to be addressed, such as unmappable characters. 2. Code Conversion Basics Converting between different encodings of a single locale, which represents a trivial effort that involves well- established code conversion algorithms (or mapping tables), is a well-understood process these days. How- ever, as soon as code conversion extends beyond a single locale, there are additional complexities that arise, such as the following: • Code conversion algorithms almost always must be replaced by mapping tables because the ordering of characters in different CJKV character sets are different.
    [Show full text]
  • NAME DESCRIPTION Supported Encodings
    Perl version 5.8.6 documentation - Encode::Supported NAME Encode::Supported -- Encodings supported by Encode DESCRIPTION Encoding Names Encoding names are case insensitive. White space in names is ignored. In addition, an encoding may have aliases. Each encoding has one "canonical" name. The "canonical" name is chosen from the names of the encoding by picking the first in the following sequence (with a few exceptions). The name used by the Perl community. That includes 'utf8' and 'ascii'. Unlike aliases, canonical names directly reach the method so such frequently used words like 'utf8' don't need to do alias lookups. The MIME name as defined in IETF RFCs. This includes all "iso-"s. The name in the IANA registry. The name used by the organization that defined it. In case de jure canonical names differ from that of the Encode module, they are always aliased if it ever be implemented. So you can safely tell if a given encoding is implemented or not just by passing the canonical name. Because of all the alias issues, and because in the general case encodings have state, "Encode" uses an encoding object internally once an operation is in progress. Supported Encodings As of Perl 5.8.0, at least the following encodings are recognized. Note that unless otherwise specified, they are all case insensitive (via alias) and all occurrence of spaces are replaced with '-'. In other words, "ISO 8859 1" and "iso-8859-1" are identical. Encodings are categorized and implemented in several different modules but you don't have to use Encode::XX to make them available for most cases.
    [Show full text]
  • Rocket Universe NLS Guide
    Rocket UniVerse NLS User Guide Version 12.1.1 June 2019 UNV-1211-NLS-1 Notices Edition Publication date: June 2019 Book number: UNV-1211-NLS-1 Product version: Version 12.1.1 Copyright © Rocket Software, Inc. or its affiliates 1985–2019. All Rights Reserved. Trademarks Rocket is a registered trademark of Rocket Software, Inc. For a list of Rocket registered trademarks go to: www.rocketsoftware.com/about/legal. All other products or services mentioned in this document may be covered by the trademarks, service marks, or product names of their respective owners. Examples This information might contain examples of data and reports. The examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. License agreement This software and the associated documentation are proprietary and confidential to Rocket Software, Inc. or its affiliates, are furnished under license, and may be used and copied only in accordance with the terms of such license. Note: This product may contain encryption technology. Many countries prohibit or restrict the use, import, or export of encryption technologies, and current use, import, and export regulations should be followed when exporting this product. 2 Corporate information Rocket Software, Inc. develops enterprise infrastructure products in four key areas: storage, networks, and compliance; database servers and tools; business information and analytics; and application development, integration, and modernization. Website: www.rocketsoftware.com Rocket Global Headquarters 77 4th Avenue, Suite 100 Waltham, MA 02451-1468 USA To contact Rocket Software by telephone for any reason, including obtaining pre-sales information and technical support, use one of the following telephone numbers.
    [Show full text]
  • XML in the Development of Component Systems
    Data-centric XML Character Sets © 2009 Martin v. Löwis © 2009 Martin v. Character Sets: Rationale •Computer stores data in sequences of bytes – each byte represents a value in range 0..255 •Text data are intended to denote characters, not numbers •Encoding defines a mechanism to associate bytes and characters •Encoding can only cover finite number of character ↠ character set – Many terminology issues (character set, repertoire, encoding, coded character set, …) © 2009 Martin v. Löwis © 2009 Martin v. Datenorientiertes XML 2 Character Sets: History • ASCII: American Standard Code for Information Interchange – 7-bit character set, 1963 proposed, 1968 finalized • ANSI X3.4-1986 – 32(34) control characters, 96(94) graphical characters – Also known as CCITT International Alphabet #5 (IA5), ISO 646 • national variants, international reference version • DIN 66003: @ vs. §, [ vs. Ä, \ vs. Ö, ] vs. Ü, … © 2009 Martin v. Löwis © 2009 Martin v. Datenorientiertes XML 3 Character Sets: History (2) • 8-bit character sets: 190..224 graphic characters • ISO 8859: European/Middle-East alphabets – ISO-8859-1: Western Europe (Latin-1) – ISO-8859-2: Central/Eastern Europe (Latin-2) – ISO-8859-3: Southern Europe (Latin-3) – ISO-8859-4: Northern Europe (Latin-4) – ISO-8859-5: Cyrillic – ISO-8859-6: Arabic – ISO-8859-7: Greek – ISO-8859-8: Hebrew – ISO-8859-9: Turkish (Latin-5; replace Icelandic chars with Turkish) – ISO-8859-10: Nordic (Latin-6; Latin 4 + Inuit, non-Skolt Sami) – ISO-8859-11 (1999): Thai – ISO-8859-13: Baltic Rim (Latin-7) – ISO-8859-14: Celtic (Latin-8) – ISO-8859-15: Western Europe (Latin-9, Latin-1 w/o fraction characters, plus Euro sign, Š, Ž, Œ, Ÿ) – ISO-8859-16: European (Latin-10, omit many symbols in favor of letters) © 2009 Martin v.
    [Show full text]
  • The Unicode Standard, Version 3.0, Issued by the Unicode Consor- Tium and Published by Addison-Wesley
    The Unicode Standard Version 3.0 The Unicode Consortium ADDISON–WESLEY An Imprint of Addison Wesley Longman, Inc. Reading, Massachusetts · Harlow, England · Menlo Park, California Berkeley, California · Don Mills, Ontario · Sydney Bonn · Amsterdam · Tokyo · Mexico City Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. If these files have been purchased on computer-readable media, the sole remedy for any claim will be exchange of defective media within ninety days of receipt. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten. ISBN 0-201-61633-5 Copyright © 1991-2000 by Unicode, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or other- wise, without the prior written permission of the publisher or Unicode, Inc.
    [Show full text]
  • Adobe Technical Note #5078: the Adobe-Japan1-6 Character Collection 2
    Adobe Enterprise & Developer Support bc Adobe Technical Note #5078 The Adobe-Japan1-6 Character Collection Introduction The purpose of this document is to define and describe the Adobe-Japan1-6 character collection, which enumerates 23,058 glyphs, and whose designation is derived from the following three /CIDSystemInfo dictionary entries: ● /Registry (Adobe) ● /Ordering (Japan1) ● /Supplement 6 CIDFont resources that reference this character collection must include a /CIDSystemInfo dictionary that matches the /Registry and /Ordering strings shown above. This document is designed for font developers, for the purpose of developing Japanese fonts for use with PostScript products, or for developing OpenType Japanese fonts. It is also useful for application developers and end users who need to know more about the glyphs in this character collection. This document expects that its readers are familiar with the CID-keyed font file format, which is described in Adobe Technical Note #5014, entitled Adobe CMap and CIDFont Files Specification.* A character collection contains the glyphs that are required to develop font products for a specific language, script, or market. Specific encodings are defined through the use of CMap resources that are instantiated as files, and generally reference a subset of the character collection. The character collection that results from each Supplement includes the glyphs associated with all earlier Supplements. For example, Supplement 6 includes all glyphs defined in Supplements 0 through 5. The Adobe-Japan1-6 character collection enumerates 23,058 glyphs, specifically CIDs 0 through 23057, among seven Supplements, designated 0 through 6. Adobe-Japan1-6 completely supports the current JIS (Japanese Industrial Standard) character set standards, and earlier vintages thereof, specifically JIS X 0208:1997, JIS X 0213:2004, and JIS X 0212-1990.
    [Show full text]
  • XML in the Development of Component Systems
    Data-centric XML Character Sets © 2010 Martin v. Löwis 2010© Martin v. Montag, 3. Mai 2010 Character Sets: Rationale •Computer stores data in sequences of bytes – each byte represents a value in range 0..255 •Text data are intended to denote characters, not numbers •Encoding defines a mechanism to associate bytes and characters •Encoding can only cover finite number of character ↠ character set – Many terminology issues (character set, repertoire, encoding, coded character set, …) © 2010 Martin v. Löwis 2010© Martin v. Datenorientiertes XML 2 Montag, 3. Mai 2010 Character Sets: History • ASCII: American Standard Code for Information Interchange – 7-bit character set, 1963 proposed, 1968 finalized • ANSI X3.4-1968 – 32(34) control characters, 96(94) graphical characters – Also known as CCITT International Alphabet #5 (IA5), ISO 646 • national variants, international reference version • DIN 66003: @ vs. §, [ vs. Ä, \ vs. Ö, ] vs. Ü, … © 2010 Martin v. Löwis 2010© Martin v. Datenorientiertes XML 3 Montag, 3. Mai 2010 Character Sets: History (2) • 8-bit character sets: 190..224 graphic characters • ISO 8859: European/Middle-East alphabets – ISO-8859-1: Western Europe (Latin-1) – ISO-8859-2: Central/Eastern Europe (Latin-2) – ISO-8859-3: Southern Europe (Latin-3) – ISO-8859-4: Northern Europe (Latin-4) – ISO-8859-5: Cyrillic – ISO-8859-6: Arabic – ISO-8859-7: Greek – ISO-8859-8: Hebrew – ISO-8859-9: Turkish (Latin-5; replace Icelandic chars with Turkish) – ISO-8859-10: Nordic (Latin-6; Latin 4 + Inuit, non-Skolt Sami) – ISO-8859-11 (1999): Thai – ISO-8859-13: Baltic Rim (Latin-7) – ISO-8859-14: Celtic (Latin-8) – ISO-8859-15: Western Europe (Latin-9, Latin-1 w/o fraction characters, plus Euro sign, Š, Ž, Œ, Ÿ) – ISO-8859-16: European (Latin-10, omit many symbols in favor of letters) © 2010 Martin v.
    [Show full text]