Fonts & Encodings
Total Page:16
File Type:pdf, Size:1020Kb
Fonts & Encodings Yannis Haralambous To cite this version: Yannis Haralambous. Fonts & Encodings. O’Reilly, 2007, 978-0-596-10242-5. hal-02112942 HAL Id: hal-02112942 https://hal.archives-ouvertes.fr/hal-02112942 Submitted on 27 Apr 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. ,title.25934 Page iii Friday, September 7, 2007 10:44 AM Fonts & Encodings Yannis Haralambous Translated by P. Scott Horne Beijing • Cambridge • Farnham • Köln • Paris • Sebastopol • Taipei • Tokyo ,copyright.24847 Page iv Friday, September 7, 2007 10:32 AM Fonts & Encodings by Yannis Haralambous Copyright © 2007 O’Reilly Media, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (safari.oreilly.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or [email protected]. Printing History: September 2007: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Fonts & Encodings, the image of an axis deer, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN-10: 0-596-10242-9 ISBN-13: 978-0-596-10242-5 [M] Ubi sunt qui ante nos in mundo fuere ? To the memory of my beloved father, Athanassios-Diomidis Haralambous This book would never have seen the light of day without the help of a number of people, to whom the author would like to express his thanks: • His wife, Tereza, and his elder daughter, Ernestine (“Daddy, when are you going to finish your book?”), who lived through hell for a whole year. • The management of ENST Bretagne, Annie Gravey (chair of his department), and his col- leagues, for encouraging him in this undertaking and tolerating the inconveniences caused by his prolonged absence. • His editor, Xavier Cazin, for his professionalism, his enthusiasm, and his friendship. • Jacques André, for supplying tons of books, articles, leads, addresses, ideas, advice, suggestions, memories, crazy thoughts, etc. • His proofreaders: Jacques André once again, but also Patrick Andries, Oscarine Bosquet, Michel Cacouros, Luc Devroye, Pierre Dumesnil, Tereza Haralambous, John Plaice, Pascal Ru- bini, and François Yergeau, for reviewing and correcting all or part of the book in record time. • The indefatigable George Williams, for never failing to add new features to his FontForge soft- ware at the author’s request. • All those who supported him by providing information or resources: Ben Bauermeister, Gá- bor Bella, Tom Bishop, Thierry Bouche, John Collins, Richard Cook, Simon Daniels, Mark Davis, Lisa Devlin, Bon Hallissy, Ken’ichi Handa, Alan Hoenig, Bogusław Jackowski, Michael Jansson, Ronan Keryell, Alain LaBonté, David Lemon, Ken Lunde, Jim Lyles, Sergey Malkin, Sabine Millecamps (Harrie Potter), Lisa Moore, Tomohiko Morioka, Éric Muller, Paul Nel- son, David Opstad, Christian Paput, Thomas Phinney, Just van Rossum, Emmanuël Souchier, Naoto Takahashi, Bob Thomas, Adam Twardoch, Jürgen Willrodt, and Candy Lee Yiu. • The foundries that supplied fonts or specimens for use in his examples: Justin Howes, P22, Thierry Gouttenègre, Klemens Burkhardt, Hoefler Type Foundry, Typofonderie Porchez, and Fountain Type. • Emma Colby and Hanna Dyer of O’Reilly, for selecting that magnificent buck as the animal on the cover, doubtless because its coat is reminiscent of encoding tables and its antlers suggest the Bézier curves of fonts. • Last but not least, Scott Horne, the heroic translator of this book of more than a thousand pages, who mustered all his energy and know-how to translate the technical terms correctly, adapt the book’s style to the culture of the English-speaking countries, correct countless errors (even in the Chinese passages)—in short, he prepared this translation with the utmost care. Just to cite one example, he translated the third stanza of Gaudeamus Igitur from Latin to ar- chaic English—in verse, no less—for use in the dedication. The author will be forever grateful to him for all these contributions. Contents Introduction 1 Explorations.................................... 3 TheLetterandItsParts........................... 3 LetterpressTypesetting........................... 7 DigitalTypesetting.............................. 11 FontFormats................................. 14 Between Characters and Glyphs: the Problems oftheElectronicDocument.................... 15 TheStructureoftheBookandWaystoUseIt................. 17 HowtoReadThisBook........................... 23 HowtoContactUs................................. 25 1BeforeUnicode 27 FIELDATA..................................... 29 ASCII........................................ 29 EBCDIC....................................... 31 ISO 2022 . ........ 33 ISO 8859 . ........ 35 ISO 8859-1 (Latin-1) and ISO 8859-15 (Latin-9) . 36 ISO 8859-2 (Latin-2) and ISO 8859-16 (Latin-10) . 38 ISO 8859-3 (Latin-3) and ISO 8859-9 (Latin-5) . ........ 39 ISO 8859-4 (Latin-4), ISO 8859-10 (Latin-6), and ISO 8859-13 (Latin-7) . 40 ISO 8859-5, 6, 7, 8, 11 . ........ 41 ISO 8859-14 (Latin-8) . ........ 42 vii viii Contents TheFarEast.................................... 42 Microsoft’scodepages............................... 45 Apple’sencodings................................. 47 Electronicmail................................... 48 TheWeb...................................... 51 2 Characters, glyphs, bytes: An introduction to Unicode 53 Philosophicalissues:charactersandglyphs................... 54 Firstprinciples................................ 58 Technicalissues:charactersandbytes...................... 62 Characterencodingforms.......................... 64 GeneralorganizationofUnicode:planesandblocks.............. 70 The BMP (Basic Multilingual Plane) . ........ 70 Higherplanes................................ 83 Scriptsproposedforaddition........................ 89 3 Properties of Unicode characters 95 Basicproperties.................................. 96 Name..................................... 96 Blockandscript............................... 96 Age...................................... 97 Generalcategory............................... 98 Othergeneralproperties............................. 105 Spaces..................................... 106 Alphabeticcharacters............................ 106 Noncharacters................................ 106 Ignorablecharacters............................. 107 Deprecatedcharacters............................ 107 Logical-orderexceptions........................... 107 Soft-dottedletters.............................. 108 Mathematicalcharacters.......................... 108 Quotationmarks............................... 109 Dashes.................................... 109 Hyphens................................... 109 Contents ix Terminalpunctuation............................ 109 Diacritics................................... 109 Extenders................................... 110 Joincontrol.................................. 110 TheUnicode1nameandISO’scomments................ 110 Propertiesthatpertaintocase.......................... 111 Uppercaseletters............................... 111 Lowercaseletters............................... 112 Simplelowercase/uppercase/titlecasemappings.............. 112 Speciallowercase/uppercase/titlecasemappings.............. 112 Casefolding................................. 113 Renderingproperties............................... 114 TheArabicandSyriacscripts........................ 114 Managinggraphemeclusters........................ 116 Numericproperties................................ 118 Identifiers...................................... 119 ReadingaUnicodeblock............................. 120 4 Normalization, bidirectionality, and East Asian characters 127 DecompositionsandNormalizations...................... 127 CombiningCharacters............................ 127 CompositionandDecomposition..................... 130 NormalizationForms............................ 131 TheBidirectionalAlgorithm........................... 133 Typographyinbothdirections....................... 134 UnicodeandBidirectionality........................ 138 TheAlgorithm,StepbyStep........................ 142 EastAsianScripts................................. 146 IdeographsofChineseOrigin........................ 147 TheSyllabicKoreanHangulScript..................... 155 x Contents 5UsingUnicode 159 InteractiveToolsforEnteringUnicodeCharacters..............