A Guide to Developing Fonts for Indian Languages

A Guide To Indian Language Font Development Elaborate, Illustrated font development guideline document for scripts of India by font designers, developers, language experts. Editor Santhosh Thottingal Contributors Balasankar C Behdad Esfahbod Hrishikesh Kartik Mistry Kavya Manohar Rahimanuddin Shaik Rajeesh Nambiar Ryan Kaldari Vasudev Kamath March 31, 2014 https://github.com/IndicFontbook/Fontbook Dedicated to all font designers Contents 1 Introduction 1 1.1 Objectives of document ................... 2 1.2 Target Audience for this document ............ 2 1.3 Scope ............................. 2 1.4 How to use this document ................. 3 1.5 Notes on Collaboration ................... 3 2 General Concepts 5 2.1 Script ............................. 5 2.2 Complex script ....................... 5 2.3 Glyph ............................ 6 2.4 Ligatures ........................... 7 2.5 Cluster/Syllable ...................... 8 2.6 Akhand ........................... 8 2.7 Ra Forms .......................... 8 2.8 Split Matra ......................... 8 2.9 Reordering ......................... 9 2.10 Zero Width Joiner ..................... 9 2.11 Zero Width Non Joiner .................. 9 2.12 Stacking ........................... 9 3 Open font format 11 3.1 Introduction ......................... 11 3.2 GPOS ............................. 11 3.3 GSUB ............................. 11 3.4 GDEF ............................. 11 3.5 Shaping Engine ....................... 11 3.6 Shape Glyph sequence ................... 12 3.7 Position Glyph sequence .................. 12 iii 3.8 Reference fonts ....................... 12 3.9 Reference Rendering Engine ................ 12 4 Bengali 13 4.1 Introduction ......................... 13 4.2 Reference fonts ....................... 13 4.2.1 Lohit Bengali .................... 13 4.2.2 History ........................ 13 5 Devanagari 15 5.1 Introduction ......................... 15 5.2 Reference fonts ....................... 15 5.2.1 Lohit Devanagari .................. 15 5.2.2 History ........................ 16 6 Gujarati 17 6.1 Introduction ......................... 17 6.2 Reference fonts ....................... 17 6.2.1 Lohit Gujarati .................... 17 6.2.2 History ........................ 18 7 Kannada 19 8 Malayalam 21 8.1 Introduction ......................... 21 8.2 Orthography variation ................... 22 8.3 Reference fonts ....................... 23 8.3.1 Meera Font ..................... 23 8.3.2 Lohit Malayalam Font ............... 24 8.3.3 History ........................ 24 8.4 Technical details ....................... 25 8.4.1 Opentype Script Tags - mlym and mlm2 .... 25 8.4.2 Reordering ..................... 25 8.4.3 Vowel signs and combining marks ........ 25 8.4.4 Samvruthokaram .................. 28 8.4.5 Conjunct Signs for , , .............. 30 8.4.6 Reph ......................... 30 8.4.7 Dot Reph ....................... 31 8.4.8 Chillus ........................ 32 8.4.9 Stacking ....................... 33 8.4.10 Font metrics ..................... 34 8.4.11 Positioning rules .................. 34 8.4.12 ZWNJ and ZWJ Signs ............... 34 8.4.13 Prebase substitutions ................ 34 8.4.14 Akhand forms .................... 34 8.4.15 Below base forms .................. 34 8.4.16 Below base substitutions .............. 34 8.4.17 Half forms ...................... 34 8.4.18 Postbase forms ................... 34 8.4.19 Latin glyphs and punctuations .......... 34 8.4.20 Kerning ....................... 34 8.4.21 Shape references .................. 34 8.4.22 Left and right bearings ............... 34 8.4.23 Italic variant ..................... 34 8.4.24 Bold variant ..................... 34 8.5 Design ............................ 34 8.5.1 Number of glyphs ................. 34 8.5.2 Guidelines ...................... 35 9 Odiya 39 10 Panjabi 41 11 Tamil 43 12 Telugu 45 12.1 Introduction ......................... 45 13 Glossary 47 14 References 51 15 Appendices 53 1 Introduction "I can't go to a restaurant and order food because I keep looking at the fonts on the menu.-Donald Knuth" One of the integral building blocks for providing multilingual sup- port for digital content are fonts. In current times, OpenType fonts are the choice. With the increasing need for supporting languages beyond the Latin script, the TrueType font specification was extended to include elements for the more elaborate writing systems that exist. This effort was jointly undertaken in the 1990s by Microsoft and Adobe. The outcome of this effort was the OpenType Specification - a succes- sor to the TrueType font specification. Fonts for Indic languages had traditionally been created for the print- ing industry. The TrueType specification provided the baseline for the digital fonts that were largely used in desktop publishing. These fonts however suffered from inconsistencies arising from technical shortcomings like non-uniform character codes. These shortcomings made the fonts highly unreliable for digital content and their use across platforms. The problems with character codes were largely alleviated with the gradual standardization through modification and adoption of Unicode character codes. The OpenType Specification additionally extended the styling and behavior for the typography. The availability of the specification eased the process of creating Indic language fonts with consistent typographic behavior as per the script's requirement. However, disconnects between the styling and techni- 1 cal implementation hampered the font creation process. Several well- stylized fonts were upgraded to the new specification through com- plicated adjustments, which at times compromised on their aesthetic quality. On the other hand, the technical adoption of the specification details was a comparatively new know-how for the font designers. To strike a balance, an initiative was undertaken by the a group of font developers and designers to document the knowledge acquired from the hands own experience for the benefit of upcoming developers and designers in this field. The outcome of the project will be an elaborate, illustrated guideline for font designers. A chapter will be dedicated to each of the Indic scripts - Bengali, Devanagari, Gujarati, Kannada, Malayalam, Odia, Punjabi, Tamil and Telugu. The guidelines will outline the technical representation of the canonical aspects of these complex scripts. This is especially important when designing for complex scripts where the shape or positioning of a character depends on its relation to other characters. This project is open for participation and contributors can commit directly on the project repository. 1.1 Objectives of document 1.2 Target Audience for this document 1.3 Scope • This document covers only languages and scripts recognized in India. • This document covers all complex OFF features required for Indian scripts. • This document is based on the current expertise of community members working in this area. • This document is from the typography perspective for each script and many not be linguistically correct. • This document does not cover design or calligraphy style aspects but covers only technical aspects. • This document is based on Unicode 6.2 and ISO/IEC 14496- 22:2009 (Second Edition) "Open Font Format" standard. • This document is not a tutorial on font design or development and does not teach typography. 1.4 How to use this document Elaborate, Illustrated font design guideline document for Indic fonts by font designers, developers, language experts. - designer freedom to adapt 1.5 Notes on Collaboration 2 General Concepts 2.1 Script At least one set of defined base elements or symbols, is a requirement for all writing systems. these symbols are individually termed as characters and collectively called a script. simply the set of the symbols required to represent a writing system is called a script. a script may in turn be used to represent more than one languages. Latin, Devanagari and Arabic are examples of scripts. English, French, Ger- man, and Latin are all languages written using the Latin script. 2.2 Complex script Complex script is a writing system in which the shape or positioning of a character depends on its relation to other characters. What makes it complex? * Bi-directional: text is written right-to-left (For exam- The Devanagari ddhrya-ligature, as displayed in the ple: Arabic, Hebrew) and left-to-right JanaSanskritSans font. (For example: Devanagari) * Context sensitive shaping and ligatures: Character may change shape depending upon position. 5 * Ordering: In Gujarati, Ki is where 'i' is place before 'K'. //FIXME What is a complex script? What makes it complex, some examples, screenshots 2-3 paragraph + images How it differs from simple scripts like Latin Character 2.3 Glyph A glyph is an element of writing. It can be a single character or a group of characters. Visually, if you see one or more characters form a single visual unit, it is called a glyph. In typography, it is "the specific shape, design, or representation of a character". 1 It is a particular graphical representation, in a particular typeface, of an element of written language, which could be a grapheme, or part of a grapheme, or sometimes several graphemes in combination. To illustrate this concept, a set glyphs inside a Latin font (Fig. 2.1) and a Malayalam font (Fig. 2.2) as seen in fontforge is given below. Figure 2.1: Glyphs inside Roboto font 1Ilene Strizver. "Confusing (and Frequently Misused) Type Terminology, Part 1". fonts.com. Monotype

A Guide to Developing Fonts for Indian Languages

75 Characters Maximum

Uniform Rendering of XML Encoded Mathematical Content with Opentype Fonts

The Unicode Cookbook for Linguists: Managing Writing Systems Using Orthography Profiles

Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress

Handwriting Recognition in Indian Regional Scripts: a Survey of Ofﬂine Techniques

Optical Character Recognition - a Combined ANN/HMM Approach

Opentype Postscript Fonts with Unusual Units-Per-Em Values

CM-Super: Automatic Creation of Efficient Type 1 Fonts from Metafont

License Agreement

Updated Proposal to Encode Tulu-Tigalari Script in Unicode

Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR)

PDF Copy of My Gsoc Proposal