Indic Typesetting – Challenges and Opportunities
Total Page:16
File Type:pdf, Size:1020Kb
Indic Typesetting – Challenges and Opportunities S. Rajkumar Linuxense Information Systems, “Lalita Mandir”, 16/1623, Jagathy, Trivandrum 695014, India [email protected] Abstract Asia boasts a wide variety of scripts, most of which are complex from the perspec- tive of a computer scientist or engineer. This is true in the case of Indic scripts which are classified in the realm of complex scripts. All of the Indic scripts re- quire a special process to transform from the a Unicode text to the actual glyph metrics for TEX to process. In this paper I will talk about the processing of Unicode text to produce high quality typeset material for Indic scripts using OpenType fonts. I will cover the OpenType standards for Indic scripts and other facilities OpenType provides for advanced typesetting. Introduction a full 16 bits and is based on Unicode, and thus supports 64k glyph in a single font. T X has remained and continues to remain one of E OpenType also supports advanced typograph- the best typesetting systems of the world. But with ical control such as ligatures, kerning, small caps the passage of time, T X has been used for purposes E etc, which were available in T X for a long time, that were not taken into account during its design E plus swash variants, contextual ligatures, old-style phase. One such “flaw” is that it was based on 8- figures, multi-script baselines etc, which are not part bit tables internally. This applies among others to of T X. What sets apart OpenType from others that the number of glyph in font files and the number of E offer these features, including T X, is that the ren- characters in a language. The original 8-bit design E derer need not be aware of all the features available is fine for Roman scripts but inadequate for complex in the fonts. OpenType fonts are capable of giving scripts like Indic or CJK scripts. this information to the rendering engine. This en- When T X was designed there were very few E ables the type designer to let her imagination run programs that could really do any meaningful “type- loose without being hampered by the limits of the setting”. Font standards were almost nonexistent. rendering engine. If it is present in the font it will As time passes the non-T X world is also trying to E be used by the renderer. catch up with T X. One such promising technol- E This is in contrast with the T X, where a clear ogy is OpenType. It is an industry standard font E encoding of a font is required for T X to work. This technology formed by the fusion of TrueType and E problem has been sorted out in Latin scripts where Type 1. OpenType has many advanced features the possible numbers of glyph to render a text is that were the exclusive domain of T X so far and E finite and a robust encoding scheme is in place for adds even more. font designers to follow. But not so in Indic scripts. The rest of the paper dwells more on the two In languages like Malayalam there are very many subjects covered above and takes a look at how they conjuncts or ligatures possible, and not all fonts have can be used to advance the capabilities of T X. E all of them. Unlike in Latin scripts, the ligatures are not an advanced typographic tool but an absolute OpenType and Internationalization requirement for legible reading. One of the most exciting internationalization devel- The number and placement of glyphs for high opment happening outside the TEX world is the de- quality typography in Malayalam is still being de- velopment of OpenType technology [1]. OpenType bated. Furthermore, the Malayalam script itself is is an amalgamation of earlier TrueType and Type 1 divided into an original script and a reformed script, technologies and designed from ground up to sup- each with a slightly different set of glyphs and rules port complex scripts like Indic and Arabic. It uses for vowel consonants formation. 90 TUGboat, Volume 23 (2002), No. 1 — Proceedings of the 2002 Annual Meeting Indic Typesetting – Challenges and Opportunities Another problem with Indic computing in gen- Registered Features for Indic Scripts eral is that there are very few high quality fonts Registered features can be divided into language fea- available and thus reusing fonts is a priority. Since tures which encode the linguistic rules; conjuncts the entire non-T X world is moving towards Open- E and typographic forms, which are used for typo- Type as a single font standard, more and more new graphical substitution; and conjunct creation and fonts to appear will be based on OpenType. All this positioning features which are used to position the points to the fact that a possible Omega implemen- various markings for vowel modifiers along the final tation that uses OpenType will be ideal for Indic glyph. The language features listed in their order of languages implementation of T X. E applications are: OpenType Rendering for Indic scripts Linguistic Rules OpenType rendering is the process of converting the nukt This feature takes nominal (full) forms of con- plain Unicode input into a set of glyph indexes in a sonants and produces nukta forms. All nukta font file so that the text can be rendered on a de- forms must be based on an input context con- vice such as a screen or printer. This is much more sisting of the full form of consonants. All con- complex than it sounds, since the font itself con- sonants in a font must have an associated nukta tains meta-information in the form of various fea- form, and nukta forms must exist in the font for tures. Features are script- and language-dependent all glyphs with akhand forms as well. and standard features are registered in the renderer. akhn This feature creates an akhand ligature glyph This enables font designer to use the registered fea- from two consonants in nominal forms sepa- tures so that consistent high quality output is pro- rated by a halant. The input context for the duced when rendered and renderers can be written akhand feature always consists of the full form without being tied to particular fonts. of the consonant. The standards for Indic scripts are designed by rphf Applying this feature produces the reph glyph. Microsoft Corporation [2] and are publically avail- If the first consonant of the cluster consists of able. The latest Microsoft rendering engines support the full form (Ra + Halant), this feature substi- most of the Indic scripts but not all. Yudit [3] is one tutes the combining-mark form of Reph. In ad- free editor which also supports most Indic languages. dition, the glyph that represents the combining- An Indic OpenType rendering engine processes mark form of Reph is repositioned in the glyph text in different stages. These stages include: string: it is attached to the final base glyph 1. analyzing the syllables of the consonant cluster. The input context for 2. reordering characters the Reph feature always consists of the full form of Ra + Halant. 3. shaping (substituting) glyphs according to the blwf Applying this feature creates below-base instructions in the font forms of consonants. The input context for the 4. positioning glyphs ‘below-base form’ feature must always consist of During the analysis stage the engine takes the the full form of the consonant + Halant. The stream of Unicode characters and finds the syllable feature ‘below-base form’ is applied to conso- boundaries. Once a syllable boundary is found it nants having below-base forms and following is analyzed to find the features that can be applied the base consonant. The exception is vattu, to that syllable, such as possibly combining to pro- which may appear below half forms as well as duce a distinct glyph. Next the base consonant is below the base glyph. The feature ‘below-base identified. All other elements are classified accord- form’ will be applied to all such occurrences of ing to the base consonant like pre-base, post-base, Ra as well. etc. Next the components that appear in more than half Applying this feature produces so-called half one side are split into component parts. At this forms: forms of consonants used in pre-base po- point the syllable is reordered according to the ap- sition. Half forms must exist for all consonants propriate rules of the language. The reordered part in the font, and half forms of nukta consonants is passed to the glyph substitution part of the en- and Akhand consonants also must exist. Use gine to obtain a reordered glyph string. The var- the halant form for consonants that do not have ious contextual shape features are applied to this distinct shapes for half forms. This feature is nominal glyph string to obtain the final output for not applied to the base glyph even if the syllable rendering. ends with a halant. TUGboat, Volume 23 (2002), No. 1 — Proceedings of the 2002 Annual Meeting 91 S. Rajkumar pstf Applying this feature creates post-base forms. psts Post-base substitutions: This feature produces Examples include Bengali and Oriya ‘Ya’ and ligatures of the base glyph with post-base forms Malayalam ‘Ya’ and ‘Va’. of consonants. It also produces the correct ty- vatu Vattu variants are formed by combining con- pographic shape when a post-base matra forms sonants with the vattu mark. Vattu ligatures a ligature with the base glyph and different can be either half or full form, and fonts must forms of post-base vowel modifiers like visarga.