Fonts with complex OpenType tables
Karel Píška Institute of Physics of the ASCR, v. v. i. piska (at) fzu dot cz August 30, 2010
Abstract experiments and experiences for dialogs and future collaboration with involved people. My main di- The paper presents development of complex Open- rection prefers the usage of fonts within TEX based Type fonts. The sample fonts cover Czech and software providing Unicode and OpenType support Georgian handwriting with numerous letter connec- —X TE EX [9] and ConTEXt/LuaTEX[11]. For Open- tions. Type the abbreviation “OT” will also be used in the At the beginning, we show general principles article. of “advanced typography” — complex metric data represented by OpenType tables (GSUB and GPOS) — and compare them with the ligature and kerning 2 Advanced typography tables in METAFONT. Then we describe a history of the OpenType Under “advanced typography” we can assume not font production — approaches, tools and techniques. only so called OpenType font technologies but also We discuss crucial problems, critical barriers, at- our good “old” TEX&METAFONT capability provid- tempts and ways how to reach successful solutions. ing sophisticated word-processing. We demonstrate several tools for font creating, test- ing, debugging and conversions between various text 2.1 TEX& METAFONT – clear and and binary formats. Among these tools are, for clean example, AFDKO, VOLT, FontForge, TTX, Font- TTF. We illustrate their features, advantages, dis- In fact, advanced typography with METAFONT and advantages, and also cases of possible incompatibil- TEX has been available for TEX users for many ities (or maybe bugs). years. METAFONT contains powerful tools like gen- Finally, we present using the OpenType fonts in eralized ligatures together with boundary charac- the TeX world applications: X TE EX and LuaTEX ters [1]: ( MkIV), the programs allowing to read ConTEXt .mf: ligtable % produces % .tfm/.pl and process OpenType fonts directly. a : b |=:| c ;% acb /LIG/ 1 Key words: font, font production, Unicode, Open- a : b |=:|> c ;% acb /LIG/> 2 Type, GSUB, GPOS; AFDKO, VOLT, FontForge, a : b |=:|>> c ;% acb /LIG/>> 3 TTX, Font-TTF; T X, METAFONT, TFM, X T X, E E E a : b =:| c ;% cb LIG/ 4 , LuaT X. ConTEXt E a : b =:|> c ;% cb LIG/> 5 a : b |=: c ;% ac /LIG 6 1 Introduction a : b |=:> c ;% ac /LIG> 7 a : b =: c ;% c LIG 8 The presented fonts are successors of the META- where FONT fonts designed in 1997–98 by Petr Olšák [2] and Karel Píška [3]. Last year (2009) I was not able 1. retains both a and b, inserts c between: acb to create a complete font with OpenType tables working properly. This year I have finally reached 2. retains both a and b, inserts c between; the positive results. The current article can be con- the processing continues after a: acb sidered as a report summarizing my recent studies,
101 3. retains both a and b, inserts c between; draw (-4,0){right}..{sklon2}(3,6); the processing continues after c: acb endchar; % right end of character 4. retains b, inserts c before b: cb beginchar(1, .7u#, 7u#, 0); 5. retains b, inserts c before b; draw (0,6)..(.7,7); the processing continues after c: cb endchar; 6. retains a, inserts c after a: ac and the ligtable instructions 7. retains a, inserts c after a; ligtable ||: "e" |=:|> 3; % ... the processing continues after a: ac ligtable "s": "t" |=:| 6; % ... boundarychar:=1; 8. substitutes both a and b by c. ligtable "a": rightboundaries; def rightboundaries = Boundary characters. The METAFONT and TEX concept of the “word boundary” (the left and right 1 |=:> 1, boundary characters) allows to “implicit” process- % ...... ing of the beginning and the end of the word, i.e. enddef; a substitution or adjustment of the letters in the invoke inserting the requested strokes in the left “initial” and the “final” position of the word. In boundary point (3), between ‘s’ and ‘t’ (6), and in METAFONT sources the left boundary characters the right boundary position (1). is denoted by "||:", the right boundary character must be introduced as the “real” character using the "boundarychar code";" assignment. These facilities allow to apply substitution and positioning rules with some restrictions: only the e s t a pair of two adjacent characters can be processed, it is impossible to look ahead for longer sequence in a simple way; the maximal of glyphs in one font is e s t a 256. Anyway, definitions of ligatures and kernings in METAFONT and then in TFM and also the pro- cessing algorithm in TEX are clear and clean. We es ta always know the actual position in the input stream METAFONT cannot process in a single and natu- and how to find the next rule from TEX metrics ta- ral way any character sequences consisting of three bles having to be applied. or more characters, e.g. triplets like "UN-KAN-AN" Abilities of METAFONT and TEX will be demon- and "UN-KAN-EN" in Georgian handwriting. strated by two short samples. Primarily, by default, Depending on the following character ("AN", "EN" the Latin (Czech) letters are in the “medial” form, or another) the original (“isolated” by default) glyph without connecting strokes. Then the TEX&MF "KAN" is replaced by its modified form or not: “machinery” joins the adjacent letters in words and adjusts the letters in the initial and final positions. ligtable GR_KAN: GR_AN =:| GR_kan_; The letter ‘e’ is preceded by one of the front-end and then it may be joined to the next character by strokes, ‘s’ and ‘t’ are joined by the correspondent the connecting stroke: inter-letter connecting stroke, and, finally, the last letter in the word is closed by the ending stroke. ligtable GR_kan_: GR_AN |=:| gr_en__an; METAFONT defines several initial, medial and final strokes (depending on concerned letters), for exam- ple: % left deflected end of character უ ა უკე beginchar(3, 6u#, 7u#, 0); draw (0,0){(3,2)}..{sklon2}(6,6); But no substitution and no kerning is defined for endchar; the pairs "UN-KAN" and "KAN-EN". After the pro- % shorter convex stroke for the pairs st,... cessing of the pair consisting of the 2nd and 3rd beginchar(6, 3u#, 7u#, 0); character and substituting of the second glyph we
102 უ ა უკე უკა უკე have no chance to return before the first character At first, we have to create an OpenType font and we are not able to adjust the kerning between properly, using some suitable tools. And, at second, the first and the second, modified, glyph (left) — the font must be in agreement with corresponding while "UN-KAN-EN" (right) needs no substitution or software to execute adequate operations according positioning changes. Two triplets above should be the rules (instruction) defined in the font. processed differently. It may be probably possible but the solution with METAFONT would not be triv- ial. 3 Tools to produce OpenType fonts 2.2 OpenType technology 3.1 Creating OpenType fonts 2.3 Advanced typography with Open- Type We assume that we already have Unicode encoded outline fonts but without OpenType features. We “Old TrueType fonts” can be “enriched” by adding have generated them earlier with FontForge. And “Advanced OpenType Typographic Tables” to pro- our aim and task is to produce OpenType, i.e. to duce fonts in OpenType format. We will not dis- enrich the fonts with OT tables. cuss two different format versions: “new” TTF and My fonts cover Czech, Georgian and also Arme- OTF, because the additional OT tables are com- nian handwriting letter repertoire (taught in pri- mon. mary/elementary schools). Opposite to Czech and Each feature is defined as a system of subsys- Georgian writing, the Armenian letters are designed tems called lookups. Any lookup is described as a and written in a simple way without no special join- subsystem consisted of substitution and positioning ers and there is no necessity to build any OpenType rules. Depending on script and language, a feature support/facility for Armenian. may be enabled or disabled. If the feature is enabled Sample of Armenian: and some lookup, contained in this feature, fulfil the given conditions, then the execution of correspond- ս ա տ ր ազատ ւ հավասար ing operations should be invoked. It is a signal and the real application must be executed by an applica- There is another problem – to distinguish the adja- tion program or operating system, e.g. by means of cent letters. a special library. OpenType introduces substitution In the following paragraphs we show the con- (GSUB), positioning (GPOS), and several other ta- struction of OT fonts using various tools, namely bles. VOLT, FontForge and AFDKO. These tables define the set of rules of several The specification of (binary) OT tables, data types specifying [from OpenType specification [4, formats of the VOLT project files, variants of fea- 5]]: ture files accepted by AFDKO, FontLab, FontForge Glyph substitution (GSUB) rules may all be different. Single, Multiple, Alternate, Ligature, Contextual, Chaining contextual, Extension, and Reverse Chain- 3.2 Managing OpenType with VOLT ing Single Substitution; Glyph positioning (GPOS) rules VOLT (Visual OpenType Layout Tool) [6], free prod- Single adjustment, Pair adjustment, Cursive attach- uct developed by Microsoft and running only un- ment, Mark-to-Base attachment, Mark-to-Ligature der MS Windows, offers an interactive approach to attachment, Mark-to-Mark attachment, Contextual, fill input areas with appropriate parameter values Chaining contextual, and Extension positioning. manually in the VOLT project window. Other pos- From METAFONT we inherited the entire let- sibility is writing and modifying source textual files ters with accents. Therefore we have not needed to in the VOLT project language. In the VOLT in- use marks and anchors and operations with them put area we can enter or in a text editor we have to assembly the complete letters from components to define glyphs (their names, types and code num- (accents, signs, marks, . . . ) and we do not use the bers), glyph groups (glyph sets or glyph lists), con- Mark positioning rules. On the other hand, the text conditions, substitution and positioning rules, fonts contain hundreds contextual substitution and and finally, to complete the hierarchy of scripts, lan- positioning rules. guages, features, and lookups; those data can be
103 saved and (re)read. Such method is reasonable and ALL, DIRECTION LTR or DIRECTION RTL, etc. In our purposeful/meaningful for fonts with several hun- case — DIRECTION LTR (“left to right”) — ‘left’ al- dreds contextual substitution and positional rules ways means ‘before’; similarly ‘right’ and ‘after’ is (our font contains about 350 glyphs, more than also the same. 600 substitutions and about 50 positionings). Of Representation of the VOLT Project data allows course, I have used interactive design and espe- insertions and also different rule types may be in one cially proofing tools for testing tasks but the files lookup because the VOLT compiler accepts such defining OpenType data have been completed in rules and can compile them when converting the the VOLT project (VTP) source/exchange format source data into binary OpenType tables. A final from some tables by scripting and editing of texts. binary font then includes more than 100 internal VOLT allows to read the font only in TrueType features, numbered zz01,. . . , zz99,. . . It means, the format, imports the VOLT project file, compiles higher level of the VOLT project language turns OT data and then generates the font with binary into more complexity of the compiled product. OT tables. I.e., VOLT adds OT tables and proofs The following text demonstrates several com- the features and lookups; it accepts only the fonts plete examples describing the lookups in the VOLT with OT tables produced by VOLT, other OT tables project textual representation as they are present deletes. Moreover, before the tests in the proofing in my source files of the VOLT based fonts. window we must always run (re)compilation even for VOLT based fonts, they embed additionally for 3.2.1 Glyphs, scripts, languages and features proofing the special tables ‘TSID’, ‘TSIP’, ‘TSIS’, and ‘TSIV’. But earlier we will mention other elements of the A general structure of a lookup with substitu- VTP. All glyphs must be listed in the glyph defi- tions in the VOLT project language (in some my nition section, each glyph command must have in symbolic notation) is: the DEF_GLYPH its unique name, its ordinal num- ber (index) in the font ID, its TYPE (BASE, MARK, COMPONENT, LIGATURE), the UNICODE number must DEF_LOOKUP lookup_name lookup_parameters be present for the Unicode coded glyphs and are [ IN_CONTEXT | EXCEPT_CONTEXT missing for the glyphs from Private Use Area (PUA). [ [ LEFT | RIGHT ] glyph_list ] Glyphs can grouped/collected in named groups ... to address the groups (glyph lists) in rules simply END_CONTEXT and shortly. DEF_GROUP commands may consist of ] glyph sequences GLYPH glyph_name, glyph ranges, .... and also other glyph groups, defined elsewhere but AS_SUBSTITUTION without ambiguity. SUB glyph_list WITH glyph_list END_SUB [ SUB glyph_list WITH glyph_list END_SUB ] DEF_GROUP "czever" ... ENUM RANGE "a" TO "z" GROUP "accver" END_SUBSTITUTION END_ENUM END_GROUP It is a sequence of one or more substitution rules and has the common contextual condition. The The names of glyphs and groups must quoted, eg. context may be defined as a compound logical ex- GLYPH "hyphen" or GROUP "czever". pression. During the evaluation process the glyphs from the given glyph lists before (LEFT) or after (RIGHT) are compared relative to the current glyph according their presence (IN_CONTEXT) or absence (EXCEPT_CONTEXT). Sequences of more LEFT or/and RIGHT subconditions can constitute left and right chains, their lengths depend of the numbers of the left and right conditions. It also possible to repeat more IN/EXCEPT_CONTEXT subexpressions, then the total result is their union. The lookup_parameters contain the instructions for processing like PROCESS_BASE, PROCESS_MARKS,
104 3.2.2 Single substitution Switching the corresponding feature we can invoke the substitution of the letters by their short variants. DEF_LOOKUP "GeorAlt" PROCESS_BASE ALL DIRECTION LTR AS_SUBSTITUTION დ SUB GLYPH "uni10D3" WITH GLYPH "GR_varD" END_SUB b n (bn) SUB GLYPH "uni10DA" WITH GLYPH "GR_varL" END_SUB ლ o m (om) SUB GLYPH "uni10DD" WITH GLYPH "GR_varO" END_SUB ო SUB GLYPH "uni10E0" WITH GLYPH "GR_varR" END_SUB v y (vy) END_SUBSTITUTION რ 3.2.3 Ligature substitution s m s m Typical ligatures can be define by unconditional substitutions. DEF_LOOKUP "liga" PROCESS_BASE ALL DIRECTION LTR s t s t AS_SUBSTITUTION s v s v SUB GLYPH "comma" GLYPH "comma" WITH GLYPH "quotedblbase" END_SUB s y s y SUB GLYPH "quoteleft" GLYPH "quoteleft" WITH GLYPH "quotedblleft" END_SUB SUB GLYPH "hyphen" GLYPH "hyphen" GLYPH "hyphen" WITH GLYPH "dash" END_SUBs z s z SUB GLYPH "hyphen" GLYPH "hyphen" WITH GLYPH "minus" END_SUB a z az END_SUBSTITUTION 3.2.4 Contextual substitution ზ ზ ზ Before the letters defined by the right context the letter listed later will beა changed ზ აზ to აზ their narrower variants (the unadjusted versions in parentheses). ნ ზ ნზ ნზ DEF_LOOKUP "CZEbmnvwy" PROCESS_BASE ALL DIRECTION LTR IN_CONTEXT შ ზ შზ შზ RIGHT ENUM GLYPH "m" GLYPH "n" GLYPH "ncaron" GLYPH "v" GLYPH "w" მ ზ მზ მზ GLYPH "y" GLYPH "yacute" END_ENUM END_CONTEXT ზ ზ ზ AS_SUBSTITUTION დ a c d g o q