Extending TEXforUnicode

Richard J. Kinch 6994 Pebble Beach Ct Lake Worth FL 33467 USA Telephone (561) 966-8400 FAX (561) 966-0962 [email protected] URL: http://styx.ios.com/~kinch

Abstract TEX began its “childhood” with 7-bit-encoded , and has entered adolescence with 8-bit encodings such as the Cork standard. Adulthood will require TEX to embrace 16-bit encoding standards such as . Omega has debuted as a well-designed extension of the TEX formatter to accommodate Unicode, but much new work remains to extend the fonts and DVI translation that make up the bulk of a complete TEX implementation. Far more than simply doubling the width of some variables, such an extension implies a massive reorganization of many components of TEX. We describe the many areas of development needed to bring TEX fully into multi-byte encoding. To describe and illustrate these areas, we introduce the r TrueTEX Unicode edition, which implements many of the extensions using the Windows Graphics Device Interface and TrueType scalable technology.

Integrating TEXandUnicode especially to a commercial product in an inter- national marketplace. You cannot use T X for long without discovering E • With access to Unicode fonts, the natural that is a big, messy issue in every ability of T X to process the large character sets implementation. The promise of Unicode, a 16-bit E of the Asian continent will be realized. Methods character-encoding standard [15, 14], is to clean up such as the Han unification will be accessible. the mess and simplify the issues. • T X will install with fewer font and driver files. While Omega [5, 13] has upgraded TEX-to-DVI E translation to handle Unicode [3], the fonts and Many 8-bit fonts will fit into one 16-bit font, and in systems like Windows, which treat fonts DVI-to-device translators are far too entrenched in narrow encodings to be easily upgraded. This paper as a system-wide resource, fewer fonts are an will develop the concepts needed to create Unicode advantage. Only one application will be needed to translate from Unicode DVI to output device. TEX fonts and DVI translators, and exhibit our • T X documents will convert to other portable progress in the TrueTEX Unicode edition. E forms (like PDF,OpenDoc,orHTML) and will A fully Unicode-capable TEX brings many substantial benefits: work with Windows OLE, without tricks and without pain. • TEX will work smoothly with non-TEXfonts. • and other TEX fonts will be While TEX already has a degree of access to usable in non-TEX Unicode applications. The 8-bit PostScript and TrueType fonts, there are 8-bit encoding problems have broken Computer many limitations that Unicode can eliminate. Modern on every variety of Windows. • When 16-bit encodings overcome the resistance • TEX will eliminate the last vestiges of its deep- of the past — and we have every reason to hope seated bias for the English language and US ver- that they will — TEX will play a continuing role sions of multilingual platforms like Windows. It in of the future, instead of becoming will adapt freely and instantaneously to other an antique. languages, not just in the documents produced, Claiming these promises involves some trouble along but in its run-time messages and user interface. the way, but without 16 bits to use for encoding, we This flexibility is crucial to quality software, will never have a solid solution.

TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting 147 Richard J. Kinch

Let us survey in the rest of this paper what is Two, these tables have to be stored in files, needed to achieve these various aspects of integrat- and we need to carefully and deliberately extend ing TEXandUnicode. the file formats to handle the extensions. Not only could we come up with a bad design that limits us Omega and Unicode unnecessarily in the future, but all the old TEXware The goal of our work has been to create a Unicode- has to be upgraded, and then we have to port the capable DVI translator, and to reorganize the TEX upgrades. fonts into a Unicode encoding. TEX itself (that Three, we must rationalize the old ad-hoc is, the formatter) already has a Unicode successor, character sets into a big union set. Just cataloging namely Omega. and managing this data is a large task: many The chief advance of Omega is that it gener- tens of thousands of items, where we used to have alizes the TEX formatter to handle wider encod- only hundreds before. Some degree of database ings. What Omega is less concerned with is the management tools must be applied to get the codes DVI format (which has always provided for wider into a form which we can compile into software; it encodings, up to 32 bits), the encoding of the exist- is not enough to just type in some array initializers ing TEX fonts, and the translation of .dvi files for here and there. output devices. In fact, Omega has side-stepped DVI Encoding standards are necessarily incomplete translation altogether with its extended xdvicopy or imprecise in some aspects, and none fit the translation, whereby Omega operates within the old TEX enterprise. While many of the Unicode math environment of 8-bit TEX fonts and the old DVI symbols were taken from TEX, many of the TEX translators. Since Unicode rendering is supported characters are missing from Unicode. But Unicode on Windows but not on other popular TEX platforms is about the closest encoding to TEXmaththat (, DOS, etc.), a devotion to Omega’s porta- we can expect from an unspecialized encoding, and bility requires that Omega use the old fonts and with Unicode we gain a powerful connection to DVI translators. Lacking any compulsion to extend multilingual character sets. the DVI translators for Unicode, the Omega project Extending Computer Modern to Unicode has justifiably invested most of its effort into earlier stages of the process [4, page 426]. A “rational” encoding establishes a mapping of char- Our Unicode TEX fonts and Unicode DVI acter names to unique integers, and this mapping translator, while having a natural connection to does not vary from font to font. The Computer Omega, are capable of connecting TEX82 to Unicode Modern fonts were not encoded rationally. For as well. Through the mechanism of virtual fonts [11], example, code 0x7b is overloaded about 8 different TEX can access Unicoded fonts while using its old characters, and character dotlessi appears in differ- 8-bit encodings itself. ent codes in different fonts. Given an 8-bit limit on encoding, this was inevitable. But this makes What’s the fuss? for many troubles; moving up to a rational, 16-bit Wishing for 16 bits of Unicode sounds like, hey encoding is a clean solution. presto, we just widen some integer types, double Computer Modern is also “incomplete” in the some constants, and type “make” somewhere very, sense that if you made a table having on one axis very high in a directory tree. The task is far from the list of all the specific styles (Roman, Italic, this simple for several reasons: typewriter, sans serif, etc.) and on the other axis One, TEXandTEXware is full of 256-member all the characters in all the fonts (A–Z, ,

tables which enumerate all code points. These , math symbols, etc.), the table would have

AFONT would have to grow to 65,536 members. While lots of holes when it came to what MET Haralambous and Plaice want us to agree that source exists. Commercial text fonts have all of this is “impossible” for practical reasons [6], they these holes filled, or at least the regions populated assume that we are not going to re-implement the in the table are rectangular. In Computer Modern 8-bit-encoded software for sparse arrays. Applying the regions are randomly shaped. sparse-array techniques to manage per-character Furthermore, the character axis of this (very data will avoid an impossible increase in execution large) imaginary table is missing many characters time and/or memory, although it will require an considered important in non-TEX encodings. For initial extra effort to upgrade the software. example, ANSI characters like florin, perthousand,

148 TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting Extending TEXforUnicode cent, currency, yen, brokenbar, etc., are not imple- names1 in common use come from an assort- mented in Computer Modern. Certain of these sym- ment of sources, and they exhibit inconsisten- bols can be unapologetically composed from exist- cies, conflicts, and ambiguities which frustrate ing Computer Modern symbols: a Roman multiply computation of set projections and joins. or divide would come from the math symbol font, • A database of TEX encodings, which tells which trademark from the T and M of a smaller optical characters appear in which TEXfonts.TEX

size, and so on. But many other characters will uses a gumbo of no less than 28 (!) distinct

AFONT just have to be autographed anew in MET encodings (Table 1). This number may come (at least one related work is in progress [6]). as a surprise, but has been hidden by the

The job of extending Computer Modern to be a

AFONT of MET source files. The sum of all TEX rational and complete set of fonts first requires that encodings constitutes a database of 3415 name- we reorganize the existing characters into a clean, to-code pairs, each name being taken from the 16-bit encoding. Then we are in a strong position 1108 members of TeXunion.Wedesignate to fill in the missing characters. this set of encodings (that is, a set of mappings We not only want to give TEX access to 16-bit- of names to integers) as TeXencod. encoded fonts, we also want the converse: non-TEX As if the miasmic fog of encoding conventions applications to have access to the Computer Modern were not confused enough, small-caps fonts fonts in TrueType form. This mandates adherence present still more encoding problems. They to the Unicode standard wherever possible, and represent an axis of variation that is hardly an organized method to manage the non-Unicode defined in the usual set of font parameters. characters in Computer Modern. We must consider small-caps characters to be Here is a list of the components we consider different from their corresponding parents. If essential to a Unicode rationalization of Computer this is not done, then there is no way to Modern. In this list we take a different approach compose virtual small-caps fonts from their from Haralambous’ Unicode Computer Modern lowercase counterparts, because we would have

project [7], which is aimed at producing virtual fonts no way of knowing which characters are to

AFONT which resolve to 8-bit .pk fonts from MET . shrink (a jumbled set of letters and accented Our aim is a set of Unicoded TrueType fonts. letters) and which do not (punctuation and

all the rest). Thus, for each encoding used

AFONT • A MET -to-outline converter, a very diffi- cult although not impossible task, as illustrated anywhere by a small-caps font, we must make in MetaFog [9]. a duplicate small-caps version (altering the lowercase character names to small-caps names) • A database system to treat the converted of the encoding in the list of all the encodings. Computer Modern as atoms, for input Thus we have a csc2 for roman2, t1csc for t1, to a TrueType font-builder. and so on. To produce these duplicate encodings, we • A database of character names which covers all need a rule to convert lowercase names (both the characters in the TEX fonts and in Unicode. letters a–z and diacritical letters) to small-caps We call the grand union TEX character set lowercase names and back. We have simply TeXunion, in the same way that we denote been appending “sc” to the name (this works the Unicode characters as the Unicode char- because there are no collisions with names that acter set. (We will use Small Caps to indi- happen already to end in “sc”). For example, a cate a formal set.) TeXunion contains 1108 small-caps “a” is “asc”. characters by the present inventory. Produc- Adobe has been appending “small” to their ing this database involves some work because names (this often causes character names to there are no standards for TEX character names exceed a traditional limit of 15 characters in (that is, single-word alphanumeric names such length), as in the MacExpert encoding [2]. This as are used in PostScript encodings). The stan- is done in an irregular manner by appending dard Unicode character descriptions are lengthy to the uppercase character names (for example, phrases instead of single words, making them 1 unjoinable to the TEX names. For example, the There is an attempt at standardization in PostScript- Unicode standard provides the verbose entry style names from the Association for Font Information left-pointing double Interchange (AFII), but the standard is proprietary and on for the code 0x00ab,“ paper only. The names are serial numbers as opposed to angle ”. The PostScript abbreviated descriptions.

TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting 149 Richard J. Kinch

indicates which codes are letters or diacritics. Table 1: T X Single-Byte Encodings E Another related limitation of T1 encoding is the (TeXencod). Covers all Computer Modern, lack of small-caps accents. AMS math symbol, and Euler fonts. Each item A wealth of code positions does not exempt maps a set of 128 or 256 character names to Unicode from pecksniffian absurdities. The integers. Unicode committee will not provide encodings Name Description for a small-caps alphabet (small-caps being a csc0 TEX caps and small caps (ligs =0) matter of and not information con- csc1 TEX caps and small caps (ligs =1) tent), although they provide encodings for some euex AMS Euler Big Operators small-caps characters (which appear in older eufb AMS Euler Fraktur Bold† encoding standards subsumed by Unicode). A S eufm M Euler Fraktur • A rationalization of the TEX character sets into eur AMS Euler their largest common subsets (Table 2). This A S eus M Euler represents the relations between the TEXen- lasy LATEXsymbols codings and the character subsets as organized

A

AFONT lcircle LTEX circles in Knuth’s MET sources. The first item

line LATEX lines in each entry of Table 2 gives the TEXencoding

AFONT logo MET logo as given in Table 1, known by the .mf source manfnt TEXbook symbols font file used to generate the font; the remaining

mathex2 TEX math extension items are the common subsets generated via the

AFONT mathit1 TEX math italic (ligs =1) MET source files of the same names. mathit2 TEX math italic (ligs =2) The relation set forth in Table 2 is not re- mathsy1 TEX math symbols (ligs =1) fined for the distinctions regarding the mathsy2 TEX math symbols (ligs =2) setting. Certain of Knuth’s encodings appear A S msam M symbol set A overqualified, namely, mathex2, mathit{12}, A S msbm M symbol set B and mathsy{12} do not vary with the ligs set-

roman0 T X Roman (ligs =0)

AFONT E ting, although it is specified in the MET roman1 TEX Roman (ligs =1) driver file.2 roman2 TEX Roman (ligs =2) These decompositions of the various TEXen- t1 LAT XNFSST1encoding E codings may be considered close to the “great- t1csc T1 with small caps est common” subsets, although we do not re- texset0 T X “texset” encoding (ligs =0) E quire a full decomposition here. To be com- textit0 T X text italic (ligs =0) E pletely decomposed, the {012}-numbered items textit2 T X text italic (ligs =2) E on the right should be further decomposed into title2 T X 1-inch capitals (ligs =2) E the unnumbered common set and the various † A superset of eufm with two extra chars numbered differential sets. The sets on the right column of Table 2 we will use below as the set known as TeXpages. We have not yet made the Adobe small-caps for aacute is Aacutes- the effort to elaborate the members of each mall, while ours is aacutesc); apparently some- TeXpages set, which is needed to compute the one mistook appearance for semantics. Fur- remaining work to complete the style axes of thermore, Adobe has a small-caps version of Computer Modern. bare diacritics in their MacExpert encoding, al- • A database giving the mapping of TEXfonts though the diacritical character name is irregu- to their encodings as known above (Table 3). larly changed to an initial capital (for example, The table below lists TEX font names and their Adobe small-caps for acute is Acutesmall). encoding name; an N indicates a wildcard for The LATEX T1 encoding, which was supposed any optical size integer, excluding sizes of to have been uniform for all DC fonts, also has the same style already matched earlier in the an irrational aspect, in that the T1 encoding is table. If a new optical size for a font name is overloaded when it is applied to both lowercase not in this table, the presumption should be and small-caps fonts. Somewhere in the 2 Also, the comments at the top of romsub.mf are in error A LTEX macros is buried something tantamount about what happens when ligs = 2. Apparently, no one has to another small-caps encoding of T1, which tried any other ligs setting!

150 TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting Extending TEXforUnicode

Table 2: TeXencod Decomposition into Table 3: Mapping of TeXfonts to TeXencod. TeXpages. Set romanlsc is romanl with N indicates an optical point size; asterisk a suffix small-caps semantics; it does not actually appear wildcard. in Computer Modern. Braced digits indicate TEXFont Encoding TEXFont Encoding factored suffixes. cmb10 roman2 cmbsy5 mathsy1 TeXencod Covering TeXpages cmbsyN mathsy2 cmbxN roman2 Member Members cmbxsl10 roman2 cmbxti10 textit2 csc0 accent0 cscspu greeku punct cmcscN csc1 cmdunh10 roman2 romand romanp romanu rom- cmexN mathex2 cmff10 roman2 spu romsub0 romanlsc cmfi10 textit2 cmfib8 roman2 csc1 accent12 comlig cscspu greeku cminch title2 cmitt10 textit0 punct romand romanp romanu cmmi5 mathit1 cmmiN mathit2 romspu romsub1 romanlsc cmmib5 mathit1 cmmibN mathit2 mathex2 bigdel bigop bigacc cmmr10 mathit2 cmmb10 mathit2 mathit{12} romanu itall greeku greekl cmr5 roman1 cmrN roman2 italms olddig romms cmslN roman2 cmsltt10 roman0 mathsy{12} calu symbol cmssN roman2 cmssbx10 roman2 roman0 accent0 greeku punct romand cmssdc10 roman2 cmssiN roman2 romanl romanp romanu romspl cmssq8 roman2 cmssqi8 roman2 romspu romsub0 cmsy5 mathsy1 cmsyN mathsy2 roman1 accent12 comlig greeku punct cmtcsc10 csc0 cmtexN texset0 romand romanl romanp ro- cmtiN textit2 cmttN roman0 manu romspl romspu romsub1 cmu10 textit2 cmvtt10 roman2 roman2 accent12 comlig greeku punct lasy* lasy lcircle* lcircle romand romanl romanp ro- line* line logo* logo manu romlig romspl romspu manfnt manfnt msam10 msam texset punct romand romanl romanp msbm10 msbm dccscN t1csc romanu tset tsetsl dctcscN t1csc dc* t1 textit0 accent0 greeku itald itall italp euex* euex eufb* eufb italsp punct romanu romspu eufm* eufm eurb* eur romsub0 eurm* eur eusb* eus textit2 accent12 comlig greeku itald eusm* eus italig itall italp italsp punct romanu romspu msam calu asymbols – macron vs. overscore msbm calu bsymbols xbbold – minus vs. vs. endash vs. sfthyphen ... (...andsoonfortherest...) vs. – grave vs. quoteleft in code 0x60 that its encoding ought to be the lowest optical – (0x40) vs. nbspace (0xa0) vs. visi- size in the table of the same name. The wild blespace vs. spaceopenbox vs. spaceliteral card “*” matches any suffix, such as variations – rubout in code 0x7f on style or optical size, for names which do not – ring vs. degree match higher in the table. We designate the set – dotaccent vs. periodcentered vs. middot {cmb10,..., eusm*} as TeXfonts. vs. dotmath; Zdotaccent vs. Zdot, etc. • A fuzzy-matching operator which, when join- – quotesingle vs. quoteright ing, selecting, and projecting the above data- – slash vs. virgule bases, can resolve the redundancy, synonyms, – star vs. asterisk and ambiguities in the character names and their composition. Here is an inventory of issues – oneoldstyle vs. one, etc. known to date: – diamondmath vs. diamond vs. – bar () vs. brokenbar (vertical – openbullet vs. degree broken bar) – nabla vs. gradient

TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting 151 Richard J. Kinch

– cwm vs. compoundwordmark (a T1 prob- means that the character ff (which is a lig- lem) vs. zeronobreakspace ature not to be found in Unicode) carries a – perzero vs. zeroinferior vs. perthousand- recommended private-zone code assignment of zero para perthousand 0xf001 or 0xfb01. One or more such recom- – slash vs. suppress vs. polishlslash, as in mended codes may appear, in order of pref- Lslash and Lsuppress erence. In resolving a private-zone conflict, a font-building program may take the recom- – Ng vs. Eng (and ng vs. eng) mended codes in order until a non-conflicting – hungarumlaut vs. umlaut, as in Ohun- code is found. Only after the recommended garumlaut vs. Oumlaut codes are exhausted should the program make – dbar vs. thorn a random private-zone assignment. Codes may – tilde vs. asciitilde be given in decimal, octal (leading 0), or hex – tcaron transmogrifies to tcomma, et al. (leading 0x) formats. Programs using these – vs. mu1 vs. micro, code 0xb5 tables take care to distinguish character codes (which contain only digits if start- – Dslash vs. Dmacron, code 0x0110;dslash ing with 0x, otherwise only octal digits if start- vs.dmacron,code0x0111 ing with a leading zero, otherwise only decimal – florin (not in Unicode) vs. fscript, code digits) from names (anything else, including 0x0192 names which start with digits). Sans – fraction vs. fraction1 vs. slashmath, code Unicode contains some names like “2500” (for 0x2215 code position 0x2500); if this presents a prob- – circleR vs. registered, circlecopyrt vs. lem we might have to prefix a letter to these copyright names. – arrowboth vs. arrowlongboth A name may appear in more than one – aleph vs. alef vs. alephmath synonym group, although such groups do not – Ifractur (eus, mathsy1, mathsy2) vs. Ifrak- join within the matching . The first tur (Unicode) vs. Rfractur (eus, mathsy1, name in any group is the “canonical” name. mathsy2, and Unicode); the spelling The canonical name is the name which should should uniformly be “fraktur” be output by programs which compute set operations on the encoding sets. This helps – smile vs. smileface vs. invsmileface vs. to achieve a “filtered” result which does not Unicode 0x263A (unnamed) contain troublesome synonyms. For example, – Omega vs. ohm, Omegainv vs. mho if the synonym file contained the lines: – names not starting with letters: 0script (0x2134), 2bar (0x01bb) joseph jose yosef josephus 0xfb10 joseph joe joey We represent these items in a text file hav- ing the following format: Each line of the the names jose, yosef,andjosephus would file gives character name synonyms, one group have fall-back code 0xfb10, the names joe and of synonyms per line. Any of the names on joey would have no fall-back code, and all the one line are synonyms, and can be freely ex- names above would invariably be transformed changed. For example, the line “visiblespace to joseph on output. spaceliteral” means that the character names The TrueTEX filter accessory program, visiblespace and spaceliteral are com- joincode, performs a relational join on two pletely equivalent names. (The former was used font encoding sets, making a new encoding. in the TEX DC fonts [10], while the latter was It resolves the issues of a given synonym file the PostScript name used in the Lucida Sans according to the rules we have stated. Unicode TrueType font of Windows NT.) • A database of non-TEX encodings, which tells A special case of “synonym” is the Unicode which characters appear in various encoding fall-back. This is a code number which is standards such as ANSI or Unicode. This list a“synonym”forTeXunion members not in presently constitutes a database of 3523 entries Unicode, and is our assignment of the Unicode from a set of 1814 characters. Some of the “private zone” codes for the misfits. For exam- common examples of commercial importance ple, the line “ff 0xf001 0xfb01” (Microsoft today are given in Tables 4 and 5. Having these fonts have an undocumented usage like this) sets allows us to export virtual fonts for any of

152 TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting Extending TEXforUnicode

Table 4: Some Non-TEXEncodings. Table 5: Some Encodings Used in Windows. Name Description Name Description ase Adobe PostScript “Adobe- wae Adobe “WinAnsiEncoding” StandardEncoding”, the (used in Acrobat in PDF), built-in encoding of many “winansi” with bullets in Type 1 fonts the .notdef positions, some belleek Belleek [8] scheme for LATEXT1 semantic synonyms encoding on 8-bit TrueType winansi Windows ANSI 8-bit belleekc Belleek with small-caps (US/Western Europe code latin1 Latin 1 (ISO 8859-1) page) (Includes certain non- latin1ps Adobe PostScript “ISO- ANSI characters in 0x80–0x9f Latin1Encoding” (which is not range of codes.) ISO Latin-1!) winansiu Windows ANSI Unicode mac (US/Western Europe code macexp Adobe PostScript “Mac- page) (Same characters as Expert” encoding (used by are present in the winansi Acrobat in PDF [2]), containing encoding, except the non-ANSI ligatures, small caps, fractions, characters are in their Unicode typesetting niceties positions.) mre3 Adobe PostScript “Macin- winmultu Windows Multi-Lingual toshRomanEncoding” (used Unicode (Windows 95/NT) by Acrobat in PDF), a subset (655 characters supporting all of “mac”, which omits some Latin alphabets, Greek, math characters and the Apple , OEM screen trademark characters.) pdfdoc Adobe PostScript “PDFDo- winNNNN Windows (for code pages num- cEncoding” (used by Acrobat bered NNNN ) in PDF), an ad-hoc encoding used in PDF outline entries, text annotations, and Info dic- waste of effort to convert the glyphs twice. tionary strings, consisting of a Another example is that many font variants are { ∪ ∪ remapped set = ase mre slanted versions of the upright face, and the } wae geometric slant is easily applied to an already-

unicode 16-bit Unicode (Windows NT A converted rather than slanting in MET -

and 95, AT&T Plan 9) ONT F and repeating the glyph conversion. This technique is also used to compose accents and letters for “purely” accented characters (where the encodings represented, so we can virtualize the accent and letter do not overlap), since non-Unicode, non-TEXfonts. the MetaFog conversion is applied only to • A TrueType-font-builder that takes the con- the accent part of such glyphs, allowing the verted outlines from various TEX fonts, orga- redundant letter conversion to be done only nizes sets of them based on an output encod- once. ing, and builds binary TrueType fonts from the Another sub-tool builds these redundancy reorganized glyph data. tables by comparing the encoding tables for A sub-tool for the font-builder incorporates sets of fonts against a target font. For a redundancy-elimination feature that allows example, a DC font combines punctuation you to specify a table listing which characters and symbol characters spread across several

in a given TEX font may be taken from other Computer Modern fonts. A

TEX fonts without repeating a costly MET - In our system, we actually produce textual ONT F glyph conversion. One example of such a versions of the binary fonts and convert them redundancy is how DC fonts largely replicate to Type 1 and TrueType formats with separate the Computer Modern fonts; it would be a tools. This allows a general conversion to be

TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting 153 Richard J. Kinch

we can un-do the “scattering” of characters amongst Table 6: DC fonts in LAT X (Rev. 1/95). The E the TEX fonts. For example, the math italic set “funny” fonts (Fibonacci Roman, etc.) are (mathit{12}) contains the regular (not italic) low- omitted. This list is made by examining names in ercase Greek letters. Conversely, we are going to *.fd from the LAT X distribution. E have to scatter a few TEX fonts that happened to Font56789101217 combine dissimilar styles into one 7-bit font, such as dcb ••••• • • • the math symbol fonts (mathsy{12}) which contain dcbx ••••• • • calligraphic capitals. dcbxsl ••••• • • In set-theoretic terms, the rationalization task dcbxti ••• involves the following steps: ••• dccsc • Begin with the union of all TEX characters, the dcitt ••••• set we have called TeXunion. Remember that dcr ••••• • • • this is the set of character names, not the glyphs dcsl ••••• themselves. dcsltt •• • • • Partition this union set into the largest subsets dcss ••••• • which do not cross encodings. This partitioning dcssbx is a set of proper subsets of TeXunion;we dcssdc • call this set TeXpages. For example, all dcssi ••••• ••• the uppercase letters A–Z make such a subset. dctcsc A combination of all upper- and lowercase dcti ••• • • • letters A–Z and a–z do not, because the dctt •• • • ••• • • • small-caps fonts do not contain the lowercase

dcu letters. These subsets are equivalent to Knuth’s

AFONT Computer Modern MET “program” files, because this was the highest level of source file optimized for the ultimate binary format. For nesting in which he did not make conditional example, Type 1 glyphs require knot-pivoting, the generation of characters. following by combing, to insert extrema tangent • Encode the TEX character union set for a new, points. The hinting methods also differ between universal 16-bit encoding. That is, we invent Type 1 and TrueType. a mapping of TeXunion members to unique Once a binary version of a font is prepared, 16-bit integer codes. Most of the members containing all the glyphs, a re-encoding tool of TeXunion appear in Unicode and so (TrueTEX accessory program ttf_edit [17], have a natural encoding already determined. which is a stack-oriented TrueType font encod- For the TeXunion members not in Unicode ing editor) must be applied to finish the font (which includes all the small-caps letters), we for real-world use. The re-encoding stage not shall promulgate (by fiat) assignments to the only re-encodes, but can optionally adjust the Unicode private-zone codes. We designate this metrics and kerning information. By making subset of TeXunion as TeXpzone; this subset these aspects “afterthoughts” we can fine-tune finds its concrete representation in the private- fonts without going back into the detailed con- zone codes expressed in the character synonym version process. The re-encoding stage can also table. We have the relation upgrade any 8-bit-encoded TrueType font to an (Unicode ∪ TeXpzone) ⊃ TeXunion, arbitrary Unicode encoding, which is important since many commercial font editors can only since Unicode contains many characters not output 8-bit TrueType fonts. used in TEX. We can compute a mapping: TeXunion 7→ Unicode ∪ TeXpzone • AnotionofwhatTEX fonts we want to convert. ( ) If we consider the DC fonts a good target, we by matching character names from the left come up with quite a list (Table 6). to the names on the right; in this way we arrive at a Unicode code number for each T X Rationalizing T X fonts in Unicode sets E E character. We designate this mapping (a 16- Let us consider the shuffling and dealing needed to bit number for each member of TeXunion) reorganize Computer Modern into a Unicode encod- as TeXeru (“TEX encodings rationalized to ing. With the luxury of thousands of code positions, Unicode”, rhymes with “kangaroo”). This

154 TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting Extending TEXforUnicode

mapping is the key result of rationalizing wildcarding to its parent TeXfonts, but fewer Computer Modern into Unicode. members in parallel with the reduction. • We can think again of a large table having, on • To produce each real Unicode font (a member oneaxis,theTEX fonts, on the other, the mem- of TeXinUni), we assemble the glyphs and

bers of TeXpages, and bullets wherever Com- metrics from TeXfonts and install them via

AFONT puter Modern implements MET glyphs. TeXeru into each code of the mapped-to This table will be sparsely and irregularly pop- TeXinUni member. ulated. The sparseness reflects the fact that • We must finally produce Omega virtual fonts the T X fonts cover a wide range of characters, E (that is, .xvp files) which will map 8-bit DVI while the variations in style mostly are typo- codes from the old TEX fonts into TeXeru graphic distinctions on alphabets and punctu- codes in TeXinUni members. For this we use ation; in other words, TEX provides more than the TrueTEX metric exporter to generate an a few symbols sets, in contrast to the simple .xvp file, and XVPtoXVF to convert this to an ANSI/Symbol set distinction in 8-bit Windows .xvf file; the .xfm file also produced contains

fonts. The irregularity in this imaginary table

AFONT the same information as the MET .tfm results from the ad-hoc arrangment of TeX- and may be discarded if only TEX82 is to be pages among TeXfonts. The sparseness is used for formatting. not a deficiency, but we ought to have some goal in mind for the rational extension of Computer Generating Unicode virtual fonts for Modern and the other TEX fonts to populate non-TEXfonts areas of this table for the sake of regularity. Realizing this goal would require drafting of Let us consider a converse task: instead of con-

verting single-byte-encoded Computer Modern fonts

AFONT new MET code and translation to out- into Unicode fonts, let us assume we have a Unicode lines. This is similar to how the DC font project extended Computer Modern to the rational T1 font in TrueType form, and want to make it usable encoding. with TEX or Omega. To use a font TEX (and Omega) require a .tfm (or .xfm) metric file and a .vf (or • At this stage we are ready to determine a list of .xvf) virtual font file. The virtual font is neces- actual Computer Modern Unicode fonts which sary only if a remapped encoding or composition is will cover TeXfonts. While Haralambous needed (usually the case). retained 8-bit PK fonts as the actual fonts for To generate metric and virtual font files for virtual Unicode fonts [7, 6], we will create a Unicode fonts in Windows, TrueT Xprovidesa converse realization, namely Unicode TrueType E File + Export Metrics item which takes the user fonts as the actual fonts for virtual CM, T1, or through several steps which illustrate the elements UT1 encodings. of such a translation: Using the imaginary table just described, we can take the union of row subsets such that • First, the user selects the font from the Win- columns are not overlapped. If we want to dows standard font-selection dialog. For exam- maintain stylistic uniformity within individual ple, in Windows, standard fonts include fonts, we merge rows subject to a personal (a clone), , and Times New decision as to which rows “belong together” Roman (a Times Roman clone), together with in a stylistic sense. On the other hand, if we their bold and/or italic variations. Windows want to minimize the number of actual fonts will also install other TrueType fonts or (with and don’t mind different styles in a single font, Adobe Type Manager) Type 1 fonts. After the we can make a “knapsack” optimization to pack user selects a font, TrueTEX has a “font han- the rows as tightly as possible. (Indeed, if we dle” with which it can access all the geometric discard the Unicode conformity, we could put information needed to calculate global and per- all of Computer Modern into a single Unicode character metric quantities for the font. font!) In any case, this collapsing of rows • Second, the user must select names for the involves imprecise judgments to arrive at an output .xvp file. Font names in Windows are optimized reduced set of fonts. verbose strings containing several words (such Implicit in this reduction is the factoring of as, “ Regular (TrueType)”), wildcarded optical sizes that was introduced while TEX insists on a single-word alphabetic in TeXfonts; we call this reduced set of name; therefore TrueTEX selects a TEX-style fonts TeXinUni, which will have a similar name for the font based on the TrueType file

TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting 155 Richard J. Kinch

name (such as “times”). The metric output in compose or substitute a glyph for the character, this case would be a file times.xvp. it emits the xvp commands for the metrics and • Third, the user must specify the “input” and other actions. If the composition engine is “output” encodings for the virtual font. The “stumped”, TrueTEX emits an xvp comment input encoding is the encoding of the actual to note that the character is unencoded. font, typically ANSI or Unicode. The output Note that virtual fonts can use more than one encoding is the virtual encoding which the input font to produce a virtual output font. This user desires to construct for TEX’s point of would allow, for example, a text font, an expert font TeXencod view, and is typically a member of . (such as might contain ligatures), and a symbol font True TEX uses encodings in the form of .cod (such as might contain math symbols) to contribute files (each line contains a hex code field followed toasingleTEX Unicode font. Another use for this by the character name field) or .afm (Adobe technique would be the assembling of a Unicode Font Metric [1]) files4 To select an encoding, the font from the old bit-mapped PK fonts. TrueTEX user selects an item from a list which TrueT X E supports all of the TEX and Omega-extended virtual presents, each item giving a description of font mechanisms, but does not yet directly support the encoding (for example, “TEXRoman with multiple input fonts for metric export (or bit- ligs = 2” corresponds to the roman2.cod file). mapped fonts, for that matter), although an expert The user can also browse for encoding files in- user can merge multiple virtual-property-list files to stead of selecting from the canned list. The user produce such a virtual font. can edit custom encoding files (which are just text files in afm format), and thereby gains com- Composing missing characters plete flexibility of input versus output encoding The Unicode and TeXunion sets are disjoint, but in the virtual fonts, including automatic pro- the virtual font mechanism allows users to create duction of composites for missing input charac- virtual TeXunion characters missing from a Uni- ters. The ttf edit [17] program originates afm code font with various composition or substitution encoding tables from existing TrueType fonts, methods. TrueT X uses this technique to create allowing maximal compatibility with randomly- E completely populated virtual fonts when the under- encoded fonts. lying TrueType fonts are missing accented charac- • True User input is now complete. TEX begins ters or ligatures. analysis of the information provided by forming To allow for easy upgrading, the “composition in-memory encoding tables from the encoding engine” in TrueTEX uses a user-modifiable script files (using sparse-array techniques to manage in a PostScript-like language to control the compo- True large, sparse code ranges). TEX sorts and sition and substitution process. By changing the indexes the tables for fast content-addressibility script, the user can add new composition methods or by either code or character name, assembles the substitution rules, or adapt the methods to various global metrics in xvp terms for the font, and typographic conventions. TrueTEX, using its own visits each input code in the font to build a table mini-PostScript , interprets this script at of per-character metrics and ligatures. xvp file metric-export time, which means that users who building may now begin. speak PostScript and know a bit of font design • For each output character name, TrueTEX can customize the composer. A good script yields determines if an exact match exists to an a much better TEX virtual font, since commercial input name, and thus to an input code. fonts are typically missing characters that TEXcon- If there is such an exact match, TrueTEX siders important, and the script can fill in most of emits xvp commands which give the character’s the missing pieces. metrics and which re-map the TEX PUT/SET When the composition script receives control commands to the Unicode positions. from TrueTEX, all the encoding and metric infor- • For output characters which have no exact to mation for the font and input and output encodings input characters, TrueTEX invokes the “com- are defined as PostScript arrays and dictionaries. position engine.” If the composition engine can The standard composition script in TrueTEXim- 4 .afm files need not contain metrics; they can simply plements the following techniques: define an encoding for a dummy font, using only the , CH, • Accent-plus-letter composition: if the name and N fields of the CharMetrics table. We thus maintain compatibility with other afm-reading software and avoid implies that the character is an accented inventing yet another file format. letter, the script decomposes the name into

156 TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting Extending TEXforUnicode

guities by recognizing ambiguous names and Table 7: Composable Accent Characters. making the appropriate substitutions. Name Position Example • Future extensions: Several more exotic com- umlaut top-center ¨o position methods are possible for an upgraded acute top-center ´o composition script. A novel idea is an “emer- breve top-center ˘o gency fall-back” character generator: as a last caron1 top-center ˆo resort for a missing character, the script could cedilla bottom-center o¸ have a table of low-resolution renderings for circumflex top-center ˆo all of TeXunion, consisting of (for example) comma top-right L’ an 8 × 16 dot-matrix or a plotter-style stick dieresis top-center ¨o font; virtual font commands would render these dotaccent top-center o˙ with rules. In this method we could render any grave top-center `o TEX document in a crude but accurate fashion 6 hungarumlaut top-center ˝o without any fonts or special’s at all! macron top-center ¯o Another idea would use the ability of virtual ogonek1 bottom-right fonts to call upon the TEX \special command. period top-center o˙ AB´ezier curve special could draw and fill ring2 top-center glyphs without the need for 1For D/d/L/l, changes to comma at top-right. support for fonts. This would not be efficient, 2Accent is not present in the Computer Modern and hinting would be missing on low-resolution fonts used in this portable document. devices, but it would place all the scalable font information in an XVF file. Projecting ligature rules into TrueType the letter and accent components, and (when fonts the letter and accent exist in the input font) uses the geometric information and typographic The TrueType fonts in Windows do not supply any conventions to overlay the accent onto the letter ligature rules such as are contained in Computer in a virtual accented character, as shown in Modern. To export metrics containing the usual True Table 7. Since the composer has detailed TEX ligature rules, TEX considers the rules geometric information on the glyph shape, in Table 8 when exporting the global vpl (xvp) which is more elaborate than the bounding-box metrics, when the target ligatures exist in the font, metrics TEX uses, it can do a careful job of or when the ligatures can be produced by the placing accents. composition engine. • Ligature composition from sequence of letters: Supporting metric export formats A ligature character (not to be confused with TrueT X supports both .vpl (T X Virtual Prop- the ligature rules of the exported T X metrics, E E E erty List) and .xvp (Omega Extended Virtual Prop- a different topic) such as the T1 character erty List) file formats when exporting font metrics. “SS” will not usually exist in a TrueType font. This is more than merely a variation in format; Code positions for ligatures are not part of the when exporting to .vpl format, the encodings are Unicode standard,5 so even common ligatures truncated to 8 bits, so that the composition process are often not present. The composition script for missing characters will likely be more intensive. forms these by concatenating the component T Xware programs VPtoVF and XVPtoXVF trans- letters within the bounding box of the T X E E late the property list files to their respective .tfm, character. This is applied to the ligatures: ff, .xfm, .vf,and.xvf binary formats for use with fi, fl, ffi, ffl, IJ, ij, SS, Æ, æ, Œ, and œ. TEX and Omega. TrueTEX launches the appro- • Remapping of certain names: The synonym priate translator after the property-list export is table shows the problems of character names complete. which are not standardized. An ad-hoc section TrueTEX metric export also supports the older of the composition script fixes up any ambi- .pl (TEX Property List) metric file format, and the companion program PLtoTF, should it be needed 5 Unicode does contain ligatures which are phonetic letters in certain languages, but this does not include 6 This could solve the problem of rendering TEXdocu- typographic ligatures such as the f-ligatures ments in HTML browsers.

TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting 157 Richard J. Kinch

Table 8: Ligature Rules Applied to Exported Fonts.

First Second Result Description hyphen hyphen endash -- to endash endash hyphen emdash endash- to emdash Shortcuts to national symbols comma comma quotedblbase ,, to quotedblbase less less guillemotleft << to left guillemot greater greater guillemotright >> to right guillemot exclam quoteleft exclamdown !‘ to exclamdown question quoteleft questiondown ?‘ to questiondown F-ligatures f f ff ff to ff f i fi fi to fi f l fl fl to fl ff i ffi ffi to ffi ff l ffl ffl to ffl Paired single quotes to double quotes quoteleft quoteleft quotedblleft ‘‘ to quotedblleft quoteright quoteright quotedblright ’’ to quotedblright

for use with older TEX software. In this case the characters each; most codes are in the range 0– user can specify only an input font encoding, and the 0x2ff, with some symbols in the 0x2000 vicinity property list reflects this encoding as applied to the and a few odd characters in the private zone at TrueType font selected, without virtual remapping. 0xf001 or 0xfb01. We would expect such a While exporting .xvp files will connect the segmented locality in a typical font. TrueTEX previewer to Windows’ Unicode fonts, One technique is to use a segmented table with the TEX82 formatter requires .tfm metric files, not binary-search lookup; this is close to the method Omega .xfm files. If the output font has an 8-bit used in TrueType fonts. A hash table for the two- encoding, the resulting virtual font is nevertheless byte keys may be used instead of the binary search. compatible with the original TEX formatter’s 8-bit Segmented tables will require the least storage, character codes, and will not require the Omega at the expense of a possible hashing performance formatter. To create a .tfm file for such a font, problem in the event of degenerate tables. Since a TrueTEX filter program xvptovpl truncates the all Unicode-capable operating systems are advanced virtual codes in the .xvp file and produces a enough to support virtual memory, the performance truncated .vpl file, and via VPtoVF, a .tfm file risk does not justify the memory savings. for use with TEX. The .vf file created in this TrueTEX uses a 2-level pointer technique: process is discarded, since it does not properly map metrics for a 16-bit code table consist of a table of the 8-bit characters to Unicode. The .xfm and 256 pointers to metric tables with 256 entries each. .tfm files produced by this process will contain the In this way a typical font having perhaps 6 or 7 same information; the .xfm format is needed only if contiguous code populations has very little wasted Omega is to be used. TrueTEXusesthe.xvf file to space. The worst-case and best-case performance map the 8-bit TEX characters to Unicode positions. are both acceptable. The lookup time is accelerated by avoiding a null-pointer test by pointing unen- Implementing sparse metric tables coded pages to a dummy table of zero-width metrics. In implementing programs which use metric data, A time-bomb of a problem looms with Omega’s we must take care to apply sparse-matrix techniques .xfm format [12], which recklessly ignores any to avoid enormous memory demands from nearly- sparse-array issues. An .xfm file, like the older empty font-metric tables. Sixteen-bit encoded fonts .tfm format, keeps an unpacked char info array are typically sparsely populated. For example, which spans the smallest to largest codes (that is, the Windows NT text fonts contain about 650

158 TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting Extending TEXforUnicode the interval [bc, ec]). Since the Unicode private- • Identifying and assigning non-Unicode TEX zone population typically extends through 0xfb00, character names to the Unicode private zone, practically all fonts will have a char info of about thereby promoting inter-operability of Unicode 126 KB, almost all wasted space. This will result in TEX implementations. These should be con- atypical.xfm file being 130 KB, instead of about cretely represented in a synonym table, which 4 KB that a simple sparse-array technique would also needs to be published. There are many provide. It is imperative that we upgrade the .xfm potential conflicts, and taking counsel from as format and xfm-reading programs to better handle many different users of TEX as possible is the sparse encodings.7 only way to maximize the compatibility of the result. The Omega project has already started Summary to stake out claims on the virgin Unicode real Let us review the areas of development needed to estate [4, Table 4, page 425] for Ω-Babel. There extend all of TEX to Unicode: are no doubt synonyms and ambiguities outside our own experience; one can only hope there is • Extending the .tfm file format and its run- sufficient room for all interested parties. time forms to large, sparsely populated fonts, • Extending DVI translators to accommodate without sacrificing backward compatibility and the extended .tfm, .vf,and.dvi formats, without exploding file lengths. We have including sparse-array techniques for efficient seen that the .xfm format can represent the run-time performance. The Omega project has information, although it needs improvement for issued this call to “DVI-ware developers” [4, sparsely populated fonts. page 426, Conclusion], although with surprising • Extending Computer Modern and other meta- aplomb for the implications. We hereby fonts to fully populate the appropriate Unicode respond with our implementation in TrueTEX, positions. A complete Unicode text font re- and invite others to build on our experience. quires about 500 symbols. While it is unlikely • Promulgating the ongoing TeXeru,theTEX- that all the styles of Computer Modern will re- in-Unicode mapping, based on a seasoned reg- ceive the attention to fill the tables completely, istry. An ongoing authority for additions and we can at least insert legible placeholders. corrections will be vital. This authority will be • Creating a formal database of TEX character responsible for registering new TEX character names, joinable to the Unicode official names. names and avoiding Unicode conflicts. Some standard is necessary for any develop- • Changing the plain TEXandLaTEX macros to ment, and there is no reason to favor anyone’s accommodate the 16-bit encoding extensions, favorite names. What is crucial is that the while maintaining backward compatibility from registry be initiated now, so that TEX software a single source. This is a tall order, and one we authors have an early start on making inter- have not touched. • changeable fonts and documents. While TEX Extending \special handling for 16-bit char- users cannot themselves dictate the standard acter sets. Now we open up the carousing com- Unicode names, we can at least make a stand- mand of TEX to a whole new vista of revelry, in version for our own use (since none seem to with the gift of tongues. This is another item exist at present), and if an acceptable set of that we shall put off for now. Unicode names comes along, we can adopt it • Implementing TEXandDVI translation user in- later. terfaces in selectable languages. While Omega processes in Unicode, it talks to the user in the • Creating a formal database of all 28 T X E old 8-bit fashion. Perhaps it is a bit much to font encodings and their 31 greatest common expect a Web2C T X change file to incorporate subsets, joinable to Unicode and other encoding E wchar t and other Unicode constructs of the C standards. The TrueT X tables are available E programming language. But DVI translators to all for examination and use [16]. Once these for Windows NT and other Unicode-capable are improved by public usage and scrutiny, they platforms should have this designed in from should be adopted as a formal standard for the start. A properly designed application can T X. E be reimplemented for another language by any non- who knows the application 7 The .vf and .xvf virtual font formats have always packed sparse per-character information, so they need no such and can translate the messages; no program- attention. ming or recompilation is required.

TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting 159 Richard J. Kinch

• Establishing provisions for orderly extension [7] Haralambous, Yannis. “The Unicode Computer of the fonts and character sets, so that new Modern Project.” [A document link on the TEX fonts, characters, and encodings may be Omega Web page [5].]

incorporated into later versions. Having made [8] Kinch, Richard. “Belleek: TEX Virtual T1-

AFONT the effort to retrofit MET output for Encoded Fonts for Windows TrueType.” Avail- an altogether different way of encoding, we able on the author’s Web site as a LATEXdocu- would hope that font designers recognize the ment in belleek.zip. [The Belleek software is shortcomings of the 7- and 8-bit encodings and an implementation of T1-encoded fonts using keep the promises of Unicode in mind. TrueType scaling technology under Microsoft

• Rationalizing the TEX fonts into orthogonal Windows. Belleek consists of TrueType fonts

AFONT styles and weights (such as cmmb10 and cmmr10, which render elements of MET glyphs for example). As TEX users we don’t care about in scalable form, plus TEXvirtualfontswhich this, but if Computer Modern is to be accepted remap and compose T1 characters from these in non-TEX applications, the style axes will elements.]

have to be fully varied and populated along the A [9] Kinch, Richard. “MetaFog: Converting MET -

conventional ranges. ONT F Shapes to Contours.” TUGboat 16,3 • Providing a means for creating virtual fonts (1995), pages 233 – 243. for non-TEX Unicode fonts in TrueType or [10] Knappen, J¨org. “The European Computer Type 1 format. Although this capability is Modern Fonts.” CTAN file: -archive/ available now only in the commercial TrueTEX fonts/dc/mf/dcdoc.tex. Unicode edition, a new TEXware stand-alone [11] Knuth, Donald. “Virtual Fonts: More Fun for tool could interpret TrueType font files (or Grand Wizards.” TUGboat 11,1 (1990), pages whatever technology is supporting 13 – 24. Unicode rendering) and join the encoding and [12] Plaice, John, and Yannis Haralambous. “Draft other information into an .xvp file. Documentation for the Ω System.” ftp://nef. References ens.fr/pub/tex/yannis//first.tex, February 1995. [See also the Omega Web page [1] Adobe Systems Incorporated. Portable Doc- [5].] ument Format Reference Manual. Reading, [13] Plaice, John. “Progress in the Ω Project.” Mass.: Addison-Wesley, 1993. TUGboat 15,3 (1994), pages 320 – 324. [2] Adobe Systems Incorporated. Adobe Font Met- [14] The , Inc. rics (AFM) File Format Specification,Version http://www. unicode.org. [This site holds files listing codes 4.1, October 1995. [Published in PDF and and descriptive phrases. There are no sample PostScript form at ftp://ftp.adobe.com.] character images, and there are no PostScript [3] Fairbairns, Robin. “Omega — Why Bother with names. A monograph gives paper images [15].] Unicode?” TUGboat 16,3 (1995), pages 325 – 328. [15] Addison-Wesley Publishing Company. The Unicode Standard: Worldwide Character En- [4] Haralambous, Yannis, John Plaice, and Jo- coding, Version 1.0, Volumes I and II. hannes Braams “Never Again Active Charac- ters! Ω-Babel.” TUGboat 16,4 (1995), pages [16] The encodings (consisting at present of 46 en- 418 – 427. coding sets containing over 9000 pairs, repre- sented in .afm format) herein listed in Tables 1, [5] Haralambous, Yannis. “Ω, a T X Extension E 4, and 5, are available via the author’s Web site. Including Unicode and Featuring Lex-like Fil- tering Processes.” Proceedings of the Eighth [17] The programs ttf_edit and joincode for DOS, Windows, and are available via the European TEX Conference,Gda´nsk, Poland, 1994, pages 153 – 166. [There is an Omega Web author’s Web site. page at http://www.ens.fr/omega.Seealso the Omega FTP site [12].]

[6] Haralambous, Yannis, and John Plaice. “Ω +

AFONT Virtual MET = Unicode + Typogra- phy (First Draft).” ftp://nef.ens.fr/pub/ tex/yannis/omega/cernomeg.ps.gz, January 1996.

160 TUGboat, 17, Number 2 — Proceedings of the 1996 Annual Meeting