The TEX Font Panel

The TEX Font Panel

The TEX Font Panel Nelson H. F. Beebe (chair) Center for Scientific Computing University of Utah Department of Mathematics, 322 INSCC 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Email: [email protected], [email protected], [email protected], [email protected] (Internet) WWW URL: http://www.math.utah.edu/~beebe Telephone: +1 801 581 5254 FAX: +1 801 585 1640, +1 801 581 4148 Internet: [email protected] Introduction Since most programming languages, operating systems, file systems, and even computer I/O and The TUG’2001 Font Panel convened on Thursday, CPU chips, have character knowledge designed into August 16, 2001, with members William Adams, them, changing the character set has huge ramifica- Nelson H. F. Beebe (chair), Barbara Beeton, Hans tions for the computing industry and for worldwide Hagen, Alan Hoenig, and Ross Moore, with active business data processing, data exchange, and record participation by several attendees in the audience. keeping. The list of topics that was projected on the screen Fortunately, a particular encoding scheme makes up the sectional headings in what follows, and called UTF-8 makes it possible for files encoded in the topics are largely independent. pure ASCII to also be Unicode in UTF-8 encoding, Any errors or omissions in this article are solely easing the transition to the new character set. the fault of the panel chair. Up to version 2.0 in 1996, the Unicode character repertoire could be fit into a table of 216 = 65 536 en- Unicode tries. Version 3.0 in 2000 increased the count to over The work of the Unicode Consortium, begun in a million, although just under 50 000 are assigned 1988, and first reported on for the TEX community and tabulated in the book. Version 3.2 in 2002 in a TUGboat article [9], has reached version 3.0 has just over 95 000 assigned. Consortium members of the Unicode Standard [29]. Version 3.1 appeared hold the view that 20 or 21 bits per character (just about the time of the TUG’2001 conference, and over two million) may ultimately be necessary by version 3.1.1 shortly thereafter. Unicode is a proper the time all historical scripts have been covered. subset of the ISO/IEC 10646 Universal Character Despite the Consortium’s warning that the col- Set Standard [14], but publication of the latter lags. lection was expected to grow, several vendors did not pay attention, and prematurely adopted 16-bit Unicode defines a character set that is intended entities to hold Unicode characters. ultimately to cover all of the world’s writing sys- Thus, the C language data type, wchar t, in- tems. Its first 128 entries are identical to the ASCII troduced in 1989 Standard C [7, 13, 28], is imple- character set (dating from 1964) used by most of the mented as a 16-bit unsigned integer in many C and world’s computers. C++ compilers, with a companion function library There is a very active Unicode technical that also has this limitation. discussion e-mail list: send subscription re- Even worse, the popular Java programming lan- quests to [email protected]. The guage is defined in terms of an underlying virtual list is archived at http://www.unicode.org/ machine [23, 24], already implemented in hardware, mail-arch/. whose instructions are permanently designed for 16- Unicode conferences are held twice a year, bit characters. with the twentieth in late January 2002; see These 16-bit limitations can be overcome by http://www.math.utah.edu/pub/tex/bib/ representation of Unicode values with variable num- index-table-u.html#unicode for a bibliogra- bers of bytes, as was done with the UTF-8 encoding. phy of publications about Unicode. Unfortunately, the opportunity to simplify character TUGboat, Volume 0 (2060), No. 0 — Proceedings of the 2060 Annual Meeting 1001 Nelson H. F. Beebe (chair) processing significantly by having fixed-size units is Doing this for PostScript Type 1 outline tragically lost. fonts has proved considerably more troublesome. In the panel chair’s view, these design errors These fonts are generally encrypted, but Adobe will rank with the infamous ASCII/EBCDIC split has published the encryption algorithm and keys, in 1964, with IBM System/360 adopting EBCDIC, so software like t1disasm (from Lee Hether- and everyone else (by about 1980) adopting ASCII, ington’s and Eddie Kohler’s t1utils package, with enormous economic costs, and user confusion, available at ftp://ctan.tug.org/tex-archive/ that lasted for decades. fonts/utilities/t1utils) can readily disassem- Newer operating systems are already designed ble a font. to use Unicode as the native character set, and ven- Disassembly reveals essentially a table of num- dors of older ones are migrating in that direction bered (not named) subroutines, Subrs, each con- through UTF-8 encoding. taining positioning commands, and calls to other Of course, jumping from a 256-character set to subroutines, plus a table of character definitions, one with potentially millions of characters poses an CharStrings, indexed by character name. Each almost impossible problem for font vendors. It will entry of CharStrings also consists of positioning be a very long time before the Unicode font reper- and drawing commands, and calls to the numbered toire is adequate. Current systems with native Uni- subroutines. code support generally provide only a subset of char- Because subroutine numbers could be con- acters, and then sometimes only in low-resolution structed dynamically, it is in general not possible screen bitmaps. Bitstream for a while offered their to identify which of the numbered subroutines can Cyberbit Unicode font, but in July 2001, withdrew be omitted, but a DVI driver could drop unused it without explanation. entries from the CharStrings table. This is trans- Thanks to fine work by fellow TUG members parent to font rendering software, since the entries are named, rather than numbered. Yannis Haralambous and John Plaice [26], TEX has been extended to fully support Unicode. Their sys- It was reported by a reviewer that computa- tem is known as Ω (Omega), and it has been avail- tion of subroutine numbers is in practice not done in existing Type 1 and Type 2 Compact Font able on the annual TEX Live CD-ROM distributions since at least version 5 in 2000. Development has Format (CFF) fonts, so perhaps it is safe to drop not been as rapid as end users might like, but it subroutines that are not explicitly called. must be understood that this is a hugely complex Recent versions of Tom Rokicki’s dvips driver problem, and the Ω designers have been proceeding are capable of subsetting PostScript Type 1 outline fonts, as can Adobe Acrobat Distiller and ghost- very carefully, cognizant of other TEX developments script’s ps2pdf. such as pdfTEX, ε-TEX, and N T S, in addition to the evolution of the Unicode Standards. However, this subsetting introduces new prob- lems. What if the DVI file also included PostScript Mathematics fonts figures which themselves used fonts? Subsetting might remove characters needed by those figures. Fonts for mathematics are a substantial problem, It is infeasible, or unreliable, for the DVI driver because, among the more than twenty thousand to attempt to examine an included figure file to fonts on the market, only a handful have a remotely determine its font requirements, because far too adequate repertoire of mathematical glyphs. These many PostScript producing programs fail to con- fonts are almost the only choices: Computer Con- form to Adobe’s Document Structuring Conventions crete, Computer Modern, Informal Math, Lucida, that would otherwise clearly, and simply, record the MathTime, PA Math, PX, Palatino Math, Pandora, file’s font needs. Those conventions are clearly de- and TX. scribed in the first two editions of the PostScript While it is, of course, possible to use an existing Language Reference Manual [1, Appendix C] [3, Ap- mathematics font with any other text font, the re- pendix G], but were ominously dropped from the sults are rarely visually successful. For some careful third edition [6]. They are, however, documented studies of this, see Hoenig’s book [11, Chapter 10]. at the Adobe Web site among the technical notes collected at Font subsetting http://partners.adobe.com/asn/ developer/technotes/postscript.html, in the DVI drivers for virtually all devices, other than Post- file http://partners.adobe.com/asn/developer/ Script, subset the fonts that they include in their pdfs/tn/5001.DSC_Spec.pdf. output streams: descriptions of unused characters Each Type 1 font contains a special 24-bit are simply omitted. (0 ... 16 777 215) unsigned number, the UniqueID, 1002 TUGboat, Volume 0 (2060), No. 0 — Proceedings of the 2060 Annual Meeting The TEX Font Panel which is intended to allow printing devices to cache the original font metrics embedded in the PDF file, bitmaps of rendered fonts between jobs. A million of and then substitutes the missing font with another. these numbers are reserved for private use, and the TEX DVI drivers usually complain about missing rest are allocated to font vendors on request. A sub- fonts, but some will then provide a substitute, and setted font is a different font, because it lacks some some may even support a user-defined font substi- characters, and so must be assigned a UniqueID tution file. from the private use area. A random choice from The PANOSE system [8] is a font classi- this area would mean a one-in-a-million chance of fication system that assigns numeric values in confusion between fonts in a printer. 0 ... 15 for ten font attributes (family, serif Regrettably, several versions of Adobe’s own style, weight, proportion, .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us