<<

T1part: Printing TEX Documents with Partial Type 1

Sergey Lesenko Institute for High Energy Physics Scientific Information Department Protvino, Moscow Region, Russia, P.O. Box 35 Email: [email protected]

Abstract A large fraction of the scientific papers available on the Internet have been created by TEX and dvips. Most of these papers use fonts at 300 dpi, and thus neither preview well with a typical screen resolution of less than 100 dpi, nor take advantage of the higher resolution of current laser printers. This paper presents T1part, a set of subroutines that decompose Type 1 fonts, including only those characters needed in a particular document. This package, in conjunction with a high-quality freely available set of Computer Modern fonts, can provide for the distribution of compact PostScript files that preview legibly and print beautifully. In addition, this package allows the printing of much more complex documents using more fonts than have previously been available in dvips. The T1part subroutines are modular and can be easily incorporated into other drivers.

Introduction Interface Using Type 1 PostScript fonts in TEX documents has T1part, as a subroutine, needs to be told what fonts been problematic for a number of reasons. Not least and what characters in each to include. The among these problems is the requirement that ei- fonts are indicated by a file name pointing at a PFB ther the fonts be included in the document itself, or or PFA Type 1 font. The set of characters to include that the fonts be available to all potential recipients can be specified either by a list of names or of the document. Including such fonts in the doc- by a set of character codes. In the latter case, the ument itself raises serious copyright issues, as well character codes must be translated into glyph names as causing the resulting file to be large and slow to through the font’s encoding vector. print. In this paper, we present T1part, which al- The resulting output of the program is a partial lows such fonts to be included in PostScript output font in PFA format. As integrated into dvips, the in a partial form. It is currently integrated with program inserts this font directly into dvips’ output Tomas Rokicki’s dvips program (Rokicki, 1994). stream. While commercial programs with this feature have been available for a number of years, this is Algorithm the first freely available integrated implementation Before we present how T1part works, we must de- of this functionality. scribe the format of a Type 1 font. We will first The idea for this paper was suggested by Basil consider a font in PFB format; the PFA format is a Malyshev’s paper (Malyshev, 1994). The program is simple modification of this. A Type 1 font in PFB based on information available in the Adobe “Black format has the following structure and relevant key- ” (Adobe Systems, 1990). In addition, I found words in each part: work by Rajeev Karunakaran (1994), Chris B. Sears • (1991), and Al Stevens (1994) to be useful. ASCII part This paper discusses the interface, algorithms, – keyword /Encoding and finally the efficiency of using the T1part pro- • BINARY part (eexec encryption) gram. – keyword /lenIV – keyword /Subrs – keyword /CharStrings • ASCII part

TUGboat, Volume 16 (1995), No. 3 — Proceedings of the 1995 Annual Meeting 265 Sergey Lesenko

By “ASCII”, we mean the portion of the file that is • CharStrings size — number of selected char- not eexec encrypted. strings Since the font contains three different parts, We retain the indices of the subroutines for simplic- each part has its own parsing process. The minimum ity (so we do not need to rescan and modify each set of keywords listed above allows us to quickly subroutine) and to easily guarantee that each sub- parse the input at a high speed and with a simple routine only refers to those with a lower (an parser. Adobe requirement that prevents recursive subrou- The parsing process is as follows. First, we tines). read the initial ASCII portion and search for the Finally, the selected portions of the scanned Encoding keyword. After finding it we define its memory are eexec encrypted and directed to the out- type with the help of the next token. If the next put in hexadecimal (PFA) form. token is StandardEncoding, we will assume that To finish up the font, the final ASCII portion is the Adobe Standard Encoding is the default font sent to the output without changes. encoding vector. If the input font is in PFA format, the keywords If a reencoding has been specified for this font that define the beginning of the hexadecimal eexec in the dvips psfonts.map file, then we use that are currentfile eexec; a line of all zeros reencoding to translate the character codes to glyph marks the end of the section. names. Otherwise, we try to parse an encoding from the input stream by searching for index and glyph Results name pairs. If we do not find such an encoding, we search for and parse an AFM file associated with the T1part was tested by integrating it into dvips and font. running it over a number of files using the BaKoMa Next, we parse the BINARY portion of the font. collection of Computer Modern fonts, as well as the We start by performing the eexec decryption and Acrobat Reader selection of Adobe fonts. The re- loading the result directly and entirely into memory. sults of these tests are: We then scan the decrypted section for the keywords • Size of output file. The total size of the resulting /lenIV, /Subrs,and/CharStrings.Eachofthe PostScript file when using T1part and PS fonts sections separated by these keywords has a similar was compared with that when using PK structure. fonts under both 300 dpi and 600 dpi. Usually, When initially parsing the Subrs section, we the bitmapped font output was slightly smaller simply identify what subroutines exist by number at 300 dpi, but slightly larger at 600 dpi. Thus, and keep track of the number of tokens in each sub- using partial Type 1 fonts is practical from the routine definition. We do not initially send out any perspective of final file size. subroutines. When we parse the CharStrings section, we • Reduction in PS fonts. In our tests, we found initially identify what subroutines are associated that usually less than half of the characters with each character. We parse the subroutines from body text fonts were used, and a very for each used character, recursively diving into the small fraction of special fonts were used. On Subrs entries as necessary, marking which subrou- average, the size of the partial PostScript font tines are actually used. If there are multiple Subrs created by T1part was less than 30% of the size sections, as is the case with some hybrid fonts, we of the original font. consider all such sections. If a required character is a composite character, The figures are given in detail in tables 1–3. then the component characters will be marked and The output of the program has been tested with processed as above. (Deutsch, 1993), and the time and mem- For efficiency, all selected subroutines are de- ory required to view documents created with partial crypted only once; a flag associated with each sub- fonts was less than those with the whole fonts. routine is used to indicate whether that subroutine It is clear that the popular paradigm of TEX— has already been decrypted. dvips — GhostScript will be more efficient when us- After finishing the scanning phases, the size of ing the integrated T1part functionality. the Subrs and CharStrings arrays must change to reflect the deleted subroutines. The new values are: Acknowledgements • Subrs size — largest selected subroutine plus I would like to thank Tomas Rokicki for his helpful one discussion and suggestions.

266 TUGboat, Volume 16 (1995), No. 3 — Proceedings of the 1995 Annual Meeting T1part: Printing TEX Documents with Partial Type 1 Fonts

References Adobe Systems. 1 Font Format, version 1.1. Addison Wesley, 1990. L. P. Deutsch. “Aladdin Ghostscript version 2.6.1”. Electronic distribution from ftp.cs.wisc.edu via CTAN, 1993.

R. Karunakaran. “PFB2PFA program”. Electronic 1800 622 405 distribution via comp.sources.postscript, 947 684

volume 03 issue 51, 1994. 14 933 095 B. K. Malyshev. “Problems of the conversion of fonts to PostScript Type 1”. In Pro- ceedings of TUG94, edited by M. Goossens, 1200 480 083 588 696

Santa Barbara, CA. 1994. 4 487 168 T. Rokicki. “Dvips: A TEX Driver”. Electron- ically distributed from labrea.stanford.edu PK via CTAN with dvips, version 5.58, 1994. 1016 Partial 418 895 479 080

C. B. Sears. “CHARS program”. Electronic distri- 3 082 215 bution via comp.sources.misc volume 19 issue 94, 1991. 600 A. Stevens. “Quincy: a C interpreter”. ‘Dr. Dobb’s’ 335 981 journal (09), 1994. Electronic distribution as 251 756 1 012 195 ftp://ftp.mv.com/pub/ddj/1994/1994.09/ Type 1 fonts qnc41.zip. PS 300 424 529 274 205 104 036 – Partial 398 358 277 579 format is used for Type 1 PS Full PFB – * Dvips’ output size without embedding of fonts — 228 918 bytes ** 762 981 277 579 ∗∗ (bytes) (bytes) Table 1: Dvips’ output* with various modes of used fonts (dvips.dvi, version 5.58, used as input). Font type Mode of font embedding Output resolution (dpi) Output size (bytes) Output with compressed bitmap fonts Disk space required for fonts

TUGboat, Volume 16 (1995), No. 3 — Proceedings of the 1995 Annual Meeting 267 Sergey Lesenko ulfn (bytes) font Full (bytes) font Partial used characters Percentage version Font name Font ulfn (bytes) font Full (bytes) font Partial used characters Percentage version Font name Font al :Ecec fprilfn onodn o ahue ot(vp.v sda input) as used (dvips.dvi font used each for downloading font partial of Efficiency 2: Table cmbx10 dcbx10 8757 28 2768 52 5766 25 5311 75 ∗∗∗ 55 34 ∗ 3960 53 675 40 cmr10 7211 77 039 34 *** dcr10 *PrdsaFnsCleto 10peees,1993) (1.0-prerelease, Collection Fonts Paradissa ** * ∗∗∗ 75 44 B B ∗ a a al :Rslso using of Results 3: Table K K 3360 53 o o 6871 76 314 7 cmr8 M M 732 6 dcr8 ∗∗∗ a/ a/ 14 ∗ 9 DC CM cmsl10 dcsl10 0687 80 555 14 4870 54 903 14 ot olcin(.,(ee-) 1994) (Level-B), (1.0, Collection Fonts ot olcin(.,(ee-) aur 95) January (Level-C), (1.3, Collection Fonts ∗∗∗ 18 27 ∗ 5827 15 2105 92 dcti10 cmti10 8382 18 5948 65 ∗∗∗ 17 28 DC ∗ dctt10 nta of instead 3261 83 450 36 cmtt10 2915 52 908 37 ∗∗∗ 44 72 ∗ cminch 3211 23 cminch CM 438 4 3211 23 438 4 19 ∗∗ fonts 19 ∗∗ cmmi10 cmmi10 3752 63 628 3 3752 63 628 3 6 ∗ 6 ∗ cmsy10 cmsy10 2613 52 862 3 2613 52 862 3 7 ∗ 7 ∗ cmsy7 4951 54 cmsy7 929 3 4951 54 929 3 7 ∗ 7 ∗ logo10 logo10 372 5 301 5 372 5 301 5 99 ∗∗ 99 ∗∗ 8 345 685 5 527 154 3 720 533 6 097 169 Total Total 23 32

268 TUGboat, Volume 16 (1995), No. 3 — Proceedings of the 1995 Annual Meeting