A Functional Description of TEX's Formula Layout

Total Page:16

File Type:pdf, Size:1020Kb

A Functional Description of TEX's Formula Layout AFunctional Description of T Xs Formula Layout E REINHOLD HECKMANN REINHARD WILHELM Fachbereich Informatik Universitat des Saarlandes Saarbrucke n Germany fheckmannwilhelmgcsunisbde Abstract While the quality of the results of T Xs mathematical formula layout algorithm is convincing E its original description is hard to understand since it is presented as an imp erative program with complex control ow and destructive manipulati ons of the data structures representing formulae In this pap er we present a reimplementation of T Xs formula layout algorithm in E the functional language SML therebyproviding a more readable description of the algorithm extracted from the monolithical T X system E Intro duction The mathematical formula layout algorithm used byDEKnuths T Xtyp esetting E system generates remarkably go o d output However any attempt to understand the rea sons for this success leads to deep frustration The algorithm is informally describ ed in App endix G of the T Xb o ok Knuth a using English prose with some formal frag E ments While this description provides a useful and welcome overview the details are not completely correct A complete and exact description of the whole T X implementa E tion is presented in T X The Program Knuth b It contains the full source co de E of the T X system structured into logical units well commented and do cumented by E cross references Nevertheless anyone who wishes to gain a full understanding of T Xs E algorithms from these descriptions must invest great eorts The reason is that b oth the do cumented source co de and the informal description are typical examples of imp erative programs involving complex control ow including gotos and complicated manipula tions of the various data structures In particular the usage of global variables obscures the interdep endencies b etween the subtasks It is by no means obvious where certain information is pro duced where it is changed and where it is consumed In this pap er we present a new implementation of T Xs formula layout using the E functional language SML Paulson The purp ose of this reimplementation eort is twofold First it provides a novel and hop efully more understandable description of T Xs E formula layout algorithm This description was develop ed as part of a textb o ok Wilhelm and Heckmann on do cument pro cessing Second it extracts this particular subtask of T X from the monolithically designed T X system leading to the p ossibilitytostudy E E it indep endently and to p otentially use it in systems other than T X Some remarks on E R Heckmann and R Wilhelm status and availability of the implementation are contained in Section and in the conclusion Section The main task of the formula layout algorithm consists in translating formula terms a kind of abstract syntax for formulae into b ox terms which describ e the sizes and relative p ositions of the formula constituents in the nal layout Knuths original data structure for formula terms was designed to achieve space and time eciency In our opinion it is misconceived from a more logical p oint of view as it mixes semantic concepts with details concerning spacing Hence we prop ose a new data structure for formula terms with a clean and simple design This change do es not cause harm since formula terms do not o ccur in other subtasks of T X In contrast b ox terms are pro duced by nearly E all subtasks of T X Thus we tried to mimic the structure of Knuths original b ox terms E as close as p ossible in our SML co de in order to keep a welldened interface with other subtasks of the T X system eachofwhich might b e reimplemented in the future E Concerning the algorithm itself we also tried to catch exactly T Xs b ehavior but E still we cannot denitively claim that our algorithm is the same as Knuths original algorithm The change in the structure of formula terms and the switch to the functional paradigm causes big changes in the structure of the co de and the temp oral order of op erations On the other hand we claim that for equivalent input formulae the resulting box terms are equivalent apart from certain roundo errors Knuth uses a sophisticated selfdened arithmetic while we use SMLs standard arithmetic functions Our claim is conrmed by practical tests After programming a translator from b ox terms to dvi co de the standard output from T X we can print the results of our formula layout E program and verify that they are visually indistinguishable from the results of T X E In Section the overall view of formula layout is presented together with some details that inuence the typ esetting the stylesofformulae and subformulae Section style parameters and character dimensions Knuths account of these things is very concrete In contrast we present an abstract interface that hides the details of fonttable organization and makes clear how the information is used Section presents the output of the layout algorithm box terms and Section the input formula terms In Section a set of sp ecialized functions is presented that translate subformulae of various kinds into b ox terms Section treats the translation of whole formulae including the intro duction of implicit spaces b etween adjacententities of certain kinds A fair estimation of our achievements is contained in the conclusion Section Of course our functional solution is not simpler than the problem admits Formula layout is an inherently dicult problem not in terms of computational but of algorithmic complexity There are many dierent kinds of mathematical formulae whose layout is governed by tradition and aesthetics Algorithms for formula layout must distinguish many cases and pay attention to many little details In the presentation of our implementation we do not list the SML program in its natural ordering SML programs must b e written in a b ottomup style b ecause everything must b e dened prior to its usage For some of the description in a pap er like this a topdown approach is more suitable Tosave space we do not present the arrangement of the co de into signatures and structures every typ e sp ecication is in fact part of a signature while every function denition is part of some structure AFunctional Description of T Xs Formula Layout E AnOverview of Formula Layout Complete Processing of Formulae In T X a formula is entered using a formula description language which is a sublanguage E P n nn for instance is describ ed bythe of T Xs input language The formula i E i expression sumin i n n over Formulae are pro cessed as follows The string representing the formula is read and parsed into a formula termwhich essentially corresp onds to the logical structure of the formula but additionally contains some spacing information The formula term is recursively translated into a box termwhich describ es exactly the layoutoftheformula The b ox term corresp onding to the formula b ecomes a subterm of the b ox term for the page where the formula o ccurs The b ox terms for the pages of the do cument are translated into the dvi language and written to some le The dvile essentially consists of commands where to print which symb ols Before any formula is pro cessed a prepro cessing step reads information ab out the size of characters the thickness of fraction strokes etc This information comes from tfm leswhich exist for every combination of fontandtyp e size This pap er is only concerned with p oint The actual SML implementation also con tains the prepro cessing step addresses p oint in a trivial manner the whole page is a sequence of formulae and p erforms step completely in order to obtain a visible output It do es not yet address p ointieformulae must b e entered as formula terms in SML notation Formula Styles P n nn Formulae may app ear in two dierentcontexts as inline formulae eg i i within a line of text and as displayed formulae in a line of their own eg n X nn i i As one can see the layouts of inline and displayed formulae are quite dierent In our example this concerns the p ositions of the limits of the sum and the sizes of the sum symb ol and of the parts of the fraction These prop erties are inuenced bythelayout style Knuth a page Displayed formulae are typ eset in display style D while text style T is used for inline formulae There are two further styles which apply to certain subformulae script style S for scripts sup erscripts and subscripts and other subformulae with small typ esetting and script script style SS for scripts of scripts and other subformulae with tinytyp esetting In T X there are four more styles D T S and SS which are called cramped styles E R Heckmann and R Wilhelm They apply to subformulae that are placed under something else eg denominators In cramp ed styles sup erscripts are less raised than in the corresp onding uncramp ed styles Analyzing the usage of the eightTXstyles it turns out that they may b e regarded E as pairs of a main style and a Bo olean value cramp ed where the two comp onents are indep endently calculated and used Thus we decided to separate them completely and dene datatype style DTSSS Style Parameters Every style has its own rules where to place scripts numerators denominators etc These rules are given by style parameters read from the tfm les during the prepro cessing phase In T X these style parameters are accessed via the style size whichisidentical E for D and TInmany cases two dierentstyle parameters are used for the same purp ose one for style D and the other one for the remaining styles Our prepro cessing however pro duces slightly more high level style
Recommended publications
  • Typesetting Classical Greek Philology Could Not find Anything Really Suitable for Her
    276 TUGboat, Volume 23 (2002), No. 3/4 professor of classical Greek in a nearby classical high Philology school, was complaining that she could not typeset her class tests in Greek, as she could do in Latin. I stated that with LATEX she should not have any The teubner LATEX package: difficulty, but when I started searching on CTAN,I Typesetting classical Greek philology could not find anything really suitable for her. At Claudio Beccari that time I found only the excellent Greek fonts de- signed by Silvio Levy [1] in 1987 but for a variety of Abstract reasons I did not find them satisfactory for the New The teubner package provides support for typeset- Font Selection Scheme that had been introduced in LAT X in 1994. ting classical Greek philological texts with LATEX, E including textual and rhythmic verse. The special Thus, starting from Levy’s fonts, I designed signs and glyphs made available by this package may many other different families, series, and shapes, also be useful for typesetting philological texts with and added new glyphs. This eventually resulted in other alphabets. my CB Greek fonts that now have been available on CTAN for some years. Many Greek users and schol- 1 Introduction ars began to use them, giving me valuable feedback In this paper a relatively large package is described regarding corrections some shapes, and, even more that allows the setting into type of philological texts, important, making them more useful for the com- particularly those written about Greek literature or munity of people who typeset in Greek — both in poetry.
    [Show full text]
  • The File Cmfonts.Fdd for Use with Latex2ε
    The file cmfonts.fdd for use with LATEX 2".∗ Frank Mittelbach Rainer Sch¨opf 2019/12/16 This file is maintained byA theLTEX Project team. Bug reports can be opened (category latex) at https://latex-project.org/bugs.html. 1 Introduction This file contains the external font information needed to load the Computer Modern fonts designed by Don Knuth and distributed with TEX. From this file all .fd files (font definition files) for the Computer Modern fonts, both with old encoding (OT1) and Cork encoding (T1) are generated. The Cork encoded fonts are known under the name ec fonts. 2 Customization If you plan to install the AMS font package or if you have it already installed, please note that within this package there are additional sizes of the Computer Modern symbol and math italic fonts. With the release of LATEX 2", these AMS `extracm' fonts have been included in the LATEX font set. Therefore, the math .fd files produced here assume the presence of these AMS extensions. For text fonts in T1 encoding, the directive new selects the new (version 1.2) DC fonts. For the text fonts in OT1 and U encoding, the optional docstrip directive ori selects a conservatively generated set of font definition files, which means that only the basic font sizes coming with an old LATEX 2.09 installation are included into the \DeclareFontShape commands. However, on many installations, people have added missing sizes by scaling up or down available Metafont sources. For example, the Computer Modern Roman italic font cmti is only available in the sizes 7, 8, 9, and 10pt.
    [Show full text]
  • Surviving the TEX Font Encoding Mess Understanding The
    Surviving the TEX font encoding mess Understanding the world of TEX fonts and mastering the basics of fontinst Ulrik Vieth Taco Hoekwater · EuroT X ’99 Heidelberg E · FAMOUS QUOTE: English is useful because it is a mess. Since English is a mess, it maps well onto the problem space, which is also a mess, which we call reality. Similary, Perl was designed to be a mess, though in the nicests of all possible ways. | LARRY WALL COROLLARY: TEX fonts are mess, as they are a product of reality. Similary, fontinst is a mess, not necessarily by design, but because it has to cope with the mess we call reality. Contents I Overview of TEX font technology II Installation TEX fonts with fontinst III Overview of math fonts EuroT X ’99 Heidelberg 24. September 1999 3 E · · I Overview of TEX font technology What is a font? What is a virtual font? • Font file formats and conversion utilities • Font attributes and classifications • Font selection schemes • Font naming schemes • Font encodings • What’s in a standard font? What’s in an expert font? • Font installation considerations • Why the need for reencoding? • Which raw font encoding to use? • What’s needed to set up fonts for use with T X? • E EuroT X ’99 Heidelberg 24. September 1999 4 E · · What is a font? in technical terms: • – fonts have many different representations depending on the point of view – TEX typesetter: fonts metrics (TFM) and nothing else – DVI driver: virtual fonts (VF), bitmaps fonts(PK), outline fonts (PFA/PFB or TTF) – PostScript: Type 1 (outlines), Type 3 (anything), Type 42 fonts (embedded TTF) in general terms: • – fonts are collections of glyphs (characters, symbols) of a particular design – fonts are organized into families, series and individual shapes – glyphs may be accessed either by character code or by symbolic names – encoding of glyphs may be fixed or controllable by encoding vectors font information consists of: • – metric information (glyph metrics and global parameters) – some representation of glyph shapes (bitmaps or outlines) EuroT X ’99 Heidelberg 24.
    [Show full text]
  • 236 Tugboat, Volume 9 (1988), No. 3 Some Typesetting Conventions
    236 TUGboat, Volume 9 (1988), No. 3 Some Typesetting Conventions suitability to contents Graeme McKinstry, Not all these factors are equal in their effect on readability University of Otago, nor are all the factors within your control but it is possible New Zealand. to use some of the above factors to make your documents more readable. One of the major advantages of T@ is that it makes it possible for authors to typeset their own work. However, Typeface and size of type this new found power has not been automatically asso- There are two broad classes of fonts: serif ("serifs" are ciated with a knowledge of typesetting and typographic the finishing strokes at the end of letters) and sans-serif design and so some very unreadable documents have en- (without serifs, e.g., fonts such as Helvetica). Of the sued. This is further exacerbated by authors believing two, serif fonts (such as Computer Modem, and Times- they do know something about typesetting ("Doesn't ev- Roman) are easier to read for large quantities of text, eryone?") and ignoring all attempts to lead them in the "because it has been shown that we read our own lan- right direction, e.g., LV@. guage not letter by letter but by recognizing the shapes Although TEX users are less prone to fall into this of words . " [31. The serifs tend to help in this "shape trap as compared to your average WYSIWYG user there are recognition". For example try to decipher the following still some fundamental typographic lessons to be learned. two lines (they don't form words): These principles are so fundamental that even a com- puting consultant, such as myself, is able to learn, and possibly even more importantly, understand why we have them.
    [Show full text]
  • The Latex2ε Package Ccfonts
    The LATEX 2" package ccfonts Walter Schmidt∗ (v1.2 { 2020/03/25) Contents 1 Prerequisites 1 2 Using the package 2 2.1 Package options . 3 2.2 Font encoding . 3 3 Known problems 3 4 NFSS classification of the Concrete typefaces 4 5 Implementation 4 5.1 Font setup for text mode . 4 5.2 Options . 5 5.2.1 Standard leading . 5 5.2.2 The option exscale .................. 5 5.2.3 The option slantedGreek ............... 6 5.3 The option boldsans ...................... 6 5.3.1 Processing options . 6 5.4 Font setup for math mode . 6 5.5 Initialization . 7 1 Prerequisites In order to make use of the package ccfonts, the following fonts and .fd files are required: ∗[email protected] 1 • The Concrete text fonts with traditional encoding (CTAN: fonts/ concrete/) • The Concrete text fonts with European encoding (CTAN: fonts/ ecc/) • The mathematical Concrete fonts (CTAN: fonts/concmath/) • The .fd files for the traditional and mathematical Concrete fonts (CTAN: macros/latex/contrib/supported/concmath/) • The .fd files for the European Concrete fonts, which are distributed and installed in conjunction with the ccfonts package On CTAN the fonts are available in METAFONT format. The Concrete typefaces are also provided in Type1 format from Micropress Inc, see <http://www.micropress-inc.com>. 2 Using the package The LATEX macro package ccfonts supports typesetting with the font fam- ily `Concrete'. Loading this package through \usepackage{ccfonts} will effect the following: • The default roman font family is changed to ccr, i.e. Concrete. • The default leading (\baselineskip) for the font sizes 8{12 pt is increased slightly.
    [Show full text]
  • P Font-Change Q UV 3
    p font•change q UV Version 2015.2 Macros to Change Text & Math fonts in TEX 45 Beautiful Variants 3 Amit Raj Dhawan [email protected] September 2, 2015 This work had been released under Creative Commons Attribution-Share Alike 3.0 Unported License on July 19, 2010. You are free to Share (to copy, distribute and transmit the work) and to Remix (to adapt the work) provided you follow the Attribution and Share Alike guidelines of the licence. For the full licence text, please visit: http://creativecommons.org/licenses/by-sa/3.0/legalcode. 4 When I reach the destination, more than I realize that I have realized the goal, I am occupied with the reminiscences of the journey. It strikes to me again and again, ‘‘Isn’t the journey to the goal the real attainment of the goal?’’ In this way even if I miss goal, I still have attained goal. Contents Introduction .................................................................................. 1 Usage .................................................................................. 1 Example ............................................................................... 3 AMS Symbols .......................................................................... 3 Available Weights ...................................................................... 5 Warning ............................................................................... 5 Charter ....................................................................................... 6 Utopia .......................................................................................
    [Show full text]
  • Development of Computer Typesetting
    Early steps in computer typesetting in the 1960s Jonathan Seybold, September 2018 1961–1964 Michael Barnett’s “Experiments in Typesetting” In 1961, Michael Barnett, an associate professor at MIT wrote a computer program that could produce punched paper tape output to drive a phototypesetting machine. He used this to produce the “Tail” from Alice in Wonderland, and a phototypeset press release. These were the first documents that were phototypeset from output generated by a computer. In 1962 Barnett received a research grant to continue this experiments. This lead to development of the PC6 system, which was used to produce a variety of reports, pamphlets and other publications in late 1963 and early 1964.1 Hardware: IBM 709/90 computer and a Photon 560 phototypesetter. Software: Written in Fortran, with a few routines written in FAP (Fortran Assembler). Written for a specific Photon 560 set-up. Typefonts were identified by disk position and row number. The TYPRINT program for text output composed text to fit a predefined page width and depth. Lines were broken after the last complete word that would fit on that line. There was no attempt at hyphenation. Pages were arbitrarily broken after the last line that would fit on the page. Special commands could be embedded in the text to tie text elements together. When it encountered these, the program would simply make the page as long as necessary to accommodate all of the text in the “no break” area. Given the scientific academic setting, the most notable feature of TYPRINT was a program written by J.M.
    [Show full text]
  • Typographic Terms Alphabet the Characters of a Given Language, Arranged in a Traditional Order; 26 Characters in English
    Typographic Terms alphabet The characters of a given language, arranged in a traditional order; 26 characters in English. ascender The part of a lowercase letter that rises above the main body of the letter (as in b, d, h). The part that extends above the x-height of a font. bad break Refers to widows or orphans in text copy, or a break that does not make sense of the phrasing of a line of copy, causing awkward reading. baseline The imaginary line upon which text rests. Descenders extend below the baseline. Also known as the "reading line." The line along which the bases of all capital letters (and most lowercase letters) are positioned. bleed An area of text or graphics that extends beyond the edge of the page. Commercial printers usually trim the paper after printing to create bleeds. body type The specific typeface that is used in the main text break The place where type is divided; may be the end of a line or paragraph, or as it reads best in display type. bullet A typeset character (a large dot or symbol) used to itemize lists or direct attention to the beginning of a line. (See dingbat.) cap height The height of the uppercase letters within a font. (See also cap line.) caps and small caps The typesetting option in which the lowercase letters are set as small capital letters; usually 75% the height of the size of the innercase. Typographic Terms character A symbol in writing. A letter, punctuation mark or figure. character count An estimation of the number of characters in a selection of type.
    [Show full text]
  • The Stix Package
    The stix package STI Pub Companies v1.1.2-latex from 2015/04/17 Contents 1 Introduction 2 2 Usage 2 2.1 Options.............................................2 2.2 Compatibility with other packages...............................3 2.3 Feedback............................................3 3 Math alphabets 3 4 Math symbols 4 4.1 Alphabetics...........................................4 4.2 Ordinary symbols........................................4 4.3 Binary operators........................................8 4.4 Relations............................................ 10 4.5 Punctuation........................................... 18 4.6 Integrals............................................. 18 4.7 Big operators.......................................... 19 4.8 Delimiters............................................ 19 4.9 Other bracess.......................................... 21 4.10 Accents............................................. 21 4.11 Over and under brackets.................................... 22 4.12 Radicals............................................. 22 5 Font tables 23 5.1 Text fonts............................................ 23 5.2 Math fonts............................................ 30 1 1 Introduction The mission of the Scientific and Technical Information Exchange (STIX) font creation project is the prepara- tion of a comprehensive set of fonts that serve the scientific and engineering community in the process from manuscript creation through final publication, both in electronic and print formats. Toward this purpose, the STIX fonts
    [Show full text]
  • The Begingreek Package
    The begingreek package Claudio Beccari – claudio dot beccari at gmail dot com Version v.1.5 of 2015/02/16 Contents 5 The new greek environment 3 1 Introduction 1 6 The command \greektxt 4 2 Usage 2 7 Font shapes and series 4 8 Examples 5 3 Incomplete fonts and differ- ent encoding 3 9 Acknowledgements 5 4 Default font control 3 10 The code 6 Abstract This small extension module defines the environment greek to be used with pdfLaTeX so as to imitate the similar environment defined in polyglossia. A corresponding command, \greektxt, is also defined. Of course there are some differences, but it has been used extensively and it is useful. 1 Introduction When using pdfLaTeX and babel, language changes are done with babel’s lan- guage switching commands \selectlanguage, \foreignlanguage or with the environments otherlanguage and otherlanguage*. They work fine, but some- times it is better to have a more “agile” command or environment that can be used at least in place of the last three commands or environments. Some more extra functionalities may be desirable, such as the possibility of specifying a different default font family or to switch font family for a particular stretch of Greek text. This small module does exactly what is described above; it has been created only to be used with pdfLaTeX, therefore if it gets loaded with a package that will be run with XeLaTeX or LuaLaTex it will complain loudly and its loading will be aborted; no, not simply that: the only way to exit from this wrong situation is to quit compilation.
    [Show full text]
  • The Font Family
    The font family Alain Aubord Sourire Informatique Geneva / Switzerland Antonis Tsolomitis Laboratory of Digital Typography and Mathematical Software Department of Mathematics University of the Aegean January 1 Introduction The Didot family of the Greek Font Society was made available for free in autumn 2005. This font existed with a commercial license for many years before. Support for LaTeX and the babel package was prepared several years ago by the author and I. Vasilogiorgakis (A. Aubord has done some modifications to improve the support of the accented and other special characters). With the free availability of the fonts I have modified the original package so that it reflects the changes occured in the latest releases by . The package supports four encodings: OT1, T1, TS1 and LGR. When some characters are missing in the Didot font, they are taken from the font TEX Gyre Pagella and, for the oblique rule (character 0x20 in OT1 encoding) used to build the character L with stroke (Ł), from the cmr family. All the provided encodings should be fairly complete. The greek part is to be used with the greek option of the Babel package. The Didot family in LGR and OT1 encoding has been tested successfuly with Plain TEX too. The fonts are loaded with the LATEX command (to be given in the preamble of a LATEX document): \usepackage{gfsdidot}. This command loads implicitely the package textcomp (Text Compan- ion) which defines the default values for some characters in the LATEX font selection mechanism. If this behaviour is not desired, it is possi- ble to prevent the loading of the textcomp package by passing the option noGFSDidotTS1 when calling the package gfsdidot: \usepackage[noGFS DidotTS1]{gfsdidot}.
    [Show full text]
  • The Comicsans Pacakge
    The comicsans package∗ Scott Pakin [email protected] December 19, 2013 1 Introduction The comicsans package makes Microsoft's Comic Sans font available to LATEX 2". comicsans supports all of the following: • Roman text, boldface text, SMALL-CAPS TEXT, and—with a little extra effort—italic text • Кирилица (римский шрифт, жирный шрифт, каллиграфический шрифт) • Mathematics using Comic Sans wherever possible: ′ log 2" 1 k y (x) 3 10 3 + k=x pk1 Comic Sans is a TrueType (TTF) font. As such, it works particularly well with pdfLATEX, which natively supports TrueType fonts. Some TEX distribu- tions also support dynamic conversion of TTF to PK (a bitmapped font format long used by TEX) so TEX backends other than pdfTEX can (indirectly) utilize TrueType fonts, as well. 2 Installation The following is a brief summary of the comicsans installation procedure: 1. Acquire and install the Comic Sans TrueType (.ttf) files. 2. [Optional] Generate the italic and/or Cyrillic variants of Comic Sans 3. Install the comicsans font files and refresh the TEX filename database. ∗This document corresponds to comicsans v1.0g, dated 2013/12/19. 1 4. Point the TEX backends to the comicsans files. Details are presented in Sections 2.1–2.4. 2.1 Acquire and install the TrueType files comicsans requires the Comic Sans and Comic Sans Bold TrueType files (comic.ttf and comicbd.ttf). You may already have these installed. (On Windows, look in C:\WINDOWS\Fonts for Comic Sans MS (True- Type) and Comic Sans MS Bold (TrueType).) If not, see if a package called msttcorefonts is available for your operating system or operating-system distribution.
    [Show full text]