A Functional Description of TEX's Formula Layout

AFunctional Description of T Xs Formula Layout E REINHOLD HECKMANN REINHARD WILHELM Fachbereich Informatik Universitat des Saarlandes Saarbrucke n Germany fheckmannwilhelmgcsunisbde Abstract While the quality of the results of T Xs mathematical formula layout algorithm is convincing E its original description is hard to understand since it is presented as an imp erative program with complex control ow and destructive manipulati ons of the data structures representing formulae In this pap er we present a reimplementation of T Xs formula layout algorithm in E the functional language SML therebyproviding a more readable description of the algorithm extracted from the monolithical T X system E Intro duction The mathematical formula layout algorithm used byDEKnuths T Xtyp esetting E system generates remarkably go o d output However any attempt to understand the rea sons for this success leads to deep frustration The algorithm is informally describ ed in App endix G of the T Xb o ok Knuth a using English prose with some formal frag E ments While this description provides a useful and welcome overview the details are not completely correct A complete and exact description of the whole T X implementa E tion is presented in T X The Program Knuth b It contains the full source co de E of the T X system structured into logical units well commented and do cumented by E cross references Nevertheless anyone who wishes to gain a full understanding of T Xs E algorithms from these descriptions must invest great eorts The reason is that b oth the do cumented source co de and the informal description are typical examples of imp erative programs involving complex control ow including gotos and complicated manipula tions of the various data structures In particular the usage of global variables obscures the interdep endencies b etween the subtasks It is by no means obvious where certain information is pro duced where it is changed and where it is consumed In this pap er we present a new implementation of T Xs formula layout using the E functional language SML Paulson The purp ose of this reimplementation eort is twofold First it provides a novel and hop efully more understandable description of T Xs E formula layout algorithm This description was develop ed as part of a textb o ok Wilhelm and Heckmann on do cument pro cessing Second it extracts this particular subtask of T X from the monolithically designed T X system leading to the p ossibilitytostudy E E it indep endently and to p otentially use it in systems other than T X Some remarks on E R Heckmann and R Wilhelm status and availability of the implementation are contained in Section and in the conclusion Section The main task of the formula layout algorithm consists in translating formula terms a kind of abstract syntax for formulae into b ox terms which describ e the sizes and relative p ositions of the formula constituents in the nal layout Knuths original data structure for formula terms was designed to achieve space and time eciency In our opinion it is misconceived from a more logical p oint of view as it mixes semantic concepts with details concerning spacing Hence we prop ose a new data structure for formula terms with a clean and simple design This change do es not cause harm since formula terms do not o ccur in other subtasks of T X In contrast b ox terms are pro duced by nearly E all subtasks of T X Thus we tried to mimic the structure of Knuths original b ox terms E as close as p ossible in our SML co de in order to keep a welldened interface with other subtasks of the T X system eachofwhich might b e reimplemented in the future E Concerning the algorithm itself we also tried to catch exactly T Xs b ehavior but E still we cannot denitively claim that our algorithm is the same as Knuths original algorithm The change in the structure of formula terms and the switch to the functional paradigm causes big changes in the structure of the co de and the temp oral order of op erations On the other hand we claim that for equivalent input formulae the resulting box terms are equivalent apart from certain roundo errors Knuth uses a sophisticated selfdened arithmetic while we use SMLs standard arithmetic functions Our claim is conrmed by practical tests After programming a translator from b ox terms to dvi co de the standard output from T X we can print the results of our formula layout E program and verify that they are visually indistinguishable from the results of T X E In Section the overall view of formula layout is presented together with some details that inuence the typ esetting the stylesofformulae and subformulae Section style parameters and character dimensions Knuths account of these things is very concrete In contrast we present an abstract interface that hides the details of fonttable organization and makes clear how the information is used Section presents the output of the layout algorithm box terms and Section the input formula terms In Section a set of sp ecialized functions is presented that translate subformulae of various kinds into b ox terms Section treats the translation of whole formulae including the intro duction of implicit spaces b etween adjacententities of certain kinds A fair estimation of our achievements is contained in the conclusion Section Of course our functional solution is not simpler than the problem admits Formula layout is an inherently dicult problem not in terms of computational but of algorithmic complexity There are many dierent kinds of mathematical formulae whose layout is governed by tradition and aesthetics Algorithms for formula layout must distinguish many cases and pay attention to many little details In the presentation of our implementation we do not list the SML program in its natural ordering SML programs must b e written in a b ottomup style b ecause everything must b e dened prior to its usage For some of the description in a pap er like this a topdown approach is more suitable Tosave space we do not present the arrangement of the co de into signatures and structures every typ e sp ecication is in fact part of a signature while every function denition is part of some structure AFunctional Description of T Xs Formula Layout E AnOverview of Formula Layout Complete Processing of Formulae In T X a formula is entered using a formula description language which is a sublanguage E P n nn for instance is describ ed bythe of T Xs input language The formula i E i expression sumin i n n over Formulae are pro cessed as follows The string representing the formula is read and parsed into a formula termwhich essentially corresp onds to the logical structure of the formula but additionally contains some spacing information The formula term is recursively translated into a box termwhich describ es exactly the layoutoftheformula The b ox term corresp onding to the formula b ecomes a subterm of the b ox term for the page where the formula o ccurs The b ox terms for the pages of the do cument are translated into the dvi language and written to some le The dvile essentially consists of commands where to print which symb ols Before any formula is pro cessed a prepro cessing step reads information ab out the size of characters the thickness of fraction strokes etc This information comes from tfm leswhich exist for every combination of fontandtyp e size This pap er is only concerned with p oint The actual SML implementation also con tains the prepro cessing step addresses p oint in a trivial manner the whole page is a sequence of formulae and p erforms step completely in order to obtain a visible output It do es not yet address p ointieformulae must b e entered as formula terms in SML notation Formula Styles P n nn Formulae may app ear in two dierentcontexts as inline formulae eg i i within a line of text and as displayed formulae in a line of their own eg n X nn i i As one can see the layouts of inline and displayed formulae are quite dierent In our example this concerns the p ositions of the limits of the sum and the sizes of the sum symb ol and of the parts of the fraction These prop erties are inuenced bythelayout style Knuth a page Displayed formulae are typ eset in display style D while text style T is used for inline formulae There are two further styles which apply to certain subformulae script style S for scripts sup erscripts and subscripts and other subformulae with small typ esetting and script script style SS for scripts of scripts and other subformulae with tinytyp esetting In T X there are four more styles D T S and SS which are called cramped styles E R Heckmann and R Wilhelm They apply to subformulae that are placed under something else eg denominators In cramp ed styles sup erscripts are less raised than in the corresp onding uncramp ed styles Analyzing the usage of the eightTXstyles it turns out that they may b e regarded E as pairs of a main style and a Bo olean value cramp ed where the two comp onents are indep endently calculated and used Thus we decided to separate them completely and dene datatype style DTSSS Style Parameters Every style has its own rules where to place scripts numerators denominators etc These rules are given by style parameters read from the tfm les during the prepro cessing phase In T X these style parameters are accessed via the style size whichisidentical E for D and TInmany cases two dierentstyle parameters are used for the same purp ose one for style D and the other one for the remaining styles Our prepro cessing however pro duces slightly more high level style

Load more