AFunctional Description of

T Xs Formula Layout

E

REINHOLD HECKMANN REINHARD WILHELM

Fachbereich Informatik Universitat des Saarlandes

Saarbrucke n Germany

fheckmannwilhelmgcsunisbde

Abstract

While the quality of the results of T Xs mathematical formula layout is convincing

E

its original description is hard to understand since it is presented as an imp erative program

with complex control ow and destructive manipulati ons of the data structures representing

formulae In this pap er we present a reimplementation of T Xs formula layout algorithm in

E

the functional language SML therebyproviding a more readable description of the algorithm

extracted from the monolithical T X system

E

Intro duction

The mathematical formula layout algorithm used byDEKnuths T Xtyp esetting

E

system generates remarkably go o d output However any attempt to understand the rea

sons for this success leads to deep frustration The algorithm is informally describ ed in

App endix G of the T Xb o ok Knuth a using English prose with some formal frag

E

ments While this description provides a useful and welcome overview the details are

not completely correct A complete and exact description of the whole T X implementa

E

tion is presented in T X The Program Knuth b It contains the full source co de

E

of the T X system structured into logical units well commented and do cumented by

E

cross references Nevertheless anyone who wishes to gain a full understanding of T Xs

E

from these descriptions must invest great eorts The reason is that b oth the

do cumented source co de and the informal description are typical examples of imp erative

programs involving complex control ow including gotos and complicated manipula

tions of the various data structures In particular the usage of global variables obscures

the interdep endencies b etween the subtasks It is by no means obvious where certain

information is pro duced where it is changed and where it is consumed

In this pap er we present a new implementation of T Xs formula layout using the

E

functional language SML Paulson The purp ose of this reimplementation eort is

twofold First it provides a and hop efully more understandable description of T Xs

E

formula layout algorithm This description was develop ed as part of a textb o ok Wilhelm

and Heckmann on do cument pro cessing Second it extracts this particular subtask

of T X from the monolithically designed T X system to the p ossibilitytostudy

E E

it indep endently and to p otentially use it in systems other than T X Some remarks on E

R Heckmann and R Wilhelm

status and availability of the implementation are contained in Section and in the

conclusion Section

The main task of the formula layout algorithm consists in translating formula terms a

kind of abstract syntax for formulae into b ox terms which describ e the sizes and relative

p ositions of the formula constituents in the nal layout Knuths original data structure

for formula terms was designed to achieve and time eciency In our opinion it

is misconceived from a more logical p oint of view as it mixes semantic concepts with

details concerning spacing Hence we prop ose a new data structure for formula terms

with a clean and simple design This change do es not cause harm since formula terms

do not o ccur in other subtasks of T X In contrast b ox terms are pro duced by nearly

E

all subtasks of T X Thus we tried to mimic the structure of Knuths original b ox terms

E

as close as p ossible in our SML co de in order to keep a welldened interface with other

subtasks of the T X system eachofwhich might b e reimplemented in the future

E

Concerning the algorithm itself we also tried to catch exactly T Xs b ehavior but

E

still we cannot denitively claim that our algorithm is the same as Knuths original

algorithm The change in the structure of formula terms and the switch to the functional

paradigm causes big changes in the structure of the co de and the temp oral order of

op erations On the other hand we claim that for equivalent input formulae the resulting

box terms are equivalent apart from certain roundo errors Knuth uses a sophisticated

selfdened arithmetic while we use SMLs standard arithmetic functions Our claim

is conrmed by practical tests After programming a translator from b ox terms to dvi

co de the standard output from T X we can print the results of our formula layout

E

program and verify that they are visually indistinguishable from the results of T X

E

In Section the overall view of formula layout is presented together with some details

that inuence the typ esetting the stylesofformulae and subformulae Section style

parameters and character dimensions Knuths account of these things is very

concrete In contrast we present an abstract interface that hides the details of fonttable

organization and makes clear how the information is used

Section presents the output of the layout algorithm box terms and Section the

input formula terms In Section a set of sp ecialized functions is presented that translate

subformulae of various kinds into b ox terms Section treats the translation of whole

formulae including the intro duction of implicit spaces b etween adjacententities of certain

kinds

A fair estimation of our achievements is contained in the conclusion Section Of

course our functional solution is not simpler than the problem admits Formula layout

is an inherently dicult problem not in terms of computational but of algorithmic

complexity There are many dierent kinds of mathematical formulae whose layout is

governed by tradition and aesthetics Algorithms for formula layout must distinguish

many cases and pay attention to many little details

In the presentation of our implementation we do not list the SML program in its

natural ordering SML programs must b e written in a b ottomup style b ecause everything

must b e dened prior to its usage For some of the description in a pap er like this a

topdown approach is more suitable Tosave space we do not present the arrangement

of the co de into signatures and structures every typ e sp ecication is in fact part of a

signature while every function denition is part of some structure

AFunctional Description of T Xs Formula Layout

E

AnOverview of Formula Layout

Complete Processing of Formulae

In T X a formula is entered using a formula description language which is a sublanguage

E

P

n nn

for instance is describ ed bythe of T Xs input language The formula i

E

i



expression

sumin i n n over

Formulae are pro cessed as follows

The string representing the formula is read and parsed into a formula termwhich

essentially corresp onds to the logical structure of the formula but additionally

contains some spacing information

The formula term is recursively translated into a box termwhich describ es exactly

the layoutoftheformula

The b ox term corresp onding to the formula b ecomes a subterm of the b ox term for

the where the formula o ccurs

The b ox terms for the pages of the do cument are translated into the dvi language

and written to some le The dvile essentially consists of commands where to

print which symb ols

Before any formula is pro cessed a prepro cessing step reads information ab out the size

of characters the thickness of fraction strokes etc This information comes from tfm

leswhich exist for every combination of fontandtyp e size

This pap er is only concerned with p oint The actual SML implementation also con

tains the prepro cessing step addresses p oint in a trivial manner the whole page is a

sequence of formulae and p erforms step completely in order to obtain a visible output

It do es not yet address p ointieformulae must b e entered as formula terms in SML

notation

Formula Styles

P

n nn

Formulae may app ear in two dierentcontexts as inline formulae eg i

i



within a line of text and as displayed formulae in a line of their own eg

n

X

nn

i

i

As one can see the layouts of inline and displayed formulae are quite dierent In our

example this concerns the p ositions of the limits of the sum and the sizes of the sum

symb ol and of the parts of the fraction These prop erties are inuenced bythelayout

style Knuth a page Displayed formulae are typ eset in display style D while

text style T is used for inline formulae There are two further styles which apply to

certain subformulae style S for scripts sup erscripts and subscripts and other

subformulae with small typ esetting and script script style SS for scripts of scripts and

other subformulae with tinytyp esetting

In T X there are four more styles D T S and SS which are called cramped styles E

R Heckmann and R Wilhelm

They apply to subformulae that are placed under something else eg denominators In

cramp ed styles sup erscripts are less raised than in the corresp onding uncramp ed styles

Analyzing the usage of the eightTXstyles it turns out that they may b e regarded

E

as pairs of a main style and a Bo olean value cramp ed where the two comp onents are

indep endently calculated and used Thus we decided to separate them completely and

dene

datatype style DTSSS

Style Parameters

Every style has its own rules where to place scripts numerators denominators etc These

rules are given by style parameters read from the tfm les during the prepro cessing

phase In T X these style parameters are accessed via the style size whichisidentical

E

for D and TInmany cases two dierentstyle parameters are used for the same purp ose

one for style D and the other one for the remaining styles Our prepro cessing however

pro duces slightly more high level style parameters that dep end on the style directlyThe

conversion from styles to style sizes and the distinction b etween D and the remaining

styles is done internally

Hence nearly all style parameters are given by functions of typ e style dist where

dist is the typ e used for horizontal and vertical distances and lengths As in T X

E



dist is an integer typ e where a dist value of corresp onds to pt Here pt is

the abbreviation of p oint a traditional unit of measure for printers and comp ositors in

Englishsp eaking countries In T X one inch equals exactly pt

E

In this overview we present a few imp ortantstyle parameters only The remaining ones

are intro duced when needed For example the parameters p ertaining to the p ositions of

subscripts are only explained when the typ esetting of subscripts is presented

Style parameter xHeight provides the height of the letter x in the currentstyle Param

eter MathUnit yields the size of a mathematical unit whichbyT Xs denition is one

E

eighteenth of the width of the letter M in the current Parameter RuleThickness

y InT Xs contains the thickness of fraction strokes and of over and underlines as in x

E

layout algorithm and in our reimplementation it is also used to control spacing an

abuse that makes it imp ossible to change the rule thickness separately from the spacing

Parameter AxisHeight contains the vertical distance from the axis to the

The latter is the line where most of the letters and all digits sit on while the former



is the line where to put fraction strokes consider for instance the baseline is



at the b ottom of the digit The axis also plays some role outside of fractions In a

welldesigned mathematical font manysymb ols are placed symmetrically to the axis

 

consider plus and equal in

 

Apart from the style parameters the layout is inuenced by some constants that do not

dep end on the style While their role in layout is similar to that of the style parameters

they have a completely dierent origin they are not read from the tfm les but explicitly

sp ecied in the co de A sample constantisscriptSpace of typ e dist which sp ecies an additional white space of pt to b e inserted after sup erscripts and subscripts

AFunctional Description of T Xs Formula Layout

E

Characters and Character Dimensions

The main result of the prepro cessing phase is the information ab out the dimensions

of every single character Before we presenthow this information is provided to the

remainder of the program wehave to consider howcharacters are co ded internally

x

Consider say the formula x whichisentered as xx In the formula description there

is no dierence b etween the two o ccurrences of x This is still the case in the formula term

obtained by parsing the description Both o ccurrences corresp ond to the same subterm

which is a pair consisting of a font family math italic in our example and a character

code in our case After typ esetting however the two o ccurrences of x are dierent

in size the font family has b een replaced by a concrete font that is dierent for the two

o ccurrences The choice of some concrete font from a font family of course dep ends on

the style T for the rst o ccurrence of x and S for the second

The prepro cessing phase of our algorithm provides the typ es that are needed for these

enco dings family for font families fontNr for fontnumb ers that refer to actual

and charCode for character co des It also provides the function

fontNumber style family fontNr

for the styledep endent selection of a concrete font from a font family

The tfm les contain four basic size parameters for eachcharacter in every font These

parameters are read by the prepro cessing phase and made available via the following

functions

charHeight charDepth charWidth charItalic

fontNr charCode dist

The height of a character is the distance from its top end to the baseline eg a and

g have the same height and f is bigger The depth is the distance from the baseline

to the b ottom end eg a has depth whereas g has p ositive depth The width is

the horizontal size as it is used in the comp osition of ordinary text It do es not takeinto

account a p ossible extension of the upp er right part of the character This extra extension

ie the dierence b etween the overall horizontal extension and the given width of the

character is the italic correction provided by function charItalic It b ecomes visible



in cases where b oth sup erscripts and subscripts are attachedtoacharacter in f for



instance the horizontal distance b etween the two scripts is the amount of the italic

correction of the letter f

It is a particularly painful subtask of formula layout to decide whether the italic cor

rection should b e added to the width of a particular o ccurrence of a character In the

original T X program this issue is complicated by the habit of taking advantage of the

E

destructive nature of imp erative programming by subtracting some corrections that were

added in earlier stages of the layout pro cess In our SML program an added correction

will never b e removed again

The tfm les contain another kind of information whichismadeavailable by

larger fontNr charCode charCode

For some characters a larger version is available in the same font In this case the co de

of this larger version is returned while the co de is unchanged otherwise

R Heckmann and R Wilhelm

Internally all character information is stored in vectors to guarantee ecient access

Top Level Layout Functions

Recall that formulae may app ear in two contexts inline and displayed Therefore we

dene two dierent functions for the translation from formula terms to b ox terms

inlineFormula displayFormula mlist hlist

where mlist mathematical list is the typ e of formula terms Section and hlist

horizontal list is the typ e of b ox terms Section

Of course the two functions are closely related we implementthemby a more general

function that will also b e used for the recursive translation of subformulae This function

will of course dep end on the style and the Bo olean cramp edness Apart from the style

there is another dierence b etween inline and displayed formulae inline formulae may

b e broken across lines while display formulae may not Hence inline formulae must b e

equipp ed with penalty information telling the line breaking algorithm where to break

On the other hand p enalties need not b e inserted into displayed formulae Thus the

more general function has a second Bo olean argument which indicates whether p enalties

should b e inserted

val MListToHList style bool bool mlist hlist

val displayFormula MListToHList D false false

val inlineFormula MListToHList T false true

Before we describ e the implementation of MListToHListwemust present the data

structures for formula terms mlist and b ox terms hlist

The Target RepresentationBoxTerms

The main task of formula layout is the translation of formula terms into b ox terms

In Sections through we describ e these b ox terms informally on a semantical

level The actual T X and SML implementations follow in the remaining subsections In

E

particular contains the necessary typ e denitions and the computation of the

dimensions of b ox terms The remaining two subsections describ e a selection of basic

functions to build or manipulate b oxes

Boxes and their Dimensions

A box is a rectangle whose edges are parallel to the page edges Eachbox comes with a

horizontal baseline and a referencepoint that is situated at the p oint where the baseline

meets the left of the b ox The reference p oint is used to p osition the b ox within

its For the outside world a b oxischaracterized by three size parameters of

typ e dist height h depth dand width w The heightisthevertical distance from the

top margin to the baseline the depth is the vertical distance from the baseline to the

b ottom margin and the width is the horizontal distance from the left to the right margin see Figure

AFunctional Description of T Xs Formula Layout

E

h



d



 

w

Fig Box with height hdepth d and width w

Each of these dimensions may b e negative To understand what this means wemust

clarify the meaning of distance from A to B Each page is equipp ed with a Cartesian

co ordinate system whose xaxis p oints to the right as usual while the y axis p oints

downward These directions were chosen to followtheow of text for scripts commonly

used by Europ ean languages Using these co ordinates wemay dene

Horizontal distance from A to B xco ordinate of B xco ordinate of A

Vertical distance from A to B y co ordinate of B y co ordinate of A

Most b oxes have p ositive width which means that their left margin is to the left of their

right margin Boxes with negative width may for instance arise from negative horizontal

spaces They have the prop erty that their left margin which carries the reference p oint

is to the right of their right margin which might app ear surprising Nevertheless this

do es corresp ond to their unusual physical prop erties they really do have negative width

whichwould b e imp ossible for any concrete physical b o dy

Height and depth are p ositive if the baseline lies within the shap e of the b ox ie b elow

the top margin and ab ove the b ottom margin The height is negative if the baseline is

ab ove the top margin and the depth is negativeifitisbelow the b ottom margin

It is imp ortanttonotethatabox on its own do es not have an absolute p osition on

the page The p osition of a b ox is only xed relative to the reference p oint of a sup erb ox

once the sup erb ox has b een formed Accordinglyevery b ox xes the p ositions of its

subb oxes relative to its own reference p oint In fact it is dicult in T Xtoxanything

E

to a particular p oint on the page

Horizontal Combination of Boxes

Boxes can b e combined to form larger b oxes either horizontally or vertically In a standard

horizontal combination of a list of b oxes the constituents are placed next to each other

from left to right so that their baselines app ear on the same horizontal line and the

right margin of eachbox is aligned with the left margin of the next b ox The reference

p oint of the combination is that of the leftmost constituent Figure shows a horizontal

combination of three b oxes The baseline of the combined b ox is the b old line through the

middle of the b ox The small gaps b etween anytwoneighb oring b oxes as well as b etween

the subb oxes and the surrounding b ox were intro duced to enhance the readabilityof

R Heckmann and R Wilhelm

w



h max h h h

 

h



h

h



d

d



d

w



d max d d d

 

w



w w w w

 

Fig A horizontal combination of three b oxes

the gure In reality the three subb oxes toucheach other and sit tightly within the

surrounding b ox Hence the natural width of the combined b ox is the sum of the

widths of the constituents

There are two mo dications of this standard pro cedure First the surrounding b oxmay

b e given a width dierent from its natural width This is done bystretching or shrinking

glue ie white space b oxes whose width admits some variability The main application

is the adjustment of lines of text to a common width where the glue comp onents are the

interword spaces As we shall see there are some applications of this principle in formula

layout as well

The second mo dication consists in the p ossibilitytovertically shift some b oxes in

the horizontal list The amount of shift is added to the y co ordinate and as the y axis

p oints downward a p ositiveamount means a downward shift Note that there is some

redundancy here a b oxinahorizontal list may b e shifted downward by an amount s

using the metho d just describ ed or by simply adding s to the depth and subtracting

s from the height In the original T X algorithm b oth metho ds o ccur We also use

E

b oth metho ds dep ending on when exactly the shift is p erformed when the subb oxis

constructed mo dify height and depth or when the horizontal list is formed

Vertical Combination of Boxes

A list of b oxes mayalsobecombined vertically In a standard vertical combination

the b oxes are placed in a vertical row from top to b ottom so that their left margins

are aligned and the b ottom margin of eachbox is aligned with the top margin of its

successor There is no canonical choice for the reference p oint of the surrounding b ox

Later we shall meet cases where it is the reference p ointofthetopmostbox of the b ox

at the b ottom or of some b oxinbetween In Figure we presenttwo examples As in

the previous gure the small gaps only exist for b etter visibility

In analogy to the case of horizontal combination there are two mo dications of the

standard pro cedure The rst is stretching shrinking to a predened height whichdoes

not o ccur in formula layout The second is the p ossibility to shift some b oxes to the right

by a given amount which might b e negative In contrast to the situation in horizontal

lists the right shift cannot b e replaced by some manipulationofthebox dimensions

AFunctional Description of T Xs Formula Layout

E

h h

h h

h h d h



d d

w w

h h

 

d d h d

 

d d

 

d d



w w

 

w max w w w max w w

 

Fig Two examples for vertical combinations

Implementation of the Box Types

In the actual T X implementation the entities that were called b oxes in the ab ove

E

informal description are known as no des There are several kinds of elementary no des

the most prominent b eing character no des kern no des white space of xed size and glue

no des variably sized white space Kern no des and glue no des have a socalled width

which is their real width in a horizontal list or the heightinavertical list Comp ound

no des are called b oxes b esides a description of their content a list of no des they

consist of an indicator telling whether the list is horizontal or vertical elds containing

the three dimensions a eld for the shift amountwhich applies to them and some further

information Rememb er that the interpretation of the shift amount dep ends on whether

the b ox is placed in a horizontal or a vertical list

In our SML implementation we basically copied this data structure For obvious rea

sons we renamed the width of kern and glue no des into size The only more serious

change is the decision to extract the shift eld from a b ox The logical reason is that

the shift is not an intrinsic prop erty ofabox but something that is applied to it by

its context The pragmatic reason is that the shift is usually unknown when the b oxis

built so that some dummyvalue must b e assigned to the shift eld in the b ox itself to

be changed to the prop er value at some later stage While p ossible in principle sucha

pro cedure is unelegant and inappropriate in a functional setting

Thedatatyp e node has the following cases

Char of fontNr charCode

Acharacter no de where the character is sp ecied by its co de and a fontnumber

Kern of dist

Akern ie white space of a given size

Glue of glueSpec

A glue ie white space of variable size Typ e glueSpec is a record with a eld

size for the natural size of the glue and elds that sp ecify its stretchability and

shrinkability

Rule of dim

A rule ie a black rectangle with given dimensions Typ e dim is a record typ e with

elds height depth and width of typ e dist

The dimensions of a rule in T Xmaybe running ie undened and to b e de E

R Heckmann and R Wilhelm

termined by the context We do not use running dimensions b ecause they add

algorithmic and logical complexity and the dimensions of rules in formula layout

are known at construction time and cannot b e changed later on

Penalty of penalty

A p enalty no de ie p otential p oint of line breaking with a description of the

adequacy of a line break at this p oint

Box of dist box

Abox together with the amount of shift applied to it Typ e box is the following

record typ e

kind boxkind width dist depth dist height dist

content node list glueParam glueParam

where boxkind is HBox or VBoxThe glueParam eld indicates how glue no des in

the content no de list must b e mo died so that the sum of the horizontal HBox

or vertical VBox sizes of the comp onents equals the given size of the b ox The

most imp ortantvalue of typ e glueParam is natural which means that every piece

of glue gets its natural size

In our system the dimensions of a b ox are uniquely determined byitskindits

content list and its glueParam eld Thus the addition of explicit dimensions is

merely a matter of eciency In original T X this is not true since Knuth often

E

omits Kern no des at the very end of the content list While this saves some space

it leads to logical and algorithmic diculties in certain situations

Some op erations apply to horizontal or vertical no de lists without dierence while

others are sp eci for one direction To b e able to express this at least in their typ e

sp ecication wedene

hlist node list type vlist node list

Unfortunately the typ e checker cannot enforce these distinctions but we did not wantto

complicate things further byintro ducing constructors for the two kinds of lists The typ e

checker do es enforce the distinction b etween b oxes no des and no de lists a dierence

that sometimes seems to b e inappropriate

Note that the original T X implementation do es not distinguish b etween no des and

E

no de lists typ e node contains a eld next for chaining and b oth no des and no de lists

showupasentities of typ e p ointer to no de

Most rules have depth zero most b oxes are not shifted and sometimes wemust

construct a horizontal list from a single b ox Hence it proved to b e useful to intro duce

the following abbreviations

fun rule h w Rule height h depth zero width w

fun Box b Box zero b creates node with zero shift

fun HL b Box b creates hor list from box

We admit that this data structure is not optimal and clumsy in parts but it is close to

the original T X implementation and a thorough renement should only b e done after

E

exp eriences have b een collected from considering more subtasks of T X than just formula

E

layout

AFunctional Description of T Xs Formula Layout

E

The Dimensions of Nodes and Node Lists

We need functions to compute the dimensions of no des and no de lists Unfortunatelythe

dimensions of a no de are contextsensitive they dep end on whether the no de o ccurs in a

horizontaloravertical list Thus there are three functions width heightand depth

for no des in a horizontal list and four functions vwidth vheight vdepthand vsize

vheight plus vdepth for no des in a vertical list All functions have the same typ e node

dist Here we only present the denition of the two width functions

fun width Char info charWidth info

width Box width w w

width Rule width w w

width Glue size size

width Kern size size

width zero

fun vwidth Char info charWidth info

vwidth Box shift width w shift w

vwidth Rule width w w

vwidth zero

The dierence b etween the two functions lies in the handling of kern and glue sizes and

of b ox shifts

From the dimension functions for single no des those for no de lists are derived Three

functions deal with horizontal lists and two with vertical lists Functions vlistHeight

and vlistDepth are not needed

hlistWidth hlistHeight hlistDepth hlist dist

vlistWidth vlistVsize vlist dist

They are dened using a common auxiliary function

fun compute f g nl f map g nl

val hlistWidth compute sum width

val hlistHeight compute Max height

val hlistDepth compute Max depth

val vlistWidth compute Max vwidth

val vlistVsize compute sum vsize

The dimensions of a no de list are either the sum or the maximum of the corresp onding

no de dimensions Function Max is dened as fold max hence Max is and Max

never yields a negative result This b ehavior is taken over from T X

E

Basic Functions on Box Terms

In this subsection we presentseveral functions on b ox terms that are needed during

formula layout

R Heckmann and R Wilhelm

Extension to the right Often a white space has to b e added to the rightofagiven

no de eg when the italic correction is added to a character The following function

p erforms the addition in an intelligentway transforming a no de to a no de list

val extend dist node hlist extends to the right

fun extend dist node

let val extension if dist zero then else Kern dist

in node extension end

From node lists to boxes Next we dene an auxiliary function to pro duce a b ox with

given dimensions from a no de list

val makebox boxkind dim node list box

fun makebox boxkind height h depth d width w nl

kind boxkind height h depth d width w

content nl glueParam natural

The only purp ose of this function is to prepare the following two denitions

val hbox dim hlist box hbox with given dimensions

val vbox dim vlist box vbox with given dimensions

val hbox makebox HBox val vbox makebox VBox

The following function packs a horizontal list into a b ox with natural dimensions

val hpackNat hlist box

fun hpackNat nl hbox width hlistWidth nl

height hlistHeight nl

depth hlistDepth nl nl

Formula layout do es not use this function prop er but an optimized version if the given

no de list consists of a single unshifted b ox this b ox is returned

val boxList hlist box

fun boxList Box b b

boxList nl hpackNat nl

Centering around the axis The axis was intro duced in Section it is the line where

fraction strokes sit on Sometimes a b oxmust b e vertically centered around the axis

Assuming that the centered b ox will b e part of a horizontal list the centering can b e

p erformed by adding a suitable shift value

val axisCenter style box node

fun axisCenter st box

let val axh AxisHeight st

val h height box and d depth box

val shift half h d axh

in Box shift box end

AFunctional Description of T Xs Formula Layout

E

In the b eginning the baseline of the b ox coincides with the overall baseline Variable axh

is b ound to the distance from the axis to the baseline in the currentstyle This distance

is given bystyle parameter AxisHeight The total shift p erformed by this function can

b e considered as a shift by s h d followed bya shiftby s axhThe

 

rst shift results in an eective heightofh s h d and an eective depth of



d s h d ie yields a no de that is vertically centered around the baseline The



second shift by axh downward ie by axh upward yields a no de centered around the

axis as required

Horizontal centering Numerator and denominator of a fraction are usually centered



Hence we need a function within the horizontal extension of the fraction eg

n

rebox dist box box that centers a given b ox within a space of given width

This is not done by simply adding white space kerns at b oth sides of the b ox but by

adding glue of great exibility to b oth ends of the horizontal list within the b ox and

then creating a new b ox of the sp ecied width by glue adaptation The details of the

pro cess are complex but it is fully implemented in our SML program We do not include

the co de here but only note that nothing is done to a b ox that already has the desired

width

fun rebox newWidth

b as kind width height depth content

if newWidth width then b else not specified here

The reason for this complexityistoallowT X users to mo dify the p osition of numerator

E



and denominator by means of explicit glue as is done in

n

Making Vertical Boxes

General vertical boxes We start with a fairly general function that creates a b oxout

ofavertical list of no des The reference p oint of the resulting b ox will b e the reference

p ointofoneofthenodeswhichwe call the reference no de This reference no de is not

necessarily the no de at the top or at the b ottom of the vertical list This function is

n

P

where the reference no de is the b ox with needed for big op erators with limits eg

i



the op erator symb ol and for fractions eg where the reference no de is the fraction



stroke a rule no de

The typ e of the considered function is

makeVBox dist node vlist vlist box

where the distance dist is the width of the b ox b eing built node is the reference no de

the rst vlist contains the no des ab ove the reference no de and the second vlist the

no des b elow

The width is added as an parameter since it is known anywaywhen makeVBox is called

thus its recalculation as the maximum of the widths of all involved no des is avoided

There is another design decision which seems to b e o dd at rst glance when makeVBox is

called the rst vlist has to b e enumerated from b ottom to top and the second vlist

R Heckmann and R Wilhelm

from top to b ottom This causes some trouble within makeVBox since the rst vlist

must b e reversed There is no loss in eciencyhowever since the twovertical lists must

b e concatenated anyway On the other hand the decision to enumerate the two lists

symmetrically ie from the reference no de toward the top and b ottom edges allows for

maximum exploitation of the inherent symmetry of the formulae that are typ eset using

makeVBox In fact using higher order functions usually only one piece of co de is needed

to comp ose b oth argument lists of makeVBox

After these preliminaries here is the denition of makeVBox

fun makeVBox w node upList dnList

let val h vlistVsize upList vheight node

val d vlistVsize dnList vdepth node

val nodeList revAppend upList node dnList

in vbox width w height h depth d nodeList end

where the auxiliary function revAppend is given by

fun revAppend yl yl

revAppend x xl yl revAppend xl x yl

Special instances of vertical boxes From the general function makeVBoxtwo sp ecial

instances are derived where the reference no de is at the top or b ottom end of the complete

vertical list

upVBox dnVBox dist box vlist box

As in makeVBox the rst argument is the width of the b ox b eing built In upVBoxthe

reference no de of typ e box is the b ottom no de and the whole vlist go es ab oveit

Function dnVBox works the other way round the reference no de is at the top and all

other no des are placed b elow it The vlist of upVBox is enumerated from b ottom to top

while the vlist of dnVBox is enumerated from top to b ottom

fun upVBox w box upList makeVBox w Box box upList

fun dnVBox w box dnList makeVBox w Box box dnList

There is a subtle dierence b etween makeVBox and the two new functions in makeVBox

the reference no de is of typ e node while it is of typ e box here This decision was made

by observing how the functions are called upVBox and dnVBox are always called with a

box while makeVBox is called with a b ox or a rule no de It is simpler to use the transfer

function Box in the b o dies of upVBox and dnVBox than to rep eat it in all their calls

Putting two things above each other Sometimes twonodesmust b e placed ab ove each



other with some white space in b etween for instance the two scripts in x In this case



neither of the two no des can b e used as reference no de so that none of the functions

dened ab ove is easily usable Thus we dene yet another function

above node dist dist node node

By a call above n s s n node n is placed ab ovenode n with white space of size s

    

in b etween Parameter s is the distance from the b ottom edge of n to the baseline of

 

AFunctional Description of T Xs Formula Layout

E

h

h h d s

d

w

s

s





h



d h d s s

 

d



w



w max w w



Fig Two no des placed ab oveeach other

the combined b ox see Figure Pairing of the two distances simplies the calls of this

function b ecause the two distances are returned as a pair by some other function The

result of above is a no de since all callers exp ect to see a no de

fun above n s s n

let val w max vwidth n vwidth n

val h vsizens

val d vsizens s

val nodeList n Kern s n

in Box vbox width w height h depth d nodeList end

Formula Terms

The main task of formula layout consists of translating formula terms into b ox terms Be

low we rst presenttheT X implementation of formula terms highlightitsweaknesses

E

and intro duce our redesigned implementation

The Original T XRepresentation

E

In T X formula terms are called math lists Math lists are sequences of math items

E

According to the description in the T Xbook Knuth a page a math item is

E

an atomahorizontal spaceastyle command eg textstyle a generalizedfraction

see Section or some other material whichwe do not consider here for simplication

The description in T X The Program uses dierent names Knuth b Par

E

There the atoms plus generalized fractions and several other entities are called noads

according to Knuth this word should b e pronounced as noads

Atoms have at least three parts a nucleusasuperscriptanda subscript Eachof

these elds maybeempty a math symb ol or a math list There are thirteen kinds of

atoms some of whichhave additional parts Eight atom kinds mainly regulate the spacing

between two adjacentatomsarelation atom such as is surrounded by some amount

R Heckmann and R Wilhelm

of space a binary atom such as by less space and an ordinary atom suchasxby

no extra space at all The remaining ve kinds of atoms have a more serious semantics

An overline atom for example is an overlined subformula

This internal representation deserves some criticism The sup erscript and subscript

elds are empty in most cases there should really b e sup erscript and subscript construc

tors The thirteen kinds of atoms combine two completely dierent asp ects a classica

tion needed to control spacing and the adjunction of meaningful constructors These two

asp ects should not b e mixed into a single concept InterestinglyT Xs layout algorithm

E

internally tries hard to distinguish these asp ects as we explain bytwo examples

Overline atoms are handled during a rst pass through the formula After addition

of the overline rule they are transformed into Ord atoms since the spacing of overline

atoms and Ord atoms is identical The actual interatom spaces are added in a second

pass through the formula

Fractions are math items noads but not atoms Their layout is computed during the

rst pass of the algorithm and afterwards they are transformed into Inner atoms The

kind Inner controls the spacing around fractions in the second pass of the algorithm

Thus the mixture of dierent concepts into the same notion entails destructive trans

formations of the formula data structure thereby making T Xs layout algorithm hard

E

to understand

AnAlternative Representation DenedinSML

Toavoid the problems mentioned ab ove we completely redesigned the internal represen

tation of formulae Our formula terms merely reect the logical structure of the formula

They do not contain spacing information except for characters The spacing attributes

of larger subformulae are explicitly computed during typ esetting

A formula term is an ob ject of typ e mlist noad list ie a list of noads we b orrow

this word from Knuth Typ e noad is a datatyp e with as many cases as there are sorts of

subformulae There are for instance cases for atomic symb ols for overlined and underlined

subformulae for sup erscripted and subscripted subformulae for op erator symb ols with

their limits and for generalized fractions We do not describ e all these cases here at once

but instead describ e them together with their translation to b oxtermsWe hop e that this

presentation metho d avoids redundancies in the description and enhances readabilityof

the pap er

Translating Noads into Horizontal Lists

Recall that typ esetting of a complete formula is done by the function

MListToHList style bool bool mlist hlist

where the rst Bo olean indicates cramp edness and the second Bo olean requests the

insertion of p enalties line breaking information

Function MListToHList uses the function

NoadToHList style bool noad hlist

AFunctional Description of T Xs Formula Layout

E

to translate the noads in its mlist argumentinto horizontal lists Insertion of p enalty

no des is handled by MListToHList itself Hence the p enalty Bo olean need not b e passed

to NoadToHList the Bo olean argument of this function is the cramp edness

In the course of its work NoadToHList recursively calls function MListToHList on

subformulae All these recursive calls are done via an auxiliary function

val cleanBox style bool mlist box

fun cleanBox st cr ml

boxList MListToHList st cr false no penalties ml

which accounts for the fact that subformulae do not need p enalty no des and packs the

resulting horizontal list into a single b ox

Below we present some sample cases of typ e noad and dene how NoadToHList acts

on these cases The denition of MListToHList follows in Section The overall format

of the denition of NoadToHList is

fun NoadToHList st cr fn list of cases

so that the function name and the rst two parameters need not b e rep eated with every

case o ccurrences of st and cr in the description b elow should b e understo o d as b eing

the formal parameters of this function denition

Ordinary Symbols and Characters

In principle there is no dierence in the treatment of sp ecic mathematical symb ols such

as or and ordinary characters suchasxNevertheless there are two exceptions

P

Big op erator symb ols suchas are handled in a sp ecic way see Section and after

selection of a text font suchasrm or it a sequence of characters is joined together as

in ordinary text Here and in the SML implementation we did not include the handling

of text characters

All mathematical symb ols are describ ed by a pair consisting of a character co de and a

font family Within a formula term an additional kind eld carrying spacing information

is attached to each symbol Thus the case of data typ e noad that corresp onds to a single

symb ol has format

MathChar of kind family charCode

where kind is dened by

datatype kind Ord Op Bin Rel Open

Close Punct Inner None

Kind Rel for example is assigned to relation symb ols suchasor and kind Open

is used for op ening delimiters such as and f The last kind None do es not o ccur in

the original T X implementation We use it as the kind of those entities that do not have

E

a kind in original T X

E

The translation of MathChar noads to horizontal lists ignores the kind information It

is used in a later stage of formula layout when the implicit spacing b etween adjacent

entities is computed Also the information ab out cramp edness is not needed Thus the

MathChar case of NoadToHList st cr lo oks as follows

R Heckmann and R Wilhelm

MathChar fam ch makeChar st fam ch

where auxiliary function makeChar is given by

val makeChar style family charCode hlist

fun makeChar st fam ch

let val charNode itCorr basicChar st false fam ch

in extend itCorr charNode end

A further auxiliary function basicChar returns a character no de charNode and the italic

correction itCorr of the character By the last line of co de a white space of length itCorr

is added to the right of the character no de Function extend is dened in Section

Function basicChar is also called when big op erators are typ eset The Bo olean pa

rameter indicates whether the given character is to b e enlarged This do es not happ

to ordinary MakeChar symb ols but may o ccur in the case of big op erators The function

selects a concrete font from the given family forms a character no de and returns the

amount of italic correction with it Callers of basicChar may then decide whether the

italic correction is added to the character no de or not

val basicChar style bool family charCode node dist

fun basicChar st enlarge fam ch

let val fontNr fontNumber st fam

val ch if enlarge then larger fontNr ch else ch

val info fontNr ch

in Char info charItalic info end

For fontNumber and larger see Section

Overlined and Underlined Subformulae

x y z The corresp onding cases of Subformulae maybeoverlined or underlined as in

datatyp e noad are Overline of mlist and Underline of mlist These constructors

are handled by function NoadToHList st cr as follows

Overline ml HL makeOver st cleanBox st true ml

Underline ml HL makeUnder st cleanBox st cr ml

First subformula ml is typ eset into a b oxby function cleanBox using the same style st as

the context Underlined subformulae are cramp ed i their context is cramp ed parameter

cr while overlined subformulae are always cramp ed parameter true Being cramp ed

always means b eing cramp ed from ab ove there is no notion of b eing cramp ed from

b elowinT X The subformula b ox resulting from calling cleanBox is passed to function

E

makeOver or makeUnder which adds the line to the b ox Both functions return a b ox

that must b e transformed into a horizontal list by HL

Before we describ e what makeOver do es let us consider an overlined subformula such

x more closelyItisavertical combination of the b oxcontaining x with a rule no de as

of appropriate width The rule do es not sit immediately at the top edge of xthereis

vertical space in b etween Finally there is also white space ab ove the rule that is invisible

AFunctional Description of T Xs Formula Layout

E

in this example but aects the layout of a formula where an overlined subformula o ccurs

is symmetrical b elow something else The structure of underlined subformulae suchasx

there is a space b etween x and the rule and another space b elow the rule The reference

p ointof x and x is that of x in b oth cases These are exactly the situations handled by

upVBox and dnVBox Section Now it pays o that their b ehavior is symmetrical

wemay dene

val makeOver makeLine upVBox val makeUnder makeLine dnVBox

using a common implementation makeLine for b oth cases This was not done in the

original T Xprogram

E

fun makeLine constrVBox st box

let val w width box box

val line rule RuleThickness st w

in constrVBox w box

Kern lineDist st line Kern linePad st end

Function RuleThickness is a style parameter returning the default rule thickness of the

currentstyle The rst Kern is the one b etween the subformula and the rule and the

second Kern is that b eyond the rule Their sizes are derived from the currentstyle by

functions

fun lineDist st RuleThickness st

fun linePad st RuleThickness st

GeneralizedFractions

A generalized fraction in T X is as the name indicates a very general concept that

E

includes ordinary fractions and binomial co ecients as sp ecial cases Because of the

generality of the concept a generalized fraction is describ ed byvecomponents together

organized as a record

GenFraction of genfraction

genfraction num mlist den mlist thickness dist option

left delim right delim

The core of a generalized fraction consists of twosubformulae to b e placed ab ove each

other The upp er one the numeratoriscontained in the num eld and the lower one

the denominator resides in the den eld

The desired thickness of the fraction strokemay b e explicitly sp ecied in the thickness

eld If the thickness is missing dist option the default rule thickness of the current

style is used If the thickness is sp ecied as zero there will b e no fraction stroke at all

and the rules for the placementofthenumerator and the denominator are quite dierent

from the case with a stroke

A generalized fraction may b e surrounded by delimiters such as parentheses that are

sp ecied in elds left and right Delimiters maybenull ie nonexistent this is a

sp ecial value of typ e delimWe shall say something more ab out delimiters later on in this subsection

R Heckmann and R Wilhelm



Examples for generalized fractions are ordinary fractions suchas with a strokeof





without stroke default thickness and null delimiters and binomial co ecients suchas



and with parentheses as delimiters

Typesetting generalizedfractions Generalized fractions are handled by the following

co de

GenFraction genFract HL doGenFraction st cr genFract

val doGenFraction style bool genfraction box

fun doGenFraction st cr left right thickness num den

let val st fract st

val numBox cleanBox st cr num

val denBox cleanBox st true den

in makeGenFraction st thickness left right numBox denBox end

Numerator and denominator are typ eset in a smaller style st than the style st of the

context This smaller style is computed by function fract

val fract style style

fun fract D T fract T S fract SS

The cramp edness of the numerator is inherited from the context cr while the denom

inator is always cramp ed true The b oxes resulting from numerator and denominator

together with the style and the remaining comp onents of the generalized fraction are

passed to function makeGenFraction

val makeGenFraction

style dist option delim delim box box box

fun makeGenFraction st thickness left right numBox denBox

The width of the central part of the generalized fraction without the surrounding de

limiters is the maximum of the widths of the numerator and the denominator

let val width max width numBox width denBox

Next the numerator and the denominator are centered within this width by adding glue

on b oth sides See Section for rebox

val numBox rebox width numBox

val denBox rebox width denBox

Now the thickness of the fraction stroke is calculated Rememb er that parameter thickness

is of typ e dist option If no thickness is sp ecied the default rule thickness of the current

style is used

val th case thickness of NONE RuleThickness st

SOME t t

Next the middle part of the generalized fraction is formed The p ositioning of the com

p onents heavily dep ends on the existence of the fraction stroke there is no strokeifth is zero

AFunctional Description of T Xs Formula Layout

E

val middle if th zero

then makeAtop st numBox denBox

else makeFract st th width numBox denBox

Function makeAtop is not presented in this pap er for makeFract see b elow

Left and right delimiter no des are constructed

val leftNode makeDelimiter st left

val rightNode makeDelimiter st right

and nally the delimiter no des are placed around the middle part and the resulting

horizontal list is packed into a b ox

in boxList leftNode middle rightNode end

Delimiters In formula terms delimiters are included as values of typ e delimWedo

not describ e this typ e here it contains information so that function

varDelimiter style dist delim node

can pro duce delimiter symb ols of given vertical size parameter dist dep ending on

the currentstyle If the delim argumentcontains a null value function varDelimiter

returns a Kern no de of size nullDelimiterSpace

This function is used for two purp oses First to make large delimiters around subformu

lae whose size dep ends on the size of the subformula this sp ecic task is not describ ed

here Second the function is used to make delimiters around generalized fractions In

T X the size of the delimiters do es not dep end on the size of the middle part of the

E

generalized fraction it merely dep ends on the style Hence we use the following function

for generalized fractions

val makeDelimiter style delim node

fun makeDelimiter st del varDelimiter st Delim st del

where Delim isastyle parameter

Proper fractions Now we present the layout of prop er fractions ie those with a

fraction stroke It is p erformed using function

makeFract style dist dist box box node

InacallmakeFract st th w numBox denBox argument st is the currentstyle th is the

thickness of the stroke w is the width of the whole fraction and also the width of the

twoboxes numBox and denBox whichcontain the numerator and the denominator of the

fraction



the stroke do es not sit on the baseline but on the axis of the In a fraction suchas



formula As one can see numerator and denominator do not touch the stroke there is

some white space b etween the three visible comp onents of the fraction The fraction is

formed as a vertical b ox from its vecomponents three visible ones and twokerns so

that the reference p oint is at the left end of the stroke Then the resulting b ox is shifted

upward so that the strokemoves from the baseline to the axis This vertical shift can b e

safely attached to the fraction b ox since we know that it will b e placed into the context

of a horizontal list by function makeGenFraction

R Heckmann and R Wilhelm

numerator

dnum



axisNum

distNum



halfTh

 

axis

halfTh fractNum



distDen



axh

axisDen

hden

 

baseline

Denom

  

denominator

Fig Distances in a prop er fraction

fun makeFract st th w numBox denBox

Toevaluate this call we rst compute half the thickness of the stroke and the distance

axh from the axis to the baseline whichisgiven bystyle parameter AxisHeight

let val halfTh half th

val axh AxisHeight st

Then a no de for the stroke is formed Height and depth are half the thickness so that

later the stroke will b e vertically centered around the axis

val stroke Rule height halfTh

depth halfTh width w

Now the distances distNum between the b ottom edge of the numerator and the top edge

of the stroke and distDen between the b ottom edge of the stroke and the top edge of

the denominator see Figure are computed by calling function distances with all the

relevant parameters

val distNum distDen

distances st axh halfTh depth numBox height denBox

The fraction b ox is formed from the stroke which will b e the reference no de twokerns

and the twoboxes using makeVBox Section Some common work is abstracted out

to the auxiliary function makeList

fun makeList dist box Kern dist Box box

val fractBox makeVBox w stroke makeList distNum numBox

makeList distDen denBox

Finallybox fractBox is shifted from the baseline to the axis The amountofshiftis

negative since the axis is ab ove the baseline

in Box axh fractBox end

AFunctional Description of T Xs Formula Layout

E

Distances within a fraction Unfortunately the distances within a fraction are not

given directly bystyle parameters Instead there is a style parameter fractNumwhich

sp ecies the desired distance from the baseline of the numerator to the overall baseline

By subtracting axh the distance axisNum from the baseline of the numerator to the

axis results Subtracting the height of the stroke whichishalfTh and the depth of the

numerator yields the distance distNum from the b ottom edge of the numerator to the

top edge of the stroke see Figure The calculations for the denominator are similar

starting with style parameter Denom Hence the co de for distances b egins as follows

fun distances st axh halfTh dnum hden

let val axisNum fractNum st axh

and axisDen Denom st axh

val distNum axisNum halfTh dnum

and distDen axisDen halfTh hden

The distances computed so far may b e to o small or even negativeifthenumerator is

very deep or if the denominator is very high In this case the distances are increased to

a minimum value given by fractMinDist

fun correct dist max dist fractMinDist st halfTh

in correct distNum correct distDen end

Function fractMinDist computes the minimum distance dep ending on the style and the

thickness of the stroke

fun fractMinDist D halfTh halfTh

fractMinDist halfTh halfTh

Subscripts Superscripts and Limits

Superscripts and subscripts Formulae with subscripts and sup erscripts are for in

n n

stance x x orx They are represented by the following case of noad

i

i

Script of script

where script is the following record typ e

nucleus mlist supOpt mlist option subOpt mlist option

The nucleus eld contains the main formula and the two remaining elds are the two

scripts where one of them may b e missing

Using two constructors Sup and Sub instead of Script is unsuitable since the case of

b oth scripts together is typ eset dierently from a mere combination of the two scripts

Compare for instance the twoformula sp ecications xin and xin which yield

n n

resp ectively x and x There is also a subtle dierence b etween an empty script sp ec

i

i

ied by eg x and an entirely missing script Hence wemust typ e the script elds

with mlist option a simple mlist with the empty list denoting a missing script do es not suce

R Heckmann and R Wilhelm

Big operators and limits Limits are the scripts attached to big op erators suchasin

sumin While their textual representation in T Xs input language is identical

E

with that of scripts we use a dierent constructor for their internal representation

The reason is twofold rst the big op erator itself is handled dierently from ordinary

mathematical symb ols and second the layout of limits diers from that of scripts in

some cases Scripts are always attached to the right of the nucleus but limits maybe

n

P

placed ab ove and b elow the op erator symb ol as in whichiscalledlimit positionor

i

P

n

to the right of the op erator as in nolimit position The p osition can b e inuenced

i

by writing sumlimits and sumnolimits resp ectively If nothing is sp ecied just

sum the p osition dep ends on the style it is limit p osition in displaystyle and nolimit

p osition otherwise

The constructor for big op erators is the following

BigOp of limits script

where script is the record typ e intro duced ab ove and limits is a data typ e of three

values yes means that limit p osition is chosen while no means nolimit p osition and

default is the case where the p osition dep ends on the style

Typesetting scripts and limits Overview Since the layout of limits in nolimit p osition

coincides with that of scripts we use the same general function doGenScripts for b oth

scripts and limits The b ehavior of this function dep ends on some Bo olean ags

doGenScripts style bool bool bool script hlist

The rst Bo olean is the usual cramp edness ag the second indicates whether the attach

ments are to b e placed in limit p osition and the third keeps the information whether

the function was called while handling BigOp or Script

When Script is handled doGenScripts can b e called directly

Script script doGenScripts st cr false false script

When handling BigOpwemust rst decide ab out the limit p osition

BigOp lim script

let val limits st D andalso lim default

orelse lim yes

in doGenScripts st cr limits true script end

Function doGenScripts p erforms the tasks common to limits and scripts

fun doGenScripts st cr limits isOp nucleus supOpt subOpt

First it calls doNucleus see b elow to typ eset the nucleus The result is a no de nucNode

a ag isChar indicating whether this no de is a character no de and the italic correction

itCorr of this character

let val nucNode itCorr isChar doNucleus st cr isOp nucleus

Then it determines the style of the attachments

AFunctional Description of T Xs Formula Layout

E

val st script st

recursively typ esets the subformulae note that the subscripts are cramp ed

val supOptBox optMap cleanBox st cr supOpt

val subOptBox optMap cleanBox st true subOpt

fun applyToArgs f f itCorr nucNode supOptBox subOptBox

and nally calls makeLimOp in case of limits in limit p osition or makeScripts in case of

limits in nolimit p osition and scripts

in if limits then HL applyToArgs makeLimOp st

else applyToArgs makeScripts st cr isChar

end

The choice of the style for the attachments is done by function script

fun script D S script T S script SS

Function optMap f maps NONE to NONEand SOME x to SOME f x

Typesetting the nucleus The task of typ esetting the nucleus is handled by

doNucleus style bool bool mlist node dist bool

where the rst Bo olean is cramp edness and the second indicates whether the function is

called from a big op erator context There are twoentirely dierentcasesIfthenucleus

is a single character it is typ eset by function makeNucChar while otherwise the usual

formatting function cleanBox is employed

fun doNucleus st isOp MathChar fam ch

makeNucChar st isOp fam ch

doNucleus st cr l Box cleanBox st cr l zero false

The result of doNucleus is a triple The third comp onent indicates whether the rst

comp onentisa character no de this is never the case if cleanBox is used The second

comp onent is the italic correction of the character in the nucleus or zero if the nucleus

is not a character

Anucleus consisting of a single character is typ eset by makeNucChar which slightly

diers from makeChar that is used for characters in all other contexts The result of

makeNucChar is a triple of the same format as the result of doNucleus

val makeNucChar style bool family charCode

node dist bool

fun makeNucChar st isOp fam ch

A big op erator in displaystyle is enlarged

let val enlarge isOp andalso st D

Function basicChar whichwe already met in Section returns a character no de and

the italic correction of this character

R Heckmann and R Wilhelm

val charNode itCorr basicChar st enlarge fam ch

Big op erators are vertically centered around the axis by axisCenter see Section

val nucNode if isOp then axisCenter st boxList charNode

else charNode

in nucNode itCorr not isOp end

Note that the result of axisCenter is not a character no de any more and thus the third

comp onent of the result is false in this case Thusthiscomponentdoes not indicate

whether the nucleus is sp ecied as a single character but whether it is a character no de

after typ esetting cf a in App endix G of the T Xb o ok Knuth a

E

If the translation of the nucleus is a character b ox Nevertheless the second

comp onent holds the italic correction even in the case of big op erators where the third

comp onentis false

Attaching the scripts We do not include co de for function makeLimOp in this pap er

but concentrate on function

makeScripts style bool bool dist

node box option box option hlist

which attaches the scripts the two box option arguments to the nucleus the node

argument The rst Bo olean is the cramp edness and the second tells whether the nucleus

isacharacter no de The dist argument is the italic correction of the nucleus The

function distinguishes b etween four cases dep ending on the existence or nonexistence

of the two scripts

fun makeScripts st cr isChar itCorr nucNode

fn NONE

fn NONE extend itCorr nucNode

SOME subBox makeSub st isChar nucNode subBox

SOME supBox

fn NONE makeSup st cr isChar itCorr nucNode supBox

SOME subBox makeSupSub st cr isChar itCorr

nucNode supBox subBox

If there are no scripts at all nothing happ ens but adding the italic correction to the

nucleus for extend see Section The other three cases are handled by sp ecialized

functions makeSub makeSup and makeSupSub Let us consider these three functions

together

fun makeSup st cr isChar itCorr nucNode supBox

let val hnuc height nucNode and dsup depth supBox

val shift SupPos st cr isChar hnuc dsup

val scriptNode Box shift supBox

in extend itCorr nucNode extendScript scriptNode end

AFunctional Description of T Xs Formula Layout

E

fun makeSub st isChar nucNode subBox

let val dnuc depth nucNode and hsub height subBox

val shift SubAlonePos st isChar dnuc hsub

val scriptNode Box shift subBox

in nucNode extendScript scriptNode end

fun makeSupSub st cr isChar itCorr nucNode supBox subBox

let val dnuc depth nucNode and hnuc height nucNode

val dsup depth supBox and hsub height subBox

val d SupSubDistances st cr isChar hnuc dsup dnuc hsub

val scriptNode above Box itCorr supBox d Box subBox

in nucNode extendScript scriptNode end

In all three cases the result is a horizontal list essentially consisting of the nucleus no de

followed by a no de for the scripts In makeSup and makeSub the script no de is the script

box shifted bysomeamount shift while in makeSupSub it consists of the two script

boxes put ab ove each other with some white space in b etween see Section for the

denition of above In all three cases some white space of constant size scriptSpace

is added to the right of the script no de This is done by

val extendScript extend scriptSpace

Finally note the dierent usage of the italic correction In makeSub it is not used at

all the subscript is added to the uncorrected nucleus to avoid that it app ears to o far

away from it consider f In makeSup the correction is added to the nucleus lest the





sup erscript runs into it consider f Consequently the correction is not added to the



nucleus in makeSupSub but used to shift the sup erscript to the right consider f



Positioning the scripts The p ositions of the scripts are computed by SupPos SubAlonePos

and SupSubDistances Let us concentrate on SupSubDistances This function must re

turn a pair of dist values the distance supDist from the b ottom of the sup erscript to

the baseline and the distance Dist from the b ottom of the sup erscript to the top of the

subscript

fun SupSubDistances st cr isChar hnuc dsup dnuc hsub

let val supDist SupPos st cr isChar hnuc dsup dsup

Function SupPos is used to compute the desired value for the distance from the baseline

of the sup erscript to the overall baseline By subtracting the depth of the sup erscript a

rst estimate for supDist results

val subDist SubWithSupPos st isChar dnuc hsub

val Dist supDist subDist

Analogously the distance subDist from the baseline to the top of the subscript is com

puted The sum of supDist and subDist is a rst estimate for Dist

val supDist max supDist minSupDist st

val Dist max Dist minSupSubDist st

in supDist Dist end

R Heckmann and R Wilhelm

If supDist and Dist are to o small they are increased to some styledep endent minimum

values derived from style parameters

fun minSupDist st xHeight st div

fun minSupSubDist st RuleThickness st

Nowwe present function SupPoswhich is used in makeSup and SupSubDistancesIt

denes the desired distance from the baseline of the sup erscript to the overall baseline

as the maximum of three numb ers

fun SupPos st isChar hnuc

if isChar then zero else hnuc SupDrop script st

fun SupPos st cr isChar hnuc dsup

Max SupPos st isChar hnuc

Sup cr st dsup xHeight st div

The rst numb er dep ends on the fact whether the nucleus is a single character no de If

not the numb er provides an upp er b ound SupDrop for the distance from the top of the

nucleus to the baseline of the sup erscript Function SupDrop is a style parameter that

do es not dep end on the currentstyle as usual but on the style of the sup erscript

The second number is a lower b ound for SupPos given bystyle parameter SupThisis

the only style parameter that not only dep ends on the style but also on the cramp edness

cr

The third number provides a lower b ound for the distance b etween the b ottom of the

sup erscript and the baseline of one fourth of the xHeight of the currentstyle

The denition of the two remaining functions SubAlonePos and SubWithSupPos is of

similar complexity and not presented here

Translating Formula Terms into Horizontal Lists

In the previous section we dened function NoadToHList which translates single noads

into horizontal lists Here wedeneMListToHListwhich handles complete mathemat

ical lists This involves applying NoadToHList to nearly all noads in the mathematical

list but there is much more than this

As in the original T X implementation we split the job of MListToHList in two

E

passes The rst pass translates formula terms typ e mlistinto intermediate terms

typ e ilist and the second pass pro ceeds by translating intermediate terms into b ox

terms The distribution of work among the two passes is slightly dierent from that in

T X The rst pass recursively handles subformulae and builds fractions scripts and

E

the like The second pass handles explicit spacing inserts implicit spacing eg around

binary op erators and inserts line break p oints by adding p enalties when requested Be

cause of this distribution the cramping information is only needed in the rst pass while

the information whether to add p enalties is needed in the second pass onlyThisisa

nice example of lo calizing information that is held in global variables in the original T X

E

implementation

val MListToIList style bool mlist ilist

val IListToHList style bool ilist hlist

AFunctional Description of T Xs Formula Layout

E

fun MListToHList st cr pen ml

let val il MListToIList st cr ml

val hl IListToHList st pen il

in hl end

First Pass Mathematical Lists into Intermediate Form

Function MListToIList applies NoadToHList to nearly all noads of the mathematical list

There are a few exceptions four kinds of noads whichwe did not yet present are handled

by MListToIList directly or by the second pass IListToHList The corresp onding cases

of data typ e noad are

MPen of penalty

MSpace of mathSpace

Style of style

Choice of style mlist

Cases MPen and MSpace provide explicit p enalties and spaces kern or glue resp ectively

A noad Style s results from an explicit style command suchastextstyle It has eect

from the place where it o ccurs until the end of the current subformula Noads Choice

f represent four way choices ie lists of four subformulae from which one is selected

according to the currentstyle

Intermediate lists ilist are lists of entities of typ e itemDatatyp e item contains

representations for explicit p enalties and spaces which are handled in the second pass

for style commands whichhavetobeknown in b oth passes and for the horizontal lists

which result from translating noads by function NoadToHList These lists must not yet

b e concatenated since they form units that will b e separated by implicit spaces in the

second pass With every list information ab out the kind of its origin is needed Hence

typ e item is dened as follows

datatype item

INoad of kind hlist

IPen of penalty

ISpace of mathSpace

IStyle of style

After dening the relevanttyp es we present the co de for function MListToIList

fun MListToIList st cr

fn

MPen p rest IPen p MListToIList st cr rest

MSpace s rest ISpace s MListToIList st cr rest

Style st rest IStyle st MListToIList st cr rest

Choice chfun rest MListToIList st cr chfun st rest

noad rest

INoad noadKind noad NoadToHList st cr noad MListToIList st cr rest

R Heckmann and R Wilhelm

Essentially it is one run through the argument list Penalty and space noads are copied

into the intermediate form Style commands take eect on the rest of the argument list

and are copied into the result to take eect again in the second pass Four waychoices

are replaced by the appropriate case Note that the result of a choice is not group ed

into a subformula but spliced into the argument list without grouping This is whyit

could not b e handled by NoadToHList Ordinary noads are translated into horizontal

lists which are still group ed together The kind of the original noad is kept in mind for

the intro duction of implicit spaces in the second pass Function noadKind is dened as

follows

val noadKind

fn MathChar k k

Overline Ord

Underline Ord

GenFraction Inner

Script nucleus nucKind nucleus

BigOp Op

several more cases

fun nucKind MathChar k k

nucKind ml Ord

The Second Pass Spacing and Penalties

In the second pass through a formula function

IListToHList style bool ilist hlist

translates an intermediate term into a single horizontal list It must insert p enalty no des

if the Bo olean argument is true handle explicit p enalties and spaces and intro duce

implicit spaces b etween adjacententities Implicit spaces are the horizontal spaces in

formulae suchasx y or x y in contrast to xy or y They are not sp ecied in the

formula input the four formulae ab ove are entered as xy xy xyandy resp ectively

The implicit space b etween twoentities basically dep ends on their kinds There is an

additional nonlo cal dep endency for binary op erator symb ols in x y as opp osed to y

the space b etween and y dep ends on the presence of xInany case weneedthe

kind of the previous entitywhenIListToHList pro cesses some entityintheintermediate

term Hence we dene

fun IListToHList st insertPenalty iList

let fun trans st prevKind fn see below

in trans st None iList end

Function trans runs through the iListkeeping in mind the kind prevKind of the

previous entity In the b eginning prevKind is None

Style commands explicit penalties and spaces The rst four cases of trans st prevKind are

AFunctional Description of T Xs Formula Layout

E

fn

IStyle st il trans st prevKind il

IPen pen il Penalty pen trans st prevKind il

ISpace sp il makeSpace st sp trans st prevKind il

Style commands inuence the style used for the rest of the input Explicit p enalties are

put into the result without anychange except for the constructor this is a must in a

functional language like SML Explicit spaces kern or glue are pro cessed by function

makeSpace style mathSpace hlist

and the resulting hlist which is an empty list or a singleton is spliced into the result

Note that explicit p enalties and spaces are transparent for kinds they have no kind and

do not inuence prevKind

An explicit space kern or glue in a formula may b e conditional or unconditional

Conditional spaces are suppressed in the styles S and SS while unconditional spaces

showupinallstyles The size of the spaces maybegiven in absolute units suchaspt

or in mathematical units The size of the latter dep ends on the style Hence function

makeSpace whose co de we do not present here must p erform two tasks the suppression

of conditional spaces in small styles and the conversion from mathematical units into an

absolute size

Hand ling translatednoads The last case of function trans handles already translated

noads and inserts implicit spaces b efore them and implicit p enalties b ehind them Re

memb er that the following co de is situated in a context where prevKind is the kind of the

previous entity st is the currentstyle and insertPenalty is a Bo olean telling whether

implicit p enalties are needed

INoad actKind hList il

let val newKind changeKind prevKind actKind il

val spaceList makeSpaceOpt st

mathSpacing prevKind newKind

val penaltyList mathPenalty insertPenalty newKind il

in spaceList hList penaltyList trans st newKind il end

In the next three we discuss the three let bindings in the co de ab ove The

chain of app end op erations in the last line is reasonably ecient since the rst three

op erands usually are extremely short lists

The kind of binary operators The call of function changeKind deals with the nonlo cal

dep endencies in the computation of implicit space If actKind is Bin and the currentitem

o ccurs in a nonbinary context then actKind is changed into newKind Ord otherwise

newKind equals actKind Nonbinary contexts can b e detected by insp ecting prevKind

and the kind of the rst INoaditem in restIn y for instance o ccurs in a non

binary context and is assigned kind Ord while its kind remains Bin in x y Thus

changeKind is dened as follows

val changeKind kind kind ilist kind

R Heckmann and R Wilhelm

fun changeKind prevKind Bin rest

if checkPrev prevKind orelse checkNext listKind rest

then Ord else Bin

changeKind k k

Functions checkPrev and checkNext check whether their argument is in a certain list of

kinds

val checkPrev contains Bin Op Rel Open Punct None

val checkNext contains Rel Close Punct None

Function listKind returns the kind of the rst INoad item in its argument list

val rec listKind

fn None

INoad k k

t listKind t

Implicit spacing The implicit space b etween two adjacent INoad items of given kind

is computed by function

mathSpacing kind kind mathSpace option

This function implements the table in Knuth a page that sp ecies the amount

of space to b e inserted b etween two adjacent items of given kind Some combinations of

kinds are not separated by additional space To handle this case eciently the result

typ e of function mathSpacing is mathSpace option

Within function trans the result of mathSpacing is further pro cessed in the same

manner as explicit spaces using function makeSpace

fun makeSpaceOpt st fn NONE

SOME sp makeSpace st sp

Implicit penalties Inline formulae contain implicit p enalties allowing for line breaking

They are computed by function

mathPenalty bool kind ilist hlist

the result of which is either an empty list or a singleton containing a p enalty no de If the

Bo olean argument is false there are no p enalties at all Otherwise p enalties are added

b ehind items of kind Bin or Rel but only if the following ilist is not empty and do es

not start with a relation op erator or an explicit p enalty item

Conclusion

Our formula layout system has b een implemented in the language Standard ML of New

Jerseyversion This implementation is far more comprehensive than the algorithms

describ ed in this pap er The core system ie function MListToHList and all its auxiliary

functions completely covers the typ esetting of all kinds of formulae except for math

AFunctional Description of T Xs Formula Layout

E

p p

accents egx ro ots eg the construction of big delimiters from small

pieces and ordinary text o ccurring in formulae The addition of these features is planned

for the future

Besides the core system wehave implemented the prepro cessing phase whichprovides

style parameters and font information suchascharacter dimensions There are also func

tions which translate b ox terms into dvi co de and construct a dvi le of correct global

format Thus the results of our algorithm can b e considered on the screen or printed

The system is still evolving The currentversion is available in directory formulae of

the ftp server ftpcsunisbde of the University of the Saarland

The motivation for this reimplementation eort originated from an attempt to under

stand and teachT Xs formula layout algorithm The primary source the T Xbook Knuth

E E

a did not provide sucient clues ab out the metho d despite long and desp erate at

tempts to understand it Its description reects the structure of the program with very

complex control ow and destructive manipulations of global ill designed data struc

tures This data structure the internal representation of mathematical formulae com

bines several orthogonal prop erties in an unintelligible way using some elds in parallel

and consecutively for dierent purp oses It o ccurred that a formulation in a functional

language claried the metho d and enabled us to eectively teach ab out this sub ject

The resulting SML program will b e extended to cover the full set of formulae Due to

the mo dular structure it can b e included in new contexts in need of a mo dule for formula

layout

Some more sp ecic prop erties of our implementation are as follows

Dening an adequate data typ e of formula terms separates concerns ie spacing

asp ects from structure asp ects This is of great help for a b etter understanding of

the algorithm

Though not optimal the data structure of b oxes and no des represents a reasonable

compromise b etween the original T Xtyp es and the needs of a functional language

E

Using a functional description language forced us to transform the up datable global

variablesofKnuths description into explicit function parameters On the one hand

this adds complexity to the description but on the other hand the owofinfor

mation b ecomes visible it can b e seen where information comes from where it is

up dated and where it is used Thus it b ecomes apparent which subtasks dep end

on others and which are indep endentfromeach other

The functional programming style discourages changing or removing entities that

have already b een constructed This in particular concerns the white space carrying

the italic correction whichisnotbuiltby our system b efore it is denitely known

that it is required

Some constructs eg limits in limit p osition were not treated here for space rea

sons They do not oer fundamentally new problems and are included in the actual

implementation

Some p ostpro cessing parts of the algorithm lo ok somewhat imp erative These

are those where some subformulae are p ositioned indep endently of each other

only to detect afterwards that certain minimal distances b etween them are not

satised see for instance function distances used for fractions and function

R Heckmann and R Wilhelm

SupSubDistances for sup erscripts and subscripts o ccurring together It would

b e nice to have a declarative manner for stating such constraints with auto

matic means for building a formula from its subformulae or more generallya

twodimensional diagram from its comp onents

Acknowledgements

We appreciated the detailed and constructive reviews wereceived These havecontributed

to a greatly improved presentation Our thanks go to the reviewers

References

Knuth D E a The T Xb o ok Addison Wesley

E

Knuth D E b T X The Program Addison Wesley

E

Paulson L C ML for the Working Cambridge University Press

Wilhelm R and Heckmann R Grund lagen der Dokumentenverarbeitung Addison Wesley