<<

A New Approach to Document Formatting

Jeffrey H. Kingston

Basser Department of Computer Science University of Sydney 2006 Australia

ABSTRACT This paper describes a new approach to document formatting, in which features are written in a small, coherent, high-level language called . The resulting increase in productivity has permitted many advanced features to be developed quickly and accurately, including page layout of unprecedented flexibility, equation formatting, automatically generated tables of contents, running page headers and footers, cross references, sorted indexes, and access to bibliographic databases. A fully operational production implementation of the Lout system including all these features and many others is freely available.

22 December, 1992 A New Approach to Document Formatting

Jeffrey H. Kingston

Basser Department of Computer Science University of Sydney 2006 Australia

1. Introduction mented. The personal computer and the laser A second solution is to provide a printer have sparked a revolution in the relatively small system equipped with a production of documents. Many authors now means of defining new features, as in routinely take their work from conception programming languages. This approach to camera-ready copy, many publishers are has been taken by the batch formatters using systems, and it is (those which do not display a continuously probable that manual assembly of documents updated image of the printed document while will become uncommon in the near future. editing) found in academia, notably [11], T X [10], and [12]. Features As control moves into the hands of E such as footnotes and automatic tables of an ever-increasing number of non-technical contents have been added to these systems people, the stresses on document formatting using definitions. Unfortunately, such software increase. On the one hand, this soft- extensions are very difficult and error-prone ware must be so simple that anyone can use in practice: T X’s footnote macro alone it; on the other, it must supply a bewildering E contains half a page of dense, obscure code, array of features. A book, for example, de- while those who have extended troff have mands fonts, paragraph and page breaking, abandoned the language itself and taken floating figures and tables, footnotes, chap- refuge in . A more productive ters and sections, running page headers and basis for developing new features is needed. footers, an automatically generated table of contents, and a sorted index. Add to this an This article presents a high-level open-ended list of specialized features, be- language for document formatting, called ginning with mathematical , dia- Lout, which is intended to form such a grams, and access to bibliographic databas- basis. Lout is quite accessible to non-expert es, and the result is a nightmare for the soft- users, but its unique property is the ease with ware developer. which expert users can add new features. We begin with a presentation of Lout as it One solution to this feature explosion appears to the non-expert user who employs problem is to implement as a system primi- the standard packages without understanding tive every feature that will ever be required. Lout’s principles. Later sections switch to Although all of the successful interactive the expert’s view, showing by examples the document editors known to the author take principles of Lout and how advanced features this approach (admittedly with some attempt are defined. to generalize and unify their features), it has 1 clearly reached its limit. Few such systems A -compatible batch formatter for provide equation formatting, fewer still will format a Pascal program, and other special- 1 ized features will simply never be imple- Unix is a trademark of AT&T Bell Laborato- ries. - 2 -

Lout (called Basser Lout) has been written headers and footers, access to bibliographic which produces PostScript1 output suitable databases, and a sorted index, among many for printing on most laser printers and many other features. Specialized packages exist other devices. A library of standard packages for mathematical typesetting, drawing written in Lout provides all of the features figures, and formatting Pascal programs. listed above and many others. This system The advanced features maintain the is not an experimental prototype, it is a fully simple style established above. To produce operational production implementation. The a footnote, for example, one simply types software and its supporting documentation [3, 9, 4, 5, 6, 7, 8] are available free of charge @FootNote { ... } from the author. at the appropriate point, and it will be numbered and placed at the bottom of the 2. The non-expert’s view page; to add an item to the index,

The non-expert user perceives Lout expert @Index { Expert user } as text interspersed with special symbols, in a style reminiscent of many other batch is typed, and the right parameter will appear formatters: in the index, with a page number, at a place determined by the alphabetical ranking of the @Doc @Text @Begin left parameter. No technical knowledge is @Heading { Standard Integrals } required to use these features. @PP The following list of standard integrals should be memorized: 3. Objects @NumberList @Item @Eq {int e sup x dx = e sup x} To the expert user, Lout is a high- @Item @Eq {int dx over level functional language with a relative- sqrt { 1 - x sup 2 } = arc sin x} ly small repertoire of primitive features or- @EndList ganized around four key concepts: objects, @End @Text definitions, cross references, and galleys. An object is a rectangle with at least one horizon- Braces are used for grouping parameters tal and one vertical mark protruding from it. to the features. The symbols are all For example, taken from two of the standard packages: DocumentLayout, which provides headings, Australia paragraphs, lists, footnotes, sections, and so is an object which is viewed by Lout like on, and Eq, which provides mathematical this: typesetting in a style copied from the language of Kernighan and Cherry [2]. Australia At the time of writing, packages exist Horizontal and vertical concatenation for formatting general documents, technical operators, denoted by the symbols | and /, are reports, and books, the latter providing an used to assemble larger objects: automatic table of contents, running page USA |0.2i Australia

is the object 1PostScript is a trademark of Adobe Systems, Incorporated. - 3 -

USA Australia The length will be widened if necessary to prevent the parameters from overlapping, The parameters are separated by the length thus implementing the baseline-to-baseline given after the concatenation symbol (0.2 spacing used between lines of text. Other inches in this example), and their horizontal modes provide tabulation from the left marks are aligned. margin, overstriking, and hyphenation. Tables are made by combining the The final appearance of an object is two operators, with | having the higher affected by a limited amount of information precedence: inherited from the , principally the USA |0.2i Australia font and the width available for the object to /0.1i Washington | Canberra occupy. There are operators for setting these attributes: is the object Slope @Font { USA Australia Hello, world Washington Canberra }

The second horizontal concatenation sym- produces bol needs no length, since the first one de- Hello, world termines the separation between the two columns created by the alignment of the ver- and in a similar way tical marks. Objects of arbitrary complex- ity may be assembled using these and other 1.5i @Wide { operators, and braces used for grouping, in a (1) |0.1i A small manner analogous to the assembly of expres- indented paragraph sions in programming languages. of text. } The lengths attached to concatenation symbols have features which permit objects produces to be positioned very precisely. In addition to the usual units of measurement (inches, (1) A small indented centimetres, points, and ems), lengths may paragraph of text. be measured in units of the current font size, space width, inter-line space, and available with the paragraph inheriting and being width (for centering and right justification). broken to an available width of 1.4 inches minus the width of (1). This size inheritance There are also six gap modes, which remains secure through all the complexities determine where the lengths are measured of gap modes, mark alignment, the @Wide from. Previous examples have used edge- and other operators, and so on, providing a to-edge mode: high-level service comparable in value with strong typing in programming languages.

Lout also provides a mark-to-mark mode, 4. Definitions obtained by appending x to the length: Lout permits the user to define operators which take objects as parameters and return objects as results. This feature, - 4 - unremarkable in itself, has some surprising ators combine together. All are Lout prim- applications, most notably to a problem itives except @HLine, which calls upon which is the litmus test of flexibility in Lout’s graphics primitives to draw a horizon- document formatting: the specification of tal line. The // and ^// operators are variants page layout. of vertical concatenation which omit mark The use of operators in document alignment; the separation is 0.2 times the cur- formatting seems to have been pioneered rent font size. The two |0.5rt operators cen- by the eqn equation formatting language ter each parameter in the column. Finally, of Kernighan and Cherry [2]. In eqn, the the @OneRow and ^// operators work togeth- mathematical formula er to ensure that only one horizontal mark protrudes, rather than three; the result has the x2 + 1 structure 4 x2 + 1 for example is expressed as 4

{ x sup 2 + 1 } over 4 and so will be aligned correctly with adjacent parts of the equation. This identical expression is also used in Lout, As is usual in functional languages, but in Lout the operators (sup, over and so sequences are obtained by recursive on) have visible definitions which are easy to definitions. For example, the ‘leaders’ often modify and extend. seen in tables of contents can be generated by For example, here is the definition of the the definition over operator as it appears in the Eq equation formatting package [6]: def @Leaders { def over .. @Leaders precedence 54 } associativity left left x White space after { and before } is not right y significant. The recursion stops when space { runs out, so @OneRow @OneCol 1.5i @Wide { { Chapter 1 @Leaders 5 |0.5rt x } ^//0.2f @HLine //0.2f |0.5rt y has result } } Chapter 1 ...... 5

Invocations of over return the object between The final invocation of @Leaders is deleted the braces, with the formal parameters x and along with any preceding concatenation y replaced by actual parameters which are operator (or white space in this case). objects found to the left and right of the over The specification of page layout symbol. is a major problem for many document The body of over makes a good demon- formatters, because the model of a document stration of the way in which Lout’s oper- as a sequence of pages is built-in, and an - 5 - armada of tedious commands is required to - 1 - communicate with this model: commands @TextPlace for setting page width and height, margins, columns, page header and footer lines, and so on. Even with all these commands, the formatter will be unable to do such a simple thing as draw a box around each page, if there is no command for it. @FootSect Lout has no built-in model and no such commands. Instead, a page is an object like - 2 - any other: @TextPlace

def @Page right x { 8i @Wide 11i @High { @FootSect //1i ||1i x ||1i @PageList 3 //1i } which has the potential to expand to infinitely } many pages.

The result of @Page is an eight by eleven We conclude this example by defining inch object containing the right parameter @FootSect to be a small horizontal line within one inch margins. A document is a above a list of @FootPlace symbols: vertical concatenation of numbered pages: def @FootList def @PageList { right @PageNum @FootPlace { //0.1i @FootList @Page } { def @FootSect |0.5rt - @PageNum - { //0.2i @TextPlace 1i @Wide @HLine //1rt @FootSect //0.1i @FootList } } //0i @PageList This method of specifying page layout is @Next @PageNum infinitely more flexible than the other, since } the full resources of the language may be brought to bear. Of course, the @PageList, The @Next operator is a Lout primitive @FootSect, and @FootList symbols must be that returns its right parameter plus one; expanded only on demand, and we have yet all automatic numbering is effected by to see how the @TextPlace and @FootPlace combining this operator with recursion. The symbols can be replaced by actual text and result of @PageList 1 is the object footnotes. - 6 -

5. Galleys @PageList 1 // The fundamental problem with insert- @Text { ing text, footnotes, and floating figures into @Heading { Galleys } pages is that the process seems impossible to @PP describe in functional terms. A footnote is The fundamental ... entered within a paragraph of text, but it ap- ... to explain this. pears somewhere else: at the foot of a page. } Some new abstraction is needed to explain this. The only new definitions required are these: The landscape of features that previ- def @TextPlace { @Galley } ous document formatting systems have in- troduced at this point can best be described def @Text metaphorically, as an antediluvian swamp into { @TextPlace&&preceding } populated by dinosaurs and demons, whose right x air is filled with the piteous cries of document { x } format designers in torment. Lout’s solution to this problem is They say that @TextPlace (which the reader a feature called the galley, after the metal will recall as lying within the pages of trays used in manual typesetting. A galley @PageList) is a placeholder for an incoming consists of an object plus an indication that galley, and that @Text is a galley whose the object is to appear somewhere other than result is to appear, not where @Text is its invocation point. For example, invoked, but rather at some @TextPlace preceding that point of invocation in the final

- 1 - printed document. Galleys Although the abstraction can be The fundamen- understood in a static way, it is helpful to tal problem with in- serting text, footnotes, trace what happens when the Basser Lout and floating figures into pages is that the pro- batch formatter reads the expression above. cess seems impossible to describe in function- Since @PageList 1 indirectly contains al terms. A footnote is the special @Galley symbol, it will be ex- entered within a para- panded only upon demand. The discov-

- 2 - ery of the @Text galley initiates a search graph of text, but it ap- for a @TextPlace, which is found within pears somewhere else: at @PageList 1 and so forces one such expan- the foot of a page. Some new abstraction is need- sion. The available width at this @TextPlace ed to explain this. is six inches, so the @Text galley is bro- ken into six-inch components which are pro- moted one by one until the available height (nine inches) is exhausted. Since the @Text galley is not entirely consumed, a forward search of the document is made, another is the result of the expression @TextPlace is found within the as yet unex- panded @PageList 2, and the process is re- peated. - 7 -

The treatment of footnotes is the same, its arrow, but it turns out that a reverse except that flow of information can also be very useful. For example, large documents often have def @FootNote cross references such as ‘see Table 6 on into { @FootPlace&&following } page 57.’ If the numbers are replaced by arrows pointing to the table in question, is used to make the footnote appear later in it should be possible to have their values the finished document than its invocation filled in automatically (an idea introduced point. Basser Lout suspends the promotion by Scribe [12]). An arrow pointing outside of text into pages just after the component the document could retrieve an entry from a containing the footnote’s invocation point is database of references, Roman numerals, etc. promoted, switches to the placement of the And a running page header like ‘Chapter 8: footnote galley, then resumes the body text. Sorting’ might obtain its value from an arrow A collection of galleys all targeted to the pointing from the page header line down into same place may optionally appear sorted on the body text of the page, where the current a designated key, thus implementing sorted chapter is known. reference lists and indexes. The author was All these ideas are realized in Lout, but obliged to make the sorting option a primitive here we will just sketch a simplified version feature, since it otherwise seems to require of the running page header definitions found boolean operators which he preferred to in the BookLayout package [4]. A symbol exclude. called @Runner is first defined: The @PageList object which receives the @Text galley can itself be viewed as def @Runner a galley whose components are pages, right @Val and this leads to a dynamic view of Lout {} document assembly as a tree of galleys, each promoting into its parent, with the root galley @Runner produces nothing at all, which promoting into the output file. For example, means that we may place the invocation the BookLayout package [4] has @Section @Runner { Chapter galleys promoting into @Chapter galleys 8: Sorting } promoting into a single @PageList galley, which promotes into the output; no limit is at the end of a chapter without harm. This imposed on the height of the tree. invisible invocation will be carried along with the chapter and will end up on some page of the final printed document. 6. Cross references By modifying the definition of @Page- The terms @TextPlace&&preceding List, we can add to each page a header line and @FootPlace&&following used above can containing the expression be thought of as arrows in the final printed document, pointing from themselves to the @Runner&&following place they name. Expressed in this way, @Open { @Val } free of any reference to the internal action of the document formatter, they are easy to This means ‘find the nearest following comprehend and work with. These arrows invocation of @Runner in the final printed are called cross references in Lout. document and retrieve its @Val parameter.’ Every page of Chapter 8 will find the correct A galley is transported forwards along - 8 - running header, since @Runner was placed the interface be supported by a functional at the end of the chapter. The invocation base language, accessible to the expert user, @Runner {} placed at the beginning of the for such purposes as page layout definition chapter will suppress the header on the first and fine control over formatting. Lout page of the chapter, as it is conventional to appears to be an excellent candidate for do. such a language, because of its small size, These invocations of @Runner are precision, and functional semantics. hidden from the non-expert user within the definition of the @Chapter operator. The References result is a reliable implementation of a notoriously difficult feature. 1. Brooks, Kenneth P., Lilac: a two-view document editor. IEEE Computer, 7–19 (1991). 7. Conclusion 2. Kernighan, Brian W. and Cherry, The Lout document formatting system Lorinda L., A system for typesetting permits features as diverse as page layout and mathematics. Communications of the equation formatting to be implemented by ACM 18, 182–193 (1975). definitions written in a high-level language. The consequent improvement in productivity 3. Kingston, Jeffrey H., Document For- has allowed an unprecedented repertoire matting with Lout (Second Edition). of advanced features to be presented to the Tech. Rep. 449 (1992), Basser Depart- non-expert user. ment of Computer Science F09, Univer- To future research in document format- sity of Sydney 2006, Australia. ting, Lout offers evidence of the utility of the functional paradigm, as well as two new 4. Kingston, Jeffrey H., A beginners’ abstractions: galleys and cross references. guide to Lout. Tech. Rep. 450 (1992), These provide a secure foundation for fea- Basser Department of Computer Sci- tures which have proven very difficult to im- ence F09, University of Sydney 2006, plement in the past. Australia. A number of improvements to the 5. Kingston, Jeffrey H., The design and current system can be envisaged. Better implementation of the Lout document paragraph and page breaking algorithms formatting language. Tech. Rep. 442 could be added to the formatter without any (1992), Basser Department of Com- change to the language; non-rectangular puter Science F09, University of Syd- objects would be useful in some places. ney 2006, Australia. To appear in Perhaps the most useful improvement would Software—Practice and Experience. be the representation of paragraphs as horizontal galleys, since this would allow the 6. Kingston, Jeffrey H., Eq – a Lout pack- full power of the language to be brought to age for typesetting mathematics. Tech. bear on paragraph layout, in contrast to the Rep. 452 (1992), Basser Department of present built-in system which offers only the Computer Science F09, University of traditional styles (ragged right, justified, and Sydney 2006, Australia. Contains an so on). appendix describing the Pas Pascal for- matter. The author of a recent interactive document editor [1] has recommended that - 9 -

7. Kingston, Jeffrey H., Fig – a Lout package for drawing figures. Tech. Rep. 453 (1992), Basser Department of Computer Science F09, University of Sydney 2006, Australia.

8. Kingston, Jeffrey H., Tab – a Lout package for formatting tables. Tech. Rep. 451 (1992), Basser Department of Computer Science F09, University of Sydney 2006, Australia.

9. Kingston, Jeffrey H., The Basser Lout Document Formatter, Version 2.05, 1993. , pub- licly available in the jeff subdirec- tory of the home directory of ftp to host ftp.cs.su.oz.au with login name anonymous or ftp and any non-empty password (e.g. none). Lout distri- butions are also available from the comp.sources.misc newsgroup. All en- quiries to [email protected].

10. Knuth, Donald E., The TEXBook. Addison-Wesley, 1984.

11. Joseph F. Ossanna, /Troff User’s Manual. Tech. Rep. 54 (1976), Bell Laboratories, Murray Hill, NJ 07974.

12. Reid, Brian K., A High-Level Ap- proach to Computer Document Produc- tion. Proceedings of the 7th Symposium on the Principles of Programming Lan- guages (POPL), Las Vegas NV, 1980, 24–31.