A New Approach to Document Formatting Jeffrey H. Kingston Basser Department of Computer Science University of Sydney 2006 Austra
Total Page:16
File Type:pdf, Size:1020Kb
A New Approach to Document Formatting Jeffrey H. Kingston Basser Department of Computer Science University of Sydney 2006 Australia ABSTRACT This paper describes a new approach to document formatting, in which features are written in a small, coherent, high-level language called Lout. The resulting increase in productivity has permitted many advanced features to be developed quickly and accurately, including page layout of unprecedented flexibility, equation formatting, automatically generated tables of contents, running page headers and footers, cross references, sorted indexes, and access to bibliographic databases. A fully operational production implementation of the Lout system including all these features and many others is freely available. 22 December, 1992 A New Approach to Document Formatting Jeffrey H. Kingston Basser Department of Computer Science University of Sydney 2006 Australia 1. Introduction mented. The personal computer and the laser A second solution is to provide a printer have sparked a revolution in the relatively small system equipped with a production of documents. Many authors now means of defining new features, as in routinely take their work from conception programming languages. This approach to camera-ready copy, many publishers are has been taken by the batch formatters using desktop publishing systems, and it is (those which do not display a continuously probable that manual assembly of documents updated image of the printed document while will become uncommon in the near future. editing) found in academia, notably troff [11], T X [10], and Scribe [12]. Features As control moves into the hands of E such as footnotes and automatic tables of an ever-increasing number of non-technical contents have been added to these systems people, the stresses on document formatting using macro definitions. Unfortunately, such software increase. On the one hand, this soft- extensions are very difficult and error-prone ware must be so simple that anyone can use in practice: T X's footnote macro alone it; on the other, it must supply a bewildering E contains half a page of dense, obscure code, array of features. A book, for example, de- while those who have extended troff have mands fonts, paragraph and page breaking, abandoned the language itself and taken floating figures and tables, footnotes, chap- refuge in preprocessors. A more productive ters and sections, running page headers and basis for developing new features is needed. footers, an automatically generated table of contents, and a sorted index. Add to this an This article presents a high-level open-ended list of specialized features, be- language for document formatting, called ginning with mathematical typesetting, dia- Lout, which is intended to form such a grams, and access to bibliographic databas- basis. Lout is quite accessible to non-expert es, and the result is a nightmare for the soft- users, but its unique property is the ease with ware developer. which expert users can add new features. We begin with a presentation of Lout as it One solution to this feature explosion appears to the non-expert user who employs problem is to implement as a system primi- the standard packages without understanding tive every feature that will ever be required. Lout's principles. Later sections switch to Although all of the successful interactive the expert's view, showing by examples the document editors known to the author take principles of Lout and how advanced features this approach (admittedly with some attempt are defined. to generalize and unify their features), it has 1 clearly reached its limit. Few such systems A Unix-compatible batch formatter for provide equation formatting, fewer still will format a Pascal program, and other special- 1 ized features will simply never be imple- Unix is a trademark of AT&T Bell Laborato- ries. - 2 - Lout (called Basser Lout) has been written headers and footers, access to bibliographic which produces PostScript1 output suitable databases, and a sorted index, among many for printing on most laser printers and many other features. Specialized packages exist other devices. A library of standard packages for mathematical typesetting, drawing written in Lout provides all of the features figures, and formatting Pascal programs. listed above and many others. This system The advanced features maintain the is not an experimental prototype, it is a fully simple style established above. To produce operational production implementation. The a footnote, for example, one simply types software and its supporting documentation [3, 9, 4, 5, 6, 7, 8] are available free of charge @FootNote { ... } from the author. at the appropriate point, and it will be numbered and placed at the bottom of the 2. The non-expert's view page; to add an item to the index, The non-expert user perceives Lout expert @Index { Expert user } as text interspersed with special symbols, in a style reminiscent of many other batch is typed, and the right parameter will appear formatters: in the index, with a page number, at a place determined by the alphabetical ranking of the @Doc @Text @Begin left parameter. No technical knowledge is @Heading { Standard Integrals } required to use these features. @PP The following list of standard integrals should be memorized: 3. Objects @NumberList @Item @Eq {int e sup x dx = e sup x} To the expert user, Lout is a high- @Item @Eq {int dx over level functional language with a relative- sqrt { 1 - x sup 2 } = arc sin x} ly small repertoire of primitive features or- @EndList ganized around four key concepts: objects, @End @Text definitions, cross references, and galleys. An object is a rectangle with at least one horizon- Braces are used for grouping parameters tal and one vertical mark protruding from it. to the features. The symbols are all For example, taken from two of the standard packages: DocumentLayout, which provides headings, Australia paragraphs, lists, footnotes, sections, and so is an object which is viewed by Lout like on, and Eq, which provides mathematical this: typesetting in a style copied from the eqn language of Kernighan and Cherry [2]. Australia At the time of writing, packages exist Horizontal and vertical concatenation for formatting general documents, technical operators, denoted by the symbols | and /, are reports, and books, the latter providing an used to assemble larger objects: automatic table of contents, running page USA |0.2i Australia is the object 1PostScript is a trademark of Adobe Systems, Incorporated. - 3 - USA Australia The length will be widened if necessary to prevent the parameters from overlapping, The parameters are separated by the length thus implementing the baseline-to-baseline given after the concatenation symbol (0.2 spacing used between lines of text. Other inches in this example), and their horizontal modes provide tabulation from the left marks are aligned. margin, overstriking, and hyphenation. Tables are made by combining the The final appearance of an object is two operators, with | having the higher affected by a limited amount of information precedence: inherited from the context, principally the USA |0.2i Australia font and the width available for the object to /0.1i Washington | Canberra occupy. There are operators for setting these attributes: is the object Slope @Font { USA Australia Hello, world Washington Canberra } The second horizontal concatenation sym- produces bol needs no length, since the first one de- Hello, world termines the separation between the two columns created by the alignment of the ver- and in a similar way tical marks. Objects of arbitrary complex- ity may be assembled using these and other 1.5i @Wide { operators, and braces used for grouping, in a (1) |0.1i A small manner analogous to the assembly of expres- indented paragraph sions in programming languages. of text. } The lengths attached to concatenation symbols have features which permit objects produces to be positioned very precisely. In addition to the usual units of measurement (inches, (1) A small indented centimetres, points, and ems), lengths may paragraph of text. be measured in units of the current font size, space width, inter-line space, and available with the paragraph inheriting and being width (for centering and right justification). broken to an available width of 1.4 inches minus the width of (1). This size inheritance There are also six gap modes, which remains secure through all the complexities determine where the lengths are measured of gap modes, mark alignment, the @Wide from. Previous examples have used edge- and other operators, and so on, providing a to-edge mode: high-level service comparable in value with strong typing in programming languages. Lout also provides a mark-to-mark mode, 4. Definitions obtained by appending x to the length: Lout permits the user to define operators which take objects as parameters and return objects as results. This feature, - 4 - unremarkable in itself, has some surprising ators combine together. All are Lout prim- applications, most notably to a problem itives except @HLine, which calls upon which is the litmus test of flexibility in Lout's graphics primitives to draw a horizon- document formatting: the specification of tal line. The // and ^// operators are variants page layout. of vertical concatenation which omit mark The use of operators in document alignment; the separation is 0.2 times the cur- formatting seems to have been pioneered rent font size. The two |0.5rt operators cen- by the eqn equation formatting language ter each parameter in the column. Finally, of Kernighan and Cherry [2]. In eqn, the the @OneRow and ^// operators work togeth- mathematical formula er to ensure that only one horizontal mark protrudes, rather than three; the result has the x2 + 1 structure 4 x2 + 1 for example is expressed as 4 { x sup 2 + 1 } over 4 and so will be aligned correctly with adjacent parts of the equation. This identical expression is also used in Lout, As is usual in functional languages, but in Lout the operators (sup, over and so sequences are obtained by recursive on) have visible definitions which are easy to definitions.