A NT A TY P E S E TT I N G S Y S TE M
“ant is not TEX.”
Achim Blumensath
December 16, 2007
Contents
1. Invoking ant ...... 3
2. e markup language ...... 4
3. AL – the ant language ...... 4 3.1. Lexical conventions ...... 5 3.2. Literals ...... 6 3.3. Expressions ...... 7 3.4. Statements ...... 9 3.5. Patterns ...... 9 3.6. Declarations ...... 10 3.7. Built in AL-commands ...... 10 4. Typesetting commands ...... 15 4.1. Program control ...... 15 4.2. Page layout ...... 16 4.3. Galleys and paragraphs ...... 19 4.4. Boxes ...... 20
— 1 — 4.5. Parameters ...... 21 4.6. Fonts ...... 25 4.7. Tables ...... 28 4.8. Colour and graphics ...... 29 4.9. Mathematics ...... 30 4.10. Macros and environments ...... 32 4.11. Counters and references ...... 33 4.12. Parsing ...... 35 4.13. Nodes ...... 36 4.14. Environment commands ...... 36 4.15. Dimensions ...... 37 5. Overview over the source code ...... 38 5.1. e runtime library ...... 38 5.2. e typesetting library ...... 40 5.3. e layout engine ...... 40 5.4. e parser ...... 42 5.5. e virtual machine ...... 43
— 2 — ant 0.8 1. Invoking ant
is a typesetting system inspired by TEX. Although TEX does a very good job when typesetting mathematical articles and books – the task it has been designed for – it can become very difficult, cumbersome, or even impossible to meet the typographical requirements of texts outside this narrow scope. For instance, the current dra of the new output routine for LATEX consists of more than 100 pages. Unfortunately, it is also very difficult to extend the functionality of TEX since its source code is a total mess. Even aer 20 years there are only versions with minor modifications available. For these reasons I decided to rewrite from scratch aiming for a simple, clean, and modular design. In particular it is easily possible to replace parts of with other implementations, say, adding an parser, output routines for files, or a different page layout algorithm. e current version of implements all the major features of TEX but a lot of minor things are still missing. In addition, provides several improvements over TEX: a saner macro language (no catcodes); a builtin high-level scripting language; support; support for various font formats including Type1, TrueType, and Open- Type; partial support for advanced OpenType features; support for colour and graphics; simple page layout specifications; river detection.
1. Invoking ant
translates its input file into a - or -file. Rudimentary support for Post- Script and output is also implemented. e program is invoked as ant [options] input file
Currently, the following options are supported: --help prints a help message. --format= format selects the output format. Supported are dvi, xdvi, pdf (default), ps, and svg. e latter two are only partially implemented.
— 3 — 3. AL – the ant language ant 0.8
--src-specials enables the generation of source specials. --debug= flags enables debug messages. flags is a combination of the fol- lowing letters: a AL-commands l line breaking b AL-bytecode m macro expansion e the typesetting engine p page layout g galley breaking s various stacks i the current input
2. The markup language
e markup language of is quite similar to the syntax of TEX. In particular, the rules regarding tokens and braces are the same. One notable exception is that stores macros as plain strings without breaking them into tokens. is solves all those issues TEX has with its catcodes. But beware that, if you define the macro \definecommand\foo[m]{\bar#1} then \foo{text} will expand to \bartext. Another difference between and TEX is that one can use arithmetic ex- pressions to specify skips or dimensions: \hskip{2em + (4/7)*5cm}
3. AL – the ant language
e requirements on a markup language for authors are quite different from those on a programming language for implementing these markup commands. For instance, the TEX macro language serves rather well as a markup language but it is quite unsuited for the implementation of packages. Besides the markup language therefore also provides a scripting language called AL. Syntactic- ally AL resembles (a subset of) the Haskell programming language. But there are two notable semantic differences: (i) evaluation in AL is strict, not lazy and (ii) AL includes a solver for linear equations and, therefore, supports variables whose value is not yet determined.
— 4 — ant 0.8 3. AL – the ant language
3.1. Lexical conventions
We distinguish six classes of characters according to their category:
White space (ws): Mn, Mc, Me, Zs, Zl, Zp, Cc, Cf, Cs, Co, Cn Lowercase letters (lc): Ll, Lm, Lo Uppercase letters (uc): Lu, Lt Digits (dd): Nd, Nl, No Symbols (sy): Pc, Pd, Ps, Pe, Pi, Pf, Po, Sm, Sc, Sk, So Special characters (sp): " ' , ; _ ( ) [ ] { } Comments. A line comment starts with ;; and extends to the end of the line, and block comments are delimited by [; and ;]. Identifiers. ere are three types of identifiers: lowercase and symbolic identi- fiers represent variables while uppercase identifiers are used for symbols. lid lc lc uc tail uid uc lc uc tail tail _ lc uc _ dd _ sy symbol sy Examples: lowercase_Identifier_2 op_+
Uppercase_12_*&/_45
<<|: ** +/- e following symbols and keywords are reserved: begin local declare_infix_left do match declare_infix_non else then declare_infix_right elseif where declare_prefix end with declare_postfix if = := | . :
— 5 — 3. AL – the ant language ant 0.8
3.2. Literals
Numbers. Numerical constants can be written either using decimal notation or as fraction. Supported bases are 2, 8, 10, and 16. A sequence of digits may be interleaved with arbitrary many underscores _. number decimal fraction fraction natural / natural natural 0b bin 0o oct dec 0x hex decimal 0b bin . bin 0o oct . oct dec . dec 0x hex . hex bin 0b bd _ bd bd oct 0o od _ od od dec dd _ dd dd hex 0x hd _ hd hd bd 0 1 od 0 7 dd 0 9 hd dd a f A F Examples: 3.4 3/4 0xfa.4 0o64/0b101 1_000_000 Strings and characters. Character constants are enclosed in apostrophes ', string constants are deliminated by double quotes ". A string constant is just an abbreviation for a list of characters. e following escape sequences are re- cognised:
\' U0027 apostrophe \t U0009 tabulator \" U0022 double quote \ddd — character with 8 bit \\ U005C backslash ddd in decimal \b U0007 bell \xhh — character with 8 bit \e U001B hh in hexadecimal \f U000C form feed \uhhhh — character with 16 bit \n U000A newline hhhh in hexadecimal \r U000D carriage return
Symbols. Symbols are uppercase identifiers. e symbols True and False are used as boolean values.
— 6 — ant 0.8 3. AL – the ant language
3.3. Expressions
Expressions consist of simple expressions combined by operators. We distinguish between binary, prefix, and postfix operators. expr expr post-op expr bin-op expr simple-expr number simple-expr local lid ; expr local decl ; expr local begin decl-list end expr expr where decl-list end . . . Function application is written without parenthesis or commas: foldl (+) 0 [0,1,2,3,4,5] If a list of simple expressions starts with a number then the symbol * is inserted between the number and the following terms, i.e., 4 sind 45 is equivalent to 4 * sind 45. e local and where constructs allow local definitions of functions and vari- ables where local x1 . . . xn is an abbreviation for
local begin x1 := _; . . . xn := _; end e following table lists all predefined operators in order of increasing priority:
priority assoc. operators priority assoc. operators
0 right $ 7 le * / mod 1 le >> 8 le ^ 2 right || 9 right o 3 right && 9 le !! 4 non == <> > < >= <= function application 5 le land lor lxor lsr lsl prefix prefix operators: ~ 5 right ++ postfix postfix operators 6 le + -
Simple expressions. Simple expressions are either literals, variables, or complex expressions enclosed in parenthesis. We discuss the various cases separately.
— 7 — 3. AL – the ant language ant 0.8
simple-expr lid symbol number character string _ . . . e symbol _ indicates an unnamed variable without value. It is an abbreviation for the expression (local x ; x) simple-expr . . . pre-op simple-expr ( expr ) ( bin-op expr) ( expr bin-op ) ( bin-op ) . . . Partial application of binary operators are written in parenthesis. So (+) denotes the function x, y x y and (+ 1) is the function x x 1. simple-expr . . . ( expr , expr ) [ expr , expr ] [ expr , expr : expr] . . . Tuples are written with parenthesis and commas, lists are set in square brackets. e tail of a list is separated by a colon :. Examples: (1, 0, 0) [0,1,2,3] [x:xs]
e following control constructs are available: simple-expr . . . begin stmt-list-expr end do expr-stmt-list end if expr then stmt-list-expr elseif expr then stmt-list-expr else stmt-list-expr end match expr with { match-body } match-body match-clause | match-clause match-clause pattern & expr := stmt-list-expr stmt-list-expr stmt , expr expr-or-stmt expr stmt expr-stmt-list expr-or-stmt ; expr-or-stmt
— 8 — ant 0.8 3. AL – the ant language
Lambda expressions, i.e., unnamed functions, are written as list of patterns and corresponding expressions similarly to match constructs: expr . . . { fun-body } fun-body fun-clause | fun-clause fun-clause pattern & expr := stmt-list-expr For example: { [] := 0 | [x] := x | [x, y : _] := x + y }
{ x & x > 0 := 1 | x & x == 0 := 0 | x & x < 0 := ~1 }
3.4. Statements
A statement is an equation or an if statement of equations. Note that, for state- ments, the else part of an if statement may be omitted. stmt expr = expr if expr then stmt elseif expr then stmt else stmt end
3.5. Patterns pattern _ lid number ( pattern ) ( pattern , pattern ) [ pattern , pattern ] [ pattern , pattern : pattern] lid = pattern pattern = lid Pattern can be used to check the structure of values and to access their compon- ents. For instance, the pattern (0, x) can be matched with a pair whose first component is the number 0 and whose second component will be bound to the variable x.
— 9 — 3. AL – the ant language ant 0.8
3.6. Declarations decl lid pattern & expr := stmt-list-expr pattern bin-op pattern [& expr] := stmt-list-expr lid declare_infix_left num lid declare_infix_non num lid declare_infix_right num lid declare_prefix lid declare_postfix lid e first two cases are used to declare functions. A list of identifiers x_0 . . . x_n is an abbreviation for the declarations x_0 := _; . . . x_n := _ . e declare_. . . statements can be used to declare the priority and associativity of operators.
3.7. Built in AL-commands
Control constructs. e function error msg can be used to abort the computation with a given error message. Types. To test the type of a value the following functions can be used: is_unbound x is_symbol x is_bool x is_function x is_number x is_list x is_char x is_tuple x Logical operations. e operators for disjunction, conjunction, and negation are || && not Comparison operators. e operators == <> > < >= <= compare their arguments without modifying them. Equality == and inequal-
— 10 — ant 0.8 3. AL – the ant language ity <> are defined for all types. e other relations for numbers, characters, lists, and tuples where the latter ones are ordered lexicographically. min x y max x y compute, respectively, the minimum and maximum of x and y. General arithmetic. e usual arithmetic operations + - * / ^ quot mod ~ abs are supported. Addition + is defined for numbers, tuples, and lists:
num0 + num1 num0 num1