The Computer Science of TEX and LATEX; Based on CS 594, Fall 2004, University of Tennessee
Total Page:16
File Type:pdf, Size:1020Kb
The Computer Science of TEX and LATEX; based on CS 594, fall 2004, University of Tennessee Victor Eijkhout Texas Advanced Computing Center The University of Texas at Austin 2012 2 TEX–LATEX – CS 594 3 About this book. These are the lecture notes of a course I taught in the fall of 2004. This was the first time I taught the course, and the first time this course was taught, period. It has also remained the only time the course was taught. These lecture notes, therefore, are prob- ably full of inaccuracies, mild fibs, and gross errors. Ok, make that: are definitely full of at the least the first two categories, because I know of several errors that time has prevented me from addressing. My lack of time being what it is, this unfinished book wil now remain as is. The reader is asked to enjoy it, but not to take it for gospel. Victor Eijkhout [email protected] Knoxville, TN, december 2004; Austin, TX, january 2012. copyright 2012 Victor Eijkhout ISBN 978-1-105-41591-3 Enjoy! Victor Eijkhout 4 TEX–LATEX – CS 594 2.3 Finite state automata and regular languages 45 2.4 Lexical analysis with FSAs 50 Contents Syntax parsing 52 2.5 Context-free languages 52 2.6 Parsing context-free About this book 3 languages 55 1 TEX and LATEX 9 Lex 67 LATEX 10 2.7 Introduction 67 1.1 Document 2.8 Structure of a lex markup 10 file 67 1.2 The absolute basics of 2.9 Definitions LATEX 12 section 68 1.3 The TEX conceptual 2.10 Rules section 69 model of 2.11 Regular typesetting 15 expressions 71 1.4 Text elements 15 2.12 Remarks 72 1.5 Tables and figures 25 2.13 Examples 73 1.6 Math 25 Yacc 77 1.7 References 28 2.14 Introduction 77 1.8 Some TEXnical 2.15 Structure of a yacc issues 30 file 77 1.9 Customizing 2.16 Motivating LATEX 30 example 77 1.10 Extensions to 2.17 Definitions LATEX 34 section 79 TEX 2.18 Lex Yacc programming 38 interaction 79 TEX visuals 39 2.19 Rules section 81 Projects for this 2.20 Operators; precedence chapter 40 and associativity 82 2 Parsing 41 2.21 Further remarks 83 Parsing theory 42 2.22 Examples 86 2.1 Levels of parsing 42 Hashing 93 2.2 Very short 2.23 Introduction 93 introduction 42 2.24 Hash functions 94 Lexical analysis 45 2.25 Collisions 97 5 6 Contents 2.26 Other applications of Curve plotting with hashing 102 gnuplot 170 2.27 Discussion 103 4.4 Introduction 170 Projects for this 4.5 Plotting 170 chapter 104 Raster graphics 172 3 Breaking things into 4.6 Vector graphics and pieces 105 raster graphics 172 Dynamic 4.7 Basic raster Programming 106 graphics 172 3.1 Some examples 106 4.8 Rasterizing type 176 3.2 Discussion 114 4.9 Anti-aliasing 179 TEX paragraph Projects for this breaking 116 chapter 183 3.3 The elements of a 5 TEX’s macro paragraph 116 language – 3.4 TEX’s line breaking unfinished algorithm 120 chapter 185 NP Lambda calculus in completeness 130 TEX 186 3.5 Introduction 130 5.1 Logic with TEX 186 3.6 Basics 132 6 Character 3.7 Complexity encoding 199 classes 133 Input file 3.8 NP- encoding 200 completeness 135 6.1 History and Page breaking 138 context 200 3.9 Introduction 138 6.2 Unicode 203 3.10 TEX’s page breaking 6.3 More about character algorithm 139 sets and 3.11 Theory of page encodings 207 breaking 141 6.4 Character issues in Projects for this TEX/LATEX 210 chapter 150 Font encoding 212 4 Fonts 151 6.5 Basic Bezier curves 152 terminology 212 4.1 Introduction to curve 6.6 Æsthetics 215 approximation 152 6.7 Font 4.2 Parametric technologies 216 curves 157 6.8 Font handling in TEX 4.3 Practical use 167 and LATEX 218 TEX–LATEX – CS 594 Contents 7 Input and output 7.2 Knuth’s philosophy of encoding in program LATEX 221 development 224 6.9 The fontenc Software package 221 engineering 225 Projects for this 7.3 Extremely brief chapter 222 history of TEX 225 7.4 TEX’s Software 7 development 225 engineering 223 Markup 229 Literate 7.5 History 229 programming 224 Projects for this 7.1 The Web system 224 chapter 231 Victor Eijkhout 8 Contents TEX–LATEX – CS 594 Chapter 1 TEX and LATEX In this chapter we will learn • The use of LATEX for document preparation, • LATEX style file programming, • TEX programming. Handouts and further reading for this chapter For LATEX use the ‘Not so short introduction to LATEX’ by Oetiker et al. For further reading and future reference, it is highly recommended that you get ‘Guide to LATEX’ by Kopka and Daly [15]. The original reference is the book by Lamport [16]. While it is a fine book, it has not kept up with developments around LATEX, such as contributed graphics and other packages. A book that does discuss extensions to LATEX in great detail is the ‘LATEX Companion’ by Mittelbach et al. [17]. For the TEX system itself, consult ‘TEX by Topic’. The original reference is the book by Knuth [12], and the ultimate reference is the published source [11]. 9 10 CHAPTER 1. TEX AND LATEX LATEX. 1.1 Document markup If you are used to ‘wysiwyg’ (what you see is what you get) text processors, LATEX may seem like a strange beast, primitive, and probably out-dated. While it is true that there is a long history behind TEX and LATEX, and the ideas are indeed based on much more primitive technology than what we have these days, these ideas have regained surprising validity in recent times. 1.1.1 A little bit of history Document markup dates back to the earliest days of computer typesetting. In those days, terminals were strictly character-based: they could only render mono-spaced built-in fonts. Graphics terminals were very expensive. (Some terminals could switch to a graphical character set, to get at least a semblance of graphics.) As a result, com- positors had to key in text on a terminal – or using punched cards in even earlier days – and only saw the result when it would come out of the printer. Any control of the layout, therefore, also had to be through character sequences. To set text in bold face, you may have had to surround it with <B> .. the text .. </B>. Doesn’t that look like something you still encounter every day? Such ‘control sequences’ had a second use: they could serve a template function, ex- panding to often used bits of text. For instance, you could imagine $ADAM$ expanding to ‘From our correspondent in Amsterdam:’. LATEX works exactly the same. There are command control sequences; for instance, you get bold type by specifying \bf, et cetera. There are also control sequences that expand to bits of text: you have to type \LaTeX to get the characters ‘LATEX’ plus the control codes for all that shifting up and down and changes in font size. \TeX => T\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX \LaTeX => L\kern -.36em {\sbox \z@ T\vbox to\ht \z@ {\hbox {\check@mathfonts \fontsize \sf@size \z@ \math@fontsfalse \selectfont A} \vss }}\kern -.15em\TeX 1.1.2 Macro packages The old typesetting systems were limited in their control sequences: they had a fixed repertoire of commands related to typesetting, and there usually was some mechanism TEX–LATEX – CS 594 1.1. DOCUMENT MARKUP 11 to defining ‘macros’ with replacement text. Formally, a macro is a piece of the input that gets replaced by its definition text, which can be a combination of literal text and more macros or typesetting commands. An important feature of many composition programs is the ability to designate by suitable input instructions the use of specified for- mats. Previously stored sequences of commands or text replace the instructions, and the expanded input is then processed. In more so- phisticated systems, formats may summon other formats, including themselves [”System/360 Text Processor Pagination/360, Applica- tion Description Manual,” Form No. GE20-0328, IBM Corp., White Plains, New York.]. That was the situation with commercial systems by manufacturers of typesetting equip- ment such as Linotype. Systems developed by (and for!) computer scientists, such Scribe or nroff/troff, were much more customizable. In fact, they sometimes would have the equivalent of a complete programming language on board. This makes it pos- sible to take the basic language, and design a new language of commands on top of it. Such a repertoire of commands is called a macro package. In our case, TEX is the basic package with the strange macro programming language, 1 and LATEX is the macro package .LATEX was designed for typesetting scientific articles and books: it offers a number of styles, each with slightly different commands (for instance, there are no chapters in the article style) and slightly different layout (books need a title page, articles merely a title on the first page of the text). Styles can also easily be customized. For different purposes (art books with fancy designs) it is often better to write new macros in TEX, rather than to bend the existing LATEX styles. However, if you use an existing LATEX style, the whole of the underlying TEX program- ming language is still available, so many extensions to LATEX have been written. The best place to find them is through CTAN http://wwww.ctan.org/.