BASIX -An Interpreter Written in TEX
Total Page:16
File Type:pdf, Size:1020Kb
BASIX -An Interpreter Written in TEX Andrew Marc Greene MIT Project Athena, E40-342, 77 Massachusetts Avenue, Cambridge, MA 02139 617-253-7788. Internet: [email protected] Abstract An interpreter for the BASIC language is developed entirely in W. The interpreter presents techniques of scanning and parsing that are useful in many contexts where data not containing formatting directives are to be formatted by m. W'Sexpansion rules are exploited to provide a macro that reads in the rest of the input file as arguments and which never stops expanding. Introduction BASIX makes extensive use of these arrays. Commands are begun with C; functions with F; It is a basic tenet of the T@ faith that rn variables with V; program lines with /; and the is Turing-equivalent and that we can write any linked list of lines with L. This makes it easy for program in W.It is also widely held that the interpreter to look up the value of any of these is "the most idiosyncratic language known to us."' things, given the name as perceived by the user. This project is an attempt to provide a simple One benefit of \csname.. \endcsname is that programming front end to W. if the resultant command sequence is undefined. BASICwas selected because it is a widely used ~'FJ replaces it with \relax. This allows us to interpreted language. It also features an infix syntax check, using \ifx, whether the user has specified a not found in Lisp or POSTSCRIPT.This makes it a non-existent identifier. This trick is used in exercise more difficult but more general problem than either 7.7 in The mbook. We use it in BASIX to check of these others. for syntax errors and uninitialized variables. The speed of the BASIX interpreter is not impressive. It is not meant to be. The purpose Token streams. The BASX interpreter was de- of this interpreter is not to serve as the BASIC signed to be run interactively. It is called by typing implementation of choice. Its purpose is to display tex basix; the file ends waiting for the first line of useful paradigms of input parsing and advanced BASICto be entered at W's * prompt. This also W programming. allows other files to \input basix and immediately follow it with BASICcode. Interaction with We cannot have the scanner read an entire line at once, since if the last line of basix.tex Associative arrays. Using \csnane it is possible were a macro that reads a line as a parameter, to implement associative arrays in 7&X. Associative we'd get a "File ended while scanning use of arrays are arrays whose index is not necessarily a \getlinen error. Instead, we use a method which number. As an example, if \student has the name at first blush seems more convoluted but which is of a student, we might look up the student's grade actually simpler. with We note that TEX does not make any dis- tinction between the tokens that make up our interpreter and the tokens that form the BASIC which would be \grade.Greene in my case. (In code. The BASIX interpreter is carefully constructed the case of \csname, all characters up to the so that each macro ends by calling another macro \endcsname are used in the command sequence (which may read parameters). Thus, expansion is regardless of their category code.) never completed, but the interpreter can continue to absorb individual characters that follow it. These characters affect the direction of the expansion; it is Ward, Stephen A. and Robert H. Halstead, Jr., this behaviour that allows us to implement a BASIC Computation Structures, MIT Press, 1990 interpreter in W. TUGboat, Volume 11 (1990), No. 3-Proceedings of the 1990 Annual Meeting 381 Andrew Marc Greene Category codes. Normally, TEX distinguishes be- leaving this next token in the input stream. The tween sixteen categories of characters. To avoid word is returned in the macro \word. unwanted side effects, the BASIX interpreter reas- The peculiar way \scan operates gives rise to signs category codes of all non-letters to category new problems, however. We can't say 12, "other". One undocumented feature of TJ$ is that it will never let you strand yourself without because \scan looks at the tokens which follow an "active" character (which is normally the back- it, which in this case are \lineno=\word. We slash); if you try to reassign the category of your need some way to define the goto command so only active character (with, e.g., \catcode'\\=12), that the \scan is at the end of the macro; this it will fail silently. To allow us to use every typeable will take the next tokens from the input stream. character in BASIX, we first make \catcode31=0, We therefore have \scan "return" to its caller by which then allows us to reassign the catcode of the breaking the caller into two parts: the first part ends backslash without stranding ourselves. (Of course, with \scan and the second part contains the code the BASIX interpreter provides an escape back to which should follow. The second macro is stored TEX which restores the normal category codes.) in \afterscan, and \scan ends with \afterscan. We also change the category code of --M, the As syntactic sugar, \after has been defined as end-of-line character, to "active". This lets us \let\afterscan. This allows \Cgoto to be coded detect end-of-line errors that may crop up. as Semantics Words. A word is a collection of one or more (Actually, the goto code is slightly more compli- characters that meet requirements based on the cated than this; but the scanner is the important first character. The following table describes these point here.) This trick is used throughout the rules using regular expression n~tation.~These BASIX interpreter to read in the next tokens from rules are: the user's input without interrupting the expansion of the W macros that comprise the interpreter. First Regular Character Expression Meaning Expressions. An expression is a sequence of words [A-Z, a-zl [A-Z, a-z] [A-Z, a-z ,0-9]*$? Identifier that, roughly speaking, alternates between val- LO-91 CO-91+ Integer ues and operators. Values fall into one of three I8 V! [^Ill* [ll , -'MI String categories: literals, identifiers, and parenthesized Other Symbol expressions. An operator is one of less-than, more- An end-of-line at the end of a string literal is than, equality, addition, subtraction, multiplication, converted to a I' . division, or reference. (Reference is an implicit op- Everything in BASIX is one of these four types. erator that is inserted between a function identifier Line numbers are integer literals, and both variables and its parameters.) and commands are identifiers. Expressions are evaluated in an approach sim- The \scan macro is used in BASIX to read ilar to that used in the scanner. A word is the next word. It uses \futurelet to look at the scanned using \scan and its type is determined. next token and determine whether it should be part "Left-hand" values are stored for relatives, addi- of the current word; ie., whether it matches the tives, multiplicatives, and references. Using W's regular expression of the current type. If so, then it grouping operations the evaluator is reentrant, per- is read in and appended; otherwise \scan returns, mitting parenthesized expressions to be recursively evaluated. In order to achieve a functionality similar to that of \futurelet, we exit the evaluator by In this notation, [A-Z, a-zl means "any char- \expandafter\aftereval\word, where the macro acter falling between A and Z or between a and z, \aftereval is analogous to \afterscan. Since inclusive." An asterisk means "repeat the preceding \word will contain neither macros nor tokens whose specification as many times as needed, or never." A category codes need changing, this is as good as plus means "repeat the preceeding specification as \f uturelet. many times as needed, at least once." A question mark means "repeat the preceeding specification zero or one times." A dot means "any character." 382 TUGboat, Volume 11 (1990), No. 3-Proceedings of the 1990 Annual Meeting BASIX - An Interpreter Written in Structure of the Interpreter in a let statement must be =). \parseline has already been described. The file basix.tex, which appears as an appendix to this paper. defines all the macros that are Syntactic scanner. This is the section containing needed to run the interpreter. The last line of this \scan and its support macros, which are described file is \endeval, which is usually called when the above. interpreter has finished evaluating a line. In this Type tests. These routines take an argument and case, it calls \enduserline, which, in turn. calls determine whether it is an identifier, a string \parseline. variable, a string literal, an integer literal, a macro, The \parseline macro is part of the Program or a digit. The normal way of calling these routines Parser section of the interpreter. It starts by calling is \scan to get the first word of the next line. If \word is an integer literal. it is treated as a line number and the rest of the input line is stored in These predicates expand into either tt (true) or tf the appropriate variable without interpretation. If (false). Syntactic sugar is provided in the form of \word is not an integer literal, it is treated as a \itstrueand\itsfalse. \ifstring\worddoesn't command. work because of the way matches \if and \f i Each command is treated with an ad hoe tokens - only \if-style primitives are recognized.