Compiler Tools in APL

Compiler Tools in APL Robert Bernecky Gert Osterburg Snake Island Research Inc Technoma GmbH 18 Fifth Street, Ward’s Island Kurhessenstr. 14 Toronto, Ontario M5J 2B9 Frankfurt/M Canada Germany +1 416 203-0854 email: (IPSHARP) technoma@ipsaint [email protected] Keywords many common programming problems, especially those which are inherently or apparently iterative. Compilers, parsers, finite state automata, FSM, performance. 1 Introduction Abstract APL is a powerful interactive language which of- fers primitives to assist in creation of new pro- We present the design and implementation of APL grams. Among these are execute (â), which executes Intrinsic Functions for a Finite State Machine (also APL sentences, and fix (Ìfx), which creates APL known as a Finite State Automaton) which recog- functions from character tables. Given such primi- nizes regular languages, and a Parser which recog- tives, why would an APL programmer want compiler nizes a subset of context free languages, including tools? SLR(1), LALR(1), and LR(1). These are currently Several reasons for wanting additional tools are: being used on a commercial APL mainframe system Many applications allow users to specify infor- as part of a large real-time financial trading system, mation in the form of statements expressed not where they are part of a compiler which translates in APL syntax, but in a form closer to a nat- dealer specification statements into APL functions. ural language, or in a form closer to unnatural Use of these Intrinsic Functions more than doubled languages such as the more common program- the performance of the compiler. ming languages. Those statements must be an- In addition to making a significant performance alyzed for syntactic correctness and semantic improvement for a large production system, we show content according to their rules, which differ these functions to have value in effectively solving from the rules of APL. Such analysis typically This paper originally appeared in the APL92 Conference requires iteration, an area in which APL inter- Proceedings. [BO92] preters have traditionally performed poorly. 1 Many problems which are inherently iterative, PTgenerator A Parser Table generator, or which are not obviously non-iterative, are which takes definitions of a grammar and gram- amenable to easy and fast solutions using com- mar type, and generates a Parser Table. piler tool assists. FSM A Finite State Machine (or Finite State Au- Manually constructed syntax analyzers and tomaton) which takes an FSM table and source parsers are less likely to be fault-free than those text, and produces a list of machine states and constructed with the aid of already debugged partitions, which can be used to extract Tokens compiler tools. Hence, code reliability is im- from the text, and to perform certain classes of proved by the use of extant tools. validation on the text. Although a complete calculus of compiler tools is PARSER A Parsing function, which takes a beyond the scope of this paper, we have designed and Parser Table and Tokens, and produces a Parse implemented a subset of such tools, which have done Tree. an excellent job in practice on a number of applications. Interpreter An interpreter or compiler function, which takes APL code fragments and a Parse Tree, and produces APL programs, or 2 A Calculus of Compiler Tools directly interprets the desired expressions. Syntax-directed program development is an impor- These tools provide a compiler or interpreter gen- tant area of computer science: Considerable re- erating device which is comparable to Lex and Yacc: search has resulted in the availability of a number of compiler-writing aids such as LEX and YACC FSMgenerator and FSM could be useful in [Joh86, LS86]. These tools have turned compiler regular pattern matching. SHARP APL’s LO- writing from an art into a science, making creation GOS, [AGDH86] for example, uses the termi- of simple compilers an undergraduate term project. nology, but has implemented only a very sim- The problems faced by APL application program- ple subset of regular expressions. With these mers are no different from those faced by program- functions, full regular expressions could be ex- mers working in other languages. Having efficient pressed quite efficiently. compiler tools available in APL expands the power of APL in quite unexpected ways, as will be shown FSMgenerator and PTgenerator would below. make up a parser generator. That is, they gen- What sort of tools are required for a calculus of erate Tables which, together with the FSM and compiler tools? Consider the following set of func- PARSER, constitute a proper parser. tions which deal with syntax-directed program development: Detailed discussion of the entire set of compiler tools is beyond the scope of this paper, so we re- FSMgenerator A Finite State Machine gen- strict discussion to the FSMgenerator, PTgen- erator which takes definitions expressed as reg- erator,andtheFSM. A later section deals with ular expressions, and generates an FSM table. the definitions of these tools. 2 Token Example Regular Expression Regular expressions are thus seen to be valuable nat 123 DIG * both in scanning for tokens in a traditional program- integer 123 -123 (!+ !-) nat ming language, or in the validation of statements in real 1.3 -1.3 (!+ !-) ( ( nat *) !. nat + a custom-designed end-user oriented command lan- nat !. (nat *) ) guage. expreal 1.3e3 real !e integer names abc a123 ALPH (ALPH + DIG) * 4 Compiler Tool Definitions 4.1 The FSM Generator Definition currency USD DM (!U !S !D) + (!D !M) The Finite State Machine (also known as a Finite Figure 1: Regular expressions for data validation State Automaton (FSA) or FSM) generator function is written in ISO standard APL, except for the use Regular Pattern recognized (p,q are of Ìsignal to signal errors during argument val- Expression instances of patterns) idation. Performance of the generator itself is usu- !x the character x ally not critical, because a generator is typically used ! the empty string only once, while its results are used many times. pq pfollowedbyq The duty of FSMgenerator is to take gram- p+q porq mar definitions expressed as regular expressions, and p* zero or more p produce a table which can be used as the left argu- Figure 2: Regular expression definition ment to the FSM Intrinsic Function. In working with typical grammars, the effort involved to create a finite state machine definition by hand can be exces- 3 Regular Expressions sive. The FSMgenerator automates this process. Proving that a finite state machine definition is Regular expressions are commonly used in compiler correct is also difficult. The use of a function to cre- design as a way to compactly describe the legal con- ate these definitions ensures that they are correct if structs of a grammar. If we wish to construct a gram- the grammar is correct. mar which accepts numbers, characters, and curren- cies, we might construct regular expressions to de- 4.2 The PTgenerator Definition fine such a grammar. These allow precise definitions in a general way. See Figure 1 for an example of The PTgenerator function takes as argument a set of regular expressions for defining tokens. grammar rules as shown in the second part of Figure In fact, any pattern which can be derived from the 7. Rules are given by: recursive definition in Figure 2 can be recognized by Nonterminal ! a Finite State Machine. List-of-Terminals-and-Nonterminals As noted earlier, the reliability of grammars based on regular expressions and compiled by a generator Terminals are identified by a leading ’$’ and are are more likely to be correct than those constructed assumed to be defined via regular expressions. After manually in an ad hoc manner. performing some plausibility checks such as: 3 ! does each Nonterminal occuring right of ’ ’ which are not explicitly indicated in this row are as- also occur left of at least one rule? sumed to appear in the last column of the row. The remaining rows of the FSM table are next-state val- does each Nonterminal on the left occur also on ues, selected by the current state of the FSM (initially the right side (except for the first one, which is 1), and the column indicated by the next character in considered to be the starting Nonterminal). the argument. The rows are one-origin indices into the rows of the FSM table excluding the first row (or, The parser starts constructing the parsing tables zero-origin indices into the rows including the first according to the type (slr in Figure 7) of the gram- row). mar. For algorithmic details we refer to Aho, et al Legal values in these rows are ¢1 or legal one- [ASU86]. origin indices into the FSM table after the first row The result of PTgenerator is either the GOTO and has been dropped. Zeros and values greater than Actiontables(asshowninFigure8)oranerrormes- 1ÙÒFSMtable cause domain error. Other positive sage indicating the reason why the construction of values indicate the row of the FSMtable to be used the parsing table failed. as the next FSM state. In the SWAP application discussed later, an Negative one indicates an illegal transition. This SLR(1) grammar is used. A LALR(1) generator also causes the previously generated result element to be exists, but it runs fairly slowly. LALR(1) is also the negated, marking the end of a partition of legal char- technique used by Yacc. acters. The FSM resets to the initial state, and res- cans the offending character from that state. A fail- 4.3 The FSM Definition ure in this state causes negative one to be stored in the current result element, and the machine restarts A Finite State Machine derives its name from the fact on the next character, starting a new partition.

Load more