<<

4 ,

360 GUIDE CABAL CABAL DESCRIPTION

TM CCU- 51 " Author Mary Shaw and Janet Fierst Carnegie Institute of Technology Technical Editor

Computation Center Release Approval

Distribution 360 CABAL mailing list TECHNICAL MEMO Date January 27 , 1967 2 Replaces C Notes No. 2 and No. 3 Supplements 360 REFERENCE GUIDE Addends Page

Page 1

FORMAL -DESCRIPTIVE SYSTEMS

This is a preliminary version of a projected comparison of systems with " abilities. The intent here is to exhibit the formalisms compiler-descriptive to examine the black boxes rather than their contents. T have aimed for de- scription rather than evaluation, for it seems desirable to work with the formalisms for a while before attempting evaluation.

Taking last things first, I remark that the Bibliography contains rather than less than it should -- the aim was for completeness, even the - Appendix shoul pense of including material which is not really too useful. This be fairly complete up to November, 1965 as a compendium of existing or specified compiler . Comments and additions in this area will be particularly appreciated. ',

Mary Shaw Porter Hall 18D Ext. 44 Appendix contains design criteria for compiler compilers, which were developed by the CABAL group through study of the compiler compilers noted here and through personal soul searching. Again, we have tried to put in too much rather than too little. A later note is planned to describe how CABAL meets (or does not meet) these criteria. Janet Fierst, Mary Shaw, Rick Dove " Porter Hall 18N, Ext. 54

REFERENCE, 31 4 CABAL - 3 - 2 360 REFERENCE GUIDE CABAL DESCRIPTION " Algol, n. a star of the second magnitude in the constellation Perseus. It is re- markable for its variability, which is due to periodic eclipse by a fainter stellar companion.

The American College Dictionary

The process of algorithmic problem-solving with a computer may normally be regarded as independent of the particular machine on which the computation is to be performed. This was acknowledged early in the history of computing by the development of so-called machine-independent or problem-oriented languages. Many different languages and dialects were developed, most of them directed at particular applications or particular machines, and each of them consuming vast amounts of man and machine . Indeed, a 1962 survey (International Standards Organization, 1963) showed well over 300 mpilers. " Because of the effort involved in implementing a new escribed by Pratt (1965) as the implementation bottleneck), little experi- mentation with languages (e.g., special-purpose languages) was undertaken, and most "new" languages were general-purpose translators differing from their decessors in only a few features. The first efforts toward standardization were made by manufacturers and s' groups; the most notable of these was and all its many dialects. ere have been three main lines of attack on the language multiplicity problem

(1) The universal algorithmic language. There has been, in recent years, a trend toward fewer and more common languages in particular fields. ALGOL and FORTRAN in scientific areas and COBOL ( to some extent) in commercial fields have become acceptable common languages. This legislative approach may hold down the proliferation, but it also tends to discourage experimentation and freezes the available language forms. In addition to lacking applicability to all present problems, a standardized language specified at any particular time " 360 REFERENCE GUIDE CABAL - 3 - 3 CABAL DESCRIPTION

" cannot provide for the language requirements resulting from advances in computer technology or in the problem areas themselves.

(2) The universal machine-oriented language. In 1958 the SHARE organization proposed a universal intermediary language (UNCOL), to be designed such that any problem-oriented language could be transformed to UNCOL, and UNCOL could be transformed to any machine i language. JOVIAL was implemented in this spirit. Just as a "universal algorithmic language leads to inflexibilityat the problem level, so a "universal" machine-oriented language lends itself poorly to trans- lation to a large number of computers with wildly varying instruction and data formats and instruction sets

(3) The compiler compiler. An increasing interest has been shown in programming systems which accept as input not only a description of an in some language, but also a description of that source language and a specification of the target or object language, which might be the machine code for the machine on which the algorithm " is to be executed. The problem which arises in this case is one of description: formalisms must be constructed to describe both the source and target languages.

Metcalfe (1964:3) summarized the possible solutions to the proliferation problem: For M machines and N languages,

(a) No standards require M X N compilers for completeness; (b) Standard programming language (N = 1 ) requires M compilers, one for each machine; (c) Standard machine language (M = 1 ) requires N compilers, one for each language; (d) Standard programming language (N = 1 ) and standard machine language (M = 1 ) requires 1 compiler; (e) UNCOL requires M+ N partial compilers; (f) Compiler compiler requires 1 compiler plus M + N or M x N language specifications, depending on whether the source and target languages are specified independently. " Of these possibilities, the compiler compiler is the most promising. CABAL - 3 - 4 360 REFERENCE GUIDE CABAL DESCRIPTION

Consider now only systems where the target language is either machine code or . Iliffe (I960) noted that an actual process of " translation from a formula language to sequential code consists of three parts:

(1) An initial equivalence transformation of the formula; (2) The translation into sequential code; (3) A final equivalence transformation of the sequential code.

These steps may be associated with, respectively, syntax, semantics, and optimization/assembly/relocation. A meta-compiler should, above all, contain a convenient facility for describing both source and target languages; the descriptions should, moreover, be independent. It should be possible for any interested and informed to understand the meaning of the source language being defined; the distinction between the syntax of the source language, the associated semantics, and the actual generation of code should be clean. The meta-language of the compiler compiler should be machine-independ- ent, but in the absence of a good formalism for machine description (and quite possibly in the presence of such description) the meta-language should not prevent either the language designer or the language user from getting at the " machine directly. In the interest of generality, the meta-language should permit the language specifications to describe data structures (and should provide a data-descriptive facility), For the sake of flexibility, the corn- piler should provide control over the form of the output code. For the sake of production users, the compiler compiler should provide for both local and global optimization of the object code. For the convenience of the sophisti- cated user, the meta-language should permit modification of the source language by an individual program. For the sake of reproduction (as well as aesthet- ics) the meta-language should be capable of describing itself. Error detection and recovery procedures should be available in the compiler compiler and also expressible in the source language -- it may be desirable for the compiler compiler to check on the consistency of the source language as well as on the syntactic validity of its description. The most important systems with compiler-descriptive abilities are de- scribed below. A number of other systems are briefly described in Appendix A; a set of design criteria for compiler compilers is given in Appendix B; Ap- pendix C is an extensive bibliography. " 360 REFERENCE GUIDE CABAL - 3 - 5 CABAL DESCRIPTION " Brooker and Morris

In the early part of this decade Brooker and Morris described in sev- eral papers (1960A, 19608, 1961, 1962, 1963) a compiler-building system which they have developed for the Ferranti Atlas. (The papers were drawn together in a single discussion by Rosen (1964)). In the discussion above the compila- tion process was segmented into three phases: Brooker and Morris acknowledge the existence of formal descriptions of source languages and for their syntactic analysis (the first phase), and concentrate on the second, semantic interpretation. Semantic analysis requires generators which take action when a source statement is parsed; the authors propose a system of format routines associated with the particular source statement forms of each language, and provide a language in which the format routines may con- veniently be written. This language is described in the same formal terms as the source language and is converted into tables by the same service rou- tines; it thus provides a set of basic formats which may be used to build up " a more complex system. The format routines may be regarded as macro-generators, and a source language statement as a series of calls on appropriate elementary routines. The system proper contains a basic structure consisting of a number of basic instruction formats interpreted directly as format routines to handle housekeeping functions such as system sequencing and table manipulation. A compiler is built up on this structure by adding format routines, each a list of statements in formats already in the system, in a macro-building fashion. These new formats define classes of phrases, source statement formats, and further intermediate formats; the code to be generated is implicit in these routines. With enough source statement formats added, the system will act as a compiler for the language so described.

Syntax: A set of elementary symbols is recognized by the system; the

set may be extended to include class identifiers, of elementary symbols enclosed in square brackets. A phrase is a string of elementary symbols or class identifiers; a phrase class is defined in a form similar to BNF. Nota- tion is available to indicate indefinite replication and optional occurrence. " is similar to a phrase class, with the distinction that each A format class CABAL - 3 - 6 360 REFERENCE GUIDE CABAL DESCRIPTION " format in a format class has an associated format routine. The format rou- tines and associated source statement formats are the generators which actu- ally control translation and other compiling operations. Certain basic state- ment formats for manipulation of numbers, temporary storage, indices, and data structure at the translation level are built into the system; other auxiliary statement formats may be written using the basic statements and previously de- fined auxiliary statements. The format routines themselves are written in the language defined by the basic and auxiliary statements. The system philosophy is consistent with a set of basic formats, since this allows more flexibilityin defining auxiliary and source statement formats. The format routines typically look much like an ordinary algebraic language, but they are executed interpretively as a series of macro calls. The interpretive execution allows a general parameter substitution scheme, but speed considerations may it desirable to enlarge the set of basic state- ment formats.

CGS " Computer Associates had (Spring, 1964) a general-purpose table-driven compiler (Compiler Generator System) in operation on an IBM 7090, a Burroughs D-825, and a CDC 1604. This system was planned for the generation of effi- cient object code, and provision was therefore made for program optimization at several levels. Warshall and Shapiro (1964) argue that each of optimi zation has a preferred domain, and provide explicit facilities for transform- ing the program structure at each stage of compilation. The compiler is into five phases, described by the authors:

1. A syntactic analyzer which converts a piece of input string into a tree-representation of its syntax.

2. A generator, which transforms the tree into a sequence of n-addre macro-instructions, investigating syntactic context to decide the emission.

3. An "in-sequence optimizer" which accumulates macros, recognizes and eliminates the redundant computation of common subexpressions, moves invariant computations out of loops, and assigns quantities " 360 REFERENCE GUIDE CABAL - 3 7 CABAL DESCRIPTION

" to special registers. 4. A code selector which transforms macros into syllables of machine code, keeping complete track of what is in special registers at each stage of the computation.

5. An assembler which simply glues together the code syllables In whatever form is required by the system with which the compiler is to live: symbolic, absolute, or relocatable, with or without symbol tables, etc.

The first four phases are encoded as general-purpose programs; the fifth has been handled as a special-purpose job in each version of the compiler.

The analyzer, generator, and code selector need tables to determine their actions; it is these tables which determine the source and target languages with which the system is working at any given moment. A group of three lan- guages was developed, one for each of the tables. All the languages resemble Backus notation and hence can be analyzed by the analyzer (first phase). " 1 . BNF The syntax of the source language is described in a language which may be identified with Backus-Naur Form to within the limits imposed by the available character set.

2. GSL During the second phase of translation, a generator algorithm walks through the syntax tree; the General Strategy Language GSL provides a set of rules which identify types of nodes and actions to be performed when they are encountered. A statement in GSL begins with a predicate of the form: <5-.(t. ) AND (t ) , IF AND { AND ... ©n n

where the Q are Boolean expressions evaluated for truth or falsehood at the nodes i program on the tree specified by t . . The predicate is followed by a short of imperative commands: proceed to another node or take action of some . A curious characteristic of GSL is that the action taken is in part a function of the number of times that the generator has already been executed.

3. MDL Code selection is controlled by the Macro Description Language MDL and particularly by its subset CSL (Code Selection Language). CSL per- " mits the translator designer to choose the code realized from a macro on the CABAL - 3 - 8 360 REFERENCE GUIDE CABAL DESCRIPTION

basis of the current status of other macros and the available machine reg- " isters.

Warshall and Shapiro's (1964:65) final warning should be taken to heart:

The bootstrap method does not make compiler construction trivial, since code selection for a messy machine can be very difficult to work out and since contact with the data and control environment of the code being compiled may be more expensive than the translation process. What is now trivial is the substantial modification of source syntax, minor changes in optimization rules, and the like. tmg/trol

McClure (1965) constructed, at Texas Instruments, a system called TMG which Knuth has used as a basis for his Translator-Oriented Language TROL. TMG was used for a recent implementation of a subset of PL/I at Tech (Datamation, October 1965, p. 19); Knuth (n.d.) uses TROL and its associated interpreter TROLL to illustrate techniques of compiler construction. A program consists (in addition to declarations) of sets of lines, each containing a syntactic specification and a semantic specification to be exe- " cuted if the syntactic specification is fulfilled. The syntactic and seman- tic parts of the line are separated by a semicolon, and the specifications are set forth in different languages. A line contains three fields: a label, a syntactic part, and a semantic part. The syntactic part consists of a series of actions, each with up to two specified labels. Each action may be the label of some line (implying evaluation of that line) or one of a group of built-in tests or actions, in- cluding such things as testing for a letter or digit, repositioning the point er to the input string, or taking an excursion into the semantics language. Each of the built-in actions (and, indirectly, each called line) returns a truth value of true, false, or fail; each syntactic part of a statement interprets the truth values in accordance with the evaluation scheme:

At the beginning of a line, or the end of an action, if no more actions are present, execute the semantic part of the statement; If an action remains, perform it — this produces a result from the set [true, false, fail}. Now, if two labels are specified " 360 REFERENCE GUIDE CABAL - 3 - 9 CABAL DESCRIPTION

" with the action, then go to the first if the result was 'false', to the second if the result was 'fail', or to the next action (or the semantic part) if the result was 'true'. If, however, the result was 'fail' and the second label was omitted, or if the result was 'false' and neither label is provided, reset the input string pointer to its location upon entry to the current line and exit (to the point which called it) without changing the value of the result.

Of the basic actions available to the syntactic part, the tests will in general return one of the values {true, false}, depending on the value of a Boolean expression; an action which simply manipulates pointers or alters tables (unconditional actions) will generally return 'true'; and the only basic action which will return 'fail' is FAIL. Thus the 'fail' exit may be used to escape further tree searches when an error has definitely been detect- . Two other characteristics of the syntax to note are the clarity with which recursion shows in the notation and the necessity of explicitly manipu- " lating the input-string pointer to access characters not at the top of the I stack. For the semantic part, the usual assortment of statements for assignment of value, conditional execution, data-manipulation, transfer of control, and input/output are available. Object code is a particular type of output; for this reason the system appears to be directed at a symbolic rather than an absolute out- put form. Note also that a convenient facility is lacking, but that the transfers in the semantics parts may reference labels in either the syntactic or semantic parts of other lines.

TRANGEN

The TRANGEN (TRANslator GENerator) system has been developed for the IBM 7094 by Computer Associates. It is designed to produce translator pro- grams executable on a wide variety of machines. The system is the combina- tion of an executive program (TRANGEN) which executes from driving tables (TRANTAB) described in a particular language (TRANDIR). " Processing is organized around a set of tables, stacks, and lists. The CABAL - 3 - 10 360 REFERENCE GUIDE CABAL DESCRIPTION

most important of these are a table which is used as a general communications area, the symbol table, a general-purpose pushdown stack, a stack for " symbols undergoing syntactic analysis, and a list to contain the macros produced as a result of the analysis. Information passing through the system is collected into lexical units (number, symbol, terminal character, etc.) from the input tape and placed in the general communications area and, if appropriate, into the symbol table or another directory. If possible, the symbol is then combined with one or more elements from the semantic analysis stack (which are then "popped") and converted into a new entry on the macro list. If a combination cannot be made, the symbol is "pushed" onto the semantic stack. Elements on the macro list are periodically examined in a similar manner: the stack is examined for patterns, which are then subjected to optimization. When all desired optimization is complete, the macros are output to an assembly system.

Syntax Programs are written in tabular form; after a declaration and an initialization block, statements are written in three fields: a label field, a pattern , and an action (semantic) statement list. The pattern test resembles Floyd (1961) productions in that it graphically represents the " pattern being sought; the TRANDIR language also allows specification of the area (general communications area, semantic stack, macro list) where the pattern is to be found and the attributes which are to be tested for in each component. Action statements are operational functions which specify the manipulations required to effect translation. They are formulated algebraically, providing for assignment, exchange of values, conditional evaluation, and an assortment of standard subroutine- type operations.

FSL

A Formal Semantics Language (FSL) was developed by Feldman (1964) and implemented on the CDC G-21 at Carnegie Tech. FSL was designed to state in fixed terms the meaning of a statement in any of a large class of program- ming languages. It was combined with a version of Floyd (1961) productions to form a compiler compiler. " 360 REFERENCE GUIDE CABAL - 3 - 11 CABAL DESCRIPTION

" The syntax is described in a production language, which is based on a recognizer with a single push-down stack. As each character is scanned, it is placed on top of the stack, and the first few elements are scanned by the productions which describe the source language. When a production is found whose pattern matches the characters at the top of the stack, certain actions are performed. The production statements are formatted in five fields:

1. A label

2. A "picture" of a stack configuration. This is the pattern which is tested against the stack. The rest of the production is executed if a match occurs; the next production in sequence is tested if it does not.

3. A second "picture" of a stack configuration (optional). If this field is non-empty and a match occurs at 2, the stack is transformed to this pattern at the end of execution of this pro- " duction. 4. An indication of semantic action to be taken: this may call for execution of a semantic routine, an error indication, or termination of compilation.

5. The label of the next production to be compared to the stack, along with (optionally) a request for the next character of in- put to go to the stack.

The semantic routines are written in the formalism FSL and are physically separated from the productions. An FSL program consists of two sections, a declaration part and the main body. The declaration part describes the stor- age to be used by the translator, which may include cells, constants, tables, and stacks. The body of the program consists of semantic descriptions of individual constructs of the source language. The basic unit is a labeled statement or sentence; the label forms the link to the production language. A clean distinction is made between actions performed at compile time and ac- tions performed at run time: run-time actions are enclosed in the code brack- " ets 'CODEC on the left and ')' on the right. CABAL - 3 - 12 360 REFERENCE GUIDE CABAL DESCRIPTION

statement types of FSL include most of the common constructs of " The programming languages. A statement consists of a series of commands, each of which may be

1. An assignment statement.

2. An operation on an element of the data structure. For example, both destructive and non-destructive reads and writes to stacks are permitted, and individual fields of tables may be manipulated.

3. A transfer of control, either as a simple unconditional jump or as a subroutine call

4. A call on a system macro For example, convenient notation for setting up floating addresses is provided

5. A conditional statement, executing any of the above if the value of a Boolean expression is 'true'. 6. Almost any of the above enclosed in CODE brackets, to indicate " run- time execution.

" 360 REFERENCE GUIDE CABAL - 3 - 13 CABAL DESCRIPTION

APPENDIX A " 's Who In Compiler-Descriptive Languages

This is a compendium of the systems with some meta-compiling ability which have come to light in the course of this study. Some are flexible systems, as described in the text; others are little more than powerful macro-assembly systems. Some languages have been omitted: it is of little interest to note that assembly languages are commonly used for writing com- pilers. In addition, ALGOL and FORTRAN have both been used to implement other compilers, but neither is listed here as a compiler-descriptive lan- guage. A number of table-driven syntactic analyzers have also been omitted where there was no evidence of a formalism for machine-construction of the tables.

1. AMOS-II Pratt (1965)

AMOS was developed for a Ph.D. thesis at the University of Texas, where it is running on a CDC 1604. Pratt's primary interest is in " generalized data structures; he uses a four-address machine-independent formalism, where each meta-instruction gives an identifier, the left and right halves of a pattern to be matched, and an interpretation to be exe- cuted if a match occurs.

2. Brooker and Morris (several papers) See also Rosen (1964)

The same formalism is used to describe the source language and to extend the descriptor language. See discussion in text.

3. CGS Warshall and Shapiro (1964)

Warshall and Shapiro constructed a "general-purpose" table-driven 1964) running JOVIAL, L (formula language compiler which was (Spring, Q for CL-1), and CXA (a dialect of BALGOL) on the IBM 7090, Burroughs D-825 > and the CDC 1604. Compilation was divided into five phases requiring three sets of driving tables; three corresponding languages resembling BNF were devised. This system has also been reported under the names " "Bootstrap Method" and "COMPASS techniques". See discussion in text. CABAL, - 3 - 14 360 REFERENCE GUIDE CABAL DESCRIPTION " 4. COGENT Reynolds (1964, 1965)

An initial version of COGENT (Compiler and GENeralized Translator) is running on the CDC 3600 at Argonne National Laboratory. The major objective of the system was "to unify the concept of syntax-directed compilation with the more general but primitive concept of recursive list-processing" (Reynolds 1965:422). Productions in the spirit of BNF define the syntax; generator definitions are list-processing sub- routines used to manipulate the structures representing the source program.

5. FSL Feldman (1964)

FSL (Formal Semantics Language) is a system operating at Carnegie Tech. Floyd's (1961) productions are used to describe syntax; FSL is designed to handle semantics. See discussion in text.

6. GARGOYLE Garwick (1963a, 1963b, 1964)

The GARGOYLE language is designed to show the structure of syntax- tree analysis; recursion and back- tracking after faulty analysis show " clearly in the syntax instructions. The meta-language consists of four- address conditional statements displayed in tabular form; the fields are named else, conditional action, next, and link. They may be read "if is true then perform and go to ; otherwise go to ." GMGOYLE -appears to be similar to FSL, but somewhat more awkward because the semantic language is weaker and the

syntax does not permit stack comparisons.

7. JOVIAL Shaw (1963a, 1963b)

JOVIAL was designed and implemented by the System Development Corpora- tion as a corporate standard for military command and control problems. By 1963, JOVIAL was operating on: the IBM 7090, IBM AN/FSQ-31V, IBM AN/FSQ-32 IBM AN/FSQ-7, Philco 2000, and CDC 1604. JOVIAL compilers are written and maintained almost entirely in JOVIAL; the systems were developed along UNCOL lines, with all generators translating to the intermediate language IL and a translator for each machine to take IL to machine language. " 360 REFERENCE GUIDE CABAL - 3 15 CABAL DESCRIPTION

" 8. MAD Arden et al. (1963) The MAD (Michigan Algorithm Decoder) compiler is primarily an alge- braic translator based on the ALGOL-58 proposal. It is included here because of its facility for programmer definition of operators and oper- ands at the time of compilation. The definitions are included as declara- tions in the source program and are written in a pseudo-code closely re- sembling the machine language of the target machine (IBM 709/90).

9. META -II Schorre (1964) META - 3 Schneider and Johnson (1964) The META compilers (META-3 is an extension of META-II) are top-down syntax-directed compilers running on an IBM 1401 at UCLA. An input lan- guage resembling BNF with code-generating clauses embedded is reduced to a series of tests and references to external routines. The embedding of the code-generating clauses in the syntax specification makes the meta- language difficult to read. " 10. Metcalfe (1963) Using a model based on mechanical linguistics, Metcalfe has described I a system for small binary machines. The syntax descriptions are set up in modified BNF, with object code embedded; this is converted to a one- address internal representation which appears to be a stream of macro calls.

11. NELIAC Halstead (1962, 1963), Huskey (1960, 1962a, 1962b, 1963)

NELIAC was developed in 1958 at the Navy Electronics Laboratory. By 1963, at least 18 versions were either running or in process. All NELIAC compilers are written in NELIAC, most are short and simple. The language is intended to be simple and easy to learn (Halstead (1963:92) claims one week is adequate); this apparently contributes materially to the ease of implementation.

12. SLANG Sibley (1961)

Sibley developed a language patterned after Algol 58, but did not complete an implementation. This is of interest chiefly because of the " discussion of the type of information which should be included in a CABAL - 3 - 16 360 REFERENCE GUIDE CABAL DESCRIPTION

machine description for a compiler compiler " 13. TMG McClure (1965)

The TMG system is available on the IBM 7040, IBM 7090, and CDC 1604. A subset of PL/l has been written in TMG at Cal Tech (Datamation, October 1965, p. 19). Knuth's TROL language is a refinement of TMG; see descrip- tion in text.

14. TRANGEN Cheatham et al (1964)

The TRANGEN (TRANslator GENerator) system was designed by Computer Associates for an IBM 7094. The language is formulated tabularly, with 1 a production- like language for syntax and an algebraic form for semantics. An explicit test on patterns is used to identify syntactic constructions, and subroutine calls on standard routines are used to handle semantics. See discussion in text.

15. TROL/TROLL Knuth Knuth devised TROL (TRanslator Oriented Language) and its associated " interpreter TROLL to illustrate techniques of compiler generation. It was inspired by McClure 's TMG; see discussion in text.

16. UNCOL SHARE (1958a, 1958b)

UNCOL (UNiversal Computer Oriented Language) was devised in 1958 by a SHARE committee as an attempt to provide a common intermediate language as a target language for formula- language translators and an object lan- guage for a certain set of machine assembly programs. The project apparently bogged down in specification difficulties, and JOVIAL seems to be the only significant language to arise from it.

17. WISP Wilkes (1964a)

WISP is a self-compiling compiler for a list-processing language; in 1964 it was not fully developed, but a preliminary version was written for the EDSAC 2. Work was underway on versions for the Elliot 803 and the IBM 7090. The system is table-driven, but the tables require handmade changes to change machines. The compilation technique involves comparing the input line with standard forms until a match is found, and converting " 360 REFERENCE GUIDE CABAL - 3 - 17 CABAL DESCRIPTION " to a pre-specified code sequence in assembly language. The language did contain a facility allowing a programmer to extend the language by defining new forms.

18 XPOP Halpern (1964)

XPOP is an extended assembly system, providing free-form macros over a standard assembly program. In Fall, 1964, there was apparently one running at Lockheed. The main point of the paper is the availability of prose- form macros and a mechanism for saving them. Other features of interest are compilation-control pseudo-operations to (a) hold off on embedding a segment of code in a program (for example, the bottom of a ); (b) put common code (say, subscript evaluation) in a pro- gram only once; and (c) make translation conditional on expressions evaluated at compile time. "

" 360 REFERENCE GUIDE CABAL - 3 - 18 CABAL DESCRIPTION 360 REFERENCE GUIDE CABAL - 3 - 19 CABAL DESCRIPTION

" APPENDIX B Design Criteria for Compiler Compilers (Previously issued as CABAL Note No. 3)

In this note the CABAL group proposes a number of criteria to serve as guidelines for the design of compiler compilers. We have concentrated on the external form of the compiler-descriptive language and on general charaCT teristics of the compilers and interpreters which are most likely to be writ- ten in it. We have not attempted to set desiderata for the internal structure, except as they relate to external design. Please note that these criteria

represent ideals to strive for in designing a compiler compiler and are not necessarily specifications for the CIT CABAL Language. 2 1 0 The terms C , C , and C are used throughout the paper to refer, respec- -2 tively, to the compiler compiler itself, to a compiler written in C , and to a program written in the source language C . Confusion may arise when a " language and its processor have the same name. In such cases subscripts P and L may be used to refer to the processor and the language respectively. I Thus C_, is a processor written in C ; C is a program written in C ; and 0 0 C might be considered the "language" in which data for C is written. Mention is often made of sublanguages. These are the languages contained in the C system. For example, FSL has two sublanguages: the Formal Production J-i Language and the Formal Semantics Language.

2 I. C Constructs J-i

C should provide a set of constructs complete enough and powerful enough tl to describe most of the current languages. This may prove impractical: con- -2 1 any should sider, for instance, Iverson s Programming Language. In case, CL 2 be at least powerful enough to describe a processor for itself () . " CABAL - 3 - 20 360 REFERENCE GUIDE CABAL DESCRIPTION

2 Ideally, C should also provide a way to describe the target languages L 2 1 a " for C and C . An adequate notation for directly describing machine language has not yet been developed, so the target language "descriptions" will pro- 2 bably be the code generators in C . In any case, it should be possible to 2 1 a C processor on machine X which produces C processors for machine Y where Xis not necessarily the same as Y. In addition, the language should allow a variety of output forms; a language which produces only binary machine code is severely restricted. To make the object machine language accessible to anyone who needs or wants it, and to give the compiler writer as much control over the machine 2 as possible, each sublanguage of C should have an assembler embedded in it. i_i It should not, however, be necessary to use machine language simply to get at special features of a particular machine; these features should be embedded 2 in the C processor for that machine. The systems programmer should have to describe only the compiling algorithm - not its implementation in terms of a particular machine. 2 2 C and C should be designed so that new data types, new operators, J-J IT new statements, and the like can be added to C easily. This will probably " not require much planning except, perhaps, in combining constructs. As an example of this problem, consider in ALGOL the use of a logic variable as a subscript to an array, as in the following program: begin real L ; array A[l:000];

L

ry 360 REFERENCE GUIDE CABAL - 3 - 21 CABAL DESCRIPTION

" Many parts of a compiler can be (and should be) provided in a standard form, thus avoiding the need for specifying essentially the same routine for different C 's. If this standard form may be parameterized, then even more C 's may use it. Since there will undoubtedly be some processors for P 2 which the standard form would be unacceptable, C_ should provide in addition the ability for the systems programmer to replace the standard routines with 2 his own, written either in C or in any other language (as long as the pro- -1 per interface is created for the rest of the C description) . Some routines r which probably should be handled in this way are the following a format language for C input and output 2 1 the compilation listings of programs compiled in C and C

the input routine for Cp (subscan) 1 2 debug information for both C and C 1 2 error messages in C and C (at "compile" and "run time for each system). Both terse and verbose messages should be provided with the user choosing between them. some tvpe of run-error recovery should be provided for " both C° and C 1

11. Notational Consistency

Notation should be concise but readable. This does not propose designing a language for the layman, but rather a notation which a sophisticated programmer need not use constantly to use fluently. Esoteric notation is pointless if it creates obscurities.

"Similar constructs should have similar forms. An example of what is not desirable may be found in FSL where FLADI must be written with no spaces and COMTu,I , which is similar in shape, must be written with a space after the T. 2 Verbs which are used in more than one sublanguage of C should have the same or parallel meanings in the various sublanguages.

Parallel statement structure should be used to describe operations occur- ring at meta-compile, compile, and run times. An example of this for " meta-compile and compile times occurs in FSL where the same expressions are meaningful both inside and outside of code-brackets, with the only CABAL - 3 - 22 360 REFERENCE GUIDE CABAL DESCRIPTION

distinction being the time at which the statement is executed. Extend- ing to run- time gives a way of describing a run- time routine (for ex- " ample, an interpreter). 2 1 0 Parallel data structures should be available in C , C , and C . This 2 1 means that any data forms in C are available for use in C and may 0 1 also be made available to C , if the designer of C so desires. Simi- -. A Jj larly, C should be allowed to contain an embedded assembly language. J-j 2 11 C should be able to produce tables for C to use and C should be able 0 2 to do the same for C . Also, the content of any C data structures of 1 1 1 use to C should be available to C and similarly for C data structures 0 of use to C p .

111. C Languages

2 C should be made useful for describing many different languages. It follows that there should be a minimum of design restrictions placed on C 2 C should provide as many as feasible of the "standard" constructs cur- " rently included in computer language processors; this includes such forms as transfer vectors, address chaining, error detection and recovery, and recursive routines.

Many different data structures should be available to the user: scalars, arrays, trees, graphs, queues, shelves, fields within words.

The systems programmer should be able to add data structures and state 2 , 1 ments to C to make it easier to handle C constructs Jj j— 2 1 Similarly, a system programmer using C should be able to describe a C 11 which permits a program written in C to alter C by adding data, L Ll operators, etc. " 360 REFERENCE GUIDE CABAL - 3 - 23 CABAL DESCRIPTION 2 C should be designed with the realization that language designers TJj may want any of the following equivalence of locations - giving one location two or more names. " This is useful in giving a scalar the same address as a frequently used array element so that the element may be accessed more simply equivalence of symbol strings - assigning a symbol as an abbrevia- tion for a symbol string. For instance A = B + C [4 + SIN(D + E)] or PI = 3.14159

partial word operations

macros

the ability to switch between languages (e.g., ALGOL-WHAT-COMMENT- SYSTEM)

references to the numeric value of constants used in C (e.g., 100 ABCONS) symbolic conversational mode. (The implications of conversational programs are not well understood, but it seems that C need only supply at run-time an index to the program: statement dictionary, 2 " symbol-atribute table, and the like. So C need only supply a way for C to give tables to C .) I hash- addressing in tables

ability to add pre-compiled "library routines" to the program (both in Cp and C J)

ability to have users routines pre-compiled and added to the "library"

ability to precompile entire programs (segments) and link between them

. ability to pass data from segment to segment without one segment having to output data onto a scratch area for the next segment to input

asyncronous execution of statements

» ability to back up input stream

recursion " assembly language embedded in C. CABAL - 3 - 24 360 REFERENCE GUIDE CABAL DESCRIPTION

IV. C Processors " 2 C should not restrict the types of processors which can be written in JL it:

Incremental compiling should be possible. 2 C should allow both single pass and multiple pass compilers. There must be some form of communications between passes. The format of this 2 should be defined by C , with the user being able to specify his own Jj format by feeding parameters to the standard format or by writing his own. In multiple-pass compilers this allows any of the passes to be 2 written in a language other than C J-i 2 Interpreters should be describably in C . Possibly this can be handled by enabling the creation of tables at compiler - or metacompile - time for the use at run- time and also providing a way to describe run- time P- routines in" C . 2 Perhaps C should be capable of producing either a compiler or an equiv- alent interpreter from the same language descriptions. Also, it might " be worthwhile to provide several levels of such interpretation, the low- est level being one which works with the actual machine code and other levels working on an intermediate language (perhaps output from some pas of the corresponding compiler) or on the source code itself. 2 12 C should perform some optimization on the C it produces and C should P P L i provide means for the language designer to optimize the code which C produces.

V. A Concrete Example

2 The previous discussion assumed nothing about the sublanguages of C . Now, 2 assume that C will be made up of the languages SYNTAX in the form of Floyd- Evans Productions, SEMANTICS, much like FSL, and PRAGMATICS, some form of code sequence specification language. Assume also that SYNTAX, SEMANTICS, and "

__j 360 REFERENCE GUIDE CABAL - 3 - 25 CABAL DESCRIPTION

" PRAGRAMTICS can either work alone, using output routines to pass information to a later phase of the processor, or they can work together, as SYNTAX and SEMANTICS do in FSL - call this latter object the S-S-P language. An imple- mentation! of Cp may contain any arrangement of phases written in any of the languages: SYNTAX, SEMANTICS, PRAGMATICS, or S-S-P. The following are additional design criteria based on this form of the system and are mostly a discussion of the differences from FSL: To allow multipass compilers, it will be necessary to add other links between phases besides EXEC and code-brackets. A generalized OUTPUT action in productions and a generalization of the code-brackets in FSL (call them next-pass brackets) should prove sufficient. Similarly, expanded input routines are needed. 2 These communications links should be supplied (with parameters) in C J_ but should be replaceable by the system programmer's routine. Some of the 2 other things which should be supplied in C but which should be replaceable L are

code generators subscan-or parts there of such as the definition of " an identifier storage allocation

" CABAL - 3 - 26 360 REFERENCE GUIDE CABAL DESCRIPTION 360 REFERENCE GUIDE CABAL - 3 27 CABAL DESCRIPTION

APPENDIX C " Bibliography

Reference List for Compiler Compilers Three types of information are indexed: Theory and practice of compiler compilers. This list attempts ** to be complete.

Description of particular programming languages. This list makes ** no attempt to be complete. ** Scattered papers on assorted compilation techniques. These items are noted in passing, and no attempt at completeness has been made

Arden, B. et al. 1963 The Michigan Algorithm Decoder.

Bobrow, Daniel G. and Raphael, Bertram 1964 A Comparison of List-Processing Computer Languages. CACM_7,4; pp. 231-240

Backus, J. W. " 1959 The Syntax and Semantics of the Proposed International Algebraic Language of the Zurich ACM-GAMM Conference. Proceedings, International Conference on Information Processing, UNESCO; pp. 125-132.

Barnett, M. P. and Futrelle, R. P 1962 Syntactic Analysis by Digital Computer. CACM s_; pp. 515-526

Basak, Robert et al. 1962 An Information Algebra. Phase I Report—Language Structure Group of the Codasyl Development Group of the Codasyl Development Committee CACM 5_,4; pp. 190-204.

Bastian, A. L. 1962 A Phrase-Structure Language Translator. Air Force Cambridge Research Labs. Report No. AFCRL-69-549.

Bennett, Richard K. and Neumann, David H. 1964 Extension of Existing Compilers Sophisticated Use of Macros. CACM 2,9; p. 541.

Berman, R. et al. " 1962 Syntactical Charts of COBOL 61. CACM 5; p. 260. CABAL - 3 - 28 360 REFERENCE GUIDE CABAL DESCRIPTION

Book, Erwin and Bratman, Harvey 1960 Using Compilers to Build Compilers. SP-176, System Development Corp. " Bottenbruch, H. 1962 Structure and Use of ALGOL 60. JACM_9; pp. 161-221.

Bratman, Harvey 1961 An Alternate Form of the "UNCOL Diagram". CACM_4,3; p. 142

Brooker, R. A. and Morris, D. 1960A An Assembly Program for a Phrase Structure Language. Comp J3; p. 168

Brooker, R. A. et al. 1963 The Compiler Compiler. Annual Review in Automatic Programming. V.3; p. 229

Brooker, R. A. and Morris, D. 1961 A Description of Mercury Autocode in Terms of a Phrase Structure Language. Annual Review in Automatic Programming. V.2; pp. 29-66

Brooker, R. A. and Morris, D 1962 A General Translation Program for Phrase Structure Languages. JACM 9,1; pp. l-10

Brooker, R. and Morris, D 19608 Some Proposals for the Realization of a Certain Assembly Program. Comp J 3; p. 220. " Brooker, R. et al. 1962 Trees and Routines. Comp J^s; p. 33

Brown, S. A. et al. 1963 A Description of the APT Language. CACM 6; pp. 649-658.

Cantor, D. G. 1962 On the Ambiguity Problem of Backus Systems. JACM 9.; pp. 477-479.

Cheatham, T. E. 1964 The Architecture of Compilers Computer Associates CAD-64-2-R

Cheatham, T. E. and Sattley, Kirk 1964 Syntax Directed Compiling. Proceedings, Spring Joint Computer Confer ence, AFIPS; pp.3l-57.

Cheatham, T. E. et al. 1964 Preliminary Description of the Translator Generator System—II Computer Associates CA-64-1-SD

Conway, Melvin E. 1963 Design of a Separable Transition-Diagram Compiler. CACM j>,7; p.396.

Cunningham, Joseph F. 1963 COBOL. CACM 6,3; p. 79. " 360 REFERENCE GUIDE CABAL CABAL DESCRIPTION

DiForino, A. Caracciolo " 1962 On a Research Project in the Field of Languages for Processor Construction. Proceedings, IFIP 1962; pp. 514-515.

DiForino, A. Caracciolo 1963 Some Remarks on the Syntax of Symbolic Programming Languages CACM 6,8; p. 456

Dijkstra, E. W. 1963 On the Design of Machine Independent Programming Languages. Annual Review in Automatic Programming, V.3; pp. 27-42.

Earley, Jay C 1965 Generating a Recognizer for a BNF Grammar Carnegie Institute of Technology.

Eickel, J. et al. 1963 A Syntax Controlled Generator of Formal Language Processors CACM 6,8; p.451.

Englund, Donald and Clark, Ellen 1961 The Clip Translator. CACM 4, 1 ; p. 19

Evans, Arthur 1964 An ALGOL 60 Compiler. Annual Review in Automatic Programming, V.4; pp.B7-124.

" Feldman, Jerome A. 1964 A Formal Semantics for Computer Oriented Languages Carnegie Institute of Technology.

Floyd, R. W. 1964A Bounded Context Syntactic Analysis. CACM 1; pp. 62-67.

Floyd, R. W. 1961 A Decriptive System for Symbol Manipulation. JACM 8; pp. 579-584

Floyd, R. W. 1962A On Ambiguity in Phrase Structure Languages. CACM J5; pp.526, 534.

Floyd, R. W. 19628 On theNon-Existence of a Phrase Structure Grammar for ALGOL-60. CACM s_; pp. 483-484

Floyd, R. W. 1963 Syntactic Analysis and Operator Precedence. JACM J_o; pp. 316-333.

Floyd, R. W. 19648 The Syntax of Programming Languages—A Survey. lEEE Transactions " On Electronic Computers EC-13, 4; pp. 346-353.

29 CABAL - 3 - 30 360 REFERENCE GUIDE CABAL DESCRIPTION

Gallie, T. 1962 Techniques for Processor Construction. Proceedings, IFIP 1962; pp. 526-527. "

Garwick, Jan V. 1964 Gargoyle, A Language for Compiler Writing. CACM ]_, 1 ; p. 16

Garwick, Jan V. 1963A A Programming Language for Compiler Construction and Symbol Manipulation: Preliminary Report. FFI-MAT Teknisk Notat S-54 (Norwegian Defense Re- search Establishment).

Garwick, Jan V. 19638 A Programming Language for Logical Computer Programmes: Second Pre- liminary Report. FFI-MAT Teknisk Notat S-55 (Norwegian Defense Re- search Establishment).

Glennie, A. E. 1960 On the Syntax Machine and the Construction of a Universal Compiler. Computation Center, Carnegie Institute of Technology; Technical Report No. 2 . Gorn, S. 1963 Detection of Generative Ambiguities in Context-Free Mechanical Languages. JACM I_pj pp. 196-208.

Graham, Robert M. 1964 Bounded Context Translation. Proceedings, Spring Joint Computer Con- " ference, AFIPS; pp. l7-29.

Grau, A. A. 1962 A Translator-Oriented Symbolic Programming Language. JACM 9_,4; pp. 480-487

J. 1962 Symposium on Languages for Processor Construction. Proceedings, IFIP 1962 j pp. 513-517.

Halpern, Mark I. 1964 XPOP: A Meta-Language Without Metaphysics. Proceedings, Fall Joint Computer Conference, AFIPS; pp. 57-68.

Halstead, M. H. 1962 Machine-Independent . Spartan Books, Washington, D.C.

Halstead, M. H 1963 NELIAC. CACM 6_,3; p. 91.

Heising, W. P. 1963 FORTRAN. CACM 6,3; p. 85.

Huskey, Harry D. 1962A Languages for Aiding Compiler Writing. Symbolic Languages in Data Processing; p. 187. "

Green, 360 REFERENCE GUIDE CABAL - 3 - 31 CABAL DESCRIPTION

Huskey, Harry D. 19628 Machine Independence in Compiling. Symbolic Languages in Data " Processing; p. 219

Huskey, Harry D. et al. 1960 NELIAC--A Dialect of ALGOL. CACM _3, 8; pp. 463-468

Huskey, Harry D. et al. 1963 A Syntactic Description of BC NELIAC. CACM 6,7; p. 367

Iliffe, John K. 1960 The Elements of the Genie System. Rice University Computer Project, Programming Memorandum 4.

Ingerman, P. Z. 1963 A Syntax Oriented Compiler. Moore School of Electrical Engineering, University of Pennsylvania

Ingerman, P. Z. 1962 Techniques for Processor Construction. Proceedings, IFIP 1962; pp.527 528

International Standards Organization 1963 Survey of Programming Languages and Processors. CACM_6,3; p. 93

Irons, Edgar T. 1963 The Structure and Use of the Syntax-Directed Compiler. Annual Review " in Automatic Programming, V.3; pp. 207-227. Irons, E.T. \ 1961 A Syntax Directed Compiler for ALGOL 60. CACM _4; pp. sl-55

Iverson, Kenneth E. 1964 A Method of Syntax Specification. CACM_7,IO; p. 588.

Knuth, D. E. 1962 History of Writing Compilers. Digest of Technical Papers, 1962 ACM National Conference; p. 43.

Knuth, D. E. n.d. Draft of Chapter 12 of an unpublished book.

Landen, W. H. and Wattenburg, W. H. 1962 On the Efficient Construction of Automatic Programming Systems. Digest of Technical Papers, 1962 ACM National Conference; p. 91.

Ledley, Robert S. and Wilson, James B. 1962 Automatic-Programming-Language Translation Through Syntactical Analysis CACM_S,3; pp. 145-155.

London, R. L. 1964 A for Discovering and Proving Recognition Rules for " Backus Normal Form Grammars. 19th Annual Conference, ACM. CABAL - 3 - 32 360 REFERENCE GUIDE CABAL DESCRIPTION

Masterson, Kleber S. 1960 Compilation for Two Computers with NELIAC. CACM _3, II; pp. 607-611 " McClure, R.M. 1965 TMG--A Syntax-Directed Compiler. Proceedings, 20th National Conference, ACM; pp. 262-274.

Metcalfe, Howard H. 1964 A Parametrized Compiler Based on Mechanical Linguistics. Annual Review in Automatic Programming, V.4; pp. 125-165.

Metcalfe, Howard H. 1963 A Parametrized Language Based on Mechanical Linguistics. Planning Research Corporation PRC R-311.

Morris, D. 1964 The Use of Syntactic Analysis in Compilers. Introduction to System Programming; p. 249.

Naur, Peter 1963 Documentation Problems: ALGOL 60. CACMj>,3; p. 77.

Naur, Peter 1963 Revised Report on the Algorithmic Language ALGOL 60. CACM _6; pp. l-7 / Numerische Mathematik 4; pp. 420-452 / Comp J J; pp. 349-367

Newell, Allen 1963 Documentation of IPL-V. CACM_6,3; p. 86. " Opler, A. 1962 "Tool" A Processor Construction Language. Proceedings, IFIP 1962; pp. 513-514.—

Paul, M. 1962 ALGOL 60 Processors and a Processor Generator. Proceedings, IFIP 1962 pp. 493-497.

Perlis, Alan J. et al. 1965 A Preliminary Sketch of Formul. ALGOL. Carnegie Institute of Technolo

Pratt, Terrence W. 1965 Syntax-Directed Translation foi Experimental Programming Languages. University of Texas Computation Center TNN-41 .

Pyle, I. C. 1963 Dialects of FORTRAN. CACM 6,8; p. 462.

Rabinowitz, I. N. 1962 Report on the Algorithmic Language FORTRAN 11. CACM _5; pp. 327-337. " 360 REFERENCE GUIDE CABAL - 3 - 33 CABAL DESCRIPTION

Reynolds, John C. " 1964 Cogent Programming Manual. Applied Mathematics Division, Argonne National Laboratory.

Reynolds, J. C. 1965 An Introduction to the Cogent Programming System. Proceedings, 20th National Conference, ACM; pp. 422-436.

Rosen, Saul 1964 A Compiler-Building System Developed by Brooker and Morris. CACM 7,7. Rutishauser, H. 1962 Panel on Techniques for Processor Proceedings, IFIP 1962 > pp. 524-531.

Samelson, K. 1962 Programming Languages and Their Processing. Proceedings, IFIP 1962; pp. 487-492.

Sammet, J. E. 1961 Detailed Description of COBOL. Annual Review in Automatic Programming, V.2; pp. 197-230

Schwarz, H. 1962 An Introduction to ALGOL. CACM s_; pp. B2-95.

Scott, D. W. " 1962 Wizor, A Compiler Compiler for the GE 225 Computer. Digest of Technical Papers, 1962 ACM National Conference; pp. 46-47.

Schneider, Frederick, W. and Johnson, Glen D. 1964 Meta-3; A Syntax-Directed Compiler Writing Compiler to Generate Efficient Code. Proceedings, 19th Annual Conference, ACM; P.D1.5-1 Schorre, D. V. 1964 Meta II: A Syntax-Oriented Compiler Writing Language. Proceedings 19th Annual Conference, ACM; p.D1.3.

Share Ad-Hoc Committee on Universal Languages 1958A The Problem of Programming Communication with Changing Machines: A Proposed Solution. CACM J, B; pp. l2-18.

Share Ad-Hoc Committee on Universal Languages 19588 The Problem of Programming Communication with Changing Machines: A Proposed Solution—Part 2. CACM 1,9; pp. 9- 15

Shaw, Christopher J. 1963A Jovial and its Documentation. CACM_6,3; p. 89

Shaw, Christopher J. " 19638 A Specification of Jovial. CACM 6,12; p. 721.

Construction; CABAL - 3 - 34 360 REFERENCE GUIDE CABAL DESCRIPTION

Sibley, R. A. 1961 The Slang System. CACM 4,1; p. 75. " Standards 1963 Toward Better Documentation of Programming Languages. CACM (5,3; pp. 76-100

Steel, T. B. 1961 A First Version of UNCOL. Proceedings, Western Joint Computer Conference, AFIPS; pp. 371-378.

U. S. Government Printing Office 1961 COBOL 61, Revised Specifications for a Common Business-Oriented Language GPO 0-598941.

Warshall, Stephen 1961 A Syntax Directed Generator. Proceedings, Eastern Joint Computer Con- ference, AFIPS; pp. 295-305.

Warshall, Stephen, and Shapiro, Robert M. 1964 A General-Purpose Table-Driven Compiler. Proceedings, Spring Joint Computer Conference, AFIPS; pp. 59-65.

Watt, J. B. and Wattenburg, W. H. 1961 A NELIAC-Generated 7090-1401 Compiler; Digest of Technical Papers, 1961 ACM National Conference; p.28-5.

Watt, J. B. and Wattenburg, W. H. 1962 A NELIAC-Generated 7090-1401 Compiler; CACM j>,2; pp. lol-102. " Wilkes, M. V. 1964A An Experiment with a Self-Compiling Compiler for a Simple List-Processing Language. Annual Review in Automatic Programming, V.4; pp. l-48.

Wilkes, M. V. 19648 Constraint-Type Statements in Programming Languages. CACM J_,iO; p. 587.

Yngve, Peter H. 1963 COMIT. CACM 6,3; p. 83.

"