SYNTAX ANALYSIS BY A PRODUCTION

by

Arthur Evans, Jr.

Submitted to the Carnegie Institute of Technology

in partial fulfillment of the requirements

for the degree of Doctor of Philosophy

Pittsburgh, Pennsylvania

1965

INTRODUCTION

Over the years users of computers have been writing programs whose correctness could only be taken as a matter of faith. If the study of programming is to become more of a science and less of an art, however, it becomes necessary that algorithms be accompanied by proof of correct opera- tion. To quote McCarthy (1965), "The prize to be won ... is the elimination of debugging. Instead a programmer will present a computer-checked proof that a program has the desired properties." The present work is the one of many small steps which will have to be taken before this utopian goal may be achieved.

The history of proving computer algorithms is rather sparse.

Chapter 7 contains a brief discussion of some of the work that has been done, accompanied by a comparison with the present work. It appears safe to say that no previous work, to this author's knowledge, has attacked a programming algorithm of the complexity of that treated here.

The present effort is concerned with proving the correctness of a specific and practical translation algorithm which translates a segment of an

ALGOL-like language to postfix, or reverse Polish, form. It should be emphasized that specific algorithms are introduced and their properties analyzed, but that it is not the goal of this research to develop general proof schemes which can be applied to a class of translation algorithms. It would be desirable, of course, if it were possible to prove that an entire

ALGOL translator was correct. Indeed, the initial goal of this research was to prove the correctness of the algorithm used in the ALGOL translator run- ning at the Carnegie Tech Computation Center. Unfortunately, this proved Acknowledgements

I am deeply indebted to Professor Alan J. Perlis for his guidance in this work. He has made available to me much of his valuable time to provide the counsel and advice needed to bring this project to fruition.

Much of the programming of the QWERT system, which played a valu- able part in checking algorithms before an attempt was made to prove them, has been done by Mrs. Carol H. Thompson, to whom I express my gratitude.

Further, I am grateful to the "ALGOL crew" at the Computation Center who have implemented in a working translator many of the ideas which developed in connection with this research, particularly to Mrs. Janet W. Fierst, the leader of the group, and Mr. David M. Blocher, who implemented the produc- tion interpreter. I am also grateful to the entire staff of the Computation

Center for providing smooth use of the computer, without which the work could not have been done.

The final typing has been done quickly and accurately by Mrs. Edythe

Simmons, to whom I express appreciation.

Finally, I am particularly grateful to my wife, Betty, who, along with my children, has been most patient during a trying time.

The research reported here was supported by the Advance Research

Projects Agency under the Department of Defense under the Grant SD-146 to

Carnegie Institute of Technology.

iii translation rules) and the A-productions. We show that the algorithm defined by the productions produces precisely the translation given by the translation rules.

In Chapter 5 we carry out the program of Chapter 4 for the

B-productions and B-Grammar.

In Chapter 6 we show that the A-productions are equivalent to another set of productions in that both accept the same set of input strings and produce the same output. The new productions are in a form more useful for certain applications.

Chapter 7 contains a summary of the results and discussion of the relation of this work to other work in the field.

The appendix contains a very brief discussion of a programming system for the production language. Sample computer outputs are included.

vii .

impractical for several reasons, not the least of which was the fact that the algorithm is not correct. (In the Carnegie Tech system, meanings are assigned to such non-ALGOLic constructions as A+BAC. One of the properties of a "correct" algorithm, as will be discussed later in some detail, is that it must reject any string which is not legal. Thus the Carnegie Tech

ALGOL translator is not in this sense correct, since it accepts non-ALGOLic strings. Of course, this deficiency does not keep it from being useful.)

Another reason for not considering the ALGOL productions is the sheer size of the effort involved: There are over 650 productions in the ALGOL trans- lator. At the present stage of developing techniques, it was not felt practical to undertake a proof of this magnitude. Instead, it was felt more appropriate to consider a smaller body which was more easily handled. The hope was that techniques could be developed which eventually might be applicable to larger tasks. An approach which might well be fruitful would be to merge the present techniques with those of London (1964), with the possible result of mechanically proving an entire ALGOL translator. This point will be discussed in the Summary in Chapter 7 in further detail.

For the reasons given, it seemed appropriate to consider a subset of ALGOL assignment statements. Actually, two are considered in detail: the A-language and the B-language. The first is a very simple language used as an example as the techniques are developed in the first four chapters, and the second is then covered in Chapter 5. The B-language includes assignment statements with multiple left parts, both arithmetic and Boolean expressions on the right and expressional parentheses. Not included are procedures with parameters, subscripted variables, or the ALGOL •

construction "if ... then ... else ...". Further, the only arithmetic opera- tors are plus and times - subtraction, division and exponentiation are not permitted. A few words of comment about these exclusions are in order, after a few preliminary comments.

The proof techniques to be presented involve lengthy and complex case analysis. It seemed thus appropriate to select a grammar which was representative of the general problem but for which the proofs would not be excessively tedious. With the exception of the "if ... then ... else ..." construction, all of the omissions just listed are of aspects of ALGOL which are not felt to be critical. The techniques used to handle expressional parentheses could easily be adapted to subscripted variables and to proced- ure calls. Further, adding more operators would require no new techniques, although it would lengthen the proofs. Thus these omissions seemed consistent with the purpose of developing techniques. More important, certain complica- tions were deliberately included• These include the use of "+" both as a binary and as a unary operator, and permitting mixed arithmetic and Boolean expressions, as in the statement

a _ b A d _ e * f ;

The "if ... " construction was excluded to reduce the case analysis.

Although this construction seems to be essentially different from any of the constructs which are presently accepted, it is felt that the bracketing techni- que introduced in Chapter I could be expanded to handle it. This point will be discussed further in the Summary in Chapter 7.

One other simplification has been made in this work. It is assumed o

that identifiers and constants, as used in ALGOL and other programming languages, have been "taken care of" in some earlier part of the processing.

Thus the only operand treated in this work is the symbol "I" (mnemonic for identifier). Treating ALGOL-like identifiers introduces the problems of scanning, concatenation and internal machine representation, and these did not seem to be the linguistically important problems. Instead, the emphasis in this research is on syntactic analysis of source code and translation into another form. Floyd (1961-b) has shown a production scheme which processes identifiers and (some) constants, and it is presently planned to use such a scheme in the next version of the Carnegie Tech ALGOL Translator.

This point will not be pursued further.

It is clear that we cannot go about proving an algorithm unless we are able to say what the algorithm is to do. If, for example, we were setting out to prove a square root algorithm, we could say, "The algorithm delivers a number with the property that its square differs from the input by less than epsilon." For a translation algorithm, however, it is necessary first to define what translation it is that the algorithm is to do. It is not enough to say that the algorithm is to translate assignment statements into postfix, since the term "postfix" may mean different things to differ- ent people. Instead, we must define explicitly what translation is to be produced for any given input.

But we must do more than tell what output is to be produced when the translator is supplied legal input. We want the translator to give an appropriate error signal if the input is invalid. Further, we want to be sure that for any finite input the translator does not loop forever or •

otherwise act in a pathological manner• We will make a claim somewhat like the following: Any legal input sentence will be translated into the proper postfix, and any other input will be rejected as being invalid. Thus we have two tasks: We must specify just which strings, out of all possible strings, over an alphabet, are to be considered as "legal input", and we must define for each such string what translation is to be produced.

The first task is easy, since techniques for this purpose are well known. We will base our work on the notation used in defining the language ALGOL-60, the notation referred to as Backus Normal Form, abbre- viated BNF. For the second task, we will append to BNF a notation which associates with each construct in the language a translation rule. We will show that this scheme associates with each legal string in the language a unique postfix representation. Thus, having accomplished the two tasks, we will have defined precisely what it is that the algorithm is supposed to do.

Before continuing, we pause to make a few comments about the nota- tion we will introduce to define language• The technique we use to define a BNF grammar causes the constructs of the language to be sets of strings.

This approach follows that of Floyd (1961), but differs from that frequently used by mathematical linguists. The latter usually define a grammar to be a set of rules ("productions") for generating legal strings. Since the problem in translator construction is to develop techniques for efficiently deciding which category (if any) a given string belongs to, our approach seems to be more appealing. We concern ourselves with whether a string belongs to a set (the set of legal strings) rather than whether the string can be generated by a given group of rules. ,

Our plan of attack may be described informally as follows, using notation and terminology that will not be used in the rest of the work.

After defining carefully what we mean by a Backus Naur Form (BNF) Grammar and what it means for a BNF grammar to be unambiguous, we exhibit a specific

BNF grammar G which defines a language L. Our first important result is the

Theorem: The grammar G is unambiguous.

We next will give a set of Floyd-Evans productions, PR. (We will not now distinguish between the productions and the algorithm which they define.) Given any string as input, PR must do one of three things:

(I) It may halt, having produced an output string whose last character is other than "ERR*".

(2) It may halt, having produced an output string whose last character is "ERR*".

(3) It may cycle in a loop "forever", scan past the end of the input string, or exhibit other "pathological" behavior.

In the first case, we say, "The productions have run to a successful conclu- sion." The string produced is called the resulting translation of the source string. In the second case, we say, "The productions have detected an error." The characters in the output string (other than the ERR*) are of no interest. The third category is described by the statement, "The productions fail."

Now, for each legal sentence a in G, let TR(a) be the translation •

of a as defined by the translation rules. Then, using McCarthy's notation of conditional expressions, we define for any string a

(a) = (a ¢ L -+ TR(a), true ,+ ¢)

Note that E, as we have define it, is a total function in that it has a well-defined value for any string a.

We now develop notation to describe the action of the productions•

For any string a for which the productions run to a successful conclusion

(case I above) we say that the string produced is PR(a). We define a func- tion H(a) as follows: Apply the productions PR to a, and let the predicate

"case i" mean that case I described above transpires. Define "case 2" similarly• Then

[I(a) = (case i -+ PR(a), case 2 -+ ¢)

It appears that H is a partial function since it is not defined if the productions fail. However, we will be able to prove the

LEMMA: rl is a total function•

We prove this by showing that the productions must eventually halt. In programming terms, we show that the program will never loop forever, for any input data. The principal result of this investigation then is the

THEOREM: For any string a, E(a) = H(a) .

This theorem implies that the productions will terminate successfully if, and only if, the input string is a sentence in L. Further, after the halt, the output produced will be the translation defined by the translation rules. CHAPTER 0 - STRINGS

In this section we present the properties of strings which we will need. We assume an alphabet consisting of a finite collection of distinct characters. Characters will be represented by lower case Greek letters• A string is a finite, ordered, non-empty sequence of characters from the alpha- bet. (On a few rare occasions we will permit an empty string, but we will note carefully such cases.) Strings will be represented by lower case letters. The concatenation of two strings is represented by juxtaposing their names. Thus the string a b is the string a followed by the string b.

When a character of a string is mentioned in English text, it will be enclosed in quote marks ("). When a string is exhibited on a separate line, quotes will not be used. Similarly, the name of a string appearing in text will be underlined (as in the last sentence of the preceding paragraph), but string names on separate lines will not be underlined. Thus we could say that the string a is "I" b ")" or we could say

a = I b )

We will refrain from using the mark "=" in the alphabets under consideration, so we may use that mark in defining strings as above. Similarly, we will be careful about alphabets so as to eliminate possible confusion between string names and characters in strings. (In the above example, b is a string name, since it is underlined in its appearance in text•)

• ,

We are frequently interested in considering various parts of a

string. A string a is deconcatenated if it can be written as a I a 2 ... --na , where each of the a. is a string. The function first(a) is defined for any m]_ string a and is the string consisting of the first character of a. b is a substring of a string a if there are strings c and d such that a = c b d; here either c or d or both may be empty. A head of a string a is a sub- string which includes the first character of a. Similarly, a tail of a is a substring which includes the last character of a.

For any string a, the function length(a) is the number of charac- ters in a. For any character _, length(u) = i, and for any string a, length(a) > I. If a = b c, length(a) = length(b) + length(c). CHAPTER I - BACKUS NAUR FORM

In this chapter, we will define quite carefully the concept of a Backus Naur Form Grammar, referred to as a BNF grammar. _We follow

Knuth (1964) in using this name rather than Backus Normal Form.) From this we get the concept of a BNF language. Next, we consider the concept of ambiguity and define what it means for a BNF grammar to be unambiguous.

To explain further the concepts introduced, we exhibit a specific grammar (the "A-grammar") and show that it is a BNF grammar. Finally, we show that the A-grammar is unambiguous.

This chapter is divided into sections, as follows:

I. Definition of BNF grammar

2. Ambiguity of BNF grammar

3. Example: The A-grammar

4. Brackets and balanced strings

5. Theorems on BNF grammars

6. Unambiguity of the A-grammar

10. Section i: Definition of BNF Grammar. In the literature on mathe- matical linguistics as it relates to the construction of translators, there seems to be a dichotomy in how grammars are defined. The more mathematically inclined use productions, after the manner of Chomsky (1959) and others. Those more interested in programming use Backus Naur Form

(Backus 1959, Naur et al, 1963). Knuth (1964) has pointed out that both forms are essentially equivalent, although each has advantages in certain areas. He adds, "It is much easier to do theoretical manipulations using production systems and systematically avoiding [those things which distin- guish BNF from production form]." (Italics Knuth's.) Actually, the BNF form can be used for theoretical manipulations, if one chooses. We will now provide a formal definition of Backus Naur Form, defining precisely the same form used in the ALGOL Report (Naur et al, 1963). Further, we will provide our own definition of what it means for a grammar to be unam- biguous, since the usual definitions do not carry over directly.

We accept as given some alphabet of characters, the alphabet being assumed not to include the following four marks:

< > ::= I

We will represent members of the alphabet by lower case Greek letters: _,

_' Y' "'" ' _I' _2' .... A member of the alphabet will sometimes be referred to as a terminal symbol. The entire alphabet (that is, the set including all of the letters of the alphabet) is Vt, the terminal vocabu- lary. 12.

Strings of characters from this alphabet will be represented by lower case Latin letters: a, b, c, ... , al, a2, .... The conven- tions of Chapter 0 will be used.

A category is a set of strings over V t. Categories will be named by enclosing English phrases or abbreviations in the meta-linguis- tic brackets < >. Categories will be represented by upper case Latin letters from the beginning of the alphabet: A, B, C, ... , AI, A2, ....

The --non-terminal vocabulary Vn is the set of all names of categories. The entire vocabulary, V, is the union of Vn and V t. General members of V will be represented by lower case Greek letters from the middle of the alphabet: _' P' _' "'" ' _I' rT2' .... Thus the character

'_"_ appearing in text stands for either a character of V t or the name of a category.

Definition i.i: An alternate is similar to a category, in that it is a set of strings, but differs in how it is named and how it will be used.

An alternate is named by a string over the vocabulary V, and will be represented by an upper case Latin letter from the end of the alphabet:

S, T, U, ... , SI, $2, .... The name of an alternate defines the strings which belong to it. Consider an alternate _S named _I' _2 "'" TTn. Then a particular string a is a member of S if, and only if, a can be written as the concatenation of n substrings a I a2 ... an' where each substring

corresponds to the corresponding element of the name of mS in the following

way: If _.I is a terminal symbol, then a.i is the same symbol; while if _.i is a category name (the only other possibility) then the string a.l is a member of that category. 13.

Definition 1.2: A category is defined by the following notation:

A::=SII S21 -.- I Sn

The marks ": :=" and "I " are meta-linguistic connectives which may be read

"is defined as" and "or", respectively. In a definition such as the above, we say that A is defined to be the union of the alternates SI, ... , Sn-

Thus a string is a member of the category A if it is a member of (at least) one of the alternates S. appearing in the definition of A. 1

Definition 1.3: Two collections of categories are said to be joined if there is at least one category which is defined in one collection and used in a definition in the other collection. Two collections are unjoined if they are not joined.

Definition 1.3a: A set of categories is said to be connected if it cannot be divided into two unjoined collections.

Definition 1.4: A collection of n categories is said to be circular if it could be reordered in such a way that the first appears as an entire alternate in the definition of the second, the second appears as an entire th alternate in the definition of the third, ... , and the n appears as an entire alternate in the definition of the first.

Definition 1.4a: A collection of categories is said to be non-circular if no subcollection is circular.

Definition 11.5: A Backus Naur Form Grammar (BNF grammar) is a connected non-circular collection of definitions over a non-null terminal vocabulary in which (a) no category is defined more than once, (b) each 14.

category which appears on the right side of a definition is defined, and

(c) there is a distinguished category whose name does not appear in any definition.

Definition 1.5a: A Backus Naur Form Language (BNF language) is the set of strings which belong to the distinguished category of a BNF grammar.

The reader might well pause for a moment and note what has been done. As was promised in the preface, we have provided a definition of

BNF grammar in which the grammar is a set of strings rather than a set of rules for generating strings. On the other hand, it should be clear that the distinction is merely one of point of view, since what we have done parallels very closely in many ways the usual definition of a phrase structure grammar. See, for example, Bar-Hillel (1961), Chomsky (1959) or

Floyd (1964-b). 15.

Section 2: AmbiguitY of a BNF Grammar. We consider now the concept of an ambiguous grammar. In the usual definition of a grammar in terms of productions which generate the strings of the language, an ambiguous grammar may be defined as one in which some string can be generated in more than one way. (See, for example, Bar-Hillel (1961), Floyd (1962-b),

Gorn (1963), Lynch (1963), etc.) In connection with parsing, an unambigu- ous grammar may be one in which no string can be parsed in more than one way. (See, for example, Irons (1964).) The concept of ambiguity is the same in these two outlooks, even though they require different-appearing definitions. Having looked at grammars from still another point of view, we must provide our own definition of ambiguity. It should be clear to the reader that the concept is the same. We proceed in steps:

Definition 1.6: A string is said to be ambiguous with respect to an alternate if there is more than one way in which the string can be decon- catenated to correspond to the alternate. In other words, the string a is ambiguous with respect to the alternate named _i _2 "'" _n if _a can be written as bI b 2 ... bn or as cI c2 ... cn' and each b i and each ci corres- ponds to the corresponding _.l as required in Definition I.i.

Definition 1.6a: An alternate is said to be unambiguous if there is no string which is ambiguous with respect to it.

Definition 1.7: A string is ambiguous with respect to a category if either the string is ambiguous with respect to one of the alternates of the category or the string belongs to more than one alternate of the category.

Definition 1.7a: A category is said to be unambiguous if there are no strings which are ambiguous with respect to it. 16.

Definition 1.8: A BN__Fgrammar is unambiguou s if all of its categories are unambiguous.

A few comments are in order. Speaking informally, we need a definition of ambiguity that insures that no string can "get into the grammar" in more than one way. There are only two ways this unfortunate event could happen: (i) In the definition of a category, the string might belong to more than one alternate; or (2) in the definition of an alternate the string might "get in" in more than one way. These two possibilities are clearly covered by our definition.

The definition of ambiguity of a BNF grammar could be made in one step without defining first the intermediate steps we used. However, the concepts of ambiguity of a string with respect to an alternate or to a cate- gory are useful in proofs, since it is these attributes of a specific grammar which we are usually able to prove.

This definition of ambiguity has a rather interesting property which other definitions do not have: A category may be unambiguous even though one or more of the categories appearing in its definition is ambigu- ous. Note the definition of at the beginning of Section 3 below. Even if were ambiguous (which we will see it is not), would not be, since a string x which is a member of can be deconcatenated in only one way to correspond to the single alternate in the definition of , even

though the part of mx which is a member of may be ambiguous in .

This perhaps surprising result does not hurt anything, since we have defined a grammar to be unambiguous only if all of the categories in it are unambigu- ous. From the point of view of one trying to construct proofs about ambigu- ity of grammars, our approach has distinct advantages. We can prove an alternate to be unambiguous without being concerned with whether or not the categories appearing in it are ambiguous. Thus it is possible to start with one or two categories in the grammar and gradually prove the whole grammar unambiguous. We will see more of this point later.

We next define some terms which will make it easier to state proofs. We say a string "gets into" an alternate if it is a member of the alternate. Usually, we use this term in connection with showing that a string can be deconcatenated to match the characters of the alternate as required in Definition I.I. When we say "a string is in A" or "a string is an A", we mean that the string belongs to the category A. A phrase or legal string is any string which belongs to some category of the grammar. 18.

Section 3: Example-- the A-grammar. To illustrate many of the con- cepts introduced in the preceding pages, we exhibit what will be referred to as the A-grammar:

_s> ::= I +- ;

de> : := I +

::=

I *

::= I I ( )

Theorem: The A-grammar is a BNF grammar, with the distinguished category.

Proof: We first note that the collection of definitions is connected, for

is connected to , is connected to and is connected to , since in each case the first mentioned category appears in the definition of the second. We see that no category is defined more than once, and we see (by looking) that each category name which appears in a definition is defined. The grammar is not circular. Finally, we note that

does not appear in any definition.

The reader may find it easier to think about the A-grammar if he mentally changes the names of its categories to sentence (or statement), expression, term and primary. Sentences are then ALGOL-like assignment statements with "I" the only identifier, with the operators "+" and "*", and with expressional parentheses. ]9.

The terminal vocabulary of the A-grammar consists of the symbols

( ) + * _- ; I

This is V t. The set Vn of non-terminal characters consists of

There are four definitions in the grammar, including a total of seven alternates. We show now that the string a

I _-I * (I + I) ; is a member of the category . For this to be so, we must be able to write a as the concatenation of four substrings (since there are four char- acters in the definition of ) which correspond to the characters in the definition of . Let

aI : I

a2=<-

a3 = I * (I + I)

a4= ;

Then we need only show that a3 is in to have our result. To prove this, we must prove either that a3 is in or that a3 = a31 a32 a33 where a31 is in , a32 : "*" and a33 is in Instead of proceding in this way to complete the proof (the "top down" approach), let us work instead from the bottom up. We note that the string "I" is in

. Hence, it also is in . Therefore, the string "I + I" also belongs to .

The same string therefore also belongs to . Therefore, the string

"(I + I)" is in

. Thus the same string is in . It then follows that 20.

the string "I * ( I + I )" is in . But this is all we need for our re sul t. 21.

Section 4: Brackets and balanced strings. We introduce now the con- cept of brackets. In the terminal vocabulary we select some two charac- ters, one of which we call the left bracket and the other of which we call the right bracket. Sometimes in our proofs we will, for convenience, represent them by the marks "(" and ")" respectively, but our results will hold for any two characters.

Definition 1.9: A string is said to be partly balanced with respect to a pair of brackets if it contains as many left brackets as right brackets.

Frequently, we will omit explicit reference to the set of brackets under consideration where the context makes clear what terminal characters are being used as brackets, as in the following:

Definition I.I0: A set of strings is said to be partly balanced if each string in the set is partly balanced.

Since a BNF grammar, for example, is a set of strings, this definition lets us refer to a partly balanced BNF grammar. Now we will categorize strings which are not balanced. With respect to a set of brackets, we have

Definition 1.11: A string is left-deficient if it contains more right brackets than left brackets. A string is right-deficient if it contains more left brackets than right brackets.

Clearly, any string is either left-deficient, right-deficient or partly balanced. 22_

Definition 1.12: A string is balanced if it is partly balanced and it contains no head which is left-deficient.

Definition 1.13: A set of strings is balanced if each string in the set is balanced.

Definition 1.14: A character _ in a balanced string a is said to be enclosed (with respect to a set of brackets) if the head of a whose last

character is m_ is right-deficient.

Let us now consider briefly these concepts as they apply to the

A-grammar. Let "(" be the left bracket and ")" be the right bracket. A partly balanced string is then one in which the parentheses match. However, the string ")" "I" "(" is partly balanced, according to the definition, a fact which does not seem appealing. Fortunately, this string is not balanced since it contains a head (its first character) which is left- deficient. It is clear that a balanced string is one in which the parenthe- ses match in the usual algebraic sense.

It seems obvious that any phrase in the A-grammar is balanced. It certainly is not difficult to prove this assertion. Instead, however, we will prove two general theorems which can be applied to a wider class of grammars, one of which is the A-grammar. The theorems will be useful in proving grammars to be unambiguous. 23.

Section 5: The°rems on BNF grammars. In this section we will prove the theorems needed later in connection with BNF grammars.

Theorem i.i: If every alternate in every definition in a BNF grammar is partly balanced, then so is the grammar.

Proof: Let the left and right brackets be "(" and ")", respectively.

We proceed then by induction, letting Pn be the proposition that every string of the grammar of length n is partly balanced. PI is true, since a string of length one can get into a grammar only if it appears as an entire alternate. Since each alternate is partly balanced, the strings "(" and ")" cannot appear. For some n > O, we assume that PI' P2' "'" ' Pn are all true, and prove Pn+l" Consider any string a which belongs to a category because it matches an alternate TTI _2 "'" TTk" Thus we may write a as

aI a2 ... ak where each a. corresponds to _. as required in Definition I.i. For each _. --i I i which is a category, a. must be partly balanced. This follows from the --i inductive hypothesis, since length(_ai) < length(a)._ For each _.l which is a bracket, there must be a corresponding _. which is the other kind of bracket, --3 since the alternate is partly balanced. Hence, the string a must be partly balanced.

This proof fails if the alternate consists entirely of a single character, since then _ = _I and length(_) = length(!l). In this case, however, _I must be a category and _ must belong to that category. We consider then the alternate in that category to which a belongs. Since the grammar is non-circular, this process must terminate. 24.

Theorem 1.2: If every alternate in every definition of a BNF grammar is balanced, then so is the grammar.

Proof: The proof is essentially the same as that for the previous lemma. However, in the induction step we note that each _. which is a --j left bracket must precede the _° which is the corresponding right bracket, --j so we can conclude that a is balanced from the fact that the alternate is balanced.

Theorem 1.3: Consider a BNF grammar where "(" is the left bracket and

")" is the right bracket. If an alternate is the string

( A )

where A is a balanced category, then any character in any string in the alternate (except the last character of the string) is enclosed.

Proof: Assume the contrary, that a is a string in the alternate which has a head b which is not right-deficient, b is then either balanced or left-deficient. We can write b = "(" c, so c must be left-deficient, since it contains one less left bracket than b. But c is head of a string in the

category mA and cannot be left-deficient, since A is balanced.

It is clear from the preceding theorems that the A-grammar is balanced. Now we will prove a series of theorems about BNF grammars, all of which will be used later in proving various grammars to be unambiguous.

To aid the reader, a few comments will usually be made about each theorem to indicate how it will be used. Frequently, these comments will refer to the A-grammar, but the reader must understand that the theorems are general. 25.

Theorem 1.4: An alternate is unambiguous if it includes only one category name, the other elements of the alternate being characters.

Proof: This result is obvious from Definition 1.6. See also the comment after Definition 1.8.

In the A-grammar, for example, we will use this theorem to show that is unambiguous. Next we will prove some theorems about ambigu- ity of alternates and categories. Each theorem will show that a certain kind of scheme used in defining a category will not lead to ambiguity.

Theorem 1.5: An alternate of the form A _ B, where A and B are both balanced categories in the grammar, _ is a character in V t which is not a bracket, and any occurrence of _ in a string in B is enclosed, is unambigu- ous.

Proof: Assume the contrary, that there is a string Nh which can be in the alternate in more than one way. Then we can have

h = b c_ c (I)

h = d _ e (2) where b and d are in A and c and e are in B. Assume first that length(b)

> length(d). Then we can write b = d f and e = g c. Thus

h = d f_c = d_ gc and hence f c_ = c_ g so that first(f) = c_. Thus we can write f = c_ k and, since b = d f, we have that b = d _ k. From this and from line (i) we have

h = d_k_ c (3) 26.

Now, h, d and c are each partly balanced (since they are balanced), so k must also be partly balanced. From lines (2) and (3) we have e - k _ c.

Since k is partly balanced, the _ in e is not enclosed. (The head k _ is not right deficient.) But e is in B (see line (2)), and we hypothesized there to be no string in B containing an unenclosed _.

Now we consider the case where length(b) < length(d). The same proof applies if we permute b and d and also c and e. Finally, if length(b) = length(d), we do not have two distinct members of the alternate.

Theorem 1.6: An alternate of the form A _ B, where A and B are both balanced categories in the grammar, _ is a character in V t which is not a bracket, and any occurrence of _ in a string in A is unambiguous.

Proof: This is the previous theorem, changed only by saying that it is in A, not B, that _ cannot occur unless it is enclosed. The same proof applies up to just below line (3). Then use (I) and (3) to get b = d _ k, and a similar argument will show that we have an unenclosed _ in b and therefore in A.

Theorem 1.7: Consider a category A in a balanced BNF grammar where A is defined to be

A::=B I oIB 1 A ol B

Here B is another category in the grammar and _ is a character in V t which is not a bracket. If any instance of _ in a string in B is enclosed, then

A is unambiguous. 29.

any _ is enclosed. PI is clearly true, since the only string of length one for which the proposition does not hold is the string _, but that string is not in --B and therefore cannot be in A.-- Now assume that PI' P2' "'" ' Pn are all true for some n, and we show Pn.l" Assume the contrary, that the string h of length n+l contains a B which is not enclosed and that h is in

A. Thus h must be in one of the three alternates of A. It cannot be in the first, since we have hypothesized that no such string exists in B. Since h contains a $ which is not enclosed, we can write h as a _ b, where a is balanced. Assume that h is in the second alternate. Then first(a) = _ and -- I we can write a as _ c. Since _ is not a bracket, c is balanced and thus c is a balanced head of a string in B. But this contradicts the hypothesis.

Finally, assume that h is in the third alternate. Then h can be written as e _ f with e in A and f in B. The unenclosed _ in h must be in either e or f, but its appearance in e violates the inductive hypothesis and its appearance in f violates the hypothesis.

Theorem i. II: Let the category IA in a balanced BNF grammar be defined by

A ::= B 1 ot B

where B is another category in the grammar and _ is a character in V t which is not a bracket. Let __ be another character in Vt, also not a bracket.

Assume that any instance of _ in a string in B is enclosed, and that the one- character string _ is not in B. Then any _ appearing in a string in A is enclosed.

Proof: This is the previous theorem, with one of the alternates 30.

removed from the definition. The same proof applies, with each mention of the removed category deleted.

Theorem 1.12: Let the category A in a balanced BNF grammar be defined by

A ::= B I A ol B where B is another category in the grammar and _ is a character in V which - -- t is not a bracket. Let _ be another character in Vt, also not a bracket.

Assume that any instance of _ in a string in B is enclosed, and that the one-character string _ is not in B. Then any _ appearing in a string in A is enclosed.

Proof: The proof of the previous theorem applies. 31.

Section 6: Unambiguity of the A-grammar. We now proceed through a series of lemmas to prove that the A-grammar is unambiguous. In most of the proofs we will be concerned only with the three categories

, and , so we will use the term phrase to refer to a string which is in one of these three categories. We select "(" and ")" as the left and right brackets, respectively.

Lemma i. i: The A-grammar is partly balanced.

Proof: The restrictions of Theorem i.i are clearly met.

Lemma 1.2: The A-grammar is balanced.

Proof: The conditions of Theorem 1.2 are clearly met.

Lemma 1.3: If a is a string in

which is not identically "I", then all of the characters of a, other than the last character, are enclosed.

Proof: We need consider only the second alternate, for which Theorem

1.3 provides the desired result.

Lemma 1.4: If a is any string in , then any "+" appearing in a is enclosed.

Proof: We use Theorem 1.12, with _ taken as "*" and D_ as "+". The string "+" is clearly not in

, and Lemma 1.3 gives us the result that any

"+" in a string of

is enclosed. 32.

Theorem 1.13: The A-grammar is unambiguous.

Proof: The first alternate of

is unambiguous from Theorem 1.4 and the second alternate is clearly unambiguous. Further, the only string in the second alternate cannot be in the first alternate, so

is unambiguous.

Each "*" in

is enclosed, from Len_na 1.3, so is unambiguous from

Theorem 1.9. Each "+" in a string of is enclosed, so is unambiguous, again by Theorem 1.9. Finally, is unambiguous from Theorem 1.4. CHAPTER 2 - THE PRODUCTION LANGUAGE

In this chapter the notion of Floyd-Evans productions is intro- duced and defined. A specific set of productions, the A-productions, are exhibited, and the effect of executing them with a simple input is simulated.

This chapter is divided into sections as follows:

Section i: Introduction to Production Language

Section 2: Stack pictures, matches and stack transformations

Section 3: The Production Language

Section 4: The A-productions

33. 34.

Section i: Introduction to Production Language. In constructing processors for translating languages such as we have been discussing into a different form, the problem is to recognize legal constructs in the language and to produce a suitable translation. In 1960 the present author was involved in planning a translator for ALGOL-60. Under considera- tion was translating the ALGOL source string into an intermediate language which we have chosen to call "postfix" and which many call "reverse Polish" or "Polish postfix". The problem was to develop a suitable technique for expressing the translation algorithm. Production Language resulted from this effort.

Production Language was created for the purpose of describing translation from infix to postfix. In the nature of things, a compiler is a program in which most of the work done consists of making decisions based on asking suitable questions about (a) the last character scanned, and

(b) the state of the translator as determined by the characters which have been scanned recently. Thus, a flow chart for a compiler usually has more decision boxes than action boxes. It is expedient, therefore, to attempt to develop a notation in which it is convenient to describe and program the decision-making processes involved. The language used is derived from a notation developed by R. W. Floyd. Floyd (1961-a) described an algorithm for the translation of arithmetic expressions. The description consists of about five pages of flow charts accompanied by three or four more pages of explanatory text. Later, Floyd (1961-b) published another description of the same algorithm. However, he had developed a formalism for the algorithm which was the reason for the existence of the later article. The formalism permitted a description in a very concise form of a process which had been 35.

described otherwise in a very lengthy series of flow charts. It is some- what interesting to note that Floyd's sole reason for developing the formalism was to permit him to describe more concisely a process which had been formulated in a different manner. The notation developed by Floyd has been referred to as "Floyd Productions".

It has proved expedient to modify Floyd's notation in several non-trivial ways, both to meet the demands of the character set of the computer available and for various aesthetic and philosophical reasons. The result is a programming language rather than a notation. The emphasis in the previous sentence is on the word programming. By definition, a program- ming language is one which can be run on a computer. The important thing is that we were able to develop a language which we could run on the computer and in which it was convenient and natural to express the translation algorithm. It is necessary that the phrase "a language which can be run on a computer" be explained. Production language runs on the computer just as does ALGOL, in the sense that in each case we have a mechanism to translate the language that the programmer writes into a form acceptable by the computer's hardware. For ALGOL we have an ALGOL translator; for production language we have a loader and an interpreter.

A set of productions may be regarded as a set of rules which define an algorithm. A well-formed set of productions is equivalent to a single-pushdown-store automaton. Productions may be used to describe pro- cesses in which a source string is scanned and an output string is produced, a single stack being used as temporary storage for the process. The source string is scanned from left to right, one character at a time. As each 36.

character is scanned, it is placed into the top of the stack. On the basis of the top few characters in the stack and the state of the algorithm, the productions will direct that certain stack transformations be done, that new characters be inserted into the output string (that is, appended to the right end of the output string) or that a new character be scanned. 37.

Section 2: Stack pictures, matches and stack transformations. Before trying to define the notation of production lanaguage, we will introduce and explain the basic concepts that will be needed. We start by defining a notation for a stack picture. (It is assumed that the reader is familiar with the concept of a stack as used in programming. If not, Evans (1963) gives a description of the concepts needed. This same paper also contains a more informal description of production lanaguage than that given here.)

A stack picture is a string over an alphabet V (to be defined later). For P the moment we will require that the last character in the stack picture be the mark "I "" (Context will always make it clear whether this mark is being used in a stack picture or in a BNF grammar.) A stack picture, not surpris- ingly, represents a picture of a stack. The character just to the left of the vertical bar corresponds to the top element of the stack, the character to its left is the next element, etc. Thus the stack picture

T + P I represents the character "P" as the top (that is, the first) element of the stack, "+" as the next (second) element of the stack and "T" as the next

(third) element.

The problem we are concerned with is asking whether or not, at a given instant, the stack of the translator matches a given stack picture.

We answer this question by first comparing the top element of the stack with that element of the stack picture just to the left of the vertical bar. If they are different, the match fails, while if they are the same, we examine the next element of the stack and of the stack picture. This process con- tinues until either a non-match is found or until all elements of the stack picture have been looked at. In the latter case, we say, "The stack matches 38.

the stack picture." Note that the number of characters looked at is the number of characters in the stack picture. Note further that any other characters in the stack are irrelevant to the matching process.

The characters that may be in the stack include the terminal characters V t of the language being scanned and certain internal symbols

V.1 to be used as place markers during the translation process. The vocabu- lary for stack pictures must include these characters. In addition, we permit in stack pictures certain special symbols called metacharacters.

Suppose that we want to get a match if the stack matches either the stack p ic ture

A B C 1 or the stack picture

A B D 1

Then we may define a metacharacter called, say, , as

: := C l D and use the single stack picture

A B I

The meaning attached to such a stack picture is that we are to assume that we have a match if the top element in the stack is any member of the meta- character. Of course, we need a table of metacharacters associated with each set of productions, to list just which characters belong to each meta- character. 39.

Suppose that we want to get a match if the second element of the stack is, say, "A", and we do not care what the top element of the stack is

Since this situation arises frequently in productions, the metacharacter

is considered "built in" to all sets of productions and does not need to be defined. Its meaning is "any character whatsoever". Thus the stack picture needed for the first sentence of this paragraph is

A I

Floyd, in his original work on productions (1961-b), used the

Greek _ for this purpose. has been chosen as a computer-printable representation. The reader should note that the stack picture

1 will always match the stack.

We now define the production alphabet, V . As has been mentioned, P we will use productions to define a translation algorithm from strings to

strings. V t is the alphabet of the input strings ' and VO is the alphabet of the output strings. In general, these two alphabets will be the same, although it is quite possible to write productions in which they differ.

The metacharacters we call Vm , and for any set of productions these must be defined (except for the built-in ). Any other symbols needed are inter- nal to the productions (since they appear in neither the input nor the out- put) and are in V..i Characters from any of these four alphabets may be in a stack picture, and any except Vm may be in the stack. The following table summarizes this discussion: 40.

V t input string (_erminal alphabet)

V output string O

V. internal characters i

V me tachar ac ter s m

V production vocabulary p

We now summarize much of the previous discussion with some defini- tions.

Definition 2.1: A metacharacter, written as a string of letters enclosed in the brackets <..>, stands for, or represents, any one of a set of characters. A character is said to "belong to" a metacharacter if it is one of the characters which the metacharacter represents. A metacharacter is defined by listing the characters which belong to it. The metacharacter

is always defined to represent "all possible characters".

Definition 2.2: A stack picture is a string whose last character is the mark "I " and whose other characters are from the alphabet V . A stack P picture corresponds to a stack in the following sense: The character in the stack picture just to the left of the "I" corresponds to the top element of the stack, the next character to the left corresponds to the second element of the stack, etc.

Definition 2.3: A stack configuration is said to match a stack picture if each element of the stack matches the corresponding element of the stack picture. If the stack picture element is in V t, V i or Vo, the corresponding stack element must be identical; while if the stack picture element is a metacharacter, the stack element must belong to the metacharacter. If the stack picture element is there is always a match. 4].

We consider now the concept of a stack transformation. Writing

stack picture -+ stack picture indicates that the stack is to be transformed from matching the picture on the left to matching the picture on the right. (Such a transformation clearly makes sense only if it is already established that the stack matches the left stack picture. It will become clear that a stack transformation can never be applied unless the match exists.) The idea in a stack trans- formation is that all of the stack elements which are explicitly mentioned in the left stack picture are to be transformed into the right stack picture.

(Henceforth, we will use "left side" and "right side".) The meaning of such a transformation is clear if the right side contains only characters from Vt,

V° and V , but we must explain what is meant if it contains metacharacters i O

A metacharacter may only appear on the right side of a stack trans- formation if it appears on the left side exactly once. To illustrate: The stack transformation

A I -_ B I indicates the following stack action: Unstack the top element and save it in a temporary, say TI. Unstack the next element and discard it. Unstack the next element and save it in another temporary, say T2. Stack the char- acter "B". Stack the contents of TI. Stack the contents of T2. The effect of a stack transformation which does not satisfy the rule given in the first sentence of this paragraph is undefined, and we will not use such trans- formations. Note that the same metacharacter may be used more than once on the left side providing that it does not appear on the right side at all. 42.

To sun_narize what we have said, we give the

Definition 2.4: A stack transformation states that the stack is to be transformed from one matching an initial stack picture to one matching a resultant stack picture, and is written

stack picture -+ stack picture to indicate that the stack is to be transformed from matching the stack picture on the left to matching the stack picture on the right. The trans- formation is done by unstacking the elements explicitly mentioned on the left and then stacking those mentioned on the right. If a metacharacter appears on the right, there must be exactly one appearance of the same meta- character on the left, and the meaning is that the particular stack element which matched that metacharacter on the left is to be put back into the stack into the position indicated by the appearance of the metacharacter on the right. is here to be treated just like any other metacharacter.

We must discuss one more concept before we are able to define production language. (In the discussion which follows, we use the terms

"P-label", "A-label", "production", "action" and "next part" in a technical sense. These terms will be defined in the next section.) As mentioned earlier, production language is a programming language, to be used to state an algorithm. We use the term label in the usual programming sense to designate a particular place in the program so that we may "transfer to" it.

A label identifier will be a string of letters or digits, the first charac- ter of which is a letter (just as in ALGOL). If a given label identifier appears in the P-label part of a production, it is said to label that produc- tion; if it appears in the A-label part, it is said to label the action. If 43.

a label identifier appears in the "next" part of a production, it indicates that we are to consider next (that is, transfer to) that production or action which the label identifier labels. We will, of course, require that a set of productions be well-formed in the sense that a given label identi- fier may not appear more than once as a label of a production or action, and that any label identifier used in the "next" field appear somewhere labeling a production or an action. 44.

Section 3: The Production Language. Now that the preliminary concepts are understood, we may proceed to the heart of the matter.

Definition 2.5: A production is a rule written as either

P-label stack picture _ stack picture action part or as

P-label stack picture action part

The interpretation of a production of the first type is: If on encountering the production the stack of the translator matches the left stack picture, then transform the stack to match the right stack picture and then execute the action part; otherwise, consider the next production in sequence. The interpretation of a production of the second type is the same except that no transformation is to be done. The two stack pictures in a rule of the first type must define a stack transformation as in Definition 2.4.

Definition 2.6: The term scan indicates the performance of the follow- ing actions: An instance of the (current) first character of the source string is placed into the stack, and then that character is deleted from the source string. Thus repeated scanning will "use up" the source string, transferring it into the stack. The last character scanned is at the top of the stack.

Definition 2.7: The action part of a production is subdivided as follows:

A-label action parameter star next 45.

The action_ and parameter specify an action to be executed on a parameter.

The star part will either be empty, indicating no effect, or the character

"*", indicating that the scan operation (of Definition 2.6) is to be done.

The next part will contain a label identifier that labels either a produc- tion or an action part. Executing an action part (as referred to in

Definition 2.5) consists of doing the following: Execute the action on the parameter; then, if the star part is "*", scan a new character from the source string and stack it; and then consider next the production or action which is labeled by the next part. If more than one action is to be done on the basis of a given match, the action part may be continued onto succeeding lines. All lines except the last must then have the star part and next part empty, and only the first line may have a P-label or stack pictures. Any of the lines may have an A-label. The next part of the last action for a given production must be non-empty, unless the action is HALT or some other action which, by its nature, does not have a successor.

In connection with some sets of productions, we may associate with some of the elements of V an integer called the hierarchy of the P element. We are then able to use the action "compile" (described below) whose effect is dependent on the hierarchy values of stack elements. This action is introduced to provide economy in the number of productions needed for complex languages, such as ALGOL. If the compile action is to be used, we must have a table of hierarchy values along with the productions.

We will now discuss the actions which may be used in productions.

Some of them have a parameter, the string just to the right of the action.

Definition 2.8: The action part of a production may contain one of the following four actions. Their effect is as indicated: 46.

OUT means to create in the output string an instance of the last character scanned. It does not affect nor is it affected by the stack.

OUTPUT means to create in the output string an instance of the parameter, unless the parameter is an integer, say, n. In that case, the th n character of the stack as it was before doing the transformation is to be outputted. In counting stack characters, n=O means the top element, n=l means the next element, etc.

COMPILE is dependent on hierarchy values of stack elements. The operation consists of comparing the hierarchy of the top element of the stack with that of the second element. If the top element has higher hierarchy than the second element, the operation terminates. If the top element has a hierarchy which is less than or equal to the hierarchy of the second element, an instance of the second element is created in the output string and the second element is deleted from the stack. The operation is then repeated on the (new) second element of the stack. In some cases, the operation COMPILE has a parameter. This means that instead of using the hierarchy of the top element of the stack to compare with the hierarchy of the second element, instead the hierarchy of the character appearing to the right of COMPILE is to be used. In this case, the top element of the stack is ignored. The top element of the stack will never be changed by compile.

The effect is undefined if an attempt is made to access the hierarchy of a character which has not been assigned a hierarchy.

HALT terminates execution of the productions.

From what has been said, the reader might well have the feeling 47.

that OUT is not needed, since the coder will always know what the last scanned character is when OUT is to be executed and could code, instead,

OUTPUT. Indeed, for the productions considered in the present work, this is an accurate observation. However, in languages of interest identifiers other than "I" must be handled. In the system in which the production program is imbedded, scanning any ALGOL identifier causes "I" to be stacked.

Then executing OUT causes the original identifier ---not "I" -- to be outputted.

We are now in a position to explain what happens in a program written in production language. We assume that the stack is first initial- ized to contain two instances of the character left terminator (_) and that we have then scanned and stacked the first character of the input string. We then consider the first production. We compare the stack of the translator with the stack picture of that production looking for a match as described in Definition 2.3. We continue as described in Definition 2.5:

If they match we do any indicated stack transformation and then execute the action part, while if they do not match, we turn our attention to the next production. Certain assumptions are implicit in the preceding discussion, and we now make them more definite.

Definition 2.9: A production prog=am is an ordered set or productions in which the following conditions are met:

(I) Each label identifier which appears in the next part of a production appears also as either a P-label or an A-label.

(2) Exclusive of appearances in the next field, no label identifier may appear more than once in the program. 48.

(3) The action HALT appears at least once.

(4) The left stack picture of the last production is

I

It should be noted that we have defined a program as an ordered set of productions. It is thus meaningful to talk about the first, last or next production.

If a production program is applied to a given input strin Z and eventually the action HALT is executed, we say, "The productions have run to completion." The output strin Z created is the result of executin Z the productions. If the last character of the output strin Z is "ERR*", we say,

"The productions have detected an error," and we ignore the rest of the output. Otherwise, we say, "The output strin Z is the translation of the input strin Z.''

Several things may keep the action HALT from bein Z executed.

The se are

I. The productions may loop forever. Production language is just like any other computer language in that it is possible to write programs which go into infinite loops. Clearly, the loop cannot contain any "scan" com- mands (or restriction 2 below would be violated), although the loop may contain output commands, in which case the output strin Z would grow indefin- itely.

2. An attempt may be made to read beyond the end of the source strin Z.

For our purposes, we will treat this as bein Z equivalent to loopin Z forever, 49.

since both are pathological cases. Indeed, for the productions with which we are concerned, we will be able to show that neither of these cases will ever take place.

3. Situations which can only be called "programming errors" may occur.

These include looking for a match with a stack picture which contains more elements than does the stack, or executing COMPILE in such a way that the hierarchy of an element is needed when the element has not been assigned a hierarchy, etc. 50.

Section 4: The A-productions. We exhibit now a set of productions, which we will refer to as the A-productions. We will see later that these productions form a recognizer and translator for the A-grammar. The numbers in the left-most column are not part of the productions, but are printed to facilitate reference to specific lines. Note that the numbers start over at each P-label. The lines without numbers are comments to help make the productions easier to read.

To clarify the concepts introduced, we will trace the progress through the productions in scanning the string

I +- I + I ;

(We note in passing that this string is in in the A-grammar.) To facil- itate our explanation, we augment the notion of a stack picture to show, in addition to the stack, the P-label of the production which we are currently considering and the string yet to be scanned. Thus the notation

A: _ _ I I +- I + I ; indicates that we are considering production A (the first one), that the stack picture is

and that the string yet to be scanned is

+- I + I ;

The label is followed by a colon to set it off from the stack picture and the unscanned string is to the right of the vertical bar. This machine configuration gives the entire state of the productive system, except for 51.

PPODtWCTION$ FOR THE "A-GRS_,UAR'. t I I OUT ,B +1 I I O e "" 1 1 *C +1 I I o

_,_AI N LOOP. C I l--' P I OUT ,C +1 ( 1 1 *C +2 T "_ P I "-' T ' I _ T _ I C +" * 1 I ,c +5 r + T I _ E _G=" I OUTPUT+ C +6 T

_,!OWCLESN ltp FOR EK!_ NF STATF_,4ENT +o l-" I ,,- E : ! "-' l'-' I OUTPUT OUTPUT: U HaLT

IF RC_NNIN_ qETS THIS F_R, THE. STRI_,Y¢tS INVALID° 9 I I I OUTPIITERR* H +1 I " I *_ 52.

the output string. On occasion, we will add to the notation (in an obvious way) to describe the output produced.

The notation just introduced is closer to Floyd's original nota- tion (Floyd, 1961-b) than is our production notation. Floyd shows the as- yet-unscanned string (or some head of it) to the right of the mark indicat- ing the top of the stack, and defines scanning as moving a character from the right to the left of the stack-top marker. (Floyd used a A where we use a vertical bar.) Indeed Floyd included scanning in his stack trans- formation. In the present notation, he would have indicated scanning a character by the stack transformation

I -+ i

Floyd's notation was a _ -+_ A. An interesting aspect of this notation is that it is natural to indicate a backward scan in it. This point will not be pursued further.

When we start processing the input string given above, the machine configuration is as given in the example. We are thus considering the pro- duction labeled "A", which asks the question "Is the top element of the stack the character 'I'?" The answer is yes, so we proceed across the rule.

There is no right-arrow, so there is no stack transformation. The action is OUT, telling us to create in the output string an instance of the last character scanned -- the "I". The star field contains a "*", so we scan the next character of the source string -- the "+_' -- and consider next the production labeled "B". Thus we have

B: _- _- I +- I I + I ; 53.

Production "B" asks if the top element of the stack is "+", which it is.

We scan a new character and consider next the production labeled "C". We have

C: _ _ I _- I I + I ;

We get a match on the first production we look at, but this time there is a right arrow. The notation indicates that the top element of the stack is to be changed from "I" to "P". After scanning the next character, we have

C: _ _ I +- P + I I ;

We again look at production "C", but this time we do not get a match.

Clearly line C+I does not match, either. Now let us consider C+2. We first ask, "Is the top character in the stack anything whatsoever?" The answer is, of course, yes. We than ask, "Is the second character in the stack 'P'?"

Again, the answer is yes. However, we must say no to the next question, since the third character in the stack is not "*". Now consider C+3. Here we do get a match, and we have a right-side. This time we change the second element of the stack from "P" to "T". (A glance at the A-grammar might be appropriate now. We scanned an "I" and assumed that it was in

. When we saw that there was not a "*" to its left, we assumed then that it was in

. )

We continue. We have

C: _ _ I +- T + I I ; and we are again at C. This time we get a match at C+6 and change the "T" to an "E". We again start looking for a match at C, this time getting a 54.

match at C+7. Now we scan a new character, and we have

C: _ > I ,- E + I 1 ;

Before proceeding, we note what has happened. The "I" was scanned and initially identified as a member of

. On the basis of successive tests, the identification was changed to and then to . Then, when nothing more could be done, a new character was scanned.

Now we get a match at C, changing the "I" to "P". We scan again

(since we need one character to the right to decide about "I") so we have

C: > _ I +- E + P ; I

Now we get a match at C+3 and change the "P" to a "T". We have

C: b > I +- E + T ; I and we finally do something more interesting.

The match this time is at C+5. The right side indicates that

E + T 1 in the stack is to be changed to

T I

Effectively, we are to delete from the stack the symbols "+" and "T".

(What actually happens is that the top of the stack is unstacked and saved somewhere. Then the next two elements are unstacked and discarded. Finally, the element that was saved "somewhere" is put back. Note that the on the right is to be the same character as the on the left, even though 55.

the match on that line and the action of that line are independent of what the character might actually be.) We output the character "+" and start over at C with

C: _ _ I <- E ; I

Now we get a match at C+9. We remove everything but the left terminators from the stack and output "+J'. Now we note that there is no "next" field.

We also note that the next line does not have a production but does have an action. What has happened here is that we want to do more than one action on the basis of the match, and this is the notation used. On the next line we output ";" and on the next line we halt. (The fact that the HALT command is labelled is of no importance at the moment.) The output string we have produced is

I I I + +- ;

This is the postfix representation of the input string.

A few comments are in order. The two productions at A check that the input string starts with "I", and the two productions at B insure that the next character is "+J'. It is not surprising that these two cases are separate from the rest of the productions, since the initial "I" and the "+J' have a special place in the grammar. We have assumed that the last charac- ter of each string will be ";" and have arranged things so that we cannot scan past the "'",. An attempt to do so will get a match on line D. We out- put the character "ERR*" and then go to H and halt. If in a search start- ing at C we have not gotten a match by C+9, we will get a match at D+I. We then scan and ignore all characters up to the ";" and give the error signal. 56.

We will be able to show that these two cases can occur only if the input string is not well-formed -- that is, is not a member of . Indeed, we will prove each of the assertions made in this paragraph about the

A-productions.

We make one final observation. We started with

A: > I I e- I + I ; and ended with

H: I having created the string

I I I + +- ; as output. We can summarize this by writing

A: > I I +-I+l; = H: _ I output: I I I + +- ;

We will see much more of this latter notation in Chapter 4. CHAPTER 3 - TRANSLATION RULES

In this chapter we present a formalism for defining a translation to be done from the ALGOL-like language of Chapter i to a different form.

We stated in the preface that we would prove that a translation algorithm would translate a given language correctly into another form. In

Chapter I we developed a notation for defining the source language, and in

Chapter 2 we gave a notation for expressing the algorithm. In this chapter, we will give a notation for stating what it is that the translator is to do.

Speaking informally, we will be translating from the ALGOL-like infix notation we have seen to a Polish postfix or reverse Polish form, which we have chosen to call postfix. Clearly, this statement in English prose of what the translator is to do is not enough. We must state precisely and unambiguously just what the translator is to do with any legal input.

In other words, for any legal string in the language, we must be able to tell what string the translator should produce as output. It seems reasonable that any notation to accomplish this end should be built around the same formalism used to define the language. It turns out that a Backus Naur Form

Grammar as we have defined it is a most convenient starting point for what we want to do.

The notation about to be described was developed independently by the present author in 1963. It is essentially a subset of a notation developed earlier by Irons (1961) to describe translation in a syntax-directed compiler. Although Irons' notation looks much different from ours, it may be seen that our idea is included in his.

57. 58.

We wish to define a translation from infix to postfix form. In infix, an operator appears between its operands, while in postfix, an operator follows its operands. Thus a string in the A-grammar which was in the alternate

would be translated into the form

(translation of ) (translation of

) *

We will call such a form a translation rule, and abbreviate it by the notation

-_ -_

*

Here the right arrow is used as an operator meaning "translation of". Thus the i ine

*

-_ -_

* could be read, "If a string is of the form t "*" p, where t is in and p is in

, its translation is the string consisting of the translation of t followed by the translation of p followed by "*".

Clearly we could assign a translation rule to each alternate in the grammar. Since the grammar is unambiguous, then, a unique translation would be defined for each string in the language. (This will be proved, below.) We will adapt the convention that we do not need to write a trans- lation rule if an alternate consists entirely of a single category name, since in that case the translation of the alternate is clearly the trans- 59.

lation of the category.

We now define translation rules more carefully. Consider a BNF grammar over a terminal vocabular Vt, and assume that the mark "-" is not in V t. Vn is the non-terminal vocabulary of category names. Let V o be the vocabulary of the output strings, as in Chapter 2. Now let V r be the union of Vt, Vn, the mark "-" and Vo. With each alternate _S of the grammar we associate a string called the translation rule of S. A translation rule is a string over Vr . A given translation rule may contain a category name only if that category name also appears in the alternate with which the translation rule is associated. Indeed, we go further and make the follow- ing requirement: For each instance of a given category name in an alternate, there must be exactly one instance of the same category name in the trans- lation rule, and there may be no other category names in the translation rule. Exactly one instance of the mark "-_' must appear immediately to the left of each category name in the translation rule, and there may be no other occurrences of "-_' in the translation rule.

The translation rule for an alternate indicates how the alternate is to be translated. Consider a string a which belongs to an alternate in which the category names are AI, A2, ... , Am . The translation rule then will be a string which includes each of these A° (with a "_" immediately l to its left) and which may include other characters. Since a is a member of the alternate, it can be deconcatenated into substrings to match the elements of the alternate name, as explained in Definition I.I. Let a. be i that substring of ! which corresponds to Ai, for each i. Then the trans-

lation of ma as given by the translation rule is a string made up as follows: 60.

For each category name in the translation rule, substitute the translation of the corresponding string as determined by the translation rules associated with that category, and for each other character use the charac- ter. (Ignore "-_'.) If a given category name appears more than once in the alternate (and therefore in the translation rule), then the string associated with the k th appearance in the translation rule is the string th that corresponds to the k appearance of the category name in the alternate.

Below we give the A-grammar with its associated translation rules.

Note that a minor change has been made in the form of the grammar, in that each alternate is typed on a separate line and that there is a vertical bar before each alternate, including the first. It should be clear to the reader that the change is only one of typographic convenience and is of no other significance. We will now consider as an example the same string that

_> :;: I I _- _ I -_._> 4- ;

":: I i + _ _ +

",".-- ] .,=p> I

_ _

*

;.-o-- Is I ( ) 61.

was used in the last chapter: "I +-I + I ;". The string "I + I" is in

, and its translation is seen to be "I I +". The translation of the entire string, then, is "I I I + +- ;", the same string produced as output by the productions.

Theorem 3.1: An unambiguous BNF grammar along with associated trans- lation rules assigns to each string a in the language a unique translation.

Proof: The key word in the theorem is "unambiguous". The string thus can belong to the distinguished category in only one way, and the sub- strings which belong to other categories can belong in only one way, etc. CHAPTER 4 - EQUIVALENCE OF THE A-GRAMMAR AND THE A-PRODUCTIONS

In this chapter, we show that the A-grammar and the A-productions are equivalent, in the following sense:

(I) Given any string in as input, the productions will run to a successful conclusion having produced -+ as output.

(2) Given any string not in as input (providing that the last character of the string is ";"), the productions will terminate at the

ERROR exit.

We do this by proving the

Theorem: Given as input a string x whose last character is "'" the

A-productions will run to a successful conclusion (having produced the trans- lation of x as output) if, and only if, x is a sentence in the A-language.

The chapter is divided into the following sections:

1. The "if" proof.

2. Introductory lemmas.

3. The "only if" proof.

62. 63.

Section i: The "if" proof. We will first show that any string in will be successfully translated by the productions to produce the postfix required by the translation rules. We will proceed by showing first that strings in

, and are successfully translated, and then we will have the result for strings in .

Before starting, we augment our notation of stack transformation.

What we want to show, as explained in the previous paragraph, is that scan- ning a string of will produce the right output. We go further and show that we will also leave the stack as we found it. We might be tempted to

write this symbolically for some string ms in as

A: _ I s _ H: _ I output: _ s

The notation is to indicate that we start with a terminator in the stack and s yet to be scanned and end looking at the HALT command with the stack restored and the proper output having been produced. Unfortunately, this does not happen, since the productions as written assume that the first character of the source string has been scanned. We avoid this problem by writing a "*" to the left of the '_" to indicate "scan". Thus we would write

A: _ I s * = H: _ I output: _ s

This means that if we start with the stack as indicated and then do a scan, we will end, etc. THE A-GRAMMAR_WITH TRANSLATION RULES

::- I I 4- ; I -'_<'E>_ | ::= i I + _ _ +

::: I

I *

"* "_:P> *

: :: I I I ( ) "'

THE A-PRODUCTIONS

PRODUCTOI NS FOR THE 'A-GRAMMAR', A i I I OUT ,8 +1 "_G>I I D B '- I I _C +1 I I o

MAIN LOOP. c I I -* P I OUT ,C +1 ( I I ,c +2 f * P <'36> I " f I OUTPUT* C +3 P I_ T I C +_ * I I ,c 4-5 E + T i "* E I OUTPUT+ C +6 T I -_ E i C +7 + I I ,c +8 ( E ) I "* P I _C

NOWCLEAN UP FOR END OF STATEMENT +9 I-. I ,- E ; I -. I-. I OUTPUT OUTPUT ; H HALT

IF SCANNING GETS THIS FARI,THE STRING IS INVALID. O ; I I OUTPUT ERR* H +1 I ' I ,0

64. 65.

Theorem 4.1: Let p be any string in

, t be any string in and e be any string in . Now,

let _ be either "+J' or "(" or "*" _ any character in V --p --p t

let -t_ be either "+J' or "(" or "*" or "+" _-t any except "I" or "("

let _ be either "eT' or "(" or "*" or "+" _ any except "I", "(" or "*" --e --e

Then

-_p C: olP I P _ P * = C: ol P _p I output:

C: o_t I t _t * = C: ol T a t I output: -_ t

C: ole I e _ e * = C: ol E _e I output: -+ e

Proof: We proceed by induction on string length. Let Pn be the assert- ion of the theorem, where the string i, ! or ! is of length n. Clearly P1 is true, since the only possible such string is "I". After the indicated scan is done, we get a match at C, produce the "I" in the output, leave "P" and the next character in the stack, and are looking at C. If the original string was in

, we are done. If it was in , we know the top character of the stack cannot be "I" or "(" (by hypothesis, since the character is _t), so the match will be at C+3, yielding

C: Ol T _t I as required. If the string was in , we get a match at C+3 as just des- cribed. Then, since the top character in the stack is not "*", either, we get a match at C+6, yielding the desired result. 66.

Now for some n > i we assume PI' P2' "'" ' Pn-i and prove Pn

We will prove Pn for strings in

and finally .

(a) Let p be a string of length n in

. Since n > i, first(p) -- "("

and last (p) -- ")" , so we may write p = "(" b ")" . Performing the indicated

scan, we get

C:C_(IP b)_ P

b is a string of length n-2 which is in . The match at C+2 causes the

first character of b to be scanned, and the induction hypothesis gives

C: _ ( E ) 1 _ output: -_b P P

We then get a match at C+9, giving

C: cgP P _ P I

b will have been produced. We see from the translation rule for the

second alternate of

that this is the same as _ p, completing the proof

for

.

(b) Let t be a string of length n in . If t is in from the

first alternate, the result of part (a) gives

C" (_t P _t 1 output: -+ t

As discussed above for n = i, a match at C+3 will leave the stack as required.

Now let us assume that t is in the second alternate. We can then 69.

We get a match at B, yielding

*C: _ I e- I e ;

But e is in , so by the result of the previous lemma we have

C: _ I +- E ; 1 output: -+ e

The next match is at C+9, yielding

H" _ I output: +- ;

The stack and output are as required. 70.

Section 2: Preliminary lemmas. In this section we will prove some of the preliminary results needed for the theorems of the next section.

Many of the lemmas we will state are quite obviously true; however, stating them as lemmas will make the proofs of Section 3 cleaner and more appealing.

The reader will note a tendency in the proofs to assume that characters of the source string somehow retain their identity as they enter the stack and are moved to the output string. Writing this way provides a worthwhile economy in the presentation of discussions and proofs about pro- ductions. The justification for so writing is provided by Lemma 4.1 and

Lemma 4.5.

Lemma 4.1: For every instance of ".", "*", "(", ")" or ";" in the stack, an instance of that character has been scanned from the source string.

Proof: Characters can get into the stack only by being scanned from the source string or by appearing on the right side of a production. But none of these characters appears on the right side of any of the productions.

Lemma 4.2: If we have

C: E _ I then the last match was at either C+5 or C+6.

Proof: We rule out all other possibilities. A match at C, C+I, C+4 or C+7 will leave "P", "(", "*" or "+" as the second stack element, res- pectively, since in each case that is the character at the top of the stack before the indicated scan made it the second character. A match at C+2 or

C+3 will leave "T" as the second character, and a match at C+8 will leave 75.

Lemma 4.9: If the character _ is to the left of the character _ in the stack, then __ was scanned before the _. If _ or _ is an operand symbol, the lemma holds for any "I" which became part of it.

Proof: The result is obvious from the productions, since no production

"shuffles" the stack. In C+2 and C+5 it makes no difference which "P", "T" or "E" on the left is assumed to become the "T" or "E" on the right -- the result holds equally either way. 76.

Section 3: The "only if" proof. In this section we will show that the productions will run to a successful conclusion only if the input is a legal string of the language. We prove that the productions will eventu- ally HALT, thus guaranteeing that all illegal strings will result in going out the error exit. The plan of attack is to consider the set of strings which have the property that scanning them will result in leaving only "E" in the stack. (We also consider, of course, strings that leave "T" and "P".)

We show that these three sets of strings satisfy equations which resemble closely the three equations defining

, and . Our desired result then follows.

Suppose that we scan a string x, as a result of which we have an

"E" as the second element of the stack. Symbolically, we say

C: _ 1 x _ * _ C: _ E _ I (i)

Of all possible strings x over Vt, some will have the effect of line (I) and some will not. Let us call the set of all strings x for which (i) holds the set . ( is dependent on _ and _. We will explain this dependence later.)

By Lemma 4.2 we know that the last match before arriving at the right side of (i) was at either C+5 or C+6. Let us first assume the former.

Then just before the match we had

C+5: E + T B 1

How did the stack get this way? By Lemma 4.9 we know that we must have scanned first a string which produced "E" in the stack, then a string which 79.

Definition 4.3 : Let _ be "eT', "(" , "*" or "+" and _ be any character in V t except "I" , "(" or "*"• . Then

is the set of all strings x such that

C: ot I x _ * _ C: o_ P 8 I

Lemma 4.10: If a string is in

, then it is either the character "I" or it is the concatenation of "(" with a string in with ")".

Proof: This is the result of line (6), above.

Lemma 4.11: If a string is in , then either it is in

or it is the concatenation of a string in with a "*" with a string in

.

Proof: This is the result of line (4), above.

Lemma 4.12: If a string is in , then either it is in or it is the concatenation of a string in with a "." with a string in .

Proof: This is the result of line (3), above.

Note now Theorem 4.1, which tells us that scanning a string in

will leave a "P" in the stack, scanning a string in will leave a "T" in the stack and scanning a string in will leave an "E" in the stack. It thus follows that

c

c

c 80.

We will now show that the inclusion also goes the other way, so that the

corresponding sets are equal.

Lemma 4.13:

c

, c , and c .

Proof: We proceed as usual by induction, letting Pn be the proposition

that any string of length n in

is also in

, etc. To prove PI we

consider

, , in turn.

Case I.i: Assume x is a string of length one in

. From Lermna 4.10

we see that either x is "I" or it is a string whose first character is "("

and whose last character is ")". The latter cannot be the case, since the

length of such a string is two or more, so x must be "I". But "I" is in

.

Case 1.2: Assume x is a string of length one in . Then from

Lemma 4.11 we see that either x is in

(in which case it is "I", from

Case i.i, and therefore in ) or x is a string of _ followed by a "*"

followed by a string of

. But then length(x) is at least three, and we

have hypothesized that x's length is one.

Case 1.3: Assume x is a string of length one in . Then from

Lemma 4.12 and an analysis similar to that of the previous case we see that

x must be "I" and thus in .

We continue now with the inductive step. For some n > i, we

assume that PI' P2' "'" ' Pn-1 are true and show that Pn is true. We have the usual three cases. 8].

Case 2.1: Let x be a string of length n in

. From Lemma 4.10 we know that either x is "I" or first(x) = "(" and last(x) -- ")". We know x is not "I", since its length is greater than one, so then let x be

"(" y ")". From Lemma 4.10 y is in , and we know length(y) -- n-2. By the inductive hypothesis then y is in . We have thus shown that any string in

whose length exceeds one is "(" followed by a string in followed by ")". But this is the definition of

, so

c

.

Case 2.2: Let x be a string of length n in . From Lemma 4.11 we see that either x is in

or x is a string in followed by a "*" followed by a string in

° In the first case, x is in

from Case 1 and therefore in , by virtue of the definition of . For the second case, we can write x as y "*" z where y is in and z is in

. But y and z are of length less than n and so, by the inductive hypothesis, are in

and

, respectively. Thus x is a string of followed by "*" followed by a string of

, and therefore is in .

Case 2.3: Let x be a string of length n in . From Lemma 4.12 and an analysis similar to that of the preceding case we see that x must be in

, qed.

Theorem 4.3: If e is any string such that

C" +- 1 e ; *= C: +- E ; I then e is in .

Proof: By Definition 4.1 it is clear that e is in . By the previous lemma, c . 82.

Theorem 4.4: Let s be any string such that

A: l s ; *= H: I where the last character output is not "ERR*". Then the string s is in .

Proof: Since the last character is not "ERR*", the last match cannot have been at D. It therefore must have been at C+9. Before that match, the stack was

C+9: _ I +-E ; I

There is a semi-colon in the stack, so by Lemma 4.1 a semi-colon must have been scanned from the source string. Further, no character has been scanned after the semi-colon, by Lemma 4.8. From Lemma 4.6 it is clear that the first two characters of s are "I" "+J'. From Lemma 4.9 it is clear that s is the concatenation of "I" with "+J'with a string that produced "E" in the stack. The desired result then follows from Theorem 4.3 and the definition of .

This proof completes the essential points of this chapter. The remaining two theorems sum up the results. It should be noted that the results are as described in the preface and at the beginning of the chapter:

The productions will not scan past the end of the input string or loop for- ever, given legal input they will produce the correct output, and given illegal input they will give an error message.

Theorem 4.5: Given as input a string x whose last character is "-" 85.

Section i: The B-grammar syntax and translation rules. We will consider now the B-grammar. On the next page we show the syntax of the

B-grammar, using long English phrases for the category names. On the following page we give the same syntax, but with shorter names (as in the

A-grammar), and give also the translation rules. In general we will refer to the shorter names (replacing the upper case letters by lower case), although we may use the longer names on occasion.

The B-grammar, like the A-grammar, represents a subset of ALGOL assignment statements, but it is a richer subset. Boolean quantities are permitted as well as arithmetic, and the two types may both appear in the same statement, since the relation of inequality between two arithmetic expressions may be a Boolean primary. Unfortunately, the B-grammar is ambiguous, since the string "I" is a member of the category in more than one way. This is so because "I" is a member of both Boolean expression and arithmetic expression and is, therefore, in both the second and third alternates in the definition of right side. Fortunately, the ambiguity of the grammar will not affect our results, since every ambiguous string has a unique translation. For example, the assignment statement

"I+-I;" has the translation

lie+-;

regardless of whether the second "I" is a Boolean expression or an arithmetic expression. We will pursue this point further later.

We note that the B-grammar permits multiple left parts, as in

ALGOL. Further, the left-most "+J' is to be translated into "_-J' while B-GRAMMAR SYNTAX

::: ] ,--

.:RIGHTSIDE:,:;: I " _IGHT SIDE:, I .,:BOOLEANEXPRESSION> I

;:: ! ,:BOOLEANTERM> I _OOLEAN EXPRESSION>v ,:BOOLEANTERM=, ._JOOLESNTERM:,::: I .,:BOOLEANSECONDARY> I _OOLEAN TERM:,..,.:BOOLEANSECONDARY- ;;: ] ) I

l;: I I + I ,:ARITH_IE'TITERM:,.C ::: I I .,.aRITH_IETICTERM> w: ::: I ( ,:ARITHMETICEXPRESSION> ) I

86. _r_._.ar, with _..._,.a,]"]_,o....._t:IOn" .,._,'!.Z]e_ ,.

I

::- I I _E_ v _ -_v _T> ::- I -. ^ ::- I ::- ] ) -' I <1">

.*<_P:, -_ :;- I ( ) -' I

87. 88.

other instances of "+J' are to be translated into "eT'. For example, the string

I <-I _-I + I ; has the translation

I I I I + +-<-+-;

This special treatment of the first "_' will produce extra complications.

Theorem 5.1: The B-grammar is a BNF grammar, with the distinguished category.

Proof: The grammar clearly meets the requirements of Definition 1.5.

We will now do what is possible towards an unambiguity proof. We select "(" and ")" as the left and right brackets, respectively, in all that follows:

Lemma 5.1: The B-grammar is balanced.

Proof: This follows immediately from Theorem 1.2.

Lemma 5.2: The category is unambiguous.

Proof: Obvious.

Lemma 5.3 : The category is unambiguous.

Proof: The first alternate is unambiguous from Theorem 1.4, and the second alternate is clearly unambiguous. Further, the only string in the second alternate, "I", is not in the first alternate. 89.

Lemma 5.4: The category is unambiguous.

Proof: We first note that any "*" in a string in is enclosed, from Theorem 1.3, and the result follows then from Theorem 1.9.

Theorem 5.2" The category is unambiguous.

Proof: We note first that any instance of "+" in a string in is enclosed, by Theorem 1.3. Then any "+" in a string in is enclosed, by

Theorem 1.12. Finally, is unambiguous by Theorem 1.7.

Lemma 5.5: The characters "A", '_", '_" and "_" do not appear in any string in , or .

Proof: Examination of the syntax shows that the only characters that may be in strings in , or are "+", "*", "(", ")" or "I". Thus the four characters listed in the lemma cannot appear.

Lemma 5.6: The category is unambiguous.

Proof: By the previous lemma any "_" in a string in is enclosed

(since there is no such), so by Theorem 1.5 the first alternate is unambigu- ous. The second alternate is unambiguous by Theorem 1.3, and the third alternate is obviously unambiguous. Now we must show that no string can be in more than one alternate. Clearly the only string in the third alternate cannot be in either of the other two. Further, any string in the first alternate must have an unenclosed "_" (since is balanced), and any "_" in a string in the second alternate must be enclosed.

Lemma 5.7: Any "A" , '_/" or '_" in a string in is enclosed. 90.

Proof: These characters cannot appear in the first alternate, by

Lemma 5.5; any appearance in the second is enclosed, by Theorem 1.3; and the result is obvious for the third alternate.

Lemma 5.8: The category is unambiguous.

Proof: Since any '_" in a string in is enclosed, by the previous lemma, the result follows from Theorem 1.8.

Lemma 5.9: The category is unambiguous.

Proof: Any "A" in a string in is enclosed by Lemma 5.7, so any

"A" in a string in is enclosed by Theorem i.ii. Theorem 1.9 then gives us the desired result.

Theorem 5.3: The category is unambiguous.

Proof: Any '_" in a string in is enclosed by Lemma 5.7, any '_" in a string in is enclosed by Theorem i.ii, and any '_" in a string in

is enclosed by Theorem i.ii. The result then follows from Theorem 1.9.

We have now shown that the categories arithmetic expression () and Boolean expression () are unambiguous, and we know we cannot show that right side is unambiguous. The result that we want is that the B-gram- mar syntax and translation rules determine a unique translation for any string in the grammar, and this we can prove. We will be able to show that the only ambiguous strings in are "I", "(I)","( ( I )' )", " ( ( ( I ) ) )", etc. Whether such strings are in or are in , their translation is the same: "I". 91.

Lemma 5.10: Any string which is ambiguous with respect to is

ambiguous because it is in both the second and third alternates.

Proof: We prove this by showing that all other requirements of

unambiguity are met. The character "eT' cannot be in strings in either

or , so no string in the first alternate can be in either of the other

two. There is no unenclosed ',+7,in , so the first alternate is unambigu-

ous from Theorem 1.5; and the second and third alternates are unambiguous by

Theorem 1.4.

Lemma 5.11: No string in can contain a "+" or a "*" unless it

also contains a "_".

Proof: These characters can get into strings of only through the

first alternate in the definition of .

We now define a new category to be used in the following discuss-

ion:

::= I I ( )

We will show first that any string in is also in and ,

and we will then show that these are the only strings in both categories.

We then have our desired result: That the translation is unique.

Lemma 5.12: Any string in is also in both and .

Proof: Let P n be the proposition that any string of length n which is

in is also in and . P1 is true, since the only such

string is "I" • Assume that PI' P2' "'" ' P n-I are true, and we prove P n . 92.

Let x be a string of length n in . Then first(x) = "(" and last(x) = ")", so we may write x as "(" z ")" where z is in from the definition of . But z is a string of length n-2 in , so it is also in and in , by the inductive hypothesis. From the syntax it is clear that if z is , then x is in , and therefore also in and . Similarly, if z is in , then x is in and there- fore also , and .

Lemma 5.13: Any string which is in both and must contain only the characters "(" ")" and "I"

Proof: We rule out all other characters in V t. Strings in cannot contain "_", but strings in which contain "*" or "+" must also contain a "_" (by Lemma 5.11), so no string containing a "+" or "*" can be in both categories. Further, no string in can contain a "A", '_/", '_" or "_", by Lermna 5.5. No string in either category can contain "_' or ";", and the only remaining characters in V t are "(", ")" and "I".

Lemma 5.14: If a string containing only the characters "(", ")" and

"I" is in or , then it is also in .

Proof: Let P n be the proposition of the lemma for strings of length n.

P1 is clearly true, "I" being the only such string. Now we assume

PI' P2' "'" ' P n-i and prove P n . Let --x be a string of length n made up of only the three permitted characters, and assume it is in . It must be in the first alternate of , since otherwise it would need a "+", so x is in . Similarly, x must be in the first alternate of and, there- fore, in , since there is no "*" in x. Since length(x) > i, x is in 93.

the first alternate of and, therefore, can be written as "(" z ")", where z is in . But z is a string of length n-2 in containing only "(", ")" and "I" and so by the inductive hypothesis is also in

. It thus follows that x is in . By a similar analysis we can show that if x is in , then it is also in .

We now get to the crux of the matter. Consider a string x in

which is also in , and let us ask what is its translation. It is clearly in the first alternate of , so we get its translation by looking at . Again, it is in the first alternate and we get its trans- lation by looking at . Now we learn that we get the translation by stripping off the outer set of parentheses and looking again at . This process continues until the parentheses are all gone, at which point we see that the translation is "I". Now assume that the x is in , and a similar analysis shows that its translation is also "I". We sum this up by stating

Theorem 5.4: Any string in has a unique translation.

Proof: All categories of the grammar are unambiguous except , but we have seen that any string which is ambiguous with respect to has a unique translation. 94.

Section 2: The B-productions, the "if" proof. We now introduce the

B-productions. They are listed on the next page and on a fold-out sheet at the end of the chapter. The reader may find it convenient to be able to look at both the syntax and the productions while reading the chapter.

Several aspects of these productions are worthy of comment before we start trying to prove things about them. The scanning of the initial "I" and

"+7' are not separate as they were in the A-productions but are part of the main loop. Once a "+7' is scanned, the only way it can get out of the stack is by a match at B or B+I, so a ,,+7,scanned other than at the beginning of a statement will result in an error flag. Further, if no "+7' is used, it is impossible to run to successful completion with a final match at B+I.

Eight metacharacters are used in these productions. , for example, is either "AP" or "I", and the other metacharacters are defined similarly. The need for this complication, which was not in the A-produc- tions, arises from the fact that we cannot tell whether "I" is to be in

(and therefore to become "AP") or in (and therefore to become "BP") until we have more information. The "more information" will be an operator:

If, for example, the "I" is the right operand of a "+", we know that the "I" is arithmetic, while if it is the operand of "A", it is Boolean. We must be careful not to try to make the decision too early, however, since in a legal construction like

I+-IA I_ I ; the third "I" is arithmetic. (The right operand of the "A" is the relation 95.

The plan of attack in this section will be as follows. We first

show that strings in will be translated properly. Then we show

that scanning a string in will leave either "AP" or "I*" in the stack,

and a similar result for the categories , , , , and

. Next we show that strings in are translated properly, and it is

easy to conclude with the desired result.

Lemma 5.15: If x is any string in and _ and _ are any char-

acters in Vt, then

A: _ I x _ * = A: _ I* _ I output: I

Proof: We scan "("s and match at A+2 until the "I" is scanned. Then

the match at A produces the desired output. Finally, each ")" gives a match at A+23 until all of the parentheses are gone.

Lemma 5.16: Assume that

--ap is in , --ap_ is +-or * _ap Is any character in V t

--at is in , _--at is + or + _at is not I +- (

--ae is in , --ae_ is +-or ( or _ --me_ is not I +- ( *

--bp is in , --_bpis +-or ( or-7 _bp is not I +- ( * +

__bs is in , _--bsis <-or A _bs is not I +-( * +

bt is in , _--btis +-or V _bt is not I +- ( * + # -7

be is in , _--beis +- or ( _be is not I e- ( * + # -7 A

Then 96.

A: _ ap _ * = A: _ 6 output: -+ ap ap ap ap ap

A: _at at Eat • = A: _at _at output: -+ at

A: _ ae ae _ ae * _ A: _ ae _ae output: -+ ae

A: _bp bp Bbp * = A: _bp _ bp output: -+ bp

A: _bs bs Sbs * = A: C_bs _bs output: -+ bs

A: _bt bt _bt * = A: _bt _bt output: -+ bt

A: abe be _be * = A: abe _be output: -+ be

The metacharacters appearing on the right sides indicate that the stack will

match the stack picture shown.

comment Before starting this proof, a few words may be in order.

It is clear that there will be considerable case analysis, since the lemma

involves seven categories each of which has two or three alternates, and a

case analysis is needed for each alternate. For economy of presentation,

certain abbreviations will be used. "IH" will indicate an appeal to the inductive hypothesis. The phrase "match at Q" (where "Q" is a label) will

be used to indicate that the next match is at the production labeled "Q".

We will frequently use strings with the same name as the category they are

assumed to belong to. For example, a string named ae carries with it the

assumption that it is in . Where two strings are needed in the same

category, one of them will be the reverse of the category name. Thus ea

is also in .

Proof: The proof will be by induction on string length. Let Pk be

the proposition of the lemma for strings of length k. We consider first

PI" The only string of length one in any of the categories is "I". The 97.

initial scan puts the "I" in the stack, and after a match at A we are done, since "I" is in each of the metacharacters on the right side in the state- ment of the lemma. We now assume PI' P2' "'" ' Pn-I and prove Pn, where n > i. We will consider each of the seven categories of the lemma in turn, treating each alternate as a subcase.

Case i: We consider . We ignore the second alternate, since it contains no strings longer than one. In the first alternate, assume the

string is "(" ae ")". After the scan match at A+2 to scan first(a___e).By

IH

A: _ap ( ) I _ap output: -+ ae

If is "AE" match at A+II and done, else match at A+23 and done.

Case 2: The category .

Case 2.1: String is in . If it is also in ,

Lemma 5.15 gives us the result. Otherwise, by IH

A: _at AP _at I output: -+ ap

Match at A+4 and done.

Case 2.2: The string is at "*" ap. length(at) < n, so by IH

A: _at * I ap Bat output: -+ at

Match at A+5 to scan first(ap), and then by IH

A: _at * Bat I output: -+ ap 98.

Match at A+3 and done.

Case 3: The category .

Case 3.1: The string is at. If it is in , we are done by

Lemma 5.15. Otherwise, by IH

A: C_ AT He ae 'I output: -+ at

Match at A+8 and done.

Case 3.2: The string is "+" at. After scanning the "+" match at

A+9, to scan first(at). Then by IH

A: _ae + _ae 1 output: -+ at

Match at A+7 and done.

Case 3.3: The string is ae "+" at. By IH

A: _ ae + I at _ae output : -+ ae

Match at A+9, and again by IH

A: C_ae + _ae I output: -+ at

Match at A+6 and done.

Case 4: The category .

Case 4.1: The string is ae "_" ea. By IH

A: _bp # I ea _bp output: _ ae 99.

Match at A+I2, and again by IH

A: _bp _ _bp I output: -+ ea

Match at A+I0 and done.

Case 4.2: The string is "(" be ")". After scanning the "(" match

at A.2, and then by IH

A: _bp ( ) I _bp output: -+ be

Match at A+22 if is "BE", else match at A+23, and done.

Case 4.3: The third alternate has no strings longer than one.

Case 5: The category .

Case 5.1: The string is bp. If it is in , we are done by

Lemma 5.15. Otherwise, by IH

_bs BP _bs I output: -+ bp

Match at A+I4 and done.

Case 5.2: The string is '_" bp. Match at A+I5 to scan first(bp),

and then by IH

_bp -l _bp I output: -+ bp

Match at A+I3 and done. 100.

Case 6: The category .

Case 6.1: The string is bs. If it is in , Lemma 5.15

gives us our result; otherwise by IH

abt BS _bt I output: -+ bs

Match at A+I7 and done.

Case 6.2: The string is bt "A" bs. We scan bt by IH and

A: abt A I bs _bt output: -+ bt

Match at A+I8, and then again by IH

A: abt A _bt I output: -+ bs

Match at A+I6, and done.

Case 7: The category .

Case 7.1: The string is bt. If it is in , Lemma 5.20

gives our result; otherwise, by IH

A: abe BT _be I output: -+ bt

Match at A+20 and done.

Case 7.2: String is be '_/" bt. We scan be by IH to get

A: abe V I bt _be output: -+ be

Match at A.21, and then again by IH 101 .

A: C_be V _be I output: -+ bt

Match at A+I9 and done.

This completes the proof of the lemma. Each production from A to

A+23, except A+I, was mentioned explicitly in the course of the proof. Con-

sidering the nature of the statement of the lemma, this is not surprising.

Lemma 5.17: If rs is any string in , then

A: +- 1 rs ; * _ B: +- ; 1 output: -+ rs

Proof: We proceed by induction, letting Pk be the statement of the

lemma with rs of length k. For PI' we note that the only is "I". We

get a match at A and are done. Now we assume, for some n > I, that

PI' P2' "'" ' Pn-I are true and prove Pn . We consider in turn the three alternate s.

Case I: rs is a string of length n in the first alternate. It

can then be written as "I" ,,+-,sr, where sr is a string of length n-2 in

. We get a match on the "I" at A, outputting it, and scan the "ed'. The

next match is at A.I, scanning the first character of sr. Note that there

are now two "ed's in the stack: the one we started with in the statement of

the lemma and the one we scanned in rs. Now sr is a string of length less

than n in , so by the inductive hypothesis we get

B: +- I* +- ; I output: -+ sr

We get a match at B, leaving the stack and output as required. 102.

Case 2: The string is in . Then by Lemma 5.16 we get to

A: +- ; I output: -+ rs

A match is not possible at any production from A to A+23, inclusive, so we

look for a match at B. This completes the proof, since we are considering

the proper production with the stack and output as required.

Case 3: The string is in . The argument of Case 2 applies.

Theorem 5.5" Any string in the category will be translated

correctly by the B-productions, the productions running to a successful com-

pletion. That is, for any string as in , we have

A: _ I as * = H: > 1 output: -+ as

Proof: as can be written as "I" "ed' rs "'" where rs is in . The

initial scan then leaves "I" in the stack. We get a match at A and scan

the "_', getting a match at A+I. We then have

A: b I* +-I rs ;

By Lemma 5.17 we will reach

B: b I* <- RS ; I output: -+ rs

We get a match at B+I, completing the proof. 103.

Section 3: The "only if" proof. We must now prove that the B-prod-

uctions will run to a successful conclusion only if the input strings

is legal. As we did with the A-productions, we will be complete and show

also that the error exit will occur if and only if the input is invalid,

thus taking care of all possibilities.

In Chapter 4 we used the notation to represent those input

strings which left "E" in the stack, and then showed that all strings in

are also in . Since the mark "" has already been used (as a

metacharacter), we will have to use a different notation. Thus we will

let be the set of strings, the scanning of which will leave "AE" in

the stack. The other categories will be treated similarly.

The reader should not be surprised to see that the following

proof is more complex than that of Chapter 4, since there are more cases

to consider. The style will be more terse than in Chapter 4, particularly

in those proofs where the case analysis becomes repititious. Further, for

some lemmas the "proof" part merely gives a reference to the corresponding

proof for the A-production.

Definition 5.1: We define to be the set of all strings such that

A: C_ap I ap Bap * _ A: _ap AP Bap I

Here --apc_ and _ ap are as defined in Lemma 5.16 We define similarly the sets

, , , , and , using the c_'s and $'s from

Lemma 5.16.

It should be noted that strings in will not be in any of 104.

the sets just defined, since it is "AP" and not "" that was used on the right side, a similar comment applying to the other six categories.

Lemma 5.18: If the stack is

where __ and _ are any two characters in V t and "X" is as in the table just below, then the last match before reaching this state was at one of the productions listed to the right•

X previous possible match

AP A+II

AT A+3, A+4 AE A+6, A+7, A+8 BP A+I0, A+22

BS A+I3 , A+I4 BT A+I 6, A+I 7 BE A+I9, A+20

Proof: This lemma does for the B-productions what Lemmas 4.2, 4.3 and

4.4 did for the A-productions. A case analysis similar to that given for

Lemma 4.2 rules out any productions other than those specified•

As in the A-productions, we define the term operand symbol to

stand for any of "I*" , "AP" , "AT" , "AE" , "BP" _ "BS", "BT" or "BE" •

Lemma 5 • 19: For every instance of "_' , "*" , "+" , "_" _ '_", '_" , '_" ,

"(" or ")" in the stack, an instance of that same character has been scanned earlier; for every operand symbol in the stack an instance of "I" has been scanned earlier. 1 05.

Proof: For the first part, see Lemma 4.1; for the second, see

Lemma 4.5.

Lemma 5.20: Let --w be any string over V p and --x any string over V t. Then starting at

A: w I x ; the action HALT will eventually be executed.

Proof: Clearly, if we are ever looking for a match at C, we will eventually scan the ";" and HALT Further, we cannot scan past the "'" without passing A+23, since all lines with a "*" in the "star" part of the action part cannot get a match if the top of the stack is "'",. Further, we cannot cycle indefinitely between A and A+23 without scanning, because the lines without a scan will gradually change the second element until any possible match will result in a scan. Having passed A+23, we can continue to match at B only a finite number of times, since each match uses up two stack elements. Once B is passed, we either match at B+I and have our result or get to C, which we have also seen to give the desired result.

Lemma 5.21: In the B-productions, the following hold:

(a) If a ,,+2,is scanned, it will remain in the stack as long as scanning never gets beyond A+23.

(b) If an operator is scanned and later a ,,+2,is scanned, the operator will still be in the stack when scanning first passes A+23, or there will be an operand symbol in the stack other than "I*". 106.

Proof: (a) is clearly true, since none of the productions from A to

A+23 can delete an instance of ,,+-,from the stack. For (b), we note that once ,,+-,has been scanned, the only possible matches above A.23 will leave the second element of the stack as an operand symbol other than "I*".

Lemma 5.22: If the character c_ is to the left of the character _ in the stack, then c_ was scanned before _. If c_ or B is an operand symbol, the lemma holds for any "I" which became part of it.

Proof: See Lemma 4.9.

The following lemma embodies the principle result of this section:

If "AP" is in the stack then a string in was scanned to put it there.

Thus we will be able to conclude that the set is contained in .

Of course, similar results will hold for the other six categories.

Lemma 5.23: Any string in is also in , ... , and any string in is also in .

Proof: Let Pk be the proposition of the lemma for strings of length k.

PI is clearly true, for there are no such strings, since an operator and one or two "l"s must be scanned to get any of "AP", ... , or "BE" into the stack. We now assume PI' P2' "'" ' Pn-l' for some n > i, and prove Pn" The case analysis will be abbreviated with notation similar to that used in the proof of Lemma 5.16.

Case i: Let ap be a string of length n in . By Lemma 5.18 the last match was at A+II. Thus ap must be "(" ae ")", where ae is in .

By IH ae is in and so ap is in . 107.

Case 2: Assume at in A-T. By Lemma 5.18, last match was at A+3 or

A+4.

Case 2.1: Assume A+3. Then at is ta "*" ap, with ta in and ap in . By IH they are in and , respectively, and so at is in .

Case 2.2: Assume A+4. Then at is in , by Case i in , and thus in .

Case 3: Assume ae in . By Lemma 5.18 the last match was at A+6,

A+7 or A+8.

Case 3.1: A+6. Thus ae is ea "+" at. ea and at are in and

respectively, so by IH are in and . Thus ae is in .

Case 3.2: A+7. ae is "+" at, with at in . By IH at is in

, so ae is in .

Case 3.3: A+8. ae is in and by Case 2 in , and thus

also in .

Case 4: Assume bp in . By Lemma 5.18 the last match was at A+I0

or A+22.

Case 4.1: A+I0. Thus bp is ae "_" ea, with ae and ea in .

By IH they are in , so bp is in .

Case 4.2: A+22. bp is "(" be ")", with be in . By IH be is

in , and so bp is in . 108.

Case 5: Assume bs in . By Lemma 5.18 the last match was A+I3 or

A+I4.

Case 5.1: A+I3. bs is '_" bp, with bp in . By IH bp is

also in , so bs is in .

Case 5.2: A+I4. b s is in , and by Case 4 in , so bs is

in .

Case 6: Assume bt in . By Lemma 5.18 the last match was at A+I6

or at A+I7.

Case 6.1: A+I6. Then bt is tb "A" bs, with tb in and bs

in . By IH they are in and , respectively, so bt is in .

Case 6.2: A+I7. bt is in and by Case 5 in _s>, so bt is

in .

Case 7: Assume be in . Then by Lemma 5.18 the last match was

A+I9 or A+20.

Case 7.1: A+I9. be is eb '_/" bt, with eb in and bt in .

Thus they are in and , respectively, and be is in .

Case 7.2: A+20. be is in , and thus by Case 6 in , so be is in .

Theorem 5.6" If ae is a string such that

A: +- I ae ; * = A: +- AE ; 1 109.

then ae is in .

Proof: From Definition 5.1 it is clear that ae is in . From

Lemma 5.23 it follows then that it is in .

Theorem 5.7: If be is a string such that

A: _ I be ; * _ A: _ BE ; 1 then be is in .

Proof: Same as Theorem 5.6.

We now find it expedient to introduce three new categories:

-:= I 1 _ I

::= I

::= dis> _ ;

(The names are mnemonic for left side, right side and assignment statement, respectively.) We will show that any string in is also in . (The sets are actually equal, but we will not show that fact since we do not need it.) The purpose in introducing is the following: We must show that any string for which the productions run to a successful conclusion is in .

However, because of the way the productions are written it is easier to show that such strings are in . Our desired result then follows from the fact that strings in are also in .

There is a good reason for not defining the grammar originally in terms of instead of . The way the translation rules are set up, the 110.

left-most "_' in an assignment statement is to be translated differently from the other left arrows -- it is to become "+_J' instead of "_' in the output string. Thus we need a syntax that permits distinguishing this "_' from the others. This happens in the definition of but not in .

Lemma 5.24: Any string in is also in .

Proof: This is obvious, since any string in is also in or

, from the definition of , and such strings are also in .

Lemma 5.25: Any string in is also in .

Proof: Let _ be a string in . From the definition of , we can write x as

b <- c ; where b is in and c is in . From Lemma 5.24 we know that c is in

. Now let Pk be the proposition: If b is a string of length k in and c is a string in , then the string b ,,+7,c ";" is in . Clearly, if we can prove Pk for all k we will have the desired result.

We consider first PI" If length(b) = i, _b must be "I". Any string of the form "I" "+J' ";" is in , so P1 is true. For some n > 1 we now assume that PI' P2' "'" ' Pn-i are true and prove Pn . -b must be in the second alternate of , so we can write b as d "+-" "I" where d is in .

We can then write x as

d +- I +- c ; 111 .

The substring "I" "+_' c is in , and length(d) < n, so by the inductive hypothesis we conclude that x is in .

Len_na 5.26: If rs is a string not containing the character ,,+7,such that

A: +- I rs ; * = A: +- ; then the string rs is in .

Proof: Let Pk be the statement of the theorem for strings r__sof! length k. P1 is true, since r s must then be "I" to get the desired stack picture.

For some n > 1 we now assume PI' P2' "'" ' Pn-I and prove Pn . The statement of the theorem assumes that when we get to A a member of is the second character of the stack. This member must be "AE", "BE" or "I*".

Case i: The second element is "AE". Then by Theorem 5.6 rs must be in , and it is therefore in .

Case 2: The second element is "BE". Then by Theorem 5.7 rs must be in , and it is therefore in .

Case 3: The second element is "I". But we know "I" is in , and the fact that it is ambiguous in is of no consequence here.

Lemma 5.27: If -x is a string over V t such that

A: _ 1 x ; * = B+I: k I* +-RS ; I then x can be written as y "_' z, where y is in and z is in . .... m 112•

Proof: Since there is a "+3' in the stack at B+I, x must contain a "+J', by Lemma 5.19. Thus we can write x as y "+J' z. We choose to divide x at the right-most "+-" (if there is more than one), so that we can be sure that there is no "+J' in z The only characters that can be in y are "I" and "+J' since if there were others they would still be in the stack at B+I (by

Lemma 5.21-b). From Lemma 5.26 we can conclude that z is in , so we must now show that y is in . The first time we reach B the stack is

where w is a string over V . Clearly w is the result of having scanned y.

Either w is "I" or, to get eventually to the stack configuration desired, there must be matches at B. The first such match will delete "+J' "I*" from the stack. We can conclude then from Lemma 5.22 that the last two characters of y are ,,+7,"I". Each match at B thus corresponds to the existence of ,,+7,

"I" at the right end of y • After the last match at B, w is just "I*" _ SO the first character of y is "I". Thus we see that y is "I" followed by some number of occurrences of "+-" "I". But this is precisely the definition of

.

Theorem 5.8: If as is a string such that

A: I" I as ; *_ H: I" I where the last character of the output is not "ERR*", then the string as

"'" is in

Proof: Since the last character output is not "ERR*" the last match was not at C, so it must have been at B+I. At that time the stack was 113•

B+I: _ I* +- ; I

By the previous lemma, as can be written as is "eT' r, where is is in and r is in . as ";" is therefore in <_a>, by the definition of , and it is therefore in by Lemma 5.25.

This theorem completes the essential parts of the desired proof.

As in Chapter 4, we summarize the final results in two theorems:

Theorem 5 9: Given as input any string x whose last character is "'" the B-productions will run to a successful conclusion, having produced the correct translation, if, and only if, x is a sentence in the A-language.

Proof: Assume x is in . Then by Theorem 5.5 the productions will run to a successful conclusion and produce the correct output, proving the

"if" part. Now assume that the productions run to completion. From

Theorem 5.8 we conclude that as is in , and we know then from Theorem

5.5 that the correct output was produced. This completes the "only if" part of the proof.

Theorem 5.10: Given as input any string x whose last character is

"'",, the B-productions will terminate with an error if, and only if, x is not a sentence in the B-language.

Proof: We note that the productions will surely terminate, since the last character is "'",, by Lemma 5.20. The result then follows from the preceding theorem. PRODUCTIONS FOR THE B-GRAMMAR. A I -_ 1. OUT ,a +1 *- _A +2 ( _A +3 _ AT A +5 * :eA #6 _ AE OUTPUT+ a +7 + -_ aE A +P AT "* aE _ G=, a +o + WcA +10 "_E=,:_ -* BP OUTPUT _ a +11 ( AE ) -. AP *A +12 .¢- ,A +1_ --, OUTPUT, A +14 _P -* BS a +15 -, ,A +16 .a3T> ,- <9S> _ BT OUTPUT"- A +17 BS "* BT "_G> A +IR ,, ,A +1o _E> v -* BE +? 1 v tA +22 ( BE ) --, BP ,_A +#3 ( I* ) -' I* .a

THE RIGHT SIDE HAS BEEN TRANSLATED. NOWTERMINATE. B *- I* '-- : I -*"

IF WE sea's, TwIS FAR, THE INPUT STRING IS INVALID. C : I _ I OUTPUT ERRw: H +1 ! "* ! "_C

AE I* P> BP I* 4S> BS I* _T=, RT I_

114. CHAPTER 6 - SOME EQUIVALENT SETS OF PRODUCTIONS

In this chapter, a set of productions using the COMPILE action is shown to be equivalent to the A-productions.

In both the A-productions and the B-productions there are three or four productions for each operator in the language. It is clear that for a language like ALGOL, where there are many more operators, the number of productions would become quite large. It is thus worthwhile to investi- gate techniques to reduce the number of productions needed. As was suggested briefly in Chapter 2, assigning hierarchy values to the operators and taking advantage of the COMPILE operation in the productions is one possible technique.

There is another reason for developing techniques to show the equivalence of a set of productions using COMPILE with a set of productions not using that action. The productions used in the Carnegie Tech ALGOL translator are written using COMPILE. If a proof is ever to be constructed for these productions, such techniques will be needed.

We will again consider the A-productions. In the development that follows we will transform these productions, step by step, to a set of productions which uses the COMPILE action. As each new set of productions is introduced, we will show briefly that it is equivalent to the old set. Here

"equivalent" means that both productions accept the same set of strings, both reject the same set, and both produce the same output for legal input.

We will not actually prove these assertions but will, instead, give

"convincing arguments" for them. The reader may find himself more easily

I15. 116.

convinced if he studies the sample runs from these productions as given in the appendix.

For the first transformation, we ask why we need the distinct symbols "P", "T" and "E". Suppose we were to change the productions so that each "P" and each "T" were replaced by "E". We would then delete C+3 and C+6 as being useless and we would have the productions shown on the next page, which we will call the Al-productions. Interestingly enough, these productions can be shown to be equivalent to the A-productions.

Further, the Al-productions are faster, since they do not need to change each "P" to "T" and then to "E".

We now consider the premise that the A-productions and the

Al-productions are equivalent. In Chapter 4, in the "only if" proof, equations resembling the original BNF syntax were deduced from the produc- tions. For the Al-productions, it should be clear that the corresponding equations are the following, which we will refer to as the Al-grammar:

::= I _ ;

":= + 1 * I ( ) I I

It is, of course, clear that this grammar is ambiguous, since strings such as "I + I + I" are in in more than one way. On the other hand, the set of strings in this grammar is precisely the same set of strings that is in the A-grammar. Although an inductive proof of this assertion could easily be constructed, we will content ourselves with the following remark:

In both A-language and Al-language the strings in are of the form operand-operator-operand-...-operand, where an operand is either "I" or a THE A1-PRODUCTIONS AI_L ,pt AND 'T' REPLACED BY 'E' A I I I OuT ,B +1 (SG> I I D B " I I *C .1 I I D

MAIN LOOP. C I I _ E OUT *C • 1 ( I *C • 2 E * E I _ E I _ E OUTPUT . C +5 + I *C +6 ( E ) I _ E *C

NOW CLEAN UP FOR END OF STATEMENT +7 !* 1 _ E ; I _ I* I OUTPUT OUTPUT I H HALT

IF SCANNING GETS THIS FAR, THE STRING IS INVALID, D ; I I OUTPUT ER_* H .! I _ I *D

117. 118.

member of in parentheses and an operator is "+" or "*". The reader should satisfy himself of the validity of this characterization of strings of since it will play a fundamental part in much of the following; discussion. Inherent in the operand-operator alternation is the follow- ing: "I" and ")" are always followed by an operator or a ")", and an operator or "(" is always followed by an operand.

Let us now look again at the Al-productions. In the light of the foregoing discussion it should be clear that these productions accept the same set of strings accepted by the A-productions, since the character- ization just given applies to both sets. What we must also show is t.hat both sets of productions will produce the same output for legal input. From an examination of the productions it appears that any possible difference in translation would arise from the Al-productions outputting an operator earlier than the A-productions would. This would happen because the

A-productions would fail to get a match because the wrong operand symbol was in the stack. We show this cannot happen.

When "I" is scanned, it immediately becomes "P". We ignore the possibility that the next character is "(", since then the input string is not legal, and observe that before another character can be scanned, the "P" will become a "T" from a match at C+2 or C+3. Thus, there can never be a

"P" deeper than the second stack element. Next we observe that a "T" can get deeper than the second element only if it is followed by "*". Further, an "E" will never be just below "*" in the stack. Thus, if we could get a match at C+2 in the Al-productions, it would also be possible to get a match at C+2 in the A-productions. Similarly, we can argue that "E" deep 119.

in the stack will always have "+" immediately above it, so that matches at

C+5 in the A-productions will correspond to matches at C+4 in the

AI -pro duct ions.

We now change the productions again, calling the new version the

A2-productions. This time we take advantage of the operand-operator nature of the language. Consider the Al-productions. If there has just been a match at C or at C+6, the next character must be an operator. Thus we can never get a match at C or at C+I (for a legal input string), so changing the next part of C and C+6 to C+2 would not affect the operation of the productions or the set of strings accepted. Carrying on this line of reasoning leads to the idea of having two different places in the productions to go to: one to go to when an operand is expected and one to go to when an operator is expected. The result is the A2-productions, shown on the next page. We go to C expecting an operand and to CA expecting an operator.

The reader should satisfy himself that the links in the next part correspond to the operand-operator alternation discussed earlier. Thus it is clear that these productions are equivalent to the preceding ones.

We now make one more change. In the Al-productions, the "E"s in the stack serve to insure that operators and operands alternate properly.

In the A2-productions, this function is met by scanning at C or at CA. Thus there is no need to keep the "E"s in the stack. Further, to be consistent, we also eliminate the "I" scanned at A. The result is the A3-productions.

Again, it can be shown but should be clear already to the reader that the

A2-productions and the A3-productions are equivalent. THE A2-PRODUCTIONS ALL tp, AND 'T t REPLACED BY rE' OPERATOR AND OPERAND SCAN SEPARATED A I I I OuT _B +1 I I D B " I I 4C +1 I I D

MAIN LOOP "" OPERAND EXPECTED C I I " E I OUT 4CA +1 ( I I _C +2 I I D

MAIN LOOP -" OPERATOR EXPECTED CA E * _ I _ E I OUTPUT a CA +1 * I I 4C +_ E * E ! w E I OUTPUT • CA • _ + I I *C +4 ( E ) I _ E I oCA

NOW CLEAN UP FO_ END OF STATEMENT +5 I- i * E ,I I - I- ! OUTPUT * OUTPUT J H HALT

IF SCANNING GETS THIS FAR, THE STRING IR INVALID, O ; I I OUTPUT ERR* H +1 I _ I _D

120. THE A3-PRODUCTIONS ,pt, 'T' AND 'E' NOT USED OPERATOR AND OPERAND SCAN SEPARATSD A I I _ I OUT *B +1 I I D 8 * I ! _C +1 I I D

MAIN LOOP -- OPERAND EXPECTED C I I _ I OUT 4CA +1. ( I I _C +2 I I D

MAIN LOOP -- OPERATOR EXPECTED CA * I _ I OUTPUT • CA +I * I I ,C +2 + I _ I OUTPUT * CA +3 + I I _C +4 ( ) I * I *CA

NOW CLEAN UP FOR END OF STATEMENT +5 I_ * I ! _ I_ I OUTPUT * OUTPUT I H HALT

IF SCANNING GETS THIS FAR, THE STRING I£ INVALID, D ; I I OUTPUT ERR. H .1 I _ t _D

121. 122.

Before going on to the last step the reader may find it useful to reread the definition of the COMPILE action in Chapter 2, Definition 2.8.

The affect of the four productions from CA to CA+3 can be had from one production using COMPILE. A little extra effort is needed then to take proper care of parentheses and ";" and the result is the A4-productions.

For both the A3- and A4-productions note what gets done on arrival at CA before going back to C: If the second element of the stack is "*", it is outputted and deleted; while if the second element is "+", it is outputted and deleted only if the first element is not "*". Parentheses, as in all of these productions, are taken care of by being sure that what was between them is legal and then simultaneously deleting the left and right parentheses from the stack. The reader should satisfy himself that these two sets of productions are equivalent.

The original purpose of undertaking this series of transformations was to reduce the number of productions needed. However, we seem to have made no improvement, having started and ended with sixteen productions. On the other hand, what we really wanted was a technique for economy of produc- tions in more sophisticated languages. Suppose we wish to provide also for the operators "-", both unary and binary, and "/"/ and "_" We would have to add ten productions to the A-productions, three for each of "/" and "t" and four for "-" (the extra taking care of unary minus). In the A4-productions we would only need one production, to take care of the unary minus. The three binary operations are taken care of merely by assigning suitable hier- archy values and including the operators in the definition of . Here clearly there is an economy. THE A4-PRODUCTIONS ,pt, 'T' AND 'E' NOT iJSED OPERATOR AND OPERAND £CAN SEPARAT=D COMPILE ACTION IJSE[) A I I _ I OUT ,B +1 I I D

MAIN LOOP -- OPERAND EXPECTED C I I * I OuT ,CA +1 ( I I ,C +2 I I D

MAIN LOOP -- OPERATOR E_PECTED CA I COMPILE .C +I } I COMPILE C8 +2 ; I COMPILE CC +3 I D CB ( ) * i *CA +1 I D

NOW CLEAN UP FOP END OF STATEMENT CC I 4 ; I * 14 I OUTPUT ; H HALT

IF SCANNING G_TS THIS F_R, TH_ STRING I£ INVALID. D ; i I OUTPUT ERR* H +I I _ I *D

123. 124.

A few comments are in order about the B-productions. If we were to try naively to go from B- to B4-productions in one step, thinking that the technique has already been proved, we would be in trouble. Because

B-language includes both arithmetic and Boolean operators, more than just a hierarchy compare is needed. Otherwise, we would accept as legal such constructions as

I _- I + I A I ;

The comment was made in the introduction that one of the reasons for not treating in this work the productions used in the Carnegie Tech ALGOl. Trans- lator was that these productions are incorrect. The most important problem is in precisely this area: Statements such as the above are accepted. We will not pursue this point further here other than to make two comments:

At the time the translator was written, this deficiency was not noticed; and since then good use has been made of this type of construction. We will return to this discussion in Chapter 7. CHAPTER 7 - SUMMARY AND CONCLUSION

In this chapter we summarize the work done, relate it to other work published and make suggestions for future work.

Section I" Other proofs published

Section 2: BNF Grammars

Section 3: Productions

Section 4: Future lines of research -- grammars

Section 5: Future lines of research -- productions

125. 126.

Section I: Other proofs published. In the introduction, mention was made that there has been very little published material on proofs of computer algorithms. The key word in the preceding sentence is "computer", since there is an extensive literature on the formal properties of algor- ithms. For example, Markov (1954) has published an entire book, "Theory of

Algorithms", in which he develops a notation for algorithms, proves many theorems about algorithms expressed in his notation, and exhibits many algorithms with proofs of their correctness. (It is a somewhat interesting fact that the present production notation bears a noticeable resemblance to the notation of a Markov algorithm.) However, Markov's work is directed towards the study of unsolvability problems other than to problems of computer programming.

McCarthy (1962, 1963, 1964) has been engaged in the development of what he calls a "theory of computation", in which it is both convenient and natural to state algorithms in a computer language which can then be executed usefully on a computer. The programming language LISP (McCarthy, et al, 1962) is closely related to the mathematical notation of S-expressions, which are equivalent in computational power to Turing machines.

Cooper (1965), building on McCarthy's theory, has exhibited several proofs. He gives three algorithms, written in ALGOL, for the factorial func- tion, and shows that they are equivalent. Then, using the techniques he has developed, he shows the equivalence of two algorithms for reversing the order of symbols on a list and of two algorithms for approximating a definite integral. Cooper uses McCarthyWs recursion induction technique which is suitable only for algorithms expressed by certain kinds of recursive defini- 127.

tions. For algorithms which are not so expressed, it is necessary first to restate the algorithm in the required manner.

Perlis (1963) has proven three numeric algorithms: computation of the square root, finding a zero of a polynomial by bisection, and matrix inversion. The algorithms are exhibited as flow charts. Certain predicates are stated about the problem variables, and it is shown that these predicates remain true as the flow chart is traversed. It is shown further that the process terminates, providing certain initial conditions of the problem variables are met. These techniques have helped to motivate the present work.

In the area of translation, Oettinger (1961) has proved the correctness of three formal translation schemes. The languages he uses are simpler than the A-language of the present work, but his proofs are quite different. Although the algorithms he uses are similar in operation to those used in this work (they use a single stack for storage), the notation he uses to express his algorithms is much different, being oriented more towards mathematics, which makes it easier to state and prove theorems but harder to express algorithms. It would be quite awkward to express an algorithm of the complexity of the B-productions in Oettinger's notation. Nonetheless, his work is an important contribution and his techniques helped to motivate the present work. Oettinger's work has been continued by some of his students (Oettinger, personal communication), but the results of this research are not now available to this author.

The work of London (1964) relates more closely to the present work.

London has a computer program which, given as input a BNF definition of a single category, first constructs a recognizer for the language and then tries 128.

to construct a proof that the recognizer is correct. The recognition techniques are essentially different from those given here, being based on permissable character pairs of the input string and on parentheses counts. A highly desirable result would be to incorporate recognition techniques based on productions into London's program.

Even more closely related to the present work is that of Earley

(1965). The Earley algorithm is given a BNF description of a language as input, and it outputs a set of productions for recognizing legal strings in the language. Earley has proved that his algorithm will produce produc- tions which correctly recognize their input, thus having a proof which, in some sense, is one level higher than the present proofs. The output of the

Earley algorithm is a production program which is less efficient (in that it requires more steps to recognize a string) than the productions programs here exhibited. Further, only recognition is done, no output being produced by the productions. Clearly a fruitful line of research will be to modify

Earley's algorithm in two ways: to produce more efficient output and to accept as input translation rules (perhaps of the type given here) so that the productions can output a translation. 129.

Section 2: BNF Grammars. Our notation for BNF Grammar and definition of unambiguity represent a distinct point of view. The difference between our approach and that usually used by mathematical linguists arises from the different purpose each has in studying grammars. While the linguist is interested in studying the structure of grammars because of their relation to natural languages or for their own sake, our interest in grammar arises from their relation to the problem of constructing correct translators.

While it may seem natural to a linguist to regard a grammar as a device or scheme for generating the strings of a language, it seemed more appropriate for this work to regard a grammar as the set of legal strings. The problem in translation is to determine if a given string is legal and then to give its translation. It seems more natural to regard that question as asking whether the string is in a set than to regard it as asking whether a scheme can generate the string.

One other aspect of our definition is worthy of comment. The BNF notation, whether it be called Backus Normal Form or Backus Naur Form, has achieved wide acceptance among those interested in the design of computer languages and in the construction of translators for such languages. Thus it seemed to be worth the extra complication involved to define BNF grammar using precisely the notation which has achieved popularity, instead of using notation, as linguists frequently do, in which each alternate is a separate definition and in which no alternate (or definition) has more than two elements in it. While it is true that such notation defines the same class of languages, we felt it more appropriate to retain the form programmers are used to. 130.

Our definition of ambiguity clearly has advantages if one is interested in proving grammars unambiguous. The important point is that, to a large extent, categories can be proven unambiguous merely by their form, without regard to the nature of the categories appearing in the definition. The concept of brackets and of balanced strings plays a funda- mental part, both in the unambiguity proof and in the proof of correctness of the productions. As indicated in Section 4 below, this concept can be expanded in obvious ways to handle procedure calls or subscripted variables, and in not-so-obvious ways to handle the ALGOL "if ... then ... else ... " construction. 131.

Section 3: Productions. As has been mentioned earlier, the produc- tion language was devised to answer a specific need: the writing of part of an ALGOL translator. Its characteristics arise from programming considerations -- linguistic or logical considerations had little part in its design. When we became interested in the problem of proving algorithms, it immediately became clear that production language has advantages not thought of in designing it. A big problem in proving algorithms -- indeed, possibly the biggest problem -- is to find a suitable notation for express- ing the algorithm. Usually, the more convenient a language is for people to use, the less convenient it turns out to be when the problem is to talk about (i.e., prove theorems about) programs in the language. Production language is natural to use for writing translators, and it is convenient, as we have seen, to state and prove theorems about programs written in produc- tion language. When this was realized, the present research naturally suggested itself.

An interesting question is to ask what it is about production language that gives it these desirable properties. For one thing, each line of code "does a lot" for the programmer, so that not very many lines of code are needed to express rather complex algorithms. As we have seen, the complexity of the proofs increases quite rapidly as the number of lines goes up. In a less powerful language where more lines of code are needed to express a given algorithm, the proof procedures would be even more tedious. 132.

Section 4: Further lines of research - grammars. An obvious area of research is to ask in what ways our results on unambiguity can be extended.

We seem to be very close to providing some useful sufficient conditions for a grammar to be unambiguous. It is known that the general problem of determining whether or not a grammar (of the type we have been considering) is ambiguous is unsolvable, for if it were solvable, then the word problem for semi-groups would be also. See, for example, Floyd (1962) or Cantor

(1962). On the other hand, sufficient conditions that are not too hard to test for and that are useful in practice should not be too hard to get. Lynch

(1963) has given sufficient conditions, but he does not claim that they are useful• The test case that comes immediately to mind is ALGOL-60. If the ambiguity arising from the fact that so many constructs may be is removed, there is good reason to believe that the ALGOL grammar is unambiguous.

It would certainly be interesting to try our techniques on ALGOL. It is clear that the class will cause problems, and it might well be necess- ary to rework the syntax so that it was not used. This should be possible, since the class seems to be used in no essential way but is introduced only to help economize the number of syntax equations. Of course, this conjecture must be tested.

To handle the "if ... " construction, the bracketing technique could be expanded. Consider a grammar with two sets of brackets:

(I and )I (2 and )2

Next, replace each "if"__by "(i"' each then by '')I" "(2" and each else by

") " • Then a string has its "if"s-- ' "then"s and "else"s matched up properly 133.

if it is balanced with respect to both sets of brackets, begin and end could be looked on as a bracket pair, as could for and do. Indeed, it appears that most of the reserved words in ALGOL (other than those used in declarations) could well be handled with the bracketing technique. 134.

Section 5: Future lines of research -- productions. An outstanding unanswered question in connection with productions is this: How can we characterize the set of languages which can be recognized by productions?

We know that there are BNF grammars which cannot be recognized by our productions, such as

::: o I I I o O I i I

This is the palindrome with unmarked center. The productions would have no way to tell when the center was reached. If the production notation were expanded to permit backward scanning, this problem becomes easy.

A definitely important problem is to attempt to prove the produc- tions used in the Carnegie Tech ALGOL Translator. Since, as has been mentioned, these productions do not recognize true ALGOL, the first part of the problem would be to define the BNF language which the productions are supposed to translate. Then translation rules would have to be developed, after which it might be possible to construct a proof. In this connection, it should be noted that the transition in Chapter 6 from the A-productions to the A4-productions could have been done in the reverse order. The easiest thing to do with the ALGOL productions might be to transform them into productions without COMPILE. The interested reader might refer to Evans

(1963) where a complete set of productions for ALGOL is given.

One technique which should be considered is to ask the computer to help with the proof. The human being would still do the intellectual work involved in constructing the proof, but the computer might be used for the detailed case analysis. Anyone with very many theorems to prove which were 135.

similar to or harder than Lemma 5.16 or Lemma 5.23 might well find it worth- while to ask the computer to carry out the details of the case analysis.

The work of London (1964) is related. Appendix

One of the important aspects of the research reported here is that the algorithms which have been discussed are written in a language that is acceptable to the computer. As was suggested in Section 2.1, production language was created for the express purpose of writing for the computer that algorithm which is the first part of the existing ALGOL translator at the Carnegie Tech Computation Center. A program called

QWERT (mnemonic for nothing whatsoever) has been written to convert programs written in Floyd-Evans Production Language from the form which the user writes to the form suitable to the ALGOL translator. QWERT has the addi- tional ability of simulating the operation of that part of ALGOL which converts the user's code to postfix. This part of QWERT was written to facilitate the process of debugging production programs, and so it includes trace abilities.

The following pages contain some sample outputs from QWERT. The first two pages give a complete listing of a run to load the tables for the

B-grammar. First the productions are loaded. Most of the production table will look familiar, but there are four columns of numbers which are new.

The numbers just to the right of the next part are sequence numbers punched on the input cards. The next column counts the position in the production table. From each line to the next, the column increases by one more than the number of elements in the left stack picture. The next column contains the location in an auxiliary table, called the interpretation list, where the interpretive code is stored which will cause the required actions to be done.

The interpretation list is shown just below the middle of the page. Note, for example, line A+3 in the productions. The link to the interpretation

136. 137.

list is to 4105. Starting at 4105 we have the instructions

NSTK 4 unstack the top four stack elements

STAK AT stack the character "AT"

STAK 0 stack the element that was at the top of the stack before the transformation

OUTP * output the character "*"

NEXT A consider next production A

(In the second STAK command, the zero does not print, because of an anomaly in the print routines.) The reader can check that the interpretation list contains the proper code to affect the actions desired.

The table at the bottom of the page is a symbol table of the label identifiers used. It contains both decimal and octal equivalents. Label identifiers used as A-labels have equivalents greater than 4095.

On the next page we see Table 2 and Table 3. Table 2 contains the characters used in the productions. As we will see in the A4-productions, it can also contain the hierarchies. The cards the user supplied are labeled

1 to 23 on the left -- the rest of the lines are produced by QWERT.

Below the middle of the page we have Table 3, the metacharacter table. (QWERT uses the term "meta-variable" instead of "metacharacter".)

The cards as the user prepared them are first listed, and then the table as

QWERT sees it is listed.

At the bottom of the page and on the next three pages we have traces of the translation of four statements. The first is the very simple 138.

"A _ + B ;". (The line just above the statement tells QWERT to enter a debug mode and to print the trace information.) Each line of the trace gives a stack picture followed by certain information. The two columns to the right of the stack picture indicate on what production the last match was made. The next column, when present, indicates a transfer to the production named. The next column indicates the action just executed, and the next two columns are characters scanned and output, respectively.

A line is printed each time SCAN, OUTP or NEXT is executed. The reader should have no trouble following the action of the productions.

Next we have a complete run of loading the A4-productions. These productions were selected to show how hierarchy values are input to the system in Table 2. Three sample inputs are then shown, one of which is not in the language.

The next two pages give the two legal statements as processed by the A-productions, and the last three pages give these two statements as processed by the AI-, A2- and A3-productions, so the reader may compare the operation of the five sets of productions.

A further description of productions is given in Evans (1963), and the interested reader is referred to that paper. The B-Productions Table I

T_r+LF. 1 - P_nDqH..;1"ION_ _'i!l_,ZlO 29 JUL 6_ PA[JE 001

PRO r]IJ_TIF_JS rO_ T_t: _-(,_,_l_+. 0002. A [ * )l. ()kit IA _ U 40qA _ 0003 +l *A 4 2 4_n't 0004 • ? ( m.k _1 4 41f13 0005 e_ (AT'> * (A_-:> <.%p;> -o A r ('c; -> oi_TPL)T • A 6 (5 4'!.0_ 0006 • 4 ._ - _I < ,_G _, A 7 ] 1 411N 0007 +5 * _'A _q 14 41'[4 O00B +6 (AF> * <_C'> ., A_ <_G) OUTPJT + A 0 16 41'16 0009 .7 * <"T> <_r;> "E <_b'> A In _1 41:_1 0010 +R al W (_-> <%I;'> _ _P + HS oIITPUT _ k 1& 41 414_ 0016 • 14 _1[" < _;r;> .,, At-J A 17 45 4147 0017 +I_ ,mA 1R 48 4191 001B +16 <_T> _ <_S> ++ HI <_G'> 0t]TPuT - A 19 50 41_3 0019 • 17 _S <_G) +" HT _SG> A _0 _5 411J8 0020 .IR _ *A 2'J 5B 4162 0021 .1_ -, HE <_G> oI_TPIJT _ k 2_ _0 4164 0022 ._0 mT <_f;> -, HE A 23 65 4169 0023 +21 _ *k 24 _,8 4173 0024 +2P ( BE- ) + _P ,k 2_ 70 4'..75 0025 .2:_ { I* i -* I* ik 2_ 74 4179 0026

TIlE RI(_klT SIDE .AS _EEN TPA_dSLATED. NO_ TEPMINATE. 0027. R * I e • (H5> I 1 " *" (_S> ; I oHTPUT * B 2R 78 411_3 _ 0028 +1 I" I* * <_S> I t + I+ I otlTPUT *_ _P9 84 41RR 0029 ot_TPUT I 3B 4190 0030 H HALT 3_ 4191 H 00_]1

IF WE SCA_I THIS FAR, THE INPUT STRING I_ INVALID. 0032. l I -, I ouTPUT FARe H 33 90 4192 _ 00_3 +l (SG> I l •C 34 92 419_ 0034

INTERPRFTATION LI_T 1ti 29 dU_ 6_ PAGE 002

6 7 8 9 0 1 2 3 4 4096 NSTK 1 STAK I* OUT SCAN NEXT A SCAN _EXT A SCAN NEXT A NSTK 4 4105 4106 STAK AT STAK (}LJTP • NEXT A NSTK 2 STA_ AT _TAK NExT A ScAN NExT A 4115 4116 NSTK 4 STAK AE _TAK O!JTp . N_XT A N_TK 3 _TAK AP STAK NEXT A NSTK 2 4125 4176 STAK AE STAK NEXT A SCAN NEXT A NsTx 4 _TAK BP STAK O|JTP # NEXT A 4135 4136 NSTK 3 STA K AP SCAN NEXT l SCAN N_XT A NSTK 3 STAK RS STAK OUTP - 4145 7 _ 9 0 I _ 3 4 4146 NEXT A NSTK P _TAK BS STAK NEXT A SCAN _EXT k NSTK 4 STAK BT STAK 4155 41_6 OHTP ^ NExT A NSTK _ 5TAK AT STAK NEXT A _CAN NExT A NsTK 4 STAK RE 416_ 41A6 _TAK OITP - _JEWT A NSTw 2 STAK R¢ STAW NEXT A SCAN NEXT A NSTK 3 4175 4176 _TAK RP SCAN _EXT A NSIK 3 STAK Ie SCAN NEXT A NSTK 4 RTAK 1 S'rAK J 41B_ 41_6 _UTP _ NFxT B _JSTK 4 OUTP _ OUTP I N_LT NSTK 1 OUTP ERR* NEXT H NSTK 1 4195 7 8 _ 0 I 2 3 4 4Jq6 _C_N NFYT

SVMAOL TA_ILh i-r!_-_PH(H]UCTIO_J_ AND TL - PAGF _ 29 JUL _,5 PAGE 003 I e r_ 2 _ 7_ If6 H "191 ln137

T_r4L_: l t+.t:E'-L + t](_k_E(/TLY

139. The B-Productions Table 2

Table 3

T_iF.,LE +,I CHA+A'TI+_'_ Ar,_i "_l :'_+_,-' :_ri q tJ._i,F- 4 29 JdL f15 PA{..I_ 004

1 I ,., r rl 0@137 r+rl _ 7

4 aT /il_ 0040 111140 5 A- ;( I 0(141 {1041 "_' ;"_ I') (l 4 ') rl rl 4 _ 7 ,+ _ ,-_N '_ 004,1_ fill4) k+T" PFY 0044 I5044 q _<' ;_Ii, 004_ 0nd5 Irl _ ;I i nOAA 0046 II ++ 7-1) 0047 0047 t2 F_I. 21 _J OrlAm 0048 19 INTE.NAI SYMROLS 13 ( ( ;l,i 0050 OOBO 14 ) ) 215 0n51 Nnll 15 + + _16 005_ 00'_2 16 , • 217 005_ (1053 17 _ _ _20 0054 0054 IR 221 0055 0055 10 P2_ 0056 00=i6 20 2_5 0057 0057 21 _ _ PP4 O05R 0058 22 I I 2_5 00_o 0059 23 +, +, P26 0060 0060 THF_E A_F- 11 nPCHAToR¢; LA_I SPECIA L CHA_ACTr:R FnR PwASE I

TARLE 91 CHAPACTERS ANO _I_ARCHTES PAGE _ 29 JUL 65 PAGE 005

MIE_A_C_Y TARLE

TABLE 2 Lf_ADEU CO_@ECTLy

TAgLE 31 META-VAOlABLES 29 JUL 65 PAGE: 006

AT [_ 0066 0066 AE l+ 0067 0067 BS l* 0060 0069 <_T> _T I* 0070 0070 M (RE> HE 1* 0071 0n71 M AF _E l* 007_ 0072

IWF TABLE AS LOADED

(AP> AP l* (_T> AT 1. A_ I* (gP> BP I* CBS> B_ I. BT I* ¢8F> BE I.

IASLF 3 LFIA_O C(]_R_CTLY

I I COM NOW TE_T T_E P_OOUCT IONS AI _2108151 _7 JuL A5 SEGMENT 27

@IINNIN_ TIFF: +,n:+_',z+6 ENlrH ALGNL T_ANSIATO R P! 29 JUL 65 05116110

I t _Eqo_ 4 0004 AL A _ + _l )005 I+ I+ A ++I SCAN A l- I II A +l_ OUTP A l I+ l* A +n SCAN t- l+ ] + A +,I A NEXT I+ I_ l* " A +1 SCAN + I+ I- 1. _ + A +I A NEXT i+ I_ I" + + A +u SCAN 8 I+ ++ I. + + l _ +9 A _EXI l+ ++ I" . I* A +" ouTP R I+ i+ I" _ + 1_ _ +ll SCAN ; ,+ l- I* + * I. I A +" A _EXT I+ )+ I+ * A¢ I A +7 A NEXT l+ I+ A +Ph _UlP .+

140. The B-Productions

Trial Run

20 JUL A5 PALiE 009

P= 29 JOL _ flSZ16110

14 I_ a +'_ SCAN A I [* _ *'I OuTP k l_ I* _ +'_ SCAN

* I* _ _ +I SCAN B I_ 4 • _ I A "1 A _JEXT I 4 • _ I* A +_ nUTP 8

l_ I_ • _ Io _ A *. A NEXT u- I = * I* * A +1 SCAN C I* I_ I _ I, _ I A +I i NEXT I* I_ I* _ l, b I* A +q OUTP C I_ I- l* _ I. _ I* A *n SCAN I_ I _ l_ _ 1_ _ 18 ^ l *_ & NEXT r- I _ 1. * I* _ I* _ A +1_ SCAN D _* I* _ l* _ I* _ I A _IH A NEXT I_ 1o _ I* * |* * 10 A +_l OUTP O I_ I* + I* _ I. _ I* A ._ SCAN I • _ II _ I* ^ I* " A *q A NEXT I _ I_ l* _ l_ • 8_ v _ +_h OuTP ^ I* I_ I* _ l* _ By - & +1_ A NEXT I _ I* 1. * Is _ 8F " A +?H a NEXT

I* I* " I* _ OF - I A .21 A NEXT I" I* + I* * 8_ _ I* A +q OUTP E I_ l* _ [* * 8F - le A *n SCAN + I* * I* - 8_ - I* * i *n A NEXT l* * I* - 8: " I_ * A +9 SCAN * I* - H@ - I, * I A +9 A NEXT - I* * BE - l¢ + I* A *Q OUTP F - l* * HE - I. . le A *0 SCAN * I* * BE l* + l* * A *0 A NEXT I* _ HE - I* + I_ * A *5 SCAN g * HE I* + l* * l k *q A NEXT - BE I* + Ie * li A +n OUTP G BE _ I* * l* • I• A +O SCAN # BE v le * I• * I • d A +0 A NEXT I• _ BE _ I* * A? W A *3 OUTP * I" " BE - I. + AY W A +3 a NEXT I* " I* - 8_ k_ # A +_ OUTP + I* * 1. _ B_ - k_ _ A *_ k NEXT I* * l• _ BF _ kp _ A +_2 SCAN - I* * BE - A_ W + A +IP i NEXT * I* _ BE - AF # • k *O SCAN H I* _ BE v AF W + ] A +9 A NEXT I. * _E - AF # . I* A +0 OUTP W I* _ BE _ AP W + I* A +0 SCAN * BE AE W + 1_ _ A ._ A NEXT

I* _ BE kE # A_ * A +7 A NEXT 1' I* _ BP " Bw _ A *10 OUTP d I* * I* _ RF - Bl _ A .10 A NEXT I* * I* - 8F _ Bq ^ A +14 A NEXT I* * l_ - 8_ - BY - A .I_ A NEXT I* * I* * 8F - BY ^ A ._8 SCAN - I* " BE _ 8T - A .18 A NEXT • I* " BF - 8T - A .15 SCAN I I* * BE - 8_ ^ - l A .15 k NEXT I* _ BE B_ I* A +n OUTP I I* * BE - 8T ^ - I* A +0 SCAN , I - BE _ BT ^ . [_ k +q A NEXT I* " BE - 8T ^ 8N A +13 OUTP - I* - BE - BY ^ 8q A *13 A NEXT [* - l• _ @_ " BY A +lh OuTP ^ I* * l* _ 8P BY A ._ A NEXT I* I" l* - I* * 8_ _ +19 OUTP I* I_ [e . |t _ B_ A +tO A NEXT I_ I* ]e _ 8_ A +p4 OUTP I* I+ I* _ 8_ A +P4 _ NEXT I_ I_ R +1 OUTP _ I* I* R +I OuTP I

END OF TPA_iSLATIO_ NO _RRORS DETECTFD

141. The B-Productions Trial Runs

_ JOL _5 PAGE 011

_UNNING TT_['I .ql'1111_ FNIr6 At hO L TNANSI ATO_

Pl 29 JtJL h9 O_llhltO il _0_ THT_ ]S _e_ I(L[:GA: %T_ING AL A _ R • C ^ If;

I_ 14 A .q SCAN A I- I* ]* A *n OuTP A i- (4 1* A *rl SCAN I_ I- l• _ A ,n A NEXT 14 I- |o _ A "1 _CAN B 14 14 l* _ l i +I A NEXT 14 14 la _ I• I +,_ OuTP 8 I _ I* |_ _ ]• A +r_ SCAN • I_ I- I* _ ]• • I _n A NEXT I- l" |• - l• • i ,_ SCAN C 4 14 l* _ l* • I A *_ A NEXT 14 1• . I* • [I l +fi OuTP C 14 I• _ I* • I* A *_ SCAN I" 4 1. + |• * ! • ^ A +0 A NEXT t4 I 4 1• _ AT ^ l *_ OUTP • I 4 14 |, _ AY * A +_ i NEXT I_ 14 I• _ kl A A *_ k NEXT l- I- 1• b A_ A A *JR SCAN D 4 14 I• _ AE I A +19 k NEXT 14 I• _ k_ I. A +0 OUTP D

t" 4 l• _ AP _ l• I A *0 A NEXT I_ I* _ Af ^ I• A +_ OUTP ERR* 4 1- l* _ AF I• A +_ _ NEXT

END OF TRANSLATION I" I- 1. * k[ _ |* I l +28 STACK NOT EMptY 1 ERROR DETECTED

29 JUL 69 PAGE 013

RUNNING TIMEI 00101123 ENTER ALGOL TRANSIATO_ Pl 29 JUL 6B 05116110 AL A _ .(.S,C) * D P X ^ (Yv(.Z)_(P_*R)_S) ^ T; 0015 I* I" l *0 SCAN A 14 I- l* A +0 OUTP A 14 I. X* k *0 SCAN I_ I* 1• _ A *0 A NEXT I_ I- I* * k "1 SCAN + 14 I- l. _ + k +I i NEXT I- 14 I• _ + k +9 SCAN ( l* 14 I• - + ( A +9 A NEXT l* I* I• _ + ( k +2 SCAN * I _ I_ i • _ , ( . A +? l NEXT I 4 I* ]o _ $ ( $ A *9 SCAN . 8 14 I 4 1* _ . ( . l A +9 A NEXT I" l* 1• _ . ( + I* i +0 OUTP S I" I 4 I• * . ( * l* A *0 SCAN • I 4 |* _ . ( . T• * k *_ A NEXT I • I • _ . ( + 1• * A +_ SCAN C I • * + ( * I• • A +5 A NEXT z, - + ( • I. * * i *o OUTP C I m _ + ( + I* • • A +0 SCAN ) ( . I• * Is A +0 A NEXT I" I* * + ( + AY A +3 OUTP • l" l • * * ( . AY i "3 A NEXT I" 14 I* _ + ( A¢ A +7 A NEXT I- 14 1. . + AP i .11 SCAN * 14 I* 1* _ + Am * A +11 A NEXT 14 I_ I* _ + AT • A *d k NEXT I" 14 ]* _ + A? * A +_ SCAN D I- I 4 1• _ . AT • Z l +_ A NEXT l_ I* 1• _ * AT * I• _ +P OUTP D 1- 1- L* - + AT • l• A ._ SCAN , I* 1* _ . A T * [ * # A *n k NEXT I" I_ 1* _ . AY _ A +X OUTP • I _ I_ 1• _ . AY d A +_ i NEXT 14 14 ]• _ A_ _ A 7 A NEXT I_ I" I • _ A_ # A +12 SCAN X 1 4 I_ 1* _ AF $ I A +lp A NEXT I* 14 I* * AF $ |• A +_ OUTP X I_ I_ |* _ A_ _ I• J *_ SCAN I_ l- i• _ A= # I• _ A +_ A NEXT 14 I4 11 _ Hm _ A +1 n OUTP i- I_ I• _ HP - A +1 _ A NEXT I_ I_ l• _ B_ _ A +le & NEXT I_ 14 [• _ H_ ^ A +17 A NEXT

( & +1_ A NEXT I _ I_ |. _ 8T ^ ( e +? _CAN Y I" _* t* + HT ( [ _ t NEXT i _ I_ [* RT ^ ( |• A ._ r)UTP Y

142. The B-Productions

Trial Run

t. 14 jt . HT ^ ( 16 A +q SCAN I I* _ _T r ! • v A +_ J NEXT t+ I" ' _jT ^ t I* - A +Pl SCAN ( T * F_T ( [, ( A +Pl A NEXT !. _ hr ^ ( 1. ( A +p SCAN - k_I ., ( Io ( A ._ A NEXT + - SCAN Z _T ^ t* ( [ A +I_ A NEXT ,_r ^ i ( l* A +, OUTP 7 HT I* ( - I* a +q SCAN ) . ( . v ( I+ ) A +(+ A NEXT _I ^ [e ( Hq ) A +1 _ OUTP _I ^ I+ _ ( H_ ) A +I_ A NEXT

hl * [* _ ( H_ ) A +pn A NEXT

+ BT - ( ]. BP - A +_P l NEXT [_T ^ ( I+ Bq - A +_4 A NEXT BT ^ ( ]+ H? - A +ll A NEXT

I* * +T ^ ( BP v A ++g A NEXT I + l* * BT ^ ( H: v A +21 SCAN ( I _ BT ^ ( B_ ( A +2_ A NEXT I" _ HT ^ ( 8F ( A +p SCAN P • B1 ^ ( B_ ( I A +P A NEXT Br ^ ( U¢ " ( I_ A +0 OUTP P HT ^ ( Be " ( I* A +0 SCAN # HT ^ ( BF - ( I. d A +_ A NEXT _T ^ < BE - ( ]o W A +_P SCAN ( BE _ ( I* X | A +IP A NEXT ( BE _ ( 1+ # I* A +rl OUTP O ^ ( BE + ( 1" ¢ I* A +n SCAN * ( BE ( I" # ]+ * A +_ A NEXT ( +E ( I_ W I* + A +_ SCAN R HE - ( I* # I, * I A +5 A NEXT HE ( 1* # It * I* A +_ OUTP R R_ " ( I* # I_ • I• A *fI SCAN ) ( 1_ W l• • I • A *n A NEXT ( BE { I+ W AY A +3 OUTP * ( _E { I* W AY A +3 A NEXT ( BE ( I+ # AM A +B A NEXT BT ^ ( HE v ( Bm A +1_ OUTP # BT _ ( HE - ( Be A +in A NEXT HT ^ ( HE - ( B_ A +14 l NEXT Hr ( BE v ( BY A +17 A NEXT Br ^ ( HE - l BF A +PO A NEXT I* _ BT ( BF _ BP A +22 SCAN - B? ( B= - B_ - A +_2 A NEXT • HT * ( BF - BR v k +14 l NEXT • 8T ^ ( BF - B_ - A +17 A NEXT l + I • _ BT * ( BF v A +19 OUTP - I + I • _ BT ^ ( B_ " A +lg A NEXT I + I• _ BT _ ( Bl v A +21 SCAN S ]* _ BT ^ ( BF _ I A +21 I NEXT I* _ HT ^ ( B_ _ I• A +n OUTP S

_. . HT - ( B+ v z. A ++ SCAN ) - +T ( B F l• ) A +D A NEXT I + [* " BT _ ( B_ ) A +19 OUTP - I + I • _ _T ^ ( BP ) A +lq A NEXT I + I* I• BT ^ BP A •72 SCAN ^ I + I + l+ _ BT ^ Bm _ A +P2 A NEXT t + I + l• * gt ^ Bq ^ i +14 A NEXT t+ l + I* _ BY ^ A +16 OUTP ^ I+ t" I• - BY _ A +1_ A NEXT I+ I+ I* * BY ^ A +18 SCAN , T 14 I+ I• * BT ^ ] A ._n A NEXT I + t+ I. - BT I• A +_ OUTP T l+ t+ I* * Bt ^ I• A ++ SCAN I t + l + I* _ BT ^ l• l A +_ A NEXT l+ l+ I* + BY l A st& OUTP ^ l- l- I• - BY I A +1_ A NEXT J+ l+ I• * BI J A +Pq A NEXT I- I+ A .p_ OUTP _* I. I+ A +p_ OUTP I

E_F) 0 I= T_ANSLATIU_ Ni) __f-IR_RS [)ETECTEI} l I _4AL r 0016

,r+'lIN_dl_'ir+ IIt_E: il():,)1 |4_ fiND [iF _tji,,+

Tl_ uSFD! ljfll+ P:(14 +A_Er+; JSF_I): ;)2 "!5117:q6

143. The A4-Productions Table i Table 2

rue A4-_-'_+ t,,Ff'Tl,a, -, 0002° 0003. OPEPAI _;: At'_i n_>L_,ill_rf • _; ";,_,', _ " I 0004. COMPli _ AITI I_'N /1_._ 000aj. A ! _ " 0pl tlq _ 0 4flea ,_ 0006 +1 < _ _ [! A _' 41nn 0007 I a,f" 7 4 41nl _ 000@ •i. 1 <_ ) I II IN 6 41fix 0009

MAI_J t,Inp -- OP_-nhl' .,L_-.'T,!, 0010. r [ + o'_T _CA lfl B 41_4 _ 0011 .'1 ( ! .C 11 10 41ha no12 +P <"_+' > I) I_ ST al+n 0013

Mal_ L lqP -- 0+H_A)r!b _ .,p+ :r_.,, 0014. CA ( "* CnMPII E mC 14 14 41._1 _,_ 0015 +I _ cnMPll E C_ I_ 16 41_4 0016 +P l C_Mr'II E C_ 1A 1R 4116 0017 •.,x < !.+,> D 1"/ 20 411_ 0018 CEI { ) '+ _CA 1_ ;_2 41't9 r_ 0019 ',1 < Si' > D lq 25 41P2 0020

NNW (]L_AN I,P For_ END ')_- STATEmeNT 0021. ,"?C I "_ I + I+ 0IITPUT I _I P7 4]:_3 r'C 0022 HALT _P 41_5 H 0023

IF _C_'_,INING P,ETS T_IS r_p, THe- _T_INt; lq INvAI Ill. 0024. D ! I oItTPUT I_ H_ e, H 24 30 41P6 _ 002_ • 1 + I *D _J 32 41_8 0026

]NTERPR_TATTON I. IST ¢/1 2_ JUL 65 _AGE 032

7 8 9 0 I 2 3 4 4096 NSTK I Oljt SCAN NEXT R NEXT D SCA_+ NEXT C NE_T D NSTK 1 OUT 4105 41n6 SCAN NExT CA SCAN NEXT C mFWT D COM= qCAN NExT _ CoMP NExT CB 4115 a116 Cf]MP NEwT CC NFX T tJ NSIK 2 SCAN N_X? CA NEXT D NSTK I ouTP I HALT 4125 41P6 OHT _ gRNe NExT _ n'STK l SCAM NEXT

SYMBOL TABLE FOR _RODUCTION5 AND IL - PAGE 1 29 JUL 6_ PAGE 033

1 A :i {1

3 C g 1_ 4 CA 14 16 5 CR 22 2_ 6 CC 27 _3 H 4125 1f1035 7 D 3n 36

TABL_ 1 L+)AI}EO roRPECTLy

IA_LE )I C HA_ACT_g A_I;! _I_"A_CHIEg PAGE I 2q JUL 65 P,_(;E 034

l [ ;i)_ OPP9 P 11 + /_)I 0 (17_0 ,-', -+* _' 4 fin31 X INTFrVNA 5YM_OLS 4 { 2 >'. _ C19_3 5 ) 4 I ._(,4 One4 + " + >';(" O{ _5 7 " .H_ f)0 :X6 _ _- /1/ nflx7

T_+_F Ahr _ )_CNAT_Hc i _ql -P_Ci AI (;qA 0/_ l" ;:R _-'qQ puA _l- ]

144. The A4-Productions

Table 2 (cont'd.) Table 3 Trial Run

rA_ILE ?1 ('_Ai_AI_ri_'_ A_Jl+ _I_+,_(;HTFS PA_;_ :) 29 JUL 6_ pAGE 035

t.,+I_ i_AR C ,'Y+ +AHL[:

7 a P, .I,

4 )

P ( I 14

_AHL_ P LUADFU CORRkCTLY

TABLE $I META-VARIAHLES _9 JUL 6_ PAGE $36 M COP) * • 00=3

T_E TABLE AS LOADED

(0P> + • TABLE 3 LOADED CORRECTLy

19 JUL 6g PAGE 040

RUNNING 'lIME: (]{_tFf_t47 ENTEH AL_JOL TRANSlATOII P'I 29 JUL 6aJ 04:_aJltt3 AL A., B + C 4 (D ,+ ( E ) +. F ) ; 00_4 I, I.' A '1'0 SCAN A I- I"* A .0 oUTP A I ,+ I + A ++0 SCAN •' I-., I* A +n B NEXT I+ I* PJ +n SCAN i l* 14 _ B +0 C NEXT I- I- C .n ouTP B I,,,, I- C +O SCAN + I- I-,, _ C +n ca NEXT I,* I'* " CA .n SCAN ' C I'* I+ * . CA +n C NEXT I.* I.,, o, C +0 OUTP C I_ I,* ',. C +0 SCAN * I'+ I.* *" + C .0 ca NEXT I"+ I"' <" + CA ._ BCAN ( ", I+ ,,.- . • CA +n C NEXT "* I + *" . • C +1 5CAN D " + " . • ( C +1 C NEXT •" I+ ,- . * C +n OUTP D " I+ *- . • C ._ SCAN + " " " . * ( C +O CA NEXT +" + " + * ( CA +n BCAN ( "' " *" . • l . CA +n C NEXT + "+ +'+ . ¢" ( . C .l SCAN E + *" + • ( <" ( C +1 C NEXT -, I + * + • t . C +,n OUTP E I + ++ + • r . C +n SCAN ) + *" + * ( . ( C +n cA NEXT + +" + • ( • ( CA .I CB NEXT + I-,, e.. . , ( CB +n SCAN + I+ '+ _ + • ( . CB +n ca NEXT I + +" . • ( + CA +n OUTP 1. + i"+ +. + • ( . CA *O SCAN F I+ +" +" + • ( + I CA +0 C NEXT •', t-. +. + * ( + C .0 OUTP F '+ I "" . • ( • C +n SCAN ) I + " * ',. • I . ) C -+-n ca mEXT I + + + • ( + ) CA +1 OUTP + I+ .. . • ( ) CA +1 CB NEXT I* I+ " +. • CB +n _CAN ; I+ I ,"+ . • I CR +n f;A NEXT i i + _ + • l _A + _ OUTP * I" I.,, " + ; CA +._ qUTP . I I - + l CA +_ OUTP +- I- I,+ l CA +:_ CC _EXT I I+* CC + ' nUTP l

145• The A4-Productions Trial Run

2Q JUL 6S PAGE 041

_1 29 JUL 65 04/5SI13 AL A _ (((N+C)*rl)+F)I n_ 6 I* I+ A ++, _CAN A I I A +n OUTP A I_ 14 _ +n _CAN * I- I* _ A +rT _ NEXT I i+ _ H +r_ SCAN ( 14 _ R +n C NEXT I _ _ +1 5CAN ( I- • * _ +1 £ NEXT I+ + _ C +1 _CAN ( I+ I+ C 1 C NEXT I+ I+ C +I SCIN 8 I+ I+ + C 1 C NEXT I+ I+ C +n OUTP S I+ I+ C +q _CAN * + I + + • C +rl CA NEXT + I+ + + CA +0 _CAN C l_ * _ ( I CA *n C NEXT + I+ + * C +n OUTP C + I+ + * C .n BCAN ) t+ + + ( ) C +0 CA NEXT I + + ( ) CA .1 _UTP * + I+ _ ) CA *1 C_ NEXT I+ I+ * I CS *0 SCAN * • ( * CB +0 CA NEXT I+ + ¢ * CA .n SCAN D I+ * _ ( l Z CA +_ C NEXT + I+ + ( * C .n OUTP D + I+ _ c * C +0 _CAN ) I+ " + ( I ) C +0 CA NEXT I + + ( ( ) CA .1 OUTP * " I* + ( ) CA +1 Cg NEXT I+ I+ ( Cg .0 SCAN * + I* _ * C§ +0 CA NEXT + I+ + * CA *0 SCAN E I+ + + C + / CA +0 C NEXT + I+ + ( * C .0 OUTP E + I+ + ( * C +0 SCAN ) I* + _ ( + ) C .0 C A NEXT I+ + + I + ) CA *1 ouTP * + I+ + ( ) CA *l CB NEXT I+ I+ + CB *0 _CAN l l- I+ - l C_ *0 C A NEXT I+ I- * I CA *_ OUTP + I+ I* I CA *_ CC NEXT I* I+ CC *O OUTP I

19 JUL 6Y PAGE 044

_UNNING TIMEI nqz+]3tlO ENTER ALGOL TRANSIATOn P! 29 JUL tS 04tSSmt3 II COM THIS IS iN ILLEGAL sr_tNG 00_4 AL k _ P + C * + b l 00_5 l* I+ k +n SCAN A I+ I+ A *0 OUTP k I* I* A +0 8Ciq * 14 I* + A +0 _ NEWT l+ I. * B +0 SCAN B I+ I+ + I S +n C NEXT l+ l* * C +n OUTP B l* I* _ C +0 SCAN * I+ I+ * * C +0 CA NEXT |+ I 4 + . CA *0 SCAN C I- l- _ + | CA +fl C NEXT 1+ I+ + * C +0 oUTP C I+ I- _ * C +n SCAN * + l+ * + . C +0 CA NEXT + I+ * * * CA +O SCAN * 14 + * * * * CA +0 C NEXT l + " + * * C +2 D NEXT + I+ + + * D +1 SCAN D l+ " _ * * I D +1 D NEXT 4 I+ * + * O +_ _CAN I 14 + + * * ; D +I _ NEXT I _ _ . * l D +n OUTP ERR+ I_ _ + . * l D +0 H NEXT

_rl OF TRANSL. AT[O'+

I ENPF]P I}FIFcrE;i

146. The A-Productions Trial Run

_o JUL _5 =AGE 010

Pl 29 JUL 6_ 04136128

f* 14 A +', _CAN A * I I A 0UTP i - I- l A ,n SCAN I_ - I * A .n R NEXT I_ _ [ _ q ,n SCAN § I_ I* _ l _ +n C NEXT I_ t4 _ P C *fl OUTP B I- 14 _ P C .r, SCAN $ I* 14 l P $ C *u C NEXT l_ I* [ T $ C *_ C NEXT 14 I* ] E $ C *& C NEXT i* 14 I E * C +7 SCAN C I 4 I_ ] _ + ] C .7 C NEXT I_ 14 I * + P C *n OUTP C I- I_ I . E + P C *n 5CAN • I* I_ J * E . P • C n C NEXT I _ t" I _ E * T • C *3 C NEXT I" t" l _ _ . T • C .d SCAN I I _ l + E . T • ( C .a C NEXT I" I * E * T • ( C .1 SCAN D ] _ E + T • ( I C .1 C NEXT ] " E * T * ( P C .n OUTP O I _ E * T • ( P C *n _CAN * * E . T • ! P $ C .n C NEXT E . T • ! T $ C .3 C NEXT E + T • I E + C ._ C NEXT E . T • ! E * C .7 SCAN ( E * T • ( E . ( C *7 C NEXT E $ T * ( E * ( C *1 SCAN E * T • ( E + ( | C ,1 C NEXT * T • ( E + ( P C .0 OUTP E * T • ( E + ( P C *_ SCAN ) T * ( E + ( P ) C .fl C NEXT T • ( E * ( T ) C .3 C NEXT T * ( E + ( E ) C *_ C NEXT E * T • I E + P C +8 SCAN $ + T • ( E + P + C +_ C NEXT . T * ( E + T $ C .3 C NEXT * E * T * ( E . C *_ OUTP * E . T • ! E $ C ._ C NEXT * E + T * ( E + C *7 $CIN F E * T * ( E . | C "7 C IliI_ E * T * ( E * P C .n _¥/P r E * 1 • ( E + P C ,_ ICl_ ) * T • ( E * P ) C *n C lil_Xt * T * ( E * T ) C ._ C NE_T E + T * ( E ) C ._ OUTP * * E . T * ( E ) C ._ C NEXT

I" l E : P C *8 ICtN I I " ; . ; _ _ I C +, C MIIT I" I* I * E . T I C *_ oUTP • t I_ I _ E + T I C +_ C NEXT 14 I_ I _ E ! C +_ OUTP • I_ 14 I _ E | C +_ C NEXT I* I* C .9 OUTP * I* I_ C +9 OUTP I

ENn C)F TRANSLATION NO E_nRS DETECTED

147. The A-Productions Trial Run

2Q JUL 65 PAGE 011

_II_NIN_ rltv_: 'P:OqZ4_ hNlpH At(,PL IWANSI ATUP Ps _9 JUL 65 041,]6128 AL A , (((m+C)Qn),F)l I0_5 I. 14 A +n _CAN A I- I_ A +_ OUTP k t_ t. A +_/ SCAN t_ 14 A *(I A NEXT

14 I_ _ +0 _ NEXT I* t- C +1 SCAN (

I_ l- C +1 SCAN ( 4 I* C +1 _ NEXT I_ C .1 SCAN 8 " | C +1 _ NEXT " " L P C *n OUTP B - - ] P C +0 SCAN * • R * C +_ C _EXT " _ T * C +X C NEXT " * E + C +_ C NEXT l" _ E + C *7 SCAN C I { E + l q +7 C NEXT I ( E + P C +n oUTP C I ( + P C +0 SCAN ) ( P ) C +0 C NEXT ( T ) C +3 C NEXT I" * E ) C +5 oUTP * I" * E ) C +5 C NEXT + I- ( P C +_ SCAN • I 4 I_ | _ P • C +_ C NEXT I • I _ i * T * C *3 C NEXT t* I_ [ * T • C +4 8CAN D I _ I " ( T • I C +4 C NEXT I* I * ( T • P C +0 OUT_ D I 4 I * ( T • P C *O SCAN ) I _ ( ( T • P ) C +n C NEXT i4 I* ] * ( ( T ) C +2 OUTP • t • I_ i " ( ( T ) C .P C NEXT I_ I_ l _ ( ( E ) C +6 C NEXT I_ I_ I _ ( P C +8 SCAN + I_ I_ I _ ( P • C *R C NEXT I_ I* I * ( T * C +3 C NEXT I_ I_ | _ ( _ • C +6 C NEXT - I_ I _ ( E * C .7 _CAN E I _ _ l _ ( E + I C +7 C NEXT I " l _ ( E + P C +0 OUTP E I + _ I * ( E + P C +0 SCAN ) i" " ( E . P ) C ._ C NEXT t" * ( E + T ) C .3 _ NEXT I_ [ _ ( E ) C +5 oUTP + I_ l _ ( E ) C +_ C NEXT

I" l" I * P C +8 gCAN J I- I- I * P I C +8 C NEXT I" t" I * T I C +3 C NEXT I+ I+ I * E I C +6 C NEXT I* I+ C *9 OUTP I_ I+ C +9 OUTP I

END OF TRANSLATION NO ERRORS DETECTED

148. The Ai-Productions Trial Run

...... 29 JOL _ PAtE 010

N_NNING fieF! ':u:!lu132 i-_i'i_:N _l [,rIL TNAN_ ATOI:!

o! 29 JUL 6m_ r)41_2102 AL A ,,.. _ + C • (D ,'. ( F _ _ _ ) l qllr_O 14 I',, i .n SCAN A 14 I,,* l A +n OUTP k I- 1,4 [ A *n SCAN o. I..i I- l " A .n R NEXT 1". t4 I *" rl .n SCAN B " 14 I *- I R ,.r) C NEXT " I - l * E C * n_ OUTP " l- l ,- E C +o SCAN + " " I ',- E + C +n _ NEXT " "* ] " E + C +_ SCAN C " _ E + I C *_ g NEXT " " " E + E C +l} OUTP C " " '- E + E C +n SCAN * I * _ _ E . E • C +n C NEXT I " " " E * E * C +,'l SCAN ( I" " E . E • ! C +3 C NEXT I " + E . E • ( C +I SCAN D . I _" E * E • ( I C +I C NEXT i '," E * E * ( E C +n OUTP O I .- E . E • ( E C +n SCAN * •.- E + E • f E * C +n C NEXT E + E • ( E * C +¢_ SCAN ( E * E * ( E + ( C +_ C NEXT F + E * ( E + ( C +1 SCAN E + E * ( F_ * ( I C +1 C NEXT . E * ( E . ( E C +r} OUTP E + E * ( E + ( E C +n SCAN ) E * ( E + ( E ) C +0 C NEXT E * E * ( E + E C *6 SCAN + . + E * ( E . E . C +& C NEXT • . _ + l: * ( E $ C *4 OUTP . • E + E • , E " C "4 C NEXT • ' E + E • ( E .I. C +_ SCAN Ir E + E * ( E + I C +5 C NEXT E * E * ( E + E C +n OUTP F" E . E * ( E + E C +0 SCAN ) + E * ( E • E ) C +0 C NEXT •- E + E • ! E ) C ,,4 OUTP + •" E + E • ( E ) C .4 C NEXT I " I *" E * E * E C +6 SCAN ; ] '," E . E • E I C +6 C NEXT I _ I " I ,,.. E + E I C +p OUTP • I • I" [ _ E . E I C +2 C NEXT " 14 I.o | _. E l C +4 OUTP + I"* l.o I * E l C +4 C NEXT I.J 14 C '+7 OUTP *. la I.* C +7 OUTP l

...... 29 JUI. 6_i PAGE 011

RUNNING TIME¢ {}OlnOl40 ENTFR ALGOL TRANSLATOR Pm 29 JUL t5 04:22102 AL A. ,,.. ((.(B+C)•_).EIJ 00_2 I* I,_ i +0 SCAN A I-. I., I A .0 OUTP k I-. I..* I k .0 SCAN * I.* I- I " i +0 8 NEXT 14 I_ l o. L_ +0 SCAN ( ...... 14 I-. I " _ +0 C NEXT l'* I_, j ,,. C *l SCAN ( I.. I'* i- C +1 C NEXT I " I _ C .1 SCAN ( I" I" C +1 C NEXT I- l "o C +_ SCAN S ...... l " I "* ] C +I C NEXT I "; I " I " E C +fl OUTP 8 I "* I " l E C +0 SCAN + I "_ I + E + C .0 C NEXT I " I *- E + C +5 SCAN C [ '- ( + I C +5 C NEXT I " ( + E C +t_ OUTP C I * ( + E C +n SCAN ) " ( ( E ) C +n C NEXT " I '- E ) C +4 OUTP + " I " E ) C +4 C NEXT " I _ ( E C +h 8CAN • "* t * I E • C +_ C NEXT •' _ I E • C +:_ SCAN D " I *" • I C +3 _ NEXT I + • E C +n OUTP D _' I * E C SCAN ) " ( E ) C +n C NEXT _ I ( E ) C +p OUTP * " _ I ( E ) C +P C NEXT I-, I ] ( E C +A SCAN + I-_ _- E . r *A C NEXT - I - _ E + C +5 SCAN E • i _ ( E + I C *_ C NEXT

" • I *. ( E + E C *q SCAN ) • _ ( _ + E ) C +_ C NEXT • I" J _ ( E ) C +_I OUTP _. t I _ ( E ) +4 C NEXT " I _ l * F _ +_ SCAN ; I - I + I *" E ; r? +_ r" NEXT 14 I_ r +_ quTP I _ I _ C + _ OUTP ;

149. The A2-Productions Trial Run

27 JUL 65 PAGE 010

N=tNN]N G Tl_k: ' I=llh_3_ _'NTF_ AI L_[tL T_ANSI ATO_ P! 29 JUL 6_ B4:52159

I* I- A +n _CAN A + I+ I A +q OuTP A 4 Id I A +q SCAN 14 4 [ + A +0 q NEXT I_ I = _ ._ SCAN 8 14 14 _ / _ +n C NEXT 14 t4 * E C +n OUTP B I- I_ _ E C +u SCAN * I* 14 l E + C +n cA NEXT I_" I* ] E + CA *:_ SCAN C 14 14 I _ + I CA +3 _ NEXT 14 14 I _ • E C +0 OUTP C 14 14 [ _ • E _ +n SCAN • 14 I_ | . E E • C +_ cA NEXT 1 I_ 1 _ E E • CA +] SCAN I I' 1 * E • * ( CA +i C NEXT I 1 _ E • * ( C 41 SCAN D I * E + E ( [ C +I C NEXT I _ E + F ( E C +n OUTP D l _ E + E ( E C +0 SCAN • E + E * E * C *0 cA NEXT k + E * E + CA +3 5CAN ( E . E * ( + ( CA *3 C NEXT E * E * ( + ( C +l SCAN E + E * ( E { I C +1 C NEXT + E * ( E ( E C +n OUTP E + E * ( E 1 E C +n SCAN ) E • ( E . E } C +n cA NEXT E + E * ( + E CA +4 SCAN + + E • ( E E + CA +4 CA NEXT E + E * E . CA +2 OUTP • E • E • E + CA +2 cA NEXT E + E • E + CA *3 SCAN F E + E • ( E + I CA +3 C NEXT E + E • ( E + E C +0 OUTP F E + E • ( E + E C •0 SCAN ) + E * ( E • E ) C +0 CA NEXT E + E • ( E ) CA +2 OUTP + E + E • : E ) CA +2 cA NEXT I 4 I _ E + E * E CA •4 SCAN I I _ E + E • E J CA +4 cA NEXT I 4 I _ I * E + E J CA +0 oUTP • t 4 I ] _ E . E l CA +0 CA NEXT 14 14 I _ E I CA *? OUTP + I_ 14 1 _ E l CA +_ cA NEXT I_ 14 CA +5 OUTP I_ I_ CA +5 OUTP I

29 JUL 85 PAGE 01$

RUNNING TIME: O010_14Z ENTER ALGOL TRANSLATOR Pt 29 JUL 65 04152159 AL A * (((B+C}OD)+E)! 0056 00_5 I* I_ A •O SCAN A 4 I_ l A +0 OUTP A 4 I• l A •0 SCAN I_ _ I • A +0 R NEXT ' 14 " I _ B •0 SCAN ( 14 14 * R +0 C NEXT 14 4 _ C +l SCAN ( I_ I_ _ C +1 C NEXT I_ I _ _ C +_ SCAN ( 14 I_ C •1 C NEXT I _ 14 C +_ SCAN B I 4 I 4 I C *1 C NEXT I 4 1" I E C •_ OUTP g I _ I 4 I E C •_ SCAN . I I _ + C +_ cA NEXT t_ [ _ + CA +3 SCAN C 1 _ ( I CA +_ C NEXT ] " ( E C +n OUTP C [ _ ( E C +n SCAN ) " ( ( ) C +_ C A NEXT I _ I _ ) CA +P OUTP • I l _ ) CA +P CA NEXT 14 I_ E CA .4 SCAN * I _ 14 l • CA +4 CA NEXT I I ] • CA +I SCAN D _ I _ ] CA +1 C NEXT I 1 _ E C +_ OUTP D 14 _ E C +n SCAN ) ( ) C *,; CA NEXT 14 l ) RA +ll OUTP • '_ [ ) CA +r_ cA NEXT 14 14 E r:A +4 SCAN + 14 t [ + CA +4 CA NEXT I I + {:A + t SCAN E + ++ z + I ':A +_ C: NEXT E {: +1_ OUTP E ++ I _ _ k +, SCAN ) + + { _ ) i7 +r, CA NEXT _ 14 _ _ ) _A +,: "!U r P + + I I + ) C?A +:_ _& NEXT I+ 14 _ f:A +,; _CAN | I+ 14 I 1 ::A +4 R_ 'JkXT 14 _A +_ f}uTP I+ :A QUiP 1

150. The A3-Productions

Trial Run

20 JUL _S P&GE 025

Q_INNING TIM_: 'l:rl]|NI_ ENT_N ALWn L T_ANSIATOP

_t 29 JUL 6_ 04:541Q6 AL A _ R + C • (1) + ( _ ) • _ ) ; O05P One2 I_ 14 a +0, SCAN A 14 14 A +0 F)uTP A I* I_ A +0 SCAN * - 14 _ A _0 R NEXT SCAN B 4 " _ [ B .n C NEXT I_ _ _ *n OUTP @ I _ C +0 SCAN + " _ + C +n cA NEXT . b + CA .3 SCAN C I* * _ + l CA +_ C NEXT

• I b + C _n SCAN • 14 4 . + • C +q CA NEXT I* 14 _ + CA +I SCAN { CA NEXT II_* I-14 ._ .+ : ( C +1I C SCAN O I- I* w • , ( I C _% _ NEXT 14 14 _ • • ( C +0 OUTP D 14 I_ _ + • ( C + (4 14 *0 SCAN • • ( + C *O cA NEXT I I + • ( CA +3 SCAN |

t4 _ ( + CA +3 C NEXT I * + : ( + C +I SCAN E ( + ( C +1 C NEXT I_ _ + • ( + C +0 OUTP E • I _ * • ( + C +O SCAN ) • + • ( + ( C +0 CA NEXT I _ I_ _ + I • I" _ + • +( CACA +4+a CA NSCAEXTN + 14 l* * + • ( CA +_ OUTP +

I"I- I-o_ _. _ : ( *+ CCAA +3+_ ca SCNEANXT F I* I _ * + • ! + l CA +3 C NEXT 14 l* _ + , I + C *0 OUTP F l" ,- . + • ( + C +O SCAN) I" l* * + • ( + ) C +0 CA NEXT I 4 14 _ + # ( ) CA +2 OUTP + l" I_ _ + • ( ) CA +2 CA NEXT I_ I* _ + • CA +4 SCAN I

I_ I_ _ _ : I CA +4 CA NEXT I_ I* : l CA +0 OUTP • I* I_ * l CA +0 CA NEXT I_ I_ _ J CA *_ OUTP + I_ 14 _ l CA +2 CA NEXT I* l* CA +_ OUTP I* I* CA +_ OUTP ;

_9 JUL _S PAG_ 026

RUNNING TIMEI OOzO_I4B ENTER ALGOL T_ANSI. ATOl Pl _9 JUL 6S 04154106 AL A - (((9+C)•o)+E|! 00_4 00_4 I+ I_ A +O SCAN A l • l_ A +0 OUTP A I_ I_ A +0 _CAN - l- I* * A +0 9 NEXT ' l_ le _ B +0 SCAN ( I- I_ _ B *0 C NEXT I_ I_ _ C _1 SCAN ( l* l* C "1 C NEXT l* t_ C +I SCAN ( I_ I* C +1 C NEXT I* I* C +_ SCAN B I_ I_ _ C *I C NEXT I_ 14 C *0 OUTP B I_ I_ C +0 SCAN + I_ _ C *0 CA NEXT I_ * CA *3 SCAN C I_ _ _ ( CA *_ C NEXT t_ _ C *0 OUTP C I* _ C +fl SCAN ) I* _ * ( C *0 CA NEXT " _ CA *P OUTP + _ _ CA *? CA NEXT I- " " CA +4 SCAN • 4 I* _ CA "4 cA NEXT 14 . • CA +1 SCAN D I_ _ " ( * [ CA +1 C NEXT 14 * ( • C +n OUTP D • I- _ ( • C *_ SCAN ) I_ _ _ ( • ) C +n cA NEXT - I* _ ( ) CA *0 OUTP * I_ I _ ( ) CA o cA NEXT I_ _ * ( CA *_ SCAN + 14 I- _ ( + CA *4 CA NEXT 14 I _ ( . CA +,X SCAN E t_ I_ _ ( + l CA *,_ _ NEXT I_ I_ _ ( . _ *n OUTP F 14 14 ( + C *n SCAN ) I_ I_ * ( + ) C *n cA NEXT I_ I_ _ ( ) CA *; OuTP + I_ I_ _ ( ) CA +'_ cA NEWT I_ I_ _ CA +4 SCAN ; I_ I ; CA +4 rA NEXT rA +_ OuTP I_ I_ _A +% OUTP ;

151. Bibl iography

Bar-Hillel, Y.; Perles, M.; and Shamir, E. (1961), "On formal properties of simple phrase structure grammars," Reprinted in Language and Informa- tion, by Y. Bar-Hillel, Addison-Wesley Publishing Company, Inc. 1964.

Cantor, D. G. (1962), "On the ambiguity problem of Backus systems," Journal ACM, 4 (Oct 62) pp 477 -479.

Chomsky, N. (1959), "On certain formal properties of grammars," Inf and Control 2 (1959) 137-167.

Cooper, D. C. (1965), "The equivalence of certain computations," Computation Center, Carnegie Institute of Technology.

Earley, Jay. (1965), "Generating a recognizer for a Backus normal form grammar," paper to be published.

Eickel, J., Paul M., Bauer, F. L., and Samelson, K. (1963), "A syntax controlled generator of formal language processors," CACM 6 (Aug 63) 451-455.

Evans, A. (1963), "An ALGOL-60 compiler," talk delivered at 18th Annual Meeting of ACM at Denver, Colorado, August 1963, reprinted in Annual Review in Automatic Programming, R. Goodman, ed., Pergamon Press, London 1964 pp 87-124.

Feldman, J. A. (1964), "A formal semantics for computer oriented languages," Ph.D. thesis, Carnegie Institute of Technology.

Floyd, R. W. (1961-a), "An algorithm for coding efficient arithmetic opera- tions," CACM 4, l(Jan 61) 42-51.

Floyd, R. W. (1961-b), "A descriptive language for symbol manipulation," JACM 8 No 4 (Oct 61) 579-584.

Floyd, R. W. (1961-c), "A note on mathematical induction on phrase structure grammars," Inf and Control, 4 No 4 (Dec 1961) pp 353-358.

Floyd, R. W. (1962), "On ambiguity in phrase structure languages," Comm ACM 5 No I0 (Oct 62) pp 526.

Floyd, R. W. (1963), "Syntactic analysis and operator precedence," Journal ACM I0 No 3 (July 1963) pp 316-333.

Floyd, R. W. (1964-a), "Bounded context syntactic analysis," Comm ACM 7 No 2 (Feb 64) pp 62-67.

Floyd, R. W. (1964-b), "The syntax of programming languages - a survey," IEEE Trans Elect. Comp. Vol EC-13, No 4 Aug 1964, pp 346-353.

152. 153.

Gorn, S. (1963), "Detection of generative ambiguities in context-free mechanical languages," Journal ACM i0, 2 (April 1963) pp. 196-208.

Irons, E. T. (1961), "A syntax directed compiler for ALGOL-60," CACM 4 No I (Jan 61) pp 51-55.

Irons, E. T. (1964), "Structural connections" in formal languages, Comm. ACM 7 No 2 (Feb 64) pp. 67-72.

Knuth, D. E. (1964), Letter to the editor: "Backus normal form vs. Backus Naur form," Comm. ACM 7, No 12 (Dec. 64) pp 735-736.

Knuth, D. E. (1965), "On the translation of languages from left to right," paper to be published.

London, R. L. (1964), "A computer program for discovering and proving sequential recognition rules for well-formed formulas defined by a Backus normal form grammar," Ph.D. thesis, Carnegie Institute of Technology.

Lynch, W. C. (1963), "Ambiguities in Backus normal form languages," Ph. D. thesis, University of Wisconsin, 1963.

Markov, A. A. (1954), "Theory of Algorithms," translated 1961, available as OTS 60-51085, Office of Technical Services, U. S. Dept. of Commerce, Wash 25, DC.

McCarthy, John (1962), "Towards a mathematical science of computation," proceedings of IFIP Congress 62, North-Holland Publishing Co., Amsterdam.

McCarthy, John (1963), "A basis for a mathematical theory of computation," Computer programming and formal systems, P. Braffort and D. Hirschberg, ed North Holland Publishing Co., Amsterdam 1963.

McCarthy, John (1964), "A formal description of a subset of ALGOL," Stanford Artificial Intelligence Project, Memo No 24. Sept. 1964.

McCarthy, John (1965), "Problems in the theory of computation," proceedings of IFIP Congress 65, W. Kalenich, ed. Spartan Books, Inc. Washington D.C. 1965.

McCarthy, et al (1962), "LISP 1.5 programmer's manual," The M.I.T. Press, Cambridge, Mass, August, 1962.

Naur, P. (ed) 1962, Revised report "On the algorithmic language ALGOL 60," Comm ACM 6 No I (Jan 1963) pp 1-17.

Oettinger, A. G. (1961), "Automatic syntax analysis and the pushdown store," Proceedings of symposia in applied mathematics, vol xii, American Math Soc. 1961. 154.

Perlis, A. J. (1963), Unpublished notes, University of Michigan summer session, June 1963.

Unger, S. H. (1963), "On syntax directed translators," RCA Laboratory Internal Report, Oct 1963.