SYNTAX ANALYSIS by a PRODUCTION LANGUAGE by Arthur Evans, Jr. Submitted to the Carnegie Institute of Technology in Partial Fulfi

SYNTAX ANALYSIS BY A PRODUCTION LANGUAGE by Arthur Evans, Jr. Submitted to the Carnegie Institute of Technology in partial fulfillment of the requirements for the degree of Doctor of Philosophy Pittsburgh, Pennsylvania 1965 INTRODUCTION Over the years users of computers have been writing programs whose correctness could only be taken as a matter of faith. If the study of programming is to become more of a science and less of an art, however, it becomes necessary that algorithms be accompanied by proof of correct opera- tion. To quote McCarthy (1965), "The prize to be won ... is the elimination of debugging. Instead a programmer will present a computer-checked proof that a program has the desired properties." The present work is the one of many small steps which will have to be taken before this utopian goal may be achieved. The history of proving computer algorithms is rather sparse. Chapter 7 contains a brief discussion of some of the work that has been done, accompanied by a comparison with the present work. It appears safe to say that no previous work, to this author's knowledge, has attacked a programming algorithm of the complexity of that treated here. The present effort is concerned with proving the correctness of a specific and practical translation algorithm which translates a segment of an ALGOL-like language to postfix, or reverse Polish, form. It should be emphasized that specific algorithms are introduced and their properties analyzed, but that it is not the goal of this research to develop general proof schemes which can be applied to a class of translation algorithms. It would be desirable, of course, if it were possible to prove that an entire ALGOL translator was correct. Indeed, the initial goal of this research was to prove the correctness of the algorithm used in the ALGOL translator run- ning at the Carnegie Tech Computation Center. Unfortunately, this proved Acknowledgements I am deeply indebted to Professor Alan J. Perlis for his guidance in this work. He has made available to me much of his valuable time to provide the counsel and advice needed to bring this project to fruition. Much of the programming of the QWERT system, which played a valuable part in checking algorithms before an attempt was made to prove them, has been done by Mrs. Carol H. Thompson, to whom I express my gratitude. Further, I am grateful to the "ALGOL crew" at the Computation Center who have implemented in a working translator many of the ideas which developed in connection with this research, particularly to Mrs. Janet W. Fierst, the leader of the group, and Mr. David M. Blocher, who implemented the production interpreter. I am also grateful to the entire staff of the Computation Center for providing smooth use of the computer, without which the work could not have been done. The final typing has been done quickly and accurately by Mrs. Edythe Simmons, to whom I express appreciation. Finally, I am particularly grateful to my wife, Betty, who, along with my children, has been most patient during a trying time. The research reported here was supported by the Advance Research Projects Agency under the Department of Defense under the Grant SD-146 to Carnegie Institute of Technology. iii translation rules) and the A-productions. We show that the algorithm defined by the productions produces precisely the translation given by the translation rules. In Chapter 5 we carry out the program of Chapter 4 for the B-productions and B-Grammar. In Chapter 6 we show that the A-productions are equivalent to another set of productions in that both accept the same set of input strings and produce the same output. The new productions are in a form more useful for certain applications. Chapter 7 contains a summary of the results and discussion of the relation of this work to other work in the field. The appendix contains a very brief discussion of a programming system for the production language. Sample computer outputs are included. vii . impractical for several reasons, not the least of which was the fact that the algorithm is not correct. (In the Carnegie Tech system, meanings are assigned to such non-ALGOLic constructions as A+BAC. One of the properties of a "correct" algorithm, as will be discussed later in some detail, is that it must reject any string which is not legal. Thus the Carnegie Tech ALGOL translator is not in this sense correct, since it accepts non-ALGOLic strings. Of course, this deficiency does not keep it from being useful.) Another reason for not considering the ALGOL productions is the sheer size of the effort involved: There are over 650 productions in the ALGOL translator. At the present stage of developing techniques, it was not felt practical to undertake a proof of this magnitude. Instead, it was felt more appropriate to consider a smaller body which was more easily handled. The hope was that techniques could be developed which eventually might be applicable to larger tasks. An approach which might well be fruitful would be to merge the present techniques with those of London (1964), with the possible result of mechanically proving an entire ALGOL translator. This point will be discussed in the Summary in Chapter 7 in further detail. For the reasons given, it seemed appropriate to consider a subset of ALGOL assignment statements. Actually, two languages are considered in detail: the A-language and the B-language. The first is a very simple language used as an example as the techniques are developed in the first four chapters, and the second is then covered in Chapter 5. The B-language includes assignment statements with multiple left parts, both arithmetic and Boolean expressions on the right and expressional parentheses. Not included are procedures with parameters, subscripted variables, or the ALGOL • construction "if ... then ... else ...". Further, the only arithmetic operators are plus and times - subtraction, division and exponentiation are not permitted. A few words of comment about these exclusions are in order, after a few preliminary comments. The proof techniques to be presented involve lengthy and complex case analysis. It seemed thus appropriate to select a grammar which was representative of the general problem but for which the proofs would not be excessively tedious. With the exception of the "if ... then ... else ..." construction, all of the omissions just listed are of aspects of ALGOL which are not felt to be critical. The techniques used to handle expressional parentheses could easily be adapted to subscripted variables and to proced- ure calls. Further, adding more operators would require no new techniques, although it would lengthen the proofs. Thus these omissions seemed consistent with the purpose of developing techniques. More important, certain complica- tions were deliberately included• These include the use of "+" both as a binary and as a unary operator, and permitting mixed arithmetic and Boolean expressions, as in the statement a _ b A d _ e * f ; The "if ... " construction was excluded to reduce the case analysis. Although this construction seems to be essentially different from any of the constructs which are presently accepted, it is felt that the bracketing techni- que introduced in Chapter I could be expanded to handle it. This point will be discussed further in the Summary in Chapter 7. One other simplification has been made in this work. It is assumed o that identifiers and constants, as used in ALGOL and other programming languages, have been "taken care of" in some earlier part of the processing. Thus the only operand treated in this work is the symbol "I" (mnemonic for identifier). Treating ALGOL-like identifiers introduces the problems of scanning, concatenation and internal machine representation, and these did not seem to be the linguistically important problems. Instead, the emphasis in this research is on syntactic analysis of source code and translation into another form. Floyd (1961-b) has shown a production scheme which processes identifiers and (some) constants, and it is presently planned to use such a scheme in the next version of the Carnegie Tech ALGOL Translator. This point will not be pursued further. It is clear that we cannot go about proving an algorithm unless we are able to say what the algorithm is to do. If, for example, we were setting out to prove a square root algorithm, we could say, "The algorithm delivers a number with the property that its square differs from the input by less than epsilon." For a translation algorithm, however, it is necessary first to define what translation it is that the algorithm is to do. It is not enough to say that the algorithm is to translate assignment statements into postfix, since the term "postfix" may mean different things to different people. Instead, we must define explicitly what translation is to be produced for any given input. But we must do more than tell what output is to be produced when the translator is supplied legal input. We want the translator to give an appropriate error signal if the input is invalid. Further, we want to be sure that for any finite input the translator does not loop forever or • otherwise act in a pathological manner• We will make a claim somewhat like the following: Any legal input sentence will be translated into the proper postfix, and any other input will be rejected as being invalid. Thus we have two tasks: We must specify just which strings, out of all possible strings, over an alphabet, are to be considered as "legal input", and we must define for each such string what translation is to be produced.

SYNTAX ANALYSIS by a PRODUCTION LANGUAGE by Arthur Evans, Jr. Submitted to the Carnegie Institute of Technology in Partial Fulfi

The Early Arthur: History and Myth

King Arthur in the Lands of the Saracens

Cornwall in the Early Arthurian Tradition It Is Believed That an Actual “King Arthur” Lived in 6Th Century AD in the Southwe

The Phonology and Morphology of the Dar Daju Daju Language

Breton Patronyms and the British Heroic Age

Tintagel Castle Teachers' Kit (KS1-KS4+)

Hungarian Jewish Stories of Origin: Samuel Kohn, the Khazar Connection and the Conquest of Hungary.” Hungarian Cultural Studies

From Tintagel to Aachen: Richard of Cornwall and the Power of Place

First Name Americanization Patterns Among Twentieth-Century Jewish Immigrants to the United States

Britain, the Albanian Question and the Demise of the Ottoman Empire 1876-1914

A Catalogue of the Collection of American Paintings in the Corcoran Gallery of Art

Sauda Sulaf: Urdu in the Two Version's of Sayyid Ahmad Khan's