SOME TOOLS FOR AUTOMATING THE PRODUCTION

OF AND OTHER TRANSLATORS

by

Christopher Lester, B.Sc., M.Sc.

April 1982

A thesis submitted for the degree of

Doctor of Philosophy of the University of London and for the Diploma of

Membership of the Imperial College.

Department of Computing and Control,

Imperial College, London S.W.7. 3

ABSTRACT

This thesis explores possible tools to cheapen the creation of compilers of sufficient quality for studies in design.

A literature study showed only (many) incomplete leads towards the: goal: this prompted an analysis of what properties such tools must have and might have, notably the forms of the inputs. The key conclusion is that it is desirable to describe the source language and (separately) the machine instructions, rather than a translation between them.

A design according to this conclusion is proposed, and implementations of the crucial parts of the design are presented.

The description of the language is given in four parts. The first two describe the lexis and syntax: the third gives transformations from the syntactic constructs to intermediate language (IL). An implemented tool considers inter-relationships between the transform cases, and from these inter-relationships fabricates the required case-analysis code to select which transforms to apply for instances of the syntax. A parser consistent with the case analysis is also f abricated.

The fourth part of the language description describes the IL in terms of machine description language (MDL), which is also used to describe the instructions of the target machine. Another tool is parameterised by these descriptions: it then inputs an IL sequence, analyses it for its data-flow tree, converts that into a tree rep- resenting the corresponding MDL sequence, and finally chooses an optimised machine instruction sequence equivalent to the original IL sequence.

Another study suggested that the choosing is more easily optimal if it includes the allocation of temporaries needed for the chosen instructions: a technique for this is suggested. Also, a method is given to reduce searching for optimal choices and avoid building the MDL tree.

Other possible applications of the tools are discussed. 4

Acknowledgements.

This project would not have been possible without the help of the following people. My most sincere thanks go to them all.

My supervisors at Imperial College: Ian Livingstone and John Darlington, for all their varied assistance, particularly for helping to reduce the problems arising from my part-time status, and also for the many (often unknowing) technical insights they gave me.

At City of London Polytechnic: the Assistant Directors, Ronald Sturt and Michael Brandon-Bravo, for their moral support, and for arranging day-release and financial support; the Head of the Computer Service, Pat Johnson, and all his staff, for their tolerApce of what must often have been their most difficult user. Special mention must go to Pam Roberts for her amazingly accurate key-punching, and the operators for their many kind- nesses beyond the call of duty.

My present employer, ALSYS S.A., for reproduction facilities.

Finally, and most of all, to my wife, for AVZ years of irregular hours and lost weekends without complaint but with encouragement. 5

CONTENTS Section Title Page. Abstract 3 Contents 5 List of figures 9 Preface 11

1 ORIGINS AND OBJECTIVES 13

2 FUNDAMENTAL CONCEPTS 2.1 INTRODUCTION... 19 2.2 MODELS OF COMPILATION 21 2.2.1 Non-optimising 21 2.2.2 Optimising model 24 2.2.2.1 Examples of optimisations for different phases of compilation 25 2.2.2.2 Optimisation-order problems 27 2.2.2.3 Review and terminology 29 2.2.2.4 Correspondences 29 2.2.3 Limitations of the models 30 2.3 CHEAPER METHODS OF CREATING COMPILERS 30 2.3.1 The UNCOL problem 30 2.3.2 "Portable" compilers 32 2.3.3 High-level machine codes 35 2.3.4 System implementation languages 36 2.3.5 Compilers based on macro-processing and pattern matching 38 2.3.6 -compilers: main concept and brief history 41 <4 CLARIFICATION OF TERMINOLOGY 47 ".4.1 Definitions 47 2.4.2 Summary of definitions 51 2.5 COMPILER-COMPILERS AND COMPILER-GENERATORS..52 2.5.1 Required inputs versus inbuilt information..53 2.5.2 Choice of languages for inputs and outputs..55

3 OVER VIE!1/ OF A COMPILER FACTORY 3.1 INTRODUCTION 59 3.2 THE STRATEGY UNDERLYING THIS RESEARCH 60 3.2.1 Design strategy of CF1 63 3.2.2 Design strategy of CF2 68

4 FABRICATING A PRIMARY CODE GENERATOR 4 • 1 INTRODUCTION 73 4.2 THE LEXIC DIVISION 75 4.3 THE SYNTACTIC DIVISION 83 4.4 THE TRANSFORM DIVISION 91 4.4.1 The notation of transforms 93 4.4.2 Overlapping cases 98 4.4.3 The "more specific than" relation 101 4.4.4 Commonalities in matching patterns 102 4.4.5 The influence of syntax on the allowable patterns 103 4.4.6 Calling the transforms 104 4.4.7 Editing the tree 105 4.4.8 Cases not expressible as simple patterns...108 4.4.9 Syntactic and lexic rules applicable only to patterns 109 4.4.10 Theories of transforms Ill e

4.5 EXAMPLE SETS OF TRANSFORMS 114 4.6 REVIEW OF THE PROPOSED TECHNIQUES 120

5 A PILOT PRIMARY CODE-GENERATOR FABRICATOR 5.1 INTRODUCTION 123 5.2 OVERVIEW OF THE CF2 SUITE OF PROGRAMS 123 5.2.1 Bootstrap use of the parser generator, and bootchecking 130 5.2.2 The source processor, and the source library 134 5.3 THE LEXICAL ANALYSER FABRICATOR AND • SYNTACTIC ANALYSER FABRICATOR 139 5.3.1 The Lexical-analyser fabricator 141 5.3.2 The Syntactic-analyser fabricator 144 5.4 THE INTERPRETIVE PARSER 146 5.5 THE TRANSFORM FABRICATOR 152 5.6 REVIEW OF THE PRIMARY CODE-GENERATOR FABRICATOR 158

6 FABRICATING A SECONDARY CODE--ENERATOR 6.1 INTRODUCTION 161 6.2 THE EXAMPLE 162 6.2.1 The example instruction and its IL memory architecture 163 6.2.2 The example ILs 165 6.2.2.1 A temporary-stack IL 165 6.2.2.2 An infinitely-registered IL 167 6.2.2.3 The PDP-8 170 6.3 MACHINE DESCRIPTION LANGUAGE (MDL) 172 6.3.1 MDL summary of the addresses in the I:=J+K instruction 174 6.3.2 MDL definition of the temporary-stack IL 174 6.3.3 MDL definition of the register-IL 175 6..4 INSTRUCTIONS AND INSTRUCTION-SEQUENCES AS TREES 178 6.4.1 The gross tree for an IL sequence 181 6.4.2 The trees for temporary-stack IL instructions.185 6.4.3 The trees for register-IL instructions 186 6.4.4 The fine tree for an IL sequence 188 6.4.5 The trees for the PDP-8 instructions 193 6.4.6 The trees of PDP-8 instruction sequences 196 6.5 RELATIONSHIP TO SYMBOLIC EXECUTION 198 6.5.1 Two demonstrations 199 6.6 RELATIVE MERITS OF THE STYLES OF IL 203 6.6.1 Organisation of registers in a register-IL.....203k. 6.6.2 Data-flow analyses 205 6.6.2.1 Data-flow analysis for a stack IL 205 6.6.2.2 Data-flow analysis for register-IL 208 6.6.3 Register-IL vs. Stack-IL 213 6.7 AN INSTRUCTION CHOICE METHOD 215 6.7.1 Transforming an MDL fine tree to a machine code gross tree 216 6.7.1.1 a demonstration of the method 217 6.7.1.2 Summary of the method 226 6.7.1.3 Another demonstration of the method 237 6.7.2 Register allocation on the gross tree 242

6.7.3 Choice of alternative machine code gross 0„D trees 248 7

6.7.4 Optimisation of the method of instruction choice 250 6.8 INSTRUCTION ORDERING FROM THE TARGET-MACHINE GROSS TREE 260 6.8.1 Assembly cases of instruction ordering 260 6.8.2 The "Linear section" case of instruction ordering 263 6.8.3 The "Fork point" case of instruction ordering 266 6.9 REMAINING TOPICS IN SECONDARY CODE GENERATION 271 6.10 REVIEW OF THE PROPOSED TECHNIQUES 274

7 A PILOT SECONDARY-CODE-GENERATOR SYSTEM 7.1 INTRODUCTION 275 7.2 THE SYSTEM 275 7.3 A STRUCTURED NOTATION FOR TREES WHICH CONTAIN ALTERNATIVES 283 7.4 A SIMPLE EXAMPLE CODE GENERATION 284 7.5 EXAMPLES INVOLVING TRANSFER SQUENCES 286 7.6 RETARGETING 287 7.7 TWO REALISTIC EXAMPLES 293

8 OTHER USES OF THE TOOLS 8.1 INTRODUCTION 295 8.2 SOURCE PROGRAM TRANSFORMATION 296 8.2.1 Program transformation for simplification,.297 8.2.2 Program transformation for optimisation...298 8.2.3 Program transformation for language extension 298 8.3 MISCELLANEOUS PROGRAMMERS AIDS 299 8.3.1 Prettyprinting 300 8.3.2 Program profiling 301 8.3.3 Language specific editing 301 8.3.4 Flowcharting 301 8.3.5 Cross-referencing 301 8.3.6 Portability aids 302 8.4 OTHER APPLICATIONS OF CF2 302 8.4.1 Verifications of programs and translations.302 8.4.2 Assembler-level bridgeware 303 8.4.3 Verification of architecture descriptions..304 8.5 DE-COMPILATION 304 8.6 LANGUAGE DEFINITION 305 8.7 OTHER POSSIBLE CF TOOLS 305 8.8 OTHER APPLICATION DOMAINS 307

9 SUMMARY 9.1 INTRODUCTION AND OBJECTIVES 309 9.2 THE ANALYSIS 310 9.3 THE DRAFT SOLUTION 311 9.4 BEYOND THE OBJECTIVES 312 9.5 FUTURE WORK 313 9.6 CONCLUSION 313 A ALLIED WORK 315

A. 1 PQCC AND 'WASI LEW 316 A.1.1 Expert systems 316 A.1.2 Overview of PQCC 317 A.1.3 Instruction Choice and Register Allocation 319 A.1.3.1 Register allocation 320 A.1.3.2 Instruction choice 322

A. 2 TUM-CC, MUG1, MUG2, AND MILLERS DM ACS 325 A.2.1 TUM-CC as the base of MUG1 and MUG2 326 A.2.2 Millers DMACS • 328 A. 2. 3 MUG1 331' A. 2.4 MUG2 335

A. 3 OTHER ALLIED WORK 338 A.3.1 Aho/Johnson: configurable optimal code generation 338 A.3.2 Frailey: Optimisation in an IL framework 339 A.3.3 Fraser: A knowledge-based meta-code generator 339 A.3.4 Glanville: instruction choice by parsing an ambiguous language.... 340 A.3.5 Goguen: analysis of IL to implement Wilcox's method 342 A.3.6 Kameda and Abdel-Wahib: beyond Sethi-Ullman register allocations..343 A.3.7 ' Sites: Machine-independent storage allocation 343 A.3.8 Terman: a configurable secondary code generator 344

C TWO OPTIMAL CODE GENERATION ALGORITHMS C.l INTRODUCTION 347 C .2 THE SETHI-ULLMAN METHOD 347 C.2.1 The method 348 C.2.2 Some remarks on the method 351 C . 3 THE HORWITZ-KARP-MILLER-WINOGRAD METHOD 354 C.3.1 The method 354 C.3.2 Some remarks on the methods 359

P EXTENDED EXAMPLES OF PRIMARY CODE GENERATION. 365

S EXTENDED EXAMPLE' OF SECONDARY CODE GENERATION. 387

U UNO PL DIAGRAMS 393

BIBLIOGRAPHY 397 9

LIST OF FIGURES

Figure Title Page 5-1 Organisation of the CF1 suite 125 5-2 Bootstrap use of CF1 133

6-1 The memory organisation for the example....164 6-2 A register-IL sequence... and its grgss^ ^^^ 6-3 Deviation of the fine tree...from the register-IL gross tree.191 6-4 Deviation of a fine tree...from a PDP-8 gross tree 197 6-5 ...reverse polish parsing to obtain the gross tree of a stack-IL sequence 207 6-6 ... reverse polish parsing to obtain the multi-rooted gross tree of a register-IL sequence...209-211 6-7 ...fine/gross tree relationship for... the PDP-8 machine instructions 222 6-8 The PDP-8 gross tree...after allocation....249 6-9 Relationship of the IL gross tree, the fine tree, and the...PDP-8 gross tree...257 6-10 Some fabricated optimised IL-to-PDP-8 equivalences 258 11 PREFACE This project proceeded from a belief, arising from an extensive literature survey, that the problems of the automation of the creation of compilers have not yet been identified, let alone solved. In particular, a major stumbling block appears to have been absence of a clear doctrine of how such tools ought be used. The latter part of chapter 2 develops a doctrine.

Leading up to this, chapter 1 is a statement of why the work was undertaken, and in what frame of mind; the first part of chapter 2 is a distillation of the prior experience that seems relevant.

Chapter 3 develops a proposal according to the doctrine: one tool using a description of transformations from cases of syntax to corresponding intermediate language sequences; another using descriptions of intermediate language and of machine code. Chapters 4 and 6 develop ideas for one tool each; chapters 5 and 7 each describe an implementation of the key parts of the prior chapter.

Different readers may prefer to read only: chapters 1 to 3; or chapters ? 4 and 5 (but excluding 3.2.2); or chapters 3 6 and 7 (but excluding 3.2.1). This thesis is therefore almost three theses (and its size reflects that fact).

Some less definite thoughts are presented (in appendix Obecause they influenced chapter 6: they concern the relationship between instruction choice and the allocation of temporaries needed by the chosen instructions.

The project proceeded almost from a rejection of prior work, so an orthodox survey of prior work is not meaningful. However, there have been other projects that groped in the same direction: they are reviewed in appendix A. They had little effect on this project, being either too different, too fragmentary, or too recent.

Chapter 9 summarises as far as possible, because (as the closing quote implies) this thesis is just a beginning. 13

CHAPTER 1

ORIGINS AND OBJECTIVES

... involve so many circumstances, such a great variety of

objects and of operations, that to give minute instructions

upon the subject absolutely demands a great space ... I

shall do my best to make myself clearly understood.

(William Cobbett, Esq., M.P.

"The English Gardener", para 225, London 1833)

Just as everybody must strive to learn language and writing before

he can use them freely for the expression of his thoughts, here

too there is only one way to escape the weight of formulae. It

is to acquire such power over the tool ... that, unhampered by

formal technique, one can turn to the true problem.

(Hermann Weyl,

"Raum, Zeit, Materie" Berlin, 1923)

The above quotations seem to me two sides of a single truth: Cobbett observes one case of the phenonmenon that no language - whether English

(as applied to Training and Pruning of fruit trees) or "the language of mathematics or a language - can remain adequate, because its existence is a challenge to express more and more complex ideas until it is no longer adequate. Weyl then sees that the objective is to find a better language in which to express these new ideas: then we can resume the search for even newer ideas, which in turn will also exceed the expressive capability of the "better language". Where others claim that the ascent of Man started with the taming of fire or the invention of the wheel, I believe it started with the first vocal expression beyond an animal grunt of exertion or squeal of pain. 14 This cycle of the development of ideas and of linguistic capability also dominates the history of our attempts to control our computers: a modern microcomputer capable of a million instructions a second would have been of little more use in 1950 than a much slower machine of the period (1) because the programming languages of the time would not have permitted the writing of programs sufficiently complex to take advantage of that speed.

With this observation it can be seen that the history of Computing parallels the development of the "mainstream" of programming language constructs and methods:-

Binary programming (1948) (2) Octal or teletype - code programming (1949) Octal programming with labels Mi*., unic and symbolic assemblers (1954) Macro assemblers, subroutine libraries Supra-macro languages (e.g. FORTRAN, BASIC) (1956) High level languages (e.g. ALGOL 60) (1958-1964) Record structures (e.g. ALGOL 68 and PASCAL) and Parnas modules (e.g. 67 classes) - first attempts (1966-72) (e.g. CLU, ALPHARD, ADA) - second attempts (1975 on).

The milestones of the above history start only a few years apart: they end almost a decade apart. Many reasons can be assigned to this slowing down, but I suspect a major one is that the step from conception of each of these milestones to its implementation is increasingly difficult and therefore increasingly slow. For an extreme example, the ADA project was conceived in 1975, the language was designed by 1979, it merited a further year of revision, and the first compilers should be available

(1) E.g. Manchester University MARK I, at 550 additions/second. (2) The early dates are taken from chapter 15 of ^LAV80/. 15 about mid-1983, it may not be until 1990 that ADA will have been in use long enough that experience will have shown the inevitable deficiencies, either as a result of attempting currently impracticable applications, or because the deficiencies are also inherent in present languages but hidden by more dominant deficiencies. So another decade will have passed.

At a more immediate level, my primary interest is in the (human) efficiency of the process of programming - or rather with its inefficiency. Supervising other programmers I have been appalled at

the amount of effort required to extract apparently simple effects from our present-day computers. In my own programming I have tried a variety of ideas for improving my own efficiency: used with great care

they have helped, but it still horrifies me that the work described in this thesis has taken me about 2200 hours of programming: I can't help feeling it should have taken a quarter of that, that the extremity of care taken should not have been necessary, and that at the end of the completed parts I should have a much higher quality product, rather than one which in places is as delicate as tissue paper.

How then can we improve that efficiency? There are two approaches:-

Methodologies of programming. The programmer invests time and self- discipline, and he accepts constraints on his programming freedom, so that later his program is sufficiently more understandable (and therefore more readily debuggable and extendable) that the investment is more than repaid. Tools to relieve the programmer of programming tasks that can themselves be programmed: from minor tools such as cross- referencers and pretty-printers to major tools such as high-level languages and the compilers that implement them. 16

As a supervisor X have learned that the greatest problem with methodologies is that they need enforcing: and this can create major managerial problems. As a worker, I find the self-discipline for the methodologies easy to intend, but in practice it is near-impossible to adhere to the intention without any lapse. Both as manager and well- intentioned worker I would desire that the methodology be enforced by technical (rather than human) means, by appropriate tools: they will be less fallible than we are. The key tools are the programming language to prohibit particular stupidities (e.g. type rules to prevent incompatible usages) and the environment to prohibit more general stupidities (e.g. because of the recompilation of one module to have an amended interface, to demand the recompilation of all other modules dependant on that interface prior to subsequent link-loadings).

The key approach therefore is "tools". I believe that since the mid-

1960' s the development of the tools has fallen behind that of the requirements, thereby leading to poor implementations of the require- ments (with massively expensive fiascos), and retarding the development of the next generation of requirements.

THE REASON FOR THE LANGUAGES HAVING FALLEN BEHIND IS MAINLY THE COST

OF WRITING COMPILERS: for a moderate sized language 4 man-years for a

"cheap and cheerful" compiler is typical, and 10 to 20 man-years for a

"production-quality" compiler. BECAUSE OF THIS COST, EXCESSIVE TIME IS

SPENT DITHERING ABOUT LANGUAGE DESIGN - trying to "get it right" by quesswork and by paper examples, rather than by experience of real working programs. 17

THE INTENTION OF THIS RESEARCH IS THEREFORE TO TRY TO DEVELOP TOOLS

TO REDUCE THE COST OF WRITING "PILOT" COMPILERS FOR ORTHODOX LANGUAGES

AND ORTHODOX MACHINES, AND SO ACCELERATE THE DESIGN/IMPLEMENT/

EXPERIMENT CYCLE. A secondary intention has been that whenever the

research has suggested a spin-off I have spent some effort investigating

the spin-off. Such spin-offs have included thoughts on upgrading the

produced compilers from "pilot" quality (i.e. cheap and cheerful, or

even cheap and nasty) to "production" quality by "obvious"

optimisations, and the use of the compiler-producing tools to produce

other programs that take complex-syntaxed inputs (a major example being

that I have used one of my'tools to help write itself and others: I

estimate this has saved me about 1000 hours: this because, having

written the first tool by hand I could see a purpose-oriented language

in which to write major parts- of itself - and this was none other than

the language it implemented!)

After a considerable literature search it became clear to me that a

crucial issue was the conceptual framework within which I would make

the design decisions. This framework appears to have been missing

from most of the efforts on automation of the production of parts of

compilers during the 70's, with result that those efforts were not

usable (except in detail) without hav/i? amendment. The next chapter

attempts to develop such a framework: given the lightness of itCU*

dependance on other works, it treats them only superficially: the

usual detailed reviews of recent work are postponed to Appendix A.

The only attacks on the problem comparable to mine that I have been

able to discover are the Carnegie-Mellon University PQCC (Production

Quality Compiler-Compiler) project and the Munich Technischen

Universitat MUG1 and MUG2 projects: both have the objective of helping

to create complete compilers, but, with much larger teams, the produced 18 compilers are intended to be of much higher quality than mine. This difference in emphasis has considerably reduced the impacts of PQCC

and MUG1/2 on the present work, although they remain of prime interest. 19

CHAPTER 2

FUNDAMENTAL CONCEPTS •

2.1 INTRODUCTION

Every computer yet devised is (in its raw state) grossly difficult for a human to program directly. Therefore since about 1960 we have devised synthetic languages intended to be fit for human use, and we have then written special programs (called "compilers") to translate from the synthetic ("high-level") language to the real

("low-level") language of the machine. This strategy increases the human programming efficiency by at least an order of magnitude.

A compiler is necessarily a large complex program: writing it by hand is fabulously expensive (typically one to twenty man-years -

20 to 400 thousand pounds). On the other hand many parts of all compilers are fairly stereotype: this leads one to hope that programs to aid in the production of at least those parts of the compilers could be devised. Ideally it should even be possible to do this for compilers for any language and for any kind of computer.

In the 1960's many such Compiler-compilers were attempted for limited cases: the most successful confined themselves to a single kind of computer and primitive low-level-style specifications of the part of the target compiler which synthesises the low-level output. They were very successful with the creation of the analytic parts of the compiler, notably the parser (which determines the syntactic construction of the high-level input).

Barring 1960's-style anachronisms, very little attention was addressed to the problem in the 1970's, usually for one of two reasons:-

1) it's been done; or

2) it can't be done. 20 Whatever significant work was done demonstrated at most the practicality of various methods for PARTS of the synthetic aspects

IN ISOLATION: or they discussed the problems of synthesis (i.e. of

Code-generation) to clarify the issues involved in fabricating Code- generators.

This research reported here could be summarised as an attempt with limited resources to put together those practicalities, together with some further possibilities I have not seen in the literature. I accept that semi-automated compiler-construction is the best that can be hoped for (at least in the 1980's) and I wish to delineate what can easily be automated, and what can not.

Underlying my ideas are beliefs that:-

(1) at present the theory of the "front end" of compilers (viz

parsers) is well understood, as is how to create them both by

hand or machine;

(2) the back end is not understood in either of these ways: we have

only a mass of previously-used techniques, fragmentary under-

standings, and folklore; and

(3) consequently too much of the writing about "compilers" is

actually about parsers and lexic analysers (i.e. the "front

ends")•

I regard a parser generator as a minor and routine part of a compiler- compiler, not as the star attraction: the real interest and work lies in doing what comes after parsing and in doing it well (1).

(1) A lexic analyser is usually an elementary subroutine of a parser, and as such merits at most a footnote. 21 This chapter motivates my research. I contend that most so-called

"compiler-compilers" to date have not for one reason or another actually been compiler-compilers (but either "compiler-assemblers" or

"parser-generators plus a systems programming language" or "compiler- interpreters") and I discuss what would truly be a compiler-compiler.

2.2 MODELS OF COMPILATION

2.2.1 Non-optimising

The simplest compilers consist of code which causes the compilation to be performed in 3 phases (at least conceptually, though in practice the phases may be interleaved):-

1) Parse to construct a syntax tree (or other internal structure);

2) Walk the tree left-to-right, to fabricate an equivalent

macro-like intermediate-language (IL) program;

3) Macro expand the IL, allocate and assign registers.

This model is very simple. Ideally phase 1 matches language syntax, phase 2 language semantics, and phase 3 machine code. The design of the IL is clearly crucial to achieve the separation of the language and machine: section 2.3.1 discusses - the problems of designing the IL to facilitate such separation.

The macro expansion of phase 3 prevents the emission of machine code instructions chosen because of combinations of adjacent IL instructions: as a consequence the design of the IL is constrained to what can be said precisely as macros of each of the machines for which it may be an intermediate: this often has unfortunate effects on the design of the IL as a whole. A second unfortunate consequence of the naiv. macro-expansion is that it eliminates most opportunities for post- walk optimisation. 22

Many minor variations of the above model are possible: for example the IL might not explicitly exist as text at any instant of the compilation time - it may be an unwritten history of calls of a set of routines in the compiler, each routine corresponding to one particular conceptual kind of IL instruction.

Personal experience of compiler-writing suggests to me something I have not seen in any text book: that some semantic actions are just as viable during parsing as during tree-walk, and sometimes (when phase 2 becomes multi-pass or not entirely left-to-right) actually more viable during parsing. The easiest of these to explain is name- table construction in block-structured languages, given the possibility that names might be declared of nested scopes: separate nested incarnations of the same name must be distinguished as distinct name- table entries at least by the end of the block in which the outer declaration occurs. As a consequence of parse-time name-table construction, several complications are avoided in various languages: for example in ALGOL two mutually recursive procedures may be declared so that at the calls in the first-declared of the second-declared procedure it appears undeclared to a single left-to-right walk but not to a post-parse walk at the start of which all the declaration information is available. Also in strongly-typed languages some aspects of the symbol-table look-up and type checking are often viable at parse-time: sometimes this also is advantageous, e.g. when types can be declared and then procedures o^ operators can be over-loaded depending on types of operands (as in ALGOL-68, ALPHARD or ADA) type determination is so related to declaration that one cannot complete one without the other - either you nibble at one, then the other, then repeat: or you complete the two in one operation. On the fly during parsing you do what you can, and when a declaration is met of 23 something to which there has been previous reference one can either go back in the tree to that reference-ahead and resolve it, or make a note to perform the resolution later (e.g. at tne next end enclosing the reference-ahead). This parse-time resolution of references-ahead usually turns out to be just as trivial as branch- ahead resolution in an assembler.

The policy of pushing semantic code into the parse can be taken to trie limit (see paragraph 2.2.2A): also the interface between phases 2 and 3 can be made subroutine calls from 2 to 3 so that one has a one-pass compiler in wnich the tree is never used, and so need never be built.

(One outstanding example is the RSRE ALGOL-68R compiler). Such compilers tend to be fast, big (surprising until you consider they must be monolithic rather than overlaid) and to suffer deficiencies that make unusual requirements of the subsequent link-loader (e.g. ALGOL-68R has its own link-loader with the result tnat ALGOL-68R code cannot be linked-loaded with conventional ICL 1900 relocatable modules(2)). This phase-mixing is, however, a disaster if you wish to optimise: it compels you into a single simple left-to-right walk, whereas optimisation usually demands multiple walks, some going schizo- phrenically several ways at once. Only the first walk can easily be made simple left-to-right and then it is expedient to combine it with parsing.

(2) But wno would need - or even want - to do such a silly thing? (says any ALGOL-68 purist). Answer: I for one, because I write real programs rather than textbook examples. 24 2.2.2 Optimising model

Where in the non-optimising model can we add optimisation? Everywhere:

A) In the PARSE - really a combination with a first walk; B) At the interface between parsing and IL production as (possibly multiple) walks which DECORATE the tree locally with information analysed from non-local inspections of the tree and/or which EDIT the tree; C) During IL PRODUCTION - which may be a progressive incremental process spread over several walks or which may be interleaved with some of the decorate/edit walks of (B); D) REVISION of the produced IL: some optimisations might fit best here because of the linear program-representation (rather than the tree-structured representation of B & C) but before machine- idiomatic details complicate affairs: again this may be several passes; E) During the INSTRUCTION CHOICE from the IL of a first approximation to tne eventual machine instruction sequence: the main optimisation is not simply to macro-expand each IL instruction but to choose (better) collective expansions of several lines at a time. F) FINALISING the produced machine code for example:- •REGISTER ALLOCATION (3) - deciding whicn operands should be in some register and at what times, given the numbers and kind of registers available on the target machine, and making any amendments this requires to the chosen machine instructions; and •REGISTER ASSIGNMENT - deciding in which register, to hold an operand previously allocated to some register, rel/.Jing any conflicts of assignment by again revising the machine code:

and ASSEMBLING the final code into whatever form is externally

required - for a production compiler usually to relocatable binary

in a file.

(3) Register allocation is sometimes a byproduct of instruction choicel (For example, see chapter 6). 25

2.2.2.1 Examples of Optimisations for Different Phases of Compilation -

In place of the 3-phase non-optimising model we now have a generalised model with potentially 6 phases. Given a list of optimisations, I found many fit best in only one of the six phases: examples:-

A) Constant subexpression evaluation and consequent dead-code elimination during parsing; this is a common possibility in programs that have been over-maintained, but (more importantly) can arise through use of language features such as named constants, macro expansion, or generic subroutines.

B) Global data and program flow analyses (as a DECORATION); common sub-expression detection, dead variable elimination, constant propagation leading to more constant sub-expression evaluation and consequent dead code elimination (as EDITs);

C) Anchor pointing (also called "short circuit testing"): that is, PRODUCING:-

"if complexboolean then ... else ..."

as a mass of simple, tests, rather than evaluating the boolean to a register and performing a single test on that:

e.g. for :- if A or B ...

not:- X:=(A or B); if X ...

but i_f A then goto thenbranch; if not B then goto elsebranch; thenbranch: • • goto exit; elsebranch:

exit: 26 D) During the REVISION pass, permuting the order of the linear blocks (4) of the program to reduce the number of jumps compiled (static) or executed (dynamic), e.g. straightening out:-

... goto B: (Horrifyingly A: ... goto C: (common in B: ... goto A: (heavily-maintained C: ... (programs.

(where there are no other references to A B or C) into:-

. goto .B; B: . goto A; A: • C; C:

then removing the redundant labels and gotos.

E) During the INSTRUCTION CHOICE phase, employing machine idioms: expanding:- LOAD A ... A very common ADD 1 TL sequence. STOKE A

not as: LOAD R5, A ... In machine ADD R5, 1 code. STORE R5,A

but as:- LOAD R5,l ... Using A's double ADDTOMEN; R5,A reference.

or as INCREMEM A ... Using the fact that the increment is 1 on a machine with an "add one to memory" instruction.

Clearly both these latter sequences are dependant on the machine having suitable instructions. For example, on the PDP-8 there is almost a suitable instruction for the INCREMEM of the last sequence:-

ISZ (Increment memory and Skip if result Zero)

which is mainly intended to preceed a jump back to the head of a

(4) A linear block is a section of code that does not contain any labels except at the start of the section, or contain any jumps (conditional) or not) except at the end of the section. 27 loop controlled by counting a negative loop-index to zero: but ISZ supplies the "INCREMENT actions, so to discard the "skip" effect we could write a NOP (no-operation) instruction to be skipped after the ISZ-

F) FINALISING: replacing any jump to an instruction that is itself a jump by a direct jump to the destination of the second of the original jumps (this optimisation is often called branch chaining") (A more complex case is when the jumps are conditional and the condition of the first implies that of the second will be satisfied or if it implies the condition of the second cannot be satisfied the original jump is amended to have the instruction following the second jump as destination). These optimisations might first be desired after re-organisation of linear blocks (at phase D) but since phase E might introduce further local jumps they cannot be carried to completion until after phase E.

,2.2.2.2 Optimisation-order Problems -

With some combinations of optimisations it is hard to decide in what order they should occur:-

•Positive optimisation-order problems occur when two or more

optimisations should ideally be applied each after the other,

e.g. constant sub-expression evaluation and constant

propagation each reveal opportunities for the other.

•Negative optimisation-order problems occur when two or more

optimisations should ideally neither be applied before the

other because they (may) hide opportunities for the other. 28 There are several possible ways of resolving these problems:-

* For some combinations of optimisations that cause order-

problems it is possible to find a unifying method: e.g. a

single optimisation that subsumes both the problematic ones.

* Iteration: perform one optimisation, then the other, and

repeat till neither is applicable: i.e. till we have

convergence. We could iterate global walks or local walks.

This method is only applicable to positive optimisation

order problems.

* One could analyse the cost-effectiveness of the choices

(mainly for negative optimisation order problems). The

choices build up combinatially and so exponentially:

if the only analysis method available is brute force we

soon run out of machine power, so at some level we start

needing heuristics, guestimates, or just plain wild guesses.

Guestimate limits may also be placed on iteration for positive optimisation order problems: this obviates the programming problems due to the possibility of the iteration settling to a cycle: e.g. to a pair of solutions, the first optimisation converting from one to

the other, the second converting back so that convergence never appears

(at least detecting optimisation by setting a per-walk flag denoting

that some transform was applied during that walk). ^One could iterate a fixed number of times: often later iterations are not cost- effective, but occasionally they are remarkably effective due to log- jams breaking^ Other techniques of avoiding cycling can be devised: e.g. by not globally iterating indefinitely ABABABAB... but by iterating ABA (only) at each node in the tree working from leaves to root. 29

2.2.2.3 Review and Terminology -

The most general model of compilation I assume in the current research is:-

A) PARSE for tree: elementary semantic operations;

B) DECORATE/EDIT the tree;

G) PRODUCE the IL from the tree;

D) REVISE the IL;

E) INSTRUCTION CHOICE of the machine code from the IL; and'

F) FINALISE the machine code.

I will use the underlined words to refer to the phases. Each phase may be one or more passes or it may be part of a pass. Decorate/ edit and produce phases may be interleaved. B-F are code generation:

B & C are primary code generation: E & F are secondary code generation. D (REVISION) may be included in either primary or secondary code generation, or regarded as an intermediary.

2.2.2.4 Correspondences -

The PARSE fraction of phase A hopefully corresponds to language syntax: primary code generation to language semantics: secondary code generation to machine definition. D (REVISION) ideally corresponds to machine-independant language-independant abstract knowledge, although much of this tends to be oriented to the domain of objects on which the language operates (5), and so is weakly dependant on how classes of languages interpret the abstractions of the objects

(including overflow or other side-effects). However, efficiency tends to demand much of this knowledge elsewhere: earlier for constant sub-expression evaluation, later for which branch-conditions subsume another.

(5) e.g.: for the integers we need at least Peano's axioms (and consequent theorems) propositional logic and predicate logic. 30 2.2.3 Limitations of the models

The main observation to be made about the above models is that they

are far too simple. In a hand-crafted compiler there will be many more phases than those described above: each such phase will have

a very specific task and will be designed to embody abstract knowledge for that task (as foreshadowed in the preceeding sub-

section). I do not have the resources to attempt to produce such

compilers and so have set more limited objectives. See appendix A

section 1 and in particular ^kjLQO/ for discussions of "expert" phases in more complex models.

2.3 Cheaper methods for creating compilers

Writing any kind of program is a slow and expensive task: compilers -

especially optimising compilers - are exceptionally complex and large

programs, so it follows that writing them is exceptionally expensive.

To reduce that expense a number of strategies have been proposed:

this section discusses them. The first (and most obvious) exposed a major problem: the inherent mismatch of the high-level languages we wish to compile with the machine code languages of our computers

(see also ZftcKEE74i/).

2.3.1. The UNCOL Problem

This early proposal ( 1958-61) for reducing the cost of writing compilers for many languages on each of many varieties of machine was to devise a language sufficiently low that it is language-independent in the sense that for all desired high-level languages it is possible to write a front-end compiler into this language: but also to choose it sufficiently high that it is machine-independent in the sense that it is possible to write a back-end compiler from it to the language of any of the desired machines. To construct a compiler for a particular 31 language and a particular machine we now take the language's front- end, and input its output into the machine's back-end. For L languages on each of M machines we now need to spend effort of order h + M rather than L * M: i.e. proportionately less as L and

M get large.

Obviously the choice of the intermediate-level language is crucial: not too high, not too low. This was therefore tne first thing to design: it also gave its name to the project (UNCOL = UNiversal

Computer Oriented Language), and when tne project failed, to the problem that caused the failu.vX. To summarise the intent:-

Languages LI L2 L3 L4... w// Intermediate UNtoL

Machines Ml M2 M3 M4...

It emerged however that any "highest common factor" of the language would be lower than any "lowest common multiple" of the languages of the machines of the period:

Languages LI L2 L3

Machines 32 This phenomenon is apparently due to a need for some phase of the translator to consider certain aspects of language semantics simultaneously with corresponding aspects of machine architecture.

Worse: attempting optimisation increases the number of aspects needing simultaneous consideration.

The project foundered. Since then the variety of languages and of machines - even those that might be labelled "orthodox" - has increased, exacerbating the problem.

c^l]

See /CAMP!!/ /SHA587Aand for further information on the original UNCOL project.

Variations on the UNCOL idea have been tried several times since the original UNCOL project. The best known example is the JANUS project which liberalises the UNCOL idea from a specific detailed intermediate language to merely providing a framework for intermediate languages: the framework is to be fleshed out for a specific application. There have been several papers published on JANUS including /CBUE.1A/, ADVS^ and ^NEWVj?/7. They appear to be more concerned with the possibilities of JANUS than with actual applications of it: in particular they observe that the differences between high-level-languages in control flow primitives tend to disappear in the process of translation towards machine level, but that differences in data organisation and data manipulation primitives often remain important in the final machine codings.

2.3.2 "Portable" compilers

Compilers for several languages have been written quite successfully using a simplification of the UNCOL concept to achieve a limited machine portability: instead of a language-independent and machine-independent intermediate language they have produced a language-DEPENDlNT but 33 machine-independent intermediate language. Thus instead of M front- ends and M back-ends of compilers for the language of M machines only one front-end is needed - but still M back-ends. If affairs are carefully arranged so that as much work as possible is pushed from back-end to front-en^ by "lowering" the level of the inter- mediate language as much as independence of a reasonably large class of machines allows, then as much of the effort as possible is pushed from the part of the work to be done M times to the part done once.

Over several machines this would reduce the labour for the single language to maybe half or less of that for separate hand-crafted compilers. In tne back-ends there would be requirements for identical

(or very similar) subroutines so by re-using such code further savings could be achieved: for the more machine-dependent routines in the back- ends there may be the opposite possibility of using them in back-ends for other languages on the same machine.

There are several well-known examples of tnis method: two are reviewed below. Most "portable" compilers have been written in their own language (see section 2.3.4).

The BCPL language was designed for writing systems programs, and has been used for programming its own compiler, for operating systems research, and for such run-of-the-mill systems applications as a spooler for output to microfiche. The BCPL implementation "package" is based on an intermediate language called INTCODE ^KLZKIa/-. earlier versions of the package were based on a more complex and more inconvenient intermediate language called OCODE ^ICH697 /RICH7l7 ~ the need for change illustrates the difficulty of designing a "best" intermediate language even for a single language. 34 Because BCPL itself is untyped, both INTCODE and OCODE are typeless

(like orthodox assembler languages). Essentially they are the machine codes of abstract "BCPL machines" based on stack architectures.

The prototype code-generators for INTCODE and OCODE to machine code simulate the stack, delaying the output of the object codes as long as possible. The longer this "possibility", the more opportunity for local optimisations. Unfortunately these "possibilities" are relatively short due to frequent OCODE/INTCODE directives that require the stack be "normallised" (e.g. because of a label in the BCPL source code): this requires output of machine code to match the currently-delayed previously-read intermediate code (6). This leads to very good very local code at the cost of poor medium-term optimisation. The problem is essentially that in the translation from BCPL to intermediate language, much contextual information is lost, taking with it many optimisation opportunities which it is hard or impossible"for secondary code-generaton to recover. Additionally, since the abstract machine of the intermediate languages is based on the BCPL source language itself, the intermediate language tends to model BCPL and so merely postpones the source/object conflict: in particular on stack architecture machines the hardware-provided stack is especially likely to conflict with intermediate language machine stack. INTCODE was an attempt to ameliorate these problems after the experiences with OCODE, but still was far from an ideal. It was also supported by introduction into the "front-end" of several constants that could be set to specify particular features of the target machine architecture (e.g. the

(6) This need not apply to the well-disciplined labels (and jumps) due to IF-THEN-ELSE constructs or WHILE constructs (even if complex tests are anchor-pointed). It must apply to labels wnich are destinations of explicit GOTOs in the source program. See ^/kC)R807. 35 number of bytes per word). Even with this last, it would still

appear that the intermediate language needs some more flexibility

to adapt to the target machine, both for ease of implementation and

for better optimisation.

A similar regime but with greater adaptability of the front-end and

intermediate language is that of the ALGOL 68C compiler and it's

intermediate language ZCODE. The (front-end) compiler is parameterised with information that includes the numbers and types of the registers

of the target machine^ and so can try zo optimise their use. The

parameterisation causes conditional compilation within the compiler,

and so also leads to a smaller faster compiler than would be possible with one that had not thus been made target-specific. Even the greater

flexibility of the ALGOL 68C approach still seems inadequate for

encompassing some current architectures.

In this research, for lack of resources, I have accepted the "language-

specific intermediate language" approach and tried to deal with the

consequences, at least until such time as experience in using my

tools may suggest in which ways they ought to be made more flexible.

2.3.3 High-level machine codes

A converse approach to the "portable" compiler is the language-

independent machine-dependent intermediate language. This is

inherently lower-level than the "portable" compiler approach and so

involves much more work in writing the front-ends of compilers which

then will not run on other machines. This may be what is needed

(e.g. for marketing reasons) but technically is less attractive than

the "portable" compiler method to anyone who has responsibility for

machines that do not fit the chosen regime. For a single machine 36 (e.g. Hewlett Packard 3000) or closely related series of machines

(e.g. the ICL 2900 series) it has proven a viable technology. On

the 2900s it is the input-language of the link-loader, in which one instruction may generate different numbers of real machine instructions on different models of the 2900 range: a particular input sequence may generate output which depends on the whole input, rather than being an expansion of the first instruction of the input sequence followed by an expansion of the second instruction of the input sequence ... - in short the loader is a back-back-end for all the

2900 compilers.

2.3.4. System implementation languages

One strategy applied to the writing ofX j "systems programs" - including operating systems and support utilities as well as compilers - is based on the observation that when writing such programs one tends to use some programming constructs found in the general-purpose languages, and to neglect others. Furthermore, one also tends to have a few programming language constructs that would be very useful for these programs but are not generally available (7).

The systems implementation languages (SILs) have been designed to incorporate these desirable language constructs (8 ). One desirable feature in such languages is recursion: this can be forged using an explicit stack in a non-recursive language, but that leads to complex code which is expensive to write, difficult to debug, and near- impossible for someone other than the author to maintain. Another desirable feature is that the language be compilable into machine code of sufficient quality that the systems program runs at an acceptable

(7) in some instances I have suspected that facilities identified as systems-program oriented are instead systerns-programmer oriented. (§) This could be thought of as the least possible form of the stereotpying referred to in section 2.1. speed. This last is the requirement that usually rules out the

"general purpose" language such as ALGOL 68 or SIMULA 67 because the

compilers for them must be capable of all the non-systems constructs of

the languages, and this generality costs the run-speed possible from a

language carefully restricted to meet the speed criterion.

The three well-known languages in this class are BCPL (already mentioned

in section 2.3.2), the C language /klCHIQ/ (9) and BLISS (10).

For such languages there is the problem of in what language to write

the compiler. It ought to be the language itself, but cannot be because the first version of the compiler will be needed in a compiled

form to compile that same first version. The usual technique used to escape this impasse is to write the compiler for a minimal subset of

the language in itself, hand-translate that to another already- available language and so obtain the minimal-subset compiler: to amend

the source of the minimal version to still be in the minimal language but to compile a larger subset, and compile that using the previously- compiled minimal-version-in-itself. By repeating steps like the last

the language can be expanded as needed. This process is called

BOOTSTRAPPING. One variation of bootstrapping was critical to the work reported in this thesis: that version is described in more detail in section 2.5. For a recent general review of self-compiling compile® see ^LEC 10/»

(9 ) C is a direct descendant of BCPL. It is the language in which its own compiler and the rest of the system software is written, including the UNIX itself (apart from a very few parts which must be in assembler). As a result the UNIX system has been easily transported to several models of computer from the original PDP-11. (101 The PQCC project is very similar to that reported in this thesis. PQCC is originally a generalisation of the BLISS compiler for the PDP-11 38 2.3.5 Compilers based on macro-processing and pattern-matching

In the late fifties it became common for assemblers of machine code programs to have some kind of macro facility: the idea is to define a repetitively-used sequence of instructions as being equivalent to a name, and then to use the name in lieu of the sequence. Providing the name is well-chosen to suggest the effect of the sequence to which it is expanded, the program becomes clearer to read and write, and is easier to maintain because changes within the sequence need only be made once, rather than at each use (which would lead to the practical problems of being totally consistent). These schemes often allowed parameterisation and conditional assembly: for example in the GIN macro-assembler for the ICL 1900 computers the macro definition:-

//MACRO MOVE //ACC %A if parameter A is an accumulator STO %A %B assemble this line, //ACC %A if parameter A is an accumulator, //SKIP this line causes the bracketted ( group of lines to skipped. //ACC %B LDX %B %A //ACC %B //SKIP ( LDX 0 %A STO 0 %B

//NORMAL ; end of the macro-definition defines a macro to move the contents of its first parameter (%A) to be the new contents of its second parameter (/SB), choosing one of three code sequences dependant on whether or not one (at least) of its parameters is an accumulator (i.e. on the ICL 1900, is of address in the range zero to seven). A use of the macro of form:-

MOVE 5,NOTACC where NOTACC represents an address that is not an accumulator will 39 produce the single instruction STO 5 NOTACC to store from accumulator 5 to NOTACC: wheras:-

HOVE NOTACC,5 will produce LDX 5 NOTACC to load accumulator 5 from NOTACC: and :-

MOVE NOTAC1,N0TAC2 loads from N0TAC1 to accumulator zero and then stores that to N0TAC2.

The optimisation facilities shown possible (though tedious to define) by the example make macros much more than "in-line subroutines".

This prompted the questions:

1) Can the macro-facilities be divorced from the assembler?

2) For what other applications can macros be used?

In answer to question 1, quite a number of "general purpose macro- processors" were written, the best-known early one being Strachey's

The main loss of the divorce from the assembler is that information available at assembly-time cannot be used, for example the values corresponding to names used in the assembly program. The theoretical answer to question 2 is that GPM is equivalent to the lambda calculus and so is as capable as the most powerful computational system yet devised. The practical answer to question 2 is that whereas simple applications are easy to "program" in GPM-like systems, even mildly complex applications are very difficult to "program". The second generation of macro processors tried to meet this practical difficulty by adding facilities such as variables and such as elementary pattern recognition facilities to "trigger" a macro into expansion: a good example of this was Browns' ML/1 macro processor ^BROW67, 697 • an interesting variant was Waites'STAGE2 macro processor, itself implemented in a simpler macro-system written as a 96-instruction

FORTRAN program - so either by porting the FORTRAN to another machine or by hand-translating the FORTRAN to assembler for another machine (which 40 took about 4 hours for a machine) it was simple to port STAGE2 from machine to machine /wKI6>jJ /whlCSf At this period it was hoped that macro-generators could be used to implement compilers for languages such as ALGOL 60 and its immediate descendants /bKLSS/i but this possibility was soon obseleted . by the development of much more complex languages such as ALGOL 68. Since then there have been a number of related efforts to provide very cheap compilers for simple languages: for example McCaig /jfi&ZQO/ uses SNOBOL-inspired pattern matching to trigger macro expansi-rs: for one simple language McCaig gives a 355-line compiler, but claims that an equivalent compiler written in PASCAL took 2700 lines. Regrettably McCaig does not give a breakdown of that 2700 lines, but one gains the impression that many of them would be to provide facilities equivalent to some built into his macro processor and likely to be needed in most compilers written in PASCAL. 41 2.3.6 Compiler-compilers: main concepts and a brief history

Searching for more opportunities for stereotyping in compilers (as opposed to in systems programs generally) led to an alternative to the

UNCOL concept: to write a program or suite of programs to produce a compiler from input descriptions of:-

1) The language;

2) The machine; and possibly also

3) Information independent of both machine and language, for example "anything plus zero gives what you started with"; and

4) Information dependant on both machine and language, e.g. decisions by the implementor on "policy" matters such as how best to implement stacks on this machine (would they grow up or down? does the machine have stack instructions usable for the stack of the language being compiled) and best procedure call/entry/exit sequences.

The fundamental assumption is that the concatentation of front-end (for parsing and primary-code-generation) and back-end (for secondary-code- generation) of the UNCOL scheme allows no interaction between language- and machine-dependencies, but that it might be possible to write a program capable of resolving the interactions and so produce a compiler either in some machine code^ or as source in (say) a SIL.

In the 60's this approach was severely limited mainly because the problem was not properly understood. Not many compilers had been written, so the desirable stereotypes had to be founded on insufficient experience. The "descriptions" (11) of the various requirements would have to be given in some notation: but what should be the properties of such a notation? If those notations were regarded as "a high-level language specific to writing a compiler then the program that would output the required compiler would be something-like-a-compiler for compilers, and so was called a Compiler-compiler (11).

(tl) Vague terms for what (so far) are vague concepts. 42

As a result of this inexperience the attempts at compiler-compilers in the 1960's were only of minor success. Following the ALGOL report, the

BNF (12) notation used in the report for describing language syntax (or variants of the notation) clarified that part of the problem and also provoked considerable research into syntax-analysis methods (also called parsing methods), so by the end of tne 1960's that aspect at least was well understood. High points in this research into parsing were the theses of Earley /eaRF^/ and DeRemer /peR89/ which presented methods for programs which input a syntax description and so configure themselves into parsers for the syntax (13)*and the development of packages like SID (syntax improving device) 7^3687 which for some syntaxes can convert them into an equivalent syntax for which a fast parser can then easily be output.

Unfortunately all these advances had major disadvantages.To analyse an input, the generalised parsers of Earley and DeRemer can (and often do) take time proportional to the cube of the length of the input. SID-like transformations obtain an "equivalent" syntax parsable in time proportional to the length of the input, but only preserve equivalence of recognition of the input and would produce a differently-organised analysis than that of the original untransformed syntax, so that

"actions" expressed in terms ^f untransformed syntax would not be applicable to the analysis according to the transformed syntax.

If syntax-related actions are to be performed, there appears to be a choice between either inefficient parsing or structural restrictions on the syntax.

(12) BNF stands for "Backus-Naur" form after the orignators of the notation, J. Backus and P. Naur. (13) For over a decade it was thought that the methods of Earley and DeRemer were quite different until ^RA807 showed they are in fact instances of a single generalisation. 43 For the semantics of the syntax constructs - i.e. to the Compiler writer,

into what they are to be translated - little progress was made before 1970 (and what was achieved was ignored in many so-called

Compiler-compilers and -generators of the 1970fs). These early systems

accepted syntax definitions (in variants of BNF) annotated with

"actions" to perform during parsing at times corresponding to the

points in the input suggested by tne. places of annotation in the

The actions usually would be written in a fairly orthodox programming

language extended with facilities for accessing the information obtained by the syntax analysis. In the early systems this programming

language was usually in (or like) assembler: in later systems (and

particularly in the anachronisms of the 1970's) in a language based on

ALGOL 60 or PL/1. The programming language would often be extended and

supported by libraries of subroutines common in compilers. In short,

this first wave of attempts at Compiler-compilers input the syntax of

the language and generated the appropriate parser, which was fleshed

out with hand-written code. Neither the semantics of the language nor

the description of the machine would be inputs to these systems (14).

Such systems usually used the parsing as a logical framework in which

to fit all or part of the code generation process: as a result they were often referred to as "syntax-directed", but the term appears to have been used very loosely.

Some of these systems were successfully used to create compilers that saw

considerable practical use (notably the "Compiler-compiler" of Brooker

Rohl and Morris ^BRQO&2,3,jJ (15)) but their inadequacies prompted research

into how to describe semantics both of programming languages and of

( 14) They were also usually specific to a single computer architecture. ( 15) As far as I can find out, this name of this system is the origin-.-of the generic term. 44 machine codes, and then how to take such descriptions and create appropriate code-generators (16). Two major problems were encountered: devising a notation for the semantics (17); and how to match language semantics to machine code semantics for individual language constructs and so induce a translation from the language construct to a suitable machine code sequence.

If only a single computer architecture were to be the target of the produced compilers then these problems became much simpler: so around

1970 there were several systems in which the language semantics could be expressed in absolute terms (and therefore machine-independfintly), and the system had built into it the "knowledge" of how to translate from those absolute terms to the specific machine. By producing corresponding systems with identical inputs to output compilers each for a different machine a large measure of machine-portability could be achieved: such a series of systems would have much common code, so only the differences would need re-writing to deal with a new machine. i Perhaps the best-known of these systems is Feldmans FSL (Formal semantic language) ^FEL647 (18): it is typical of them in that the writer of an

FSL-input was excessively straight-jacketed in the semantics, being unable to easily write (for example) even ALGOL 60-style symbol table processing - and these systems did not provide any "escape hatch" back to assembler or to a high-level general-purpose language to facilitate

"filling in" these gaps. ( 16) It could be argued that the hand-written code-generators of the first-wave compiler compilers incorporated such descriptions implicitly whereas these efforts aimed to make the description explicit. ( 17) or even of defining quite what semantics are in general, let alone in a particular instance of a programming language or a program. ( 18) One of the objectives of FSL was readable semantics (rather than cryptic semantics): this was one of the inspirations of the present thesis. 45 Research on the more general problem of what semantics should be, and how they should be notated, started in the mid-1960's. It has yet to yield any spectacular results: even now the "new" language that is given a formal definition is a rarity, and in the instances that have a formal definition it has been an "afterthought" intended to • •'

clarify the English text of the original definition (1(J) rather than to replace the English. Unfortunately this research became confused with that into "semantics for purposes of program validation", which appears to be a quite intractable problem: the requirements of

Compiler-compilation are rather different and may be rather easier to meet, but have been largely neglected. There are some examples of validation-directed semantic definitions being used for Compiler- compilation purposes - e.g. ffioSl^J - but apparently with rather limited success. The work reported in this thesis has required a few fragments of a semantic theory: some of the ideas have been developed for the purpose, some stolen from the Production Systems branch of Artificial

Intelligence.

What little work has been done on the semantic of machine codes has tended to be by computer hardware designers, and so haU - been oriented to purposes such as the simulation of proposed machines in order to validate the logic design of the machine prior to the expense of machine construction (e.g. ^BKRloJ).

However, throughout the 1970's there has been an undercurrent o£ research into the semantics of intermediate languages, of machine codes, and into how they can be related to produce secondary-code-generators.

(19^ The ADA language is one such case: and as one of the designers of ADA has said "The formal definition dries up just as it starts getting interesting". 46

Most of the individual researches have chosen one facet of the problem in isolation from all others: as acconsequence they are mainly useful as a source of ideas rather than of detailed techniques.

A detailed review of all the literature on Compiler-compilers and related work would not add any clarity to this thesis at this point.

Reviews of the efforts most relevant to this thesis have therefore been postponed to Appendix A, and detailed issues relating to other work are discussed at appropriate points through ; this thesis.

My personal analysis of the situation at the present is that the biggest problem in Compiler-compilation is that we do not properly understand the problem. One manifestation of this is the variety of words that have been used in this subsection as near-synonyms of "make", as applied to the bringing of compilers, and parts of compilers, into existence: "Compile", "Generate", "Produce", "Write", "Output",

"Create" - all have different connotations. Similar confusion showed at deeper levels in most of the work I examined in my initial literature search: so in section 2.4 I try to clarify the taxonomy; in section 2.5

I try to clarify the underlying concepts of what exactly is a Compiler- compiler, whether it is possible, and if that is what we want - and if not, what? Finally, qhapter 3 develops the novel approach to

Compiler-compilation that is the main subject of this thesis. 47 2.4 CLARIFICATION OF TERMINOLOGY

What should be called a "compiler-compiler"? I ask this question because it seems to me that most products that have been called

"Compiler-Compiler" have instead been one of:-

* a Parser generator with facilities for inserting hand-written semantic generators written in a systems programming language; * a Parser generator with semantics-assembler (a "compiler- assembler?") in the sense that the specifications of what code is to be produced (and maybe of how it is to be produced) must be written in a crude low-level notation that is only weakly purpose-orifiitted; or * a Compiler-interpreter (20) with ... (various)^ but none of them truly either a Compiler-Compiler or Compiler-Generator.

The problem seems to me to be one of loose terminology leading to loose

thinking: so in this section I define the words causing the problem; in

the next section I ask what we really want and what modifications are

immediately needed to make it practical. Various other restrictions

and requirements are developed in subsequent chapters •

2.4.1 Definitions

In the following sub-sections several terms are taken for granted, on

the basis that they are "safe" for our present purpose:-

Programming language High-level programming language Low-level programming language Machine code Syntax Semantics

Both these terms and those defined below are taken in the computer

context with all the connotations that implies extra to normal English

usage.

(20 ) Sometimes called a "meta-compiler". 48

A SPECIFICATION is a precise and complete description of a problem to be solved. A METHOD is one way of manipulating data to solve a problem

(but seldom the only way to solve that problem). A PROGRAM is a statement of a method written in a form suitable for translation by computer to an equivalent statement (for the same method) in machine code.

To emphasise the differences between the above: a specification is a statement of what is to be done; a method is a statement of how something is to be done; a program is a realisable form of a method.

There does not appear to be any need to distinguish between realisable forms of specification and un-realisable fon^s (e.g. in English) - though as the technology of realising specifications becomes better developed, such a need may arise.

A COMPILER is a program for translating a program in a high-level programming language to an externally available program in a significantly lower-level programming language (usually, but not necessarily, to machine code). A GENERATOR is a program that accepts an input specification and translates it into a program (either in a high- or low-level language): this implies that the generator must supply a method no matter how "obvious" the method might be (21). The

( 21) This includes (as generators) programs which only have one method available to be supplied: and this may in turn impose restrictions on what might be specified. 49 generator may make a choice from a set of methods built into it (22), or it might make use of Artificial Intelligence techniques to deduce a method ( 23).

An ASSEMBLER is a program for translating a program in a low-level language to machine code. Since there are a spectrum of "levels" of language between high and low there is a spectrum between assembler and compiler (e.g., a sophisticated macro-assembler is at least comparable with, or even higher than, a compiler for FORTRAN 1 or 2).

An INTERPRETER is a program that reads the source of a program and simulates the effects demanded by that source program: it may first translate the source program into an internal form which may be partly in and/or close to machine code. Between an interpreter and a compiler there is the program that inputs a source program and converts it into machine code and starts the machine code without making the machine code or any intemediate form externally available: this is an IN-CORE COMPILER or LOAD-AND-GO COMPILER. The internally produced translation by an in-core compiler often relies heavily on subroutine calls back into the compiler 'itself, so the translated code cannot be made externally available for subsequent use (without the required parts of the compiler): because of this, and because the compiled code is in a sense partly interpreted by the subroutines, an in-core compiler is not a true compiler.

(22) e.g.: a sort-generator chosing a sorting method^considering the bulk of information to be sorted, the nature of the keys /lor Quicksort we must be able to estimate medians, of keys, for Distributive Partitioning Sort we must have a monotonic map of the keys onto the integer^, and the amount and characteristics of main and backing store available. (28) My techniques for secondary code-generation could be argued to be either a parameterisation of one particular method or to be a mild Al deduction: not resolving this distinction does not appear to have led to any confusions. 50 A PARSER is a program which takes an input notated in a complex syntax and decodes it into a form which outputs the same information as the input but in syntactically simpler forms (e.g. as trees and/or tables). So a PARSER GENERATOR is a program that accepts a specification of a syntax and outputs a program that is a parser for that syntax (24). a PARSER INTERPRETER is a generalised parser that can input a specification of a syntax and uses that to parameterise itself . into being a parser for the specified syntax (25).

A CODE GENERATOR is a misnomer: it is not a generator, but a part of a

// tf compiler. It begets originates or causes nothing: it merely duplicates in an amended form. A code generator takes the parsed form of a program and translates it into machine code. See section 2.2.2.3 for the definition of the primary and secondary components of a code generator. A better name might be ENCODER, but 'code generator' is so deeply entrenched that is J (reluctantly) used in this thesis.

(240 A) It could at first glance reasonably be argued that a statement of a syntax in (say) BNF is in fact a program in a purpose-oriented programming language, and a parser generator is therefore not a generator but a compiler (because "purpose-oriented" implies a higher level than "general-purpose"). This argument follows from the observation that compiler-translation to a lower level language must make an arbitrary choice between alternative possible translations at least at the very-detailed level and therefore is supplying "method" at that low level. This finesses the fact that they are different encodings into programs of one significant method - not into significantly different methods as in the BNF case: the argument must therefore fall. B) By a similar argument BNF is a specification language because it does not imply a method. A parser generator chooses and supplys a method. If an a priori statement is made that the method will be (say) LL(1) then it could be argued we have a parser-compiler instead: and BNF becomes a programming language in which programs that have logical bugs (left recursion, common leading factors) may be written. This seems to me to be a hair we need not split here. (25) Some authors call a Parser Interpreter a META-PARSER. 51 2.4.2 Summary of Definitions

* An ASSEMBLER performs a minor translation of a program: • A COMPILER performs a major translation of a program: • neither of those significantly changes the method implicit in the program translated. * A GENERATOR imposes a method on a specification to create a program which solves the problem of the specification. * Assembler, compiler, generator are ascending zones on a spectrum. (Intermediates on this spectrum are possible: e.g. macro- assembler between assembler and compiler.) • If the inputs are principally programs (with an implied method) but with minor specificational features then the term "generator" is not admissible. (How many specificational features are required to make "generator" admissible must be a matter of taste). 52

2.5 COMPILER-COMPILERS AND COMPILER-GENERATORS

The definition of "COMPILER-COMPILER" is now obvious:- "A program that performs a major translation (but which does not significantly add to the method) to yield a program that performs a major translation (but which does not significantly add to the method)."

Similarly, the definition of "COMPILER-GENERATOR" is:- "A program that performs a major translation (providing a method) to yield a program that performs a major translation (but that does not signficantly add to the method)." (26) Produced Compiler Neither of these terms applies strictly to High Low the software being written as the practical (27) Level Level part of the research project: my parser-

generator could (by the argument of

footnote 24B) be more strictly called a <1 V cCompiler - parser-compiler: the code to fabricate -Compiler/ V-- generator primary code-generators takes as input what is in many respects a program in a very problem-specific (and technique-specific) language, and so that fabricator is strictly a

"compiler": and the code to fabricate secondary code-generators takes as input only descriptions of the machine code and intermediate language plus hints from the implementor and then searches for a method from a spectrum of methods: again a footnote-24B argument applies, but with an opposed weighting.

(26) Similarly, a COMPILER-INTERPRETER would be a generalised compiler that parameterises itself by accepting specifications,,and/or programs describing how it is to behave as a specific compiler. Many so-called Compiler-compilers are in fact Compiler-interpreters with the semantic phases given as programs to be interpreted. Some authors call compiler- interpreters META-COMPILERS. (27) The UNCOL-diagram notation used here is explained in Appendix U. 53 Until the issue clarifies in my mind, I am using the term "COMPILER-

FACTORY" , since that has no connotations which might confuse us. For a verb I use "to FABRICATE".

The following subsections discuss several pragmatic consequences of the above definitions, assuming we are going to construct a monolithic Compiler Factory.

2.5.1 Required Inputs Versus Inbuilt Information

Both a compiler-compiler and a compiler-generator would require for the production of a compiler for a source language and target machine:-

• A specification of the syntax of the source language; • A program/specification of the semantics of the source language; • A description of the target machine, and the desired form of the compiler's output.

To date there have been only two classes of method devised for specifying semantics of fragments of programs or of proformae of programs: either to say "this means anything that satisfies those rules

- i.e. an "axiomatic" approach - or to say "this means that" (itself a description of a translation into a standard language) - i.e. a denotational or translational or operational approach.

The axiomatic approach is the subject of many other research projects but, although it is of interest for the present research, has been rejected from this project as not likely to yield sufficiently practical results during the next few years. 54

Taking the approach "this means that", the "that" may be either:-

(i) in the language the produced compiler is to produce, or

(ii) in some standard language and the translation from that

described either (a) implicitly inside the Compiler Factory;

or (b) explicitly as a third required input.

(i) has the disadvantages that every time a compiler is implemented on a new machine only the syntax part of the existing compilers for the same language on other machines can be re-used with little change: the hard part must be largely re-done: also that this in turn will tend to discourage the production of optimising compilers. (iia) prohibits portability except by a family of Compiler-factories with different

'implicit' target machines. (iib) permits p* ~tability and opens the doors to the Compiler Factory introducing optimising methods - i.e. to it being a bit more "compiler-generator" than "compiler-compiler" - the catch is that knowledge such as:- Specification "2+2=4" and "This" "That" "the length of the concatenation

of two strings is the sum of the CF lengths" must come from somewhere (a fourth input? However, this fourth input V only needs development once for all languages on'all machines and so could be inbuilt into the Compiler

Factory software - efficient but inflexible). The syntax and semantics files need developing once for each language: the machine-orienting file once for each machine. 65 2.5.2 Choice Of Languages For Inputs And Outputs

The language that the produced compiler is in and translates to

(possibly the same language) must also be given, either explicitly as

further inputs to the Compiler Factory or implicitly in the code of the

Compiler Factory itself.

The produced compiler is usually required to (29) H M produce low-level legible code or absolute

or relocatable binary. There are viable CF uses for compilation into another high-

level language (28). V There are two obvious candidates for the language the produced compiler is to be in: a fixed high-level language (shown in my diagrams as F) or the low-level target language of the compiler. (The high-level language the compiler is from is not a candidate because it may not be appropriate for compiler-writing and because it would cause bootstrapping problems for each new language, not just for the first implementation of the fixed high-level language.) A'fixed high-level language has the disadvantage that a compiler for it to produce the

machine code of the target machine may not exist, and so will need to

(28) Examples :- (i) to take advantage of the other language's previously-written subroutine library, and (ii) into the original language itself producing either a more legible program or a faster program. These were not objectives of the current work, though may in some instances be possible as a spin-offs. (29) In my diagrams I will draw the fabricated compiler as being from the high level language H to the machine language M 56 be implemented at least as a cross-compiler on the machine on which the Compiler Factory runs. This is only a mild disadvantage because:-

* The fixed high-level language can be selected as the best-suited of the languages available (diagram on rKjlti^r and either CF * We have a Compiler-Factory with which to implement the cross-compiler needed m for F to M runnable on the base machine

B on which the Compiler-Factory also M runs (diagram X below): or * If we do not have a Compiler for the i V language in which the Compiler-^Factory is written on the target machine, then we at worst have to portage the produced compiler from the base machine (diagram Y below). Shading denotes programs output by Compiler- Factory in . (diagram X) diagrams X Y and Z.

H = target language M = target machine * s language F = fixed language of Compiler Factory Cross-compiler for F-to-M on B B = base machines language B (i.e.machine on which the Compiler V Factory runs) 57 (diagram Y) H M ET3i Produced on machine

m m m B and portaged to be run on machine M.

B

B V V

Advantages of a fixed high-level language include:-

• the availability of an

initial compiler on some

machine (to avoid the need to

either bootstrap or to write

in the language itself and

later hand-trans1ate),

• the debugability of the (diagram Z) Compiler Factory, and • the portability of the H M L J compilers (e.g. to permit B Cross- cross-compilation when the ompiler memory of the target machine for H-to-M is too to run the on B. produced compiler) (diagram Z v on left). 58

If the fixed high-level language is that in which the Compiler Factory itself is written there are the extra advantages:-

• the portability of the Compiler Factory would follow from the

portability of the produced compilers, and

* at least parts of the Compiler Factory can be used to help write

the Compiler Factory itself (which in turn simplifies changing the

"fixed" language of the Compiler Factory and its compiler - i.e.

the fixing of the language is not final and rigid).

Thus we are pushed towards the Compiler

Factory and its produced compilers being in the same fixed high-level language, and the Specifica- H M tion of target language of the produced compiler Compiler to be "readable low-level". 7 7 7 F F CF CF i,

F B

fV CF as Compiler CF as written fixed run language T on base machine CHAPTER 3

OVERVIEW OF A COMPILER FACTORY

3.1 INTRODUCTION

In the last chapter the problems of writing compilers were discussed, and an attempt was made to remove some of the confusion surrounding the problem of writing programs that

(in some sense) write parts of compilers. As stated at the end of section 2.5, the objective was to deduce some of the properties of a monolithic program for writing compilers: the most obvious properties being the kinds of inputs and outputs of such a system, and the language in which the various parts could be written and their relative merits and demerits.

Sections 2.5.1 and 2.5.2 are therefore concerned with minimal properties for all possible compiler factories: and it appears that these minimal properties are those which are externally visible to the cross-implementor.

However, as soon as one tries to design a specific Compiler

Factory(CF), one has to remember the UNCOL problem and that it dictates that the internal organisation of the CF is crucial. Hopefully it is possible to call on all the expertise gained in writing two part compilers by writing a two-part CF:

* Parser-generator plus Primary-code-generator fabricator;

* Secondary-code-generator fabricator; producing components which have an intermediate language as interface. Having made this internal decision, the UNCOL 60

problem dictates that either some language information is used by the secondary fabricator, or some machine information is used by the primary fabricator, or both. Because the proposal

is not monolithic, it may be the case that neither component

conforms to 2.5, but together they will.

3.2 THE STRATEGY UNDERLYING THIS RESEARCH

In section 2.2.2.4 it was hoped that correspondences could be

found from parts of the definition of the language and machine

to components of the compiler:-

• -Definitions Compiler component

Syntax Parser Semantics Primary code-generator Machine Secondary code-generator

This presumes the impossible intermediate language (IL): machine-

independent and language independent. Instead this research

expects that, for each language to be implemented, an IL will be designed that is language-DEPENDgNT but machine-INDEPENDfSNT.

Re-interpreting the above table in terms of such IL we get:

Definitions Compiler component

Syntax Parser Language semantics in terms of the IL Primary code-generator semantics fIL semantics and j / u- j «• • 4_ - \ Secondary code-generator Cmachine definition )

- that is, the language semantics definition in absolute terms must be split into the definition in terms of the IL and the

IL definition in absolute terms. Thus the gompilers' structure 61 is a little more defined than in FSL-like systems. It is also expedient in another way: it avoids the problem of how to define the whole language semantics in absolute terms: instead it can be described in terms of the IL sequences to be produced for each of the language constructs. Only the IL and the target machine need definitions that are in any sense "absolute" (how?): and if the IL is regarded as an abstract assembly code for an "abstract machine" underlying the language, then the IL definition is also a machine definition, and the notations for defining the IL and the target machines can hopefully be made very similar. Because of differences in how the two machine definitions are to be used some differences may be inevitable.

The above approach seems consistent with the experience reported in section 2.3, and hopefully therefore will take us further than the compiler-compilers of the 1960's.

Given section 2.5 and the above, I have attempted to fulfil, the objectives of chapter 1 by designing and producing two major tools to assist in the creation of major parts of compilers:-

CF1: a suite of programs that convert syntax-dependent tree-to-IL transformation specifications into code to implement those transformations; and CF2: a program that takes definitions of an IL and a machine code and interprets these in order to modify itself to be able to translate IL sequences into corresponding part-complete machine-code sequences.

There are major components of primary-code-generation and (to a lesser extent) of syntax-analysis and lexic-analysis for which I have not observed clear stereotypes: so in CF1 I have had to leave "escape hatches" for these requirements to be filled by hand-written code until sufficiently many compilers have been written using the technology to have suggested some adequate covering stereotype. 62

Such "escape hatches" were not available in the FSL-like systems: in the pre-FSL systems "escape hatches" were all there was. Both of those approaches underestimated the problem: I have attempted to first clarify the problem and then to identify partial solutions in which I can be reasonably confident, and to extend such solutions to operate together to provide a more complete CF than any prior approaches to

'the problem.

The current CF2 is a secondary-code-generator-interpreter: a later

CF2 will output a secondary-code-generator and so be a secondary-code- genera tor-fabrica tor.

Another issue which must await experience gained by using the system is the exact nature of the ILs to be used. The machine-code sequences output by a CF2-interpreter (or by a secondary code-generator itself output by a true CF2-fabricaror) probably must also be constrained.

In particular the implementor may need to be able to prescribe exactly what machine code sequences be produced for certain IL instructions:

"portable compiler" experience suggests this is so at least for procedure call/entry/exit sequences, and for stack implementation: but for what else? The real machine's input/output will probably need

"regularising" for some degree of machine independence by a machine- specific library, the language may need a language-specific library: both of these will affect some aspects of the design of the IL. 63

To summarise this introduction, the following diagram attempts to relate the CF1/CF2 approach to the diagrams of section 2.3.1 (page 31 ) that illustrated the UNCOL problem:

LI L2 L3a L3b Languages Parsers/primary code-generators produced from descriptions of the syntax and of the semantics- in-terms—of-IL, using CF1.

Language-dependent ILs: could be I I common only to very similar wlanguages. Secondary code-generation according to IL and machine code descriptions, using CF2. Machine codes.

Given the separation into CF1 and CF2, the discussion of section 2.5 applied primarily to CF1.

The following subsections discuss CF1 and CF2 design strategies in more detail: the following chapters discuss CF1 and CF2 implementation and exemplify initial uses of them.

,3.2.1 Design strategy of CF 1

The key question here is how to define the translation from language constructs to IL. Before the FSL-like systems it was necessary to program the translation: the FSL-like systems considerably weakened this necessity, but still to a certain extent the FSL user was

"programming an FSL machine". Some efforts in the 1970's have attempted to provide languages specific to the kinds of case-analysis code common in hand-written high-quality primary-code-generators /BBiniJ fjpJKT&J

but the specifications of the logic are little more readable than those of FSL. To achieve this with high readability I modified a transformational notation employed in an appendix to 64

The essential concept of the notation is for each case of each syntactic construct to describe its translation as a sequence of fragments of:-

1) IL; and

2) calls (possibly recursive) on other transforms; in such a way that ultimately the applications of the other transforms yield IL sub-sequences, so that the whole provides an IL sequence.

For example:-

EVAL(-x) => EVAL(x)

MINUS.

- this transform specifies that monadic minus is equivalent to (and therefore can be translated as) the evaluation of the following subexpression, followed by the application of an IL operator "MINUS".

The 1970 version of the notation was intended for language definition, and reading it calls for some common sense to interpret the fragment of the language being defined, and any fragments in terms of which it is being defined (in the above example, "-x" and "x" respectively).

Two bits of common sense are needed for the above example: to decide that "-" is meant literally, and that "x" is a meta-variable for any sub-expression that may follow monadic minus. Another common sense application needed is when the language fragment includes (possibly mismatched) brackets: to decide that a bracket is the fragments closing bracket calls for knowledge of the syntax of the language being defined.

The transformation notation therefore needs more detail if it is to be

"understood" by CF1: 65

1) to be able to mechanically distinguish literally-intended

lexic constructs (1) of the defined language from the

syntactic meta-variables and from the closing bracket of the

transformation-language, I place the literally-intended text

in invert commas (2) (.3) , so the EVAL(-x) example now

starts:-

EVAL("-"x)

2) for the language-fragment patterns to be parsable the

syntactic class of which it is a member must be known. I

therefore prefix the language pattern with the syntactic-

category name and (as a separator) an exclamation mark, so the

EVAL(-x) example now starts:-

EVAL(ARITHEXPI"—Mx)

if -x is to be parsed as an ARITHEXP. Because this requires

that the input to the transform-fabricator must have detailed

knowledge of the syntax of the defined language, and because

the output of the transform-fabricator depends heavily on the

syntax-analysed form of the defined language the transform-

(1)1 call a single lexic construct (lexic object, token of the language, etc.) a "lexeme". (2 ) To avoid many repetitions of the clumsy phrase "literally intended" both in this thesis and in the code of CF1, I have instead adopted the greek prefix "ana-" (meaning "below", as opposed to "meta-", meaning "above"). I extend the use of "meta-" by referring to lexemes of the transformation notation as meta-lexemes (e.g. => and the bracket closing the mixed-level pattern in a transformation). (3 ) The ana-text may itself include a single or double inverted comma. In this case it must be enclosed in the other kind of inverted comma: for example the ana-text:- He said "Hello" to her. would be input in the form:- 'He said "Hello" to her.' If the ana-text includes both kinds of inverted comma, it must be broken into fragments that do not, and the fragments quoted:-- He said "Don't you know?", could be input in the form:- 'He said "Don'- "'" 't you know?".' This convention seems to avoid clashes with corresponding conventions of defined languages, and horrors such as the last example are rare in programming languages. 66

fabricator must match the parser of the defined language very

closely. I ensured this by writing my own parser-generator as

one of the CF1 suite of programs.

3) again for parsability of the language-fragment pattern, each

syntactic meta-variable in the pattern must state the syntactic

class for which it stands. The EVAL(-x) example now starts:-

EVAL(ARITHEXPIIDENTIFIER). . .

if the pattern is to match only arithmetic expressions of the

form ' followed by an identifier'.

4) if in the pattern preceeding the => sign a single syntactic

class name must appear more than once, then in patterns

following that =>, uses of that name must identify which of the

prototypes they match. For example, in the following transform

describing how to compile full-evaluation selection in

conditional expressions:-

EVAL(CONDEXPi "IF" BOOLEXP "THEN" EXP "ELSE" EXP ) => EVAL(EXPi EXP ) EVAL(EXPi EXP ) SELECTON(BOOLEXPi BOOLEXP ) :

if the fabricated transform-handler were to be applied to

IF X=0 THEN 0 ELSE 1/X then it is not clear which EVAL(EXPiEXP)

is to be applied to O, and which to 1/X (but it is quite clear

that SELECTON(BOOLEXPiBOOLEXP) is to be applied to X=0).

I would have liked to have used subscripts as distinguishing

marks (e.g. EXP. versus EXP0, or EXP,, versus EXP , ) but ^ 1 2' then else given the limitations of computer peripherals I chose to

separate the syntactic-class name from the "subscript"

with a caret sign also called a circumflex accent) (.4).

( 4 ) The following clarification of terms is now possible: a syntactic class-name followed by the caret and subscript (if present) are a meta-lexeme which is a pattern for any instantiation of the syntactic class. 67

The full IF-example now becomes:

EVAL(CONDEXP: "IF" BOOLEXP "THEN"EXPA1 "ELSE" EXPA2 )

=> EVAL(EXP: EXPAI ) EVALCEXPi EXPA2 ) SELECTON(BOOLEXP i BOOLEXP ) :

which states clearly that IF X=0 THEN 0 ELSE 1/X is to be

trans formed to:-

EVAL(EXPi "0" ) EVAL(EXP! "1/X" ) SELECTQNOBOOLEXPi "X=0" )

The BOOLEXP is quite unambiguous without any subscript, so its

caret and subscript may be consistantly omitted.

The notation was primarily adapted as a powerful tool for writing the controlling code of primary code generators in a very clear notation

( 5.). For example, optimisations can easily be specified as based on

"special case" patterns: given the above general transformation for conditional expressions, special cases might include:-

EVAL(CONDEXP! "IF TRUE THEN" EXPA1 "ELSE" EXPA2 ) => EVAL(EXP i EXP^l ) : and EVALCCONDEXP! "IF" BOOLEXP "THEN" EXPA4 "ELSE" EXPA4 ) => EVALCEXPi EXPA4 ) :

(In the second of these two examples, the double use of EXPA4 in the pattern to be matched restricts the application of this transform to instantiations of CONDEXP that have identical EXPs after THEN and after ELSE.)

I have been able to discover no other attempt at such a clear notation for this limited but important task. The price paid for this was

700 hours to write the parser for the patterns, and 600 hours to write the code to analyse the relationships between different transforms of

(.5 ) To maximise visual clarity, the user must make judicious use of "white space" when writing the patterns, as in the examples. 68 the same name (e.g. EVAL) for the same syntactic class (e.g. CONDEXP), and the relationships between patterns within one single transform

(i.e. between the pattern the transform is from, and the pattern(s) the transform is to). These 1300 hours were out of the total 1900 hours for the whole of CF1.

3.2.2 .Design strategy of CF2

As for CF1, a key question is how to define the required translation.

The arguments of section 2.5 are highly relevant to CF1, but not to

CF2: this means that the present subsection must develop concepts for

CF2 akin to those of section 2.5 given the fundamental policy decision of the CF1/CF2 split as discussed in - ' '

If we attempt direct translation from a variety of language-dependent

ILs to a variety of machine codes, we encounter only an exa(grated version of the UNCOL problem: the "highest common factor" of the ILs would be lower than some of the higher-level machine codes. To continue the metaphor of the diagrams used in 2.3.1 to illustrate the UNCOL problem:

Intermediate il^ il2 languages

Highest Machine codes practical 7 m of various 0 a common target levels "b of the ILs.

But orthodox ILs are very much like high-level orthodox machine codes: so the definitions of ILs and of machine codes could be very, similar.

How then would we define either? As for language' semantics, we have 69 the choice between the axiomatic approach (this means anything that satisfies that rule) or the approach of describing the meaning in terms of equivalences to sequences in some agreed standard language

(this means that). To apply the axiomatic approach to translation from an IL to a machine code would exceed the practical limits of current mechanised theorem-proving methods.

We must then define both machine codes and IL as translations to standard languages, or ideally to a single MACHINE DESCRIPTION

LANGUAGE (MDL). The IL-to-MDL definition and machine-code-to-MDL definitions will be used differently, so some differences in what needs saying in an IL definition and a machine code definition may have to be expected. In the above diagram the "highest practical common target of the ILs" would be a description language for the conceptual IL-machines: to be able to describe orthodox machine codes as well it would need be even lower:

Intermediate languages

M Machine c

— Languages L MDL — Machine Description Language 70

What we now have for a particular translation from a specific IL to

a specific machine code is:-

IL

MDL

What is then needed is to apply the IL-to-MDL translation and then to

apply an inverse of the M-to-MDL transation ( 6•) • This could now be

represented ass-

IL

MDL

Initial paper-and-pencil trials also suggested that by, analysing

commonalities in the definitions of the IL and of the machine coda, it

should be possible at the time of fabricating a code-generator to

produce a list of possible optimal and direct translations from IL

sequences to machine code sequences. By transferring the unavoidable

heuristic searches from the code-generator (run many times) to the code-

generator-fabricator (run once for a single IL/machine combination) it

( 6) For simple examples in which the MDL is produced as text, the inverse translation is very close to parsing an ambiguous language /GLA7T7. Given an MDL sequence, of the possible "parses" into machine code the most interesting is the shortest (i.e. the fastest). Eliminating other parses quickly would appear to be very difficult, especially if any transfers (e.g. ACC->MEM) are needed in the optimal parse. Even worse problems arise if the machine code definition includes opposed pairs of transfer instructions (e.g. ACC->MEM and MEM—>ACC) or other longer cycles of transfers: this problem is not mentioned in /GLkllJ, 71 becomes possible to produce optimising code-generators that run fast themselves. This technique for "cutting the corner" is discussed at length in chapter 6, and could be represented as:-

IL

MDL

A major practical problem with "cutting the corner" is knowing when to stop searching for commonalities: longer and longer sequences give hope of finding more and more remarkable optimisations. The fabricator might never stop: and if it did, it could have produced a vast library of sequences for the secondary code-generator to seek in the input IL, so that the secondary code-generator would take an unacceptable time to translate any IL input as the price for producing remarkably optimised machine code on rare occasions. Chapter 6 proposes one method for avoiding these problems.

The implementation methods used in CF1 to fabricate a primary-code- generator were complex and many, but were discovered by design and fore-thought. I have not been able to do this for secondary-code- generation, despite a substantial related literature search followed by many hours attempting designs. The initial CF2 is therefore experimental, to determine the methods needed by both the secondary- code-genera tor- fabricator and by the secondary-code-generator it fabricates. The intention is to let this experience suggest the division of labour between what can be done by a replacement fabricator, and what must be done by the programs it produces. The interim CF2 is therefore not a -fabricator, but a secondary-code-generator-interpreter. 72

Just as I have endeavoured to avoid restricting the choice of ILs until there is experience to guide the choice of restrictions, so I have endeavoured not to restrict the choice of MDLs, except syntactically.

A statement of an effect in MDL is required only to be composed of names (which may be suffixed to indicate identity of multiple occurrences, as in CF1 patterns), monadic and dyadic operators. One of the inputs required by the current CF2 is a list of the allowed monadic operators and a list of the allowed dyadic operators. The only final choice that

I have had to make is that there will be only one transfer (assignment) operator, ->. This is because transfer operators must be handled differently than any other diadic operators. If more than one transfer operator is required (say for single-length assignment as opposed to double-length assignment) then this must be indicated in the structure of the source and/or destination of the transfer operator: for example if we choose monadic operators (i) asterisk to indicate a single-length source and (ii) dounle-asterisk to indicate a double-length source then:-

* source -> destination and ** source -> destination respectively describe single- and double-length transfers. This trick defeats one of the efficiency heuristics of the current experimental

CF2 program (but other tricks are described in chapter 6 which preserve efficiency in the current CF2).

However, despite the above, in the experimental machine descriptions written so far only a single set of MDL operators have been used, so each description of a machine code can be used with all descriptions of ILs (and vice versa). This has facilitated flexibility in my initial experiments in CF2 use. CHAPTER 4

FABRICATING A PRIMARY CODE GENERATOR

4.1 INTRODUCTION

The last chapter presented a sketch design for a fabricator

of primary code generators: this chapter gives the detailed

syntax and interpretation for inputs to the fabricator: the re-

next chapter discusses the main implementation issues.

The input to the fabricator is in the form of transforms from

patterns of syntax to a mixture of other transforms and/or IL

fragments. To interpret the patterns correctly, the fabricator

needs to know the syntax and lexis of the language. Also the

output of the fabricator is heavily dependent on the formats

of the syntax trees to be generated. For both these reasons • ;

it was expedient to create a parser generator as part of the

fabricator. The parser generator was not considered as a key

part of the research, but introduces several interesting

features. The minimum work was spent on it consistent with

the requirements fo r the transforms and for the produced parsers to be tolerably efficient.

Because of the efficiency requirement, the parser generator was separated into a lexic generator and a syntactic generator.

The lexic generator is presented in section 4.2, the syntactic generator in section 4.3. The inputs to these two generators

bear marked similarity in their higher level constructs, but 74

differ in their primitives.

Section 4.4 presents the transform input and transform generator and illustrates its dependence on the syntactic and lexic inputs. Section 4.5 gives a complete but small- scale example: most of the examples in the preceding three sections are extracts from it. Finally, section 4.6 reviews what the present design can and cannot do.

The input to the fabricator is in three divisions: first the lexic decision, then the syntactic division, then the transform division. Each division depends heavily on the forgoing division(s): the lexic division also depends lightly on the syntactic division.

The fabricator comprises three generators , one per division.

In all three generators of the fabricator, if a single clear stereotype was not available for some kind of compiler-component, then facilities for generating that component have not generally been provided. Instead, in all three generators, facilities have been provided to escape into hand-written SIMULA (the fabricator itself being in SIMULA and generating primary code- generators in SIMULA). When sufficient experience has been gained with such a kind of component that a viable stereotype can be discovered, it can be implemented either as new fabricator features or as extensions to the library of SIMULA source modules thafe can be incorporated into fabricated code.

In all three divisions a major design requirement was to maximise the legibility of the input. In this respect, experience later suggested minor redesigns which are incorporated in the following 75

description. Legibility is particularly a novel feature for the transform generator: the literature search yielded no prior primary code-generator generator whose input was legible in large quantity,,

4.2 The Lexic Division

The syntactic analyser does not concern itself with individual characters, but with lexemes. A lexeme is a combination of characters accepted by a fabricated lexic analyser.

Most of the varieties of lexeme of the language to be implemented by the fabricated code are described to the lexic generator by a set of rules which comprise the first division of the input to the fabricator. The remaining lexemes are extracted from the syntax rules comprising the second division of the input: if a syntax rule includes a sequence of characters enclosed by two double-quote characters then the enclosed sequence is a variety of lexeme in its own right.

The remainder of this section discusses the lexic rules. A lexic rule is of the form;

* an identifier (called the lexic meta—variable of the rule)

followed by

* an equal sign, followed by

* a list which defines the effect of the identifier.

An identifier is a sequence of letters and/or digits, of which the first character must be a letter, and may include embedded underscore characters. The letters and digits are significant; underscores

(if any) are ignored, but can be used to assist the human eye. 76

The identifier can be used in the syntax to accept a lexeme described by the list in the lexic rule defining the identifier.

Each identifier used must be defined precisely once, in the lexic division or the syntactic division.

The list in a lexic rule is a sequence of atoms: an atom is either primitive or compound . A primitive atom is one of the following:

* one of the identifiers LETTER DIGIT ANY EOL or EOF; or

* a-literal character, which is of form a single-quote

character, any character, and.another single-quote

character.

The identifier LETTER accepts a letter (upper or lower case).

The identifier DIGIT accepts a digit. The identifier ANY accepts any character, but not end-of-line or end-of-file.

EOL accepts end-of-line: following accepted characters must be on a different line than preceeding accepted characters.

EOF accepts end-of-file, but does not consume it (so multiple

EOF acceptance is legal). A literal character accepts the character between the two single-quote marks: this character may itself be a single-quote.

As an example of the above, a lexic rule describing,the form ... , of literal-characters would be:

LITERAL_CHARACTER = »•» ANY '''

This would accept each of the following:

i i ii 1(1 i g i i u i i hi iii

In a list there must be at least one space or end-of-line between 77

two identifier primitive atoms, and there may optionally be spaces or end-of-lines between any other pair of primitive atoms or other signs used in writing the compound atoms.

There are four varieties of compound atom: exclusions, options, choices and repeats .

An exclusion is of the form of a sequence of primitive atoms separated by minus signs. The minus sign is read as "but not".

An exclusion accepts any character accepted by the first primitive atom in the list that is not accepted by one of the later primitives in the list. For example:

ANY - LETTER accepts any character that is not a letter:

DIGIT - '8' - '9' accepts an octal digit.

An option is of the form of a list enclosed in square brackets.

If the first character not accepted by the atom before the option is acceptable to the first item of the list in the square brackets, then the effect is as if the square brackets were not present: otherwise the effect is as if the square brackets and the enclosed list were not present. If the first character is acceptable to the list, then succeeding characters required by the list must be acceptable, or an error is signalled. For example:

EXAMPLE_OPTION = 'A' 1B ' - '6' J 'D'. will accept AD oh ABCD: but if ABD was presented to the lexic analyser then the D would be signalled as an error ("expecting C").

(Similarly if AE was presented, thgn the E is not acceptable to B, 78

and would be signalled as an error - "found E when expecting D").

A choice is of the form of two or more lists separated by exclamation marks, the whole enclosed in round brackets. If a character to be accepted by the choice is acceptable to the first atom of the first list, then the whole has the effect of the first list: otherwise if it is acceptable to the first atom of the second list, then the whole has the effect of the second list: and so on. If the charac- ter is not acceptable to the first atom:of any of the alternatives, an error is signalled. For example:

EXAMPLE_CHOICE = 'A1 ( *B' 'C' ! LETTER ! DIGIT DIGIT ) 'D' would accept ABCD (first alternative), AXD (second alternative),

A89D (third alternative): but ABD would attempt the first alterna- tive, and give the error message "found D when expecting C".

Attempting to accept A* meets none of the alternatives, and gives error "found * when expecting A or LETTER or DIGIT".

A repeat is of the form

an open-diamond bracket (<); followed by

a list; followed by

an asterisk (*); followed by

a list; followed by

a close-diamond bracket (>).

It accepts one or more times what would be accepted by the first list separated by what would be accepted by the second list.

For example:

EXAMPLE_REPEAT_1 = 'A'^B' "C • * f / ' > 'D1 would accept ABCD ABC/BCD ABC/BC/BCD ABC/BC/BCtf%,CS> fcttf licb

ABCBCD or A/D. Either of the lists may be empty, but not both. For example:

EXAMPLE_REPEAT_2 = 'A' < *B' 'C' *>'D' would accept ABCD ABCBCD ABCBCBCD ABCBCBCBCD etc: and

EXAMPLE_REPEAT_J3 = 'A'<* • /' > *D • would accept AD A/D A//D A///D etc;

Each form of compound atom may be used freely in lists in options choices and repeats. For example:

NUMBER = ('8' £ ( < DIGIT*> ! 1R '^OldJ IT~ ' 8 ' - 19 1 *> )J

! )

accepts either an integer decimal number, or an integer octal number of form 8R followed by one or more octal digits. Also:

IDENTIFIER = LETTER < * (LETTER!DIGIT!)> accepts an identifier, except that the underscore is not marked out as being insignificant.

When a character is accepted, it is normally put in a buffer, and so is significant to later processing of the text of the lexeme. If the escape 0OFF0 appears before or after an atom that is accepted, from that time on accepted characters are not buffered, and so are not significant. If it appears before the • atom, it affects the atom; if it appears after, it does not affect the atom. Buffering is turned on by the escape $0N(£: there is an implicit ON-escape after the equals sign in every lexic rule.

The following rule ..would accept any sequence of characters other

than double-quote (including the empty sequence), enclosed in two

double-quote marks.

STRING = @OFF@ 0N@ <* ANY-""> Q OFFg "" 80

There are two other forms of escape allowed in lexic divisions:

Q PUT (9

directly puts the text between @PUT and the following Q (indicated

above by an ellipsis) into the fabricated'code at the appropriate

point. (*GET . . .g,

expects the text betweengGET and the following (9 ,to be the name

of a file: the text in the file is inserted at the appropriate

point in the fabricated code. These two escapes for inserting

handwritten code into the lexic generator have so far proved of

most use for debugging complex lexic rules.

Ignoring the varieties of lexeme inferred from the syntactic

division, the effect of the lexic rules is as follows. When the

syntactic analyser demands a lexeme from the lexic analyser, the

next character not so far accepted is considered. If it is a

space or end-of-line it is skipped, and the process restarted.

The first rule in the lexic division is thentried to see if it

will start by accepting the pending character: if it will, the

rule is applied, and a lexeme of type identified by the identifier

is returned to the syntactic analyser. Failing that, the second

rule is tried, and so on, until either a rule is selected by

accepting the pending character, or no rule accepts it in which

case an error message is given, the character is skipped, and

the lexic analyser re-started.

Thus the order of the rules determines their precedence if the

acceptable first characters can overlap, otherwise the order is

an efficiency matter - the rules for the most common kinds of lexeme should appear first. 81

For the experiments to date, the inter-lexeme spacing described above has sufficed. Later a facility could be added that if a rule with identifier SPACING is defined, it does not have the normal significance, but instead defines what is to be ignored before significant lexemes. The present convention could then be defined by:

SPACING =<* (• ,!EOL)>

For most languages, it would also be desirable to regard comments as part of spacing. If the present spacing was extended to allow a comment of form "the character tilde, then any characters up to the end-of-line", the above definition would be changed to:

SPACING =<* ( ' ' ! EOL ! < * ANY> EOL )>

However, a number of current languages have comment conventions that cannot be handled by the above facilities. As an example, an ADA comment is recognised by its two leading characters: the first of these alone is a quite distinct type of lexeme. That demands at least two-character testing, not one. A worse example is the ALG0L-60 "fat comma" comment: in actual-parameter-lists, a separating comma can be replaced by a close parenthesis, a sequence of characters not including semicolon or either of the subsequences end or else , and finally a colon and a open parenthesis. The fat comma is recognised by the ":(" at its end - requiring indefinite lookahead.. There is currently no way to exclude sequences of characters from a lexeme (only single characters), so the exclusion of end and else cannot be handled

(but the exclusion of semicolon can). The whole fat comma is insignificant, but must be replaced by a comma in the buffer - which would require a new escape that appends characters to the buffer which did not originate from the input. 82

In general, comment conventions are so varied that no stereotype has yet become obvious.

The notation seems to be powerful enough to meet most non-comment

requirements of most current languages, and so is satisfactory

for the present.

Note that it is like a special-purpose programming language in

the sense that it is quite easy to program bugs in the lexis notation. Consider the following four attempts at rules for a

decimal number that must have either an integer or fractional part, but one of them can be hull (i.e. the strings 123 123.456

123. and .456 should all be acceptable):

NUMBER_1 = £ ^~7 ' • yJ

NUMBER_2 = £ • . 1 <*DIGIT>_7

NUMBER_3 = <*DIGIT> •.1 <*DIGIT>

NUMBER-4 = (< DIGIT*> £ '.1 <*DIGIT>_7

! '.' )

NUMBER_1, accepting 123, would use the inner option and so give an error when is not found. NUMBER^ will not accept .456.

N!JMBER_3 will accept decimal point without any digits.

NUMBER_4 is correct, but long: it could be abbreviated as follows.

The first alternative of the choice could be transformed to:

^DIGIT *> £ 1 . ' J <*DIGIT>

This is of the same effect because the second repeat can only be entered if a ooint was accepted by the option - if there is no point, the first repeat will accept all digits leaving none for the second repeat. The second alternative could be transformed to:

' . ' DIGIT <*DIGIT^> and then the common trailing loops of the two alternatives can be 83

factorised out of the choice, yielding:

NUMBERJ^A = ( /~'.'J7 ! 'DIGIT ) <*DIGIT>

This is only marginally shorter than the original, and somewhat harder to understand.

It will be seen that the lexic notation has several of the properti-es of a programming language. It is reasonably clear to read, until quite complex forms are needed; there is a notion of flow-of-control within rules, also between them; and bugs are possible. For the present purposes, it suffices.

The following lexic division will be used in syntax and transform examples.

id- letter -.m (letter ! digit)j

uin3 igmedno - < l ' • ' j ! '.#:digi7*> ) :

text ' <*ANY- '"> :

ef- eof.

4.3 THE SYNTACTIC DIVISION

In many respects the notation of the syntactic division resembles that of the lexic division: rules, lists, options, choices, and repeats are all present with three slight changes:

* atoms in a list are separated by commas (the syntactic

atoms are textually longer than lexic atoms, and experience

suggested they need stronger visual separation than lexic

atoms) ;

* in a repeat, the first list may not be null; and

* an option may not be the last atom in a list.

The main differences are that there are no exclusions in the syntax notation, that the primitive atoms are completely different, and there is no flow of control between rules. 84

The identifier of a syntaciacrule is called the syntactic meta-variable of the rule.

The primitive atoms are metavariables.of lexic or syntactic rules, and reserved lexemes.

An identifier defined by a syntactic rule invokes its rule in place of itself: the resulting sub-parse tree replaces the identifier in building the parse branch-point corresponding to the present rule. An identifier defined by a lexic rule accepts a lexeme defined by that rule, and the resulting buffer is used as a leaf subtree in the parse branch-point of the present rule. A string of characters enclosed in double-quote marks defines a reserved lexeme, and accepts the enclosed characters only. In such a reserved lexeme, two adjacent double-quote marks stand for one enclosed double-quote mark: thus """" stands for a reserved lexeme of one double-quote mark.

Each reserved lexeme must be classified according to the lexic rules. There are two cases:

* If the first character of the reserved lexeme is not

acceptable as the first character under any lexic rule,

then the generated lexic analyser will examine, for. the.. .

reserved lexeme if on entry it finds the first character

pending. If there are two reserved lexemes such that

one is a prefix of the other (e.g. "=" is a prefix of

then the longest found is accepted.

* If the first character of the reserved lexeme is acceptable

as the first character under a lexic rule, then the lexic 85

rule is interpretively applied by the fabricator to

the whole reserved lexeme: there are two subcases.

* SUBCASE 1. The reserved lexeme (or a prefix of it)

is acceptable to the rule.

* SUBCASE 2. The reserved lexeme is not acceptable

to the rule (including the possiblity that the lexeme

with a non-empty suffix would be acceptable).

In both subcases the fabricated lexic analyser attempts

to match the reserved lexeme and the rule with the input.

If only one is acceptable, it is accepted. If both the

reserved lexeme and the rule would accept the input, in

subcase 1 the reserved lexeme is accepted (and the lexeme :

of the rule rejected), in subcase 2 the rule is accepted

(and the reserved lexeme is rejected).

Thus in all cases of ambiguity, the longest possibility is accepted, and in the case of equal lengths the reserved lexeme is accepted. For example, if the only rule starting with LETTER was simply for a sequence of letters, and if the reserved lexemes of the language included 'of', 'oft', and 'office', then the input

'off' would prefer the rule over any of the reserved lexemes

(in particular, over 'of'): but input of exactly one of the reserved lexemes followed by a non-letter would prefer the reserved lexeme.

In all the constructs, in "testing" situations of the accepting process, it is the pending lexeme that is tested. If the atom against which it is tested is a reserved lexeme or the identifier of a lexic rule, then it is that against which the lexeme is tested.

If the atom is the identifier of a syntactic rule or is a compound ,86

atom, then the set of possible types of lexeme that could

start the atom is fabricated: when the fabricated code is run,

the type of the pending lexeme is tested for being in that set.

For atoms that specify possible acceptance of an empty string

(that is, repeats with a null first list, and options) the

testing set must be unified with the testing set of the next

atom in the list. If the empty-accepting atom is the last in :

its list, the testing set cannot easily be formed. In particular,

if the atom is last in the rule, or at least in an alternative

of a choice ending the rule, the testing set would depend on the

rule referring to the current rule. Because the parser-generator

was not a major objective, no attempt was made to handle this

problem. Instead, options at end of a list in a testing context

were banned, and repeats with a null first list were banned (in

all contexts: if the effect is needed, it can be acheived by

writing / _/ ). In practice, few situations have yet

arisen in which these prohibitions have been a prima facie problem (e.g. declarative_part in July 80 Ada): but on

examination, they have all suggested equivalents that were in

any case preferable. (e.g. Ada declarative_part appears in four contexts, in three of which - block, subprogram_body, and

task_body - it is followed in the rule by:

begin sequence_of_statements /_ exception handler _/ end

and in the fourth - procedure-body - by:

j_ begin sequence_of_statements / exception__handler_/_/ end

It is better replaced by a rule that concatenates its right-hand-side with the j_ begin . . . _/ end form above, and by having the transforms check the presence of the begin ... part in the cases of block,

subprogram_body, and task_body.) The role of the repeat loop in the lexis notation was to allow repetition. In the syntax, because the syntactic meta-variables can be used in the right-hand sides of the rules, looping can be achieved by recursive rules:

L_RECURSIVE = /~L_RECURSIVE_7, '£'

R_RECURSIVE = '£', /~R_RECURSIVE J

Each of these examples defines a sequence of '£' signs. However, for the same sequence, they define different parse trees: for three pound signs the trees are:

L RECURSIVE R RECURSIVE

The transforms depending on the rule will usually require one of these trees, to the exclusion of the other: thus both tree forms must be specifiable in the syntax. For right-recursive forms such as R_RECURSIVE this is done as above. Unfortunately a left- recursive form such as L_RECURSIVE above introduces a bug into the fabrication process: in forming the test of whether the list in the option is to be used in the accepting process, the fabricator will try to recurse to include the same test. Again, because the parser was not a key part of the research, this was disallowed by giving . an error message at the first attempt to recurse forming a test .

Hence the repeat loop is provided for left-recursive specifi- cation :

X = is treated as equivalent to the left recursion

X = /~X,B_7 , A ,88

for the purposes of the transforms, but the fabricator generates

quite different code for the parser, in which the test for the

repitition depends on B (if it is not null).

Additionally, a repg(]Lb is not allowed in a list except as the

only atom on a nc|Ut; hand side. This avoids the nodes of the tree

of any parse of it being anonymous. :Such would be un-namable to

the transform generator, and so would be useless.

The trick of using rep.^ixi^ to avoid explicit left-recursions

does not help in the case of second- or higher-order left recursions

(mutual recursions) such as:

A = ( ... ! B, ... ! )

B = /" A, ..,_7 , ...

Such recursions must be removed by manual manipulation of the

syntax: this can become very complex if the same tree structure

is required for purposes of the transforms. So far, this has

not been a serious problem.

A similar inability of the system that can be easily avoided by manual manipulation is when common left factors arise within

a rule: for example both the following fragments of right-hand

sides might have unintended effects for A:

... /"A,B_7, A,C, ...

... ( ... ! A,B ! ... ! A,C, ! ... ) ...

in both of these, if an A is acceptable, it is the first A that is used for the accepting (in the first case demanding that it be followed by B,A,C, in the second demanding that it be followed by B and not C). In these examples, the sjmtax

should be changed to, respectively: ,89

... A, /~B,A_7, C ...

... ( ... ! A, (B!C) ! ... ! ... ) ...

If B and C could start with any shared lexeme types, then in

both of these modified forms B would be preferred to C: if

this is not what is desired then further "left factorisation"

must be carried out.

To summarise all the above, the generated parser is a no-backtack

LR parser, requiring a LL grammar expressed in an extended

notation. This is one of the several forms of parser that have

been confused together as "LC".

The notation for the syntax rules also allows gPUT and gGET

escapes. (1PUT escapes are especially useful for invoking a"

transform of the syntactic class the rule defines, or of one

of its components: for clarity, this merits an extension to

the notation. Examples that use escapes will be commented on

their effect.

The first rule in a syntax division is the "goal symbol" of the

generated parser: that is, the parser expects the input to be

acceptable to that rule. The first rule therefore commonly ends

with lexic meta-variable defined as equivalent to EOF.

The following syntax division uses the example lexic division given at the end of the last section. It defines a meta-variable

EXPS which is to be matched by an ALG0L-60-like series of

arithmetic or boolean expressions, separated by commas. so

BPUT *FCCALL4T£0UTTEXTZMA8EB 6PUT T*CALLT*BOL8« E?UT *JCCALL**CHEKE GPUT **CALL**Q_E*RE epirr **CALL HOIK? em PROCEDURE TREE(B); BOOLEAN N; BEGIN cmjt. oruT e

PROGRAM* 8PUT INXT^IOF INlT-XCt 9 EX?S, BPUT T.DUT; EN.CLOSE; EX.CLOSE; e EF : < EXP A > t

EXP» < EXI J " JF" * " THEN* /EXIAT»"ELSE" /EXP^ > : EXX« IB* C»ERV«,B5A«3 B53 34 r fXMP«>B4Ax3 * 84* < 13 * "OR" > J B3= < B2 * "AND" > : 82= CNDVa, Bi : BL» < •TRUE" I "F|LSE* ! "INPUT",TEXT » *FLLATIONORA4 ) RELATJONorA4- * A4» CREL0P»A4AW : RELOP* (X'J^'I'C'S'LE'I'NETRE") J < E("+"!'-")3#A5 T C4'L'"') > ! A3= < A2 * <•*•!•/"> > : A2= < AL * > : AL- C UNSJQNEBMO J " INPUT TEXT J >

Notice the decorations (starting with a caret sign) on meta-variable identifiers repeated in a rule: this is needed by the parser gene- rator, but is not otherwise used. The first seven (^PUT escapes incorporate various library routines into the fabricated parser.

The "goal" meta-variable PROGRAM is introduced for three reasons: to ensure that the input is terminated by end-of-file; to include two escapes to initialise and finalise the fabricated parser

(rather than to initialise and finalise i-t once per EXP) ; and to keep EXPS itself easy to read.

As presented, the syntax does not include any ALGOL-like variables, but does include an expression-evaluation time request for a value to be input in answer to a prompt text.

The above syntax division is the main source of examples for the following section on the transforms. ,91

4.4 THE TRANSFORM DIVISION

"Production systems" are a common technique in Artificial

intelligence for describing manipulations on complex-structured

datasets: a set of "productions" is defined, each including a

"pattern", the data is searched for an occurrence of one of the

patterns, and then the data around the occurrence is modified

in a manner specified by the production including the pattern.

After a production has been applied, the search for matches

to patterns resumes, the process being repeated until no pattern

is applicable (if this ever happens). In production systems

there is often a rule for preferring one match over another -

usually either there is a priority of the productions, matches

of the highest priority production (or productions) being

attempted first, then working down through the priority levels

- or alternatively the match nearest some point in the data (l)is preferred. Such rules determining the order of application of productions are much weaker than the ordering rules for execution

of a conventional programming language: as a consequence, the programmer does not have to consider the control flow to any great degree (if at all). paradoxically, it seems that such

systems have great power, especially if the productions them-

selves are in the manipula]>UL- set of data. A recent such

systems is 0PS5 /F0R8l7.

For applications such as compilers, a major problem in the phases that operate on the syntax tree is the complexity of

the cases being dealt with, and of their inter-relationships.

Relatively minor changes in the syntax can imply changes in the

tree structure that in turn imply major changes in the case (1) E.g. in linear data, the start: in tree data, the root. ,92

analy sis. What is needed is for* the compiler'—writer to specify the cases that interest him, and to have a program carry out the case analysis for him, and in particular to make the analysis in the light of the syntax. For each case that interests him, the compiler writer must describe the case, and what to do when it is found - what to generate, and what other classes of case searching are to be applied, very much like a restricted form of production system.

This has been attempted many times before, but usually for the transformation of a source program into a "better" source program ("better" meaning either "more optimal" or "more readable"

- objectives which often each oppose the other). MKH81/ describes a typical such system: to make the transform exemplified by changing: if a=b then goto lp; if not (a=bF then begin .

c : = d; c : = d; to e := f; e := f; lp: end; lp: they suggest a"pattern" of form:

IF-STATEMENT(BOOL-EX,STATEMENT1)£LAST-STATEMENT(STATEMENT1, JUMP(IDI) J>UNLABELLED-STATEMENTS(STATEMENT2)> LABEL(ID2)£ IDENT(IDI,ID2) to be modified to the form:

IF-STATEMENT(NOT-BOOLEX,STATEMENT2) >> LABEL(IDI)£ DEC-LABEL-REF-1(IDI)

The Irvine program transformation system /sTA767, /STA76'7 ,

/KIB77/ requires more information on the contexts in which to search (not having a detailed syntactic knowledge) but represents the patterns and the replacement very legibly so long as knowledge ,93

from more global contexts is not required: for example:

Name: DISTIF

Pattern: W(if X then Y else Z) => if X then WY else WZ Directions: (HERE LOOK-AT«OP>,W,Y) ) (HERE LOOK-AT«OP>,W,Z) ) (UP LOOK-AT(HIOP,HIOPNS)) .

All prior systems seem to either demand an illegible notation for the pattern and change to be made, or (in few cases) require other unclear notation for other purposes.

The system described here is also transform-based, but is specific to a detailed pre-described syntax, and was designed with maximal legibility as a high priority. To some extent, the legibility requirement has made this part of the project a study of what can be acheived in this context without obscure notation. Notions of preference between cases matching patterns and of control-flow within the non-pattern parts of a case enable code-generators of high efficiency to be fabricated.

Thus, by limiting the application, the advantages of production systems are to some extent obtained without their inefficiency.

4.4.1 The notation of transforms

Each transform has a name , and applies to one syntactic class or lexic class only. For example, a transform named LOAD applying to the syntactic class of meta-variable RELATI0NprA4 might be fabricated into a code-generator fragment that generates code that loads the value of an arithmetic expression (of class A4) or of a relational expression. ,94

A syntactic or lexic class may have several transforms with different names applying to it: for example RELATI0NorA4. might also have a transform named JUMPIF that fabricates code to jump if the relation is true, or to signal a compile-time error message if an A4 is presented to be code-generated.

A transform-name may be used for several transforms applying to different syntactic or lexic classes: for example LOAD of A4, and LOAD of EXP.

For each transform, there will be a number of cases of instance of its meta-variable. For example, for RELATI0NorA4 in the syntax given in the last section, there are the cases:

A4 and A4 RELOP A4~R or if the cases of RELOP are treated as implying cases of

RELATI0NorA4, there are the caees:

A4

A4 = A4~R A4

A case of a transform describes the effect of the transform on one particular case of its syntactic or lexic class. It identifies the case of the syntax by a pattern: its effect is a series of actions . The form of a case (of a transform) is:

the name of the transform; opening parenthesis; the meta-variable of the syntactic or lexic class an exclamation mark; the pattern; closing parenthesis; the sign =>; and the sequence of actions. ,95

Thus the appearance of a case is as follows:

TNAME(METAVAR ! pattern ) actions

In addition to the reasons given in 3.2, the meta-variable is needed to avoid ambiguities. (2) Also, as explained in 3;2, in the pattern the text to be taken literally is quoted, whereas names of syntactic classes, standing as identifiers of instances of the class, are not quoted (e.g., in A2M*"A2, "*" stands for the asterisk character itself, whereas A2 stands for any instance of the syntactic class A2: so the pattern is matched by each of the inputs:

1*1

3~2 * 3~2

(1+2) * (1+2) and so on). The unquoted parts of a'pattern can only be meta- variables (either syntactic or lexic): the quoted parts must each represent one or more complete lexemes (not lexic classes) of the defined language. To make the distinction between the two levels of representation, the quoted parts are said to be at ana-level, the unquoted parts at meta-level.

The actions of a transform are a sequence of individual items, each called an action. When a pattern is matched, the corresponding •"- sequence of actions are applied in their order of appearance.

There are three kinds of actions.

(2)- E.g., in the example syntax, A2"*"A2 could be an A3 or an A4. ,96

The first kind of action is quoted text: its effect is that the fabricated code generator will put the text in its intermediate- language output stream.

The second of action is an escape, in the form of an "at" sign, A text, and another "at" sign: the enclosed text is fabricated at this point in the code generator. So far, common uses of these escapes have been to output "newline" to the intermediate language stream, to give syntax error messages if the pattern matches an error of context-sensitive syntax, (3) and to invoke handwritten compiler routines for name-identification.

The third kind of action is a transform-call. This has the form:

transform name; open parenthesis; meta-variable; exclamation mark; pattern; close parenthesis.

This is the same form as that appearing on the left hand side of the "=>" of a case of a transform., but has a very different meaning: the specified pattern is to be constructed (rather than detected) by the code generator, and then the specified transform applied to it. Meta-variables in the constructed pattern must appear with the same decoration in the matched pattern: in the constructed tree the meta-variable is replaced by whatever it matched.

Consider the following case of a transform, based on the example syntax, and oriented to an IL that has facilities only for testing relations against zero:

(3) I.e. an error not detectable by the fabricated context-free parser. ,97

TEST(RELATT0NorA4 ! A4 RELOP A3 ) =

TEST(RELATIONorA4 ! A4 "-" A3 RELOP "0" )

This would cause the fabrication of a code-generator fragment that would generate code to test a relation of which the second operand is not a sum. If it found a match with a source sequence

INPUT "LEVEL" GET the code-generator would create the following parse-like tree:

(RELATI0NorA4)

n\ A4 A3 0 INPUT "LEVEL" 5

(where the unbracketed A4 A3 and RELOP represent the same subtrees matched by those names in the pattern), and then would apply the transform TEST(RELATI0NorA4! ... ) to the tree. Thus if all the possible cases of the same transform in which the relation is

"to zero" are covered, then they will in turn be invoked. For example, if there is also the case:

l i , W TEST(RELATI0NorA4 4A t '^() ) =>

LOAD(A4 ! A4) "IFGEZERO" then the above example will have the effect LOAD(A4 ! "INPUT""LEVEL""-5")

(4) on the output-stream, then will output the text

IFGEZERO

Note the form LOAD(A4 ! A4) in the last example: the first A4 is part of the identifier of the transform: the second A4 specifies a tree to be created. In this instance the tree-creation is very

(4) Note the doubling of the embedded quote-marks. ,98

simple: the instance matching the A4 in the pattern is to be used. Forms like this are so common (especially on the right hand side of =» that an abbreviation has been provided: the exclamation mark and the prior meta-variable may be omitted.

The above example would then be written as LOAD(A4). Similarly,

L0AD(A4!A4~X), using the decorated form in the pattern, would abbreviate to L0AD(A4~X).

4.4.2 Overlapping cases

If the TEST(RELATI0NorA4! ... ) transform had cases to cover instances of each relation-to-zero, there remain two more cases of the RELATI0NorA4 syntax that are not covered: they are the forms

A4 (i.e., not a relation) and

A4 RELOP A4-which-is-not-an-A3

The second of these could not be covered by changing the A4

RELOP A3 case to have matching pattern A4"X RELOP A4"Y and created pattern

A4~X "-" A4~Y RELOP "0" because the syntax demands that after a dyadic minus there be an instance of A3, which an A4 is not. The A4~Y would need to be made into an instance of A3 by enclosing it in brackets, thus:

TEST(RELATI0NorA4 ! A4"X RELOP A4~Y ) =>

TEST(RELATI0NorA4 ! A4"X ("A4~Y")" RELOP "0")

(5). In section 4.4.7 we will see that this latter form will create a very "leggy" tree, needing a relatively large amount of storage (though only transiently), and taking a long time to create and process. If these considerations are important, then

(5) Note the difficulties in laying out the "created pattern" legibly. ,99

it might be desired to have the transforms for both the pattern

A4 RELOP A3 and A4"X RELOP A4~Y. In practice it seems to

often have overlapping patterns of cases of the same transform:

the rule used to select between them that has been implemented

is that the most specific pattern matched selects the case

to be applied. Thus, in this example, the two patterns

respectively apply to syntactic instances of

A4 RELOP A3 (efficiently for most cases) or

A4 RELOP A4-which-is-not-an—A3

- which is exactly what we wanted.

This notion of selecting the more specific of overlapping patterns

is more important in fabricating code generators that produce

optimised code, rather than (as above) fabricating a more efficient

code generator that produces the same code, or for detecting cases

of syntax used in illegal contexts. The following two paragraphs provide respective examples of these.

Consider a transform ARITH(A4! ... ) that fabricates generators

that match instances of A4 in arithmetic contexts. The A4 syntax

is almost covered by the following five non-overlapping cases:

AR1TH(A4IA3) -> AR1TH(A3)J ARITH(A4L"+'A3) »> ARJTH(A3>: (Page 115 AKITHUU! "-'A3> ARXTHCA3) ".MINUS " : exolains the ARlTH(A4!AVi'A3) «> ARITH(A+) AUTH(A3> ".AM> 44 * : 21?\ ARLTH(A4LFTV-*A3) » WTHW) AJHTH(A3) ".SUET 44 « ? $$ S1^ns'}

However, there are a number of optimisations that might be provided that detect particular cases of the A4 syntax, such v

as the following two cases.

ARITH(A4 ! A4"+0" ) => ARITH(A4)

ARITH(A4 ! M-MA3"X "+" A3"Y ) => ARITH(A4 ! A3"Y A3"X )

As in the first of these cases, many constructs of many languages ,100

can be simplified where constants and other simple forms are

present. Cases such as the second are very valuable if the

language permits them: if either of the A3's had a side-effect o

on the value of the other, the transform would y

expression with a different value. This case would be correct

if there were a language rule stating that the order of evaluation

of subexpressions is undefined.

There are also cases of the A4 syntax which are not covered by

the five patterns of ARITH(A4 ! ... ) given above. In ALG0L-60

(for example) it is forbidden to have two operators adjacent,

so that constructs of form A4"+-"A3 are not allowed. With the

'repeat' construct of the syntax notation, it is very difficult

to disallow that, bitt allow "-"A3VA3: the easy way out is to

allow the illegal syntax at parsing time, but detect it via a

checking transform such as the following, which catches all

cases of A4 not matched by more specific cases:

ARITH(A4 ! A4 ) =» @S_ERR("Illegal form of A4")g

The only difficulties encountered so far with such checking

transforms are that the error messages tend to be very bland,

can not be produced at the point of the error, (6) and require

escapes to call the error-routine. All these disadvantages

could be removed at some bother for fabricating production

compilers: for example a case such as:

ARITH(A4 ! A4 "+-" A3 ) => ...

could produce a very explicit message. For the present, however,

it suffices that context-sensitive checks are possible, albeit

untidily.

(6) If checking transform is invoked at the end of the A4 syntax via a handwritten escape, the error message can be produced at the end of the innermost enclosing A4 - still rather late; 4.4.3 The "more specific than" relation

The notion of "more specific than" is only a partial order

relation among patterns of a meta-syntactic class. It is possible to devise patterns for which neither is more specific

than the other. As an example, consider the syntax rules

X = A, B, C, D

B = Bl, /B2?, B3

C = Cl, /C2?, C3 and a transform of X which includes the patterns

A B C D —(1)

A Bl B3 C D —(2)

A B Cl C3 D —(3)

Certainly, (2) and (3) are both more specific than (1), but if an instance of the pattern A Bl B3 Cl C3 D were to be matched, it matches both (2) and (3). In such a case the present system makes a random choice, determined at fabrication time. The important requirement is that if a fourth case with pattern

A Bl B3 Cl C3 D were provided to the fabricator, the consequent code generator must prefer that pattern (when it is matched) to both (2) and (3).

The next section introduces many more opportunities for partial ordering of cases, rather than total ordering. They can all be resolved in the fabricator in ways that are straightforeward, though very complex and tedious to program together in the fabricator. ,102

4.4.4 Commonalities in matching patterns.

Many optimisation cases can be detected if there is a means

in the matching-pattern of a transform for requiring that

two (or more) sub-patterns be identical. In the present

system, this is expressed by including a metavariable twice

(or more times) with the same decoration: thus

ARITH(A4 ! A3"-"A3 ) =>ARITH(A1 ! "0" )

for the case of the subtraction of an expression from itself.

This applies even if the repeated metavariable is lexic, although

this in general seems less useful.

There are cases when such optimisations can go wrong. The

following optimisation would prevent "division by zero" errors

if the instance of the A2 happened to have zero value at some

execution time:

ARITH(A3 ! A2"/"A2 ) => ARITH(A1 ! "1" )

In most languages, the removal of errors (if considered at all)

is acceptable. The introduction of errors is not: for example,

in the TEST example of section 4.4.1 that converts A4 RELOP A3

into A4"-"A3 RELOP "0" the subtraction could cause overflow

(if the run-time instance of A4 were very positive and the

instance of A3 very negative) whereas the original would not.

That optimisations can be described nicely is no release from

the separate problem that optimisations must always be used with care. ,103

4.4.5 The influence of syntax on the allowable patterns.

Clearly a pattern must conform to the syntax of the syntactic

class of its transform, no matter whether the pattern be created

or matching. The influence of the syntax on the allowable patterns, and (contrariwise) of the desirable patterns on the

syntax is therefore very important. The first design of the

system excluded left-recursion totally: but even before that version of the software was working, experiments in writing transforms showed that limited left recursion was essential.

This allows patterns such as:

A4 ! A4"+MA3"-MA3 but there is no clear way of directly handling:

A4 ! A3"+"A4-M-"A3 because the A4 would be a syntax error at the time when the fabricator parses the pattern: nor is there any obvious re- writing of the syntax to allow such a form. Thus it would appear that the bast way to handle this pattern would be to extend the system: but there is not yet any experience to guide the design of this extension.

However, some requirements akin to the above can almost be met.

For example, for one transform it would have been convenient tx> have had:

A4 ! A4 "+ A3 be a more specific form of A4 ! A4 "+" A4, but neither more •> nor less specific than A4 ! A4 "+" A3. One possibility would have been to change the syntax to:

A4 = ^A3X * ("+"!"-")>

A3X = /" (*'+"!"-") 7, A3 ,104

in which case A4 ! A4 M+ A3 would have been more specific

than A4 ! A4 "+" A3X, but incomparable to A4 ! A4 "+" A3. In

the event, there is a penalty on too many too-small syntax productions (see 4.4.7), so this change was not made.

Recurrent problems such as the above stressed the point that

the syntax is not merely a test of legality, of what strings

of characters are acceptable: it is intimately related with the desired patterns in the transforms. If we: want a syntactic class S of form a list of As separated by Bs, there cannot be a pattern S ! S B A and a pattern S ! A B S. This kind of conflict was expected, but proved to be much more common than

anticipated.

4.4.6 Calling the transforms.

At present, the transforms are invoked initially by calls from theusyntax rules, given as escapes into SIMULA. It would be desirable to give them as new atoms in the syntax (but which have no direct effect on the parsing): say, in a syntax rule a form akin to a transform call on the right hand side of a

in the transform division, in which all metavariables in the created pattern would have already have been defined in the rule, or would be. ike. metavariable defined by the rule. In most cases the created pattern would be a simple metavariable: for

example, in the syntax given at the end of 4.3, it would be better to have: PROGRAM = EXPS, OUT(PROGRAM), EF

rather than:

PROGRAM = EXPS, "."(JpUT T_0UTjgf EF The syntax analysis process is invoked for the first-given

syntax rule in the syntax division. This in turn invokes

the analyses for other syntax classes. There is no such clear

stereotype for the calling pattern - or patterns - of the

transforms: sometimes the requirement is to complete the parse,

then call one single master transform that in turn calls all

the other transforms; sometimes it is to have many rules call

transforms of about equal complexity; either way there are

likely to be needs for additional checking transforms.

To summarise the above: the organisation of the calls from the

syntax determines whether the fabricated primary code generator

is in the same pass as the syntax, is one or more separate

passes, or is a hybrid. It is not desirable to restrict this

to a single stereotype. The same is true of calls between

transforms, so long as recursive and mutually-recursive transforms

are guaranteed to terminate: this last is discussed in section

4.4.10.

4.4.7 Editing the tree.

This and the next two sections are concerned with facilities that

initial experience suggest are perhaps sufficiently useful to merit extensions to the current software.

( v

The present software fabricates a syntax analyser that parses to build a primary syntax tree for a given compilation: the fabrica-

tions of the transforms may create secondary trees which include pointers into the primary tree. This suffices for most purposes

that demand trees not present as subtrees of the primary tree, ,106

but the following paragraphs discuss other purposes for which

it is desirable to edit the primary tree.

In a multi-pass compiler, it may be desirable to have early passes canonicalise the tree in ways that reduce the number

of cases to be dealt with by later passes. A major instance

of this is that the early pass may discover an error that

implies that later passes ought not be tried: the erroneous

subtree ought to be edited into a single leaf that the later passes recognise as meaning "error" (that is, they have simple

"error" cases).

When there are a sequence of productions of forms such as:

Xi = (Xj ! ... )

Xj = < Xk * ...>

Xk = /"..._7, XI

in which case one case of a metasyntactic class is Xi t lust the next metasyntactic class in the Xj 1

sequence, the trees can become very "leggy": Xk

that is, there would be many nodes containing no

information other than a pointer to the next node, as illustrated.

This leggyness cannot in general be avoided by fabricating a

syntax analyser that does not generate leggy nodes, because it would have the consequence that transform-cases with matching- patterns such as Xj'.Xk would never be invoked. The leggy nodes

if they are not needed in the above sense, are a waste of space and of the time needed to process tham. At present, the fabri- cation of the syntax analyser does not depend in any way on the

transforms: to inspect the transforms to determine when one of them needs a particular leggyness would therefore be a major

complication to the software. It would be much simpler

to demand that the transform-writer programs edits that

remove inessential leggyness - say, as transforms called at

the end of the related syntax rule. He could also program

the removal of leggynesses needed in early passes, but inessential

in later passes - such cases could only be found by the fabricator after very complex analyses.

Finally, some optimisations might best be specified as tree-edits, so that the edited tree causes other transforms to produce the improved output.

It would be quite easy to introduce an editing facility into the present software. The difficult part would be the fabricating for the creator of replacement trees: but this code already exists to fabricate the creator of the secondary trees.

In using such editing there would be a major methodological danger, akin to the problem of aliasing between parameters of subroutine calls. If an edit causes a subtree to be apparently duplicated (by creating multiple pointers to it) then an edit to one of the apparent duplications will cause the same edit to all other duplications. For example if

A4 ! A1"X "**2 A1~Y "**2" is edited by the "difference of squares" rule into

A4 ! "(" A1"X "+" A1~Y ") * (" A1~X "-" A1"Y ")" then a subsequent edit to sums could affect the difference, and vice versa. It would be a matter of experience to determine if ,108

this is a real danger, and if so what would be the best way to avoid it.

4.4.8 Cases not expressible as simple patterns.

In section 4.4.5 it was observed that it is not possible to use a transform with a matching pattern

A4 ! A3 " + " A4 "-" A3 that either edits or creates A4!A4 because the pattern would cause a syntactic error. Thus it is difficult to "cancel out" the two A3s as an optimisation. Also, if such a pattern were allowed, matching it can be very expensive: for example in an

A4 comprising 10 instances of A3 (and separating plusses and minuses) it would have to be tried 45 different ways

(=9+8+7+6+5+4 +3+2+1). Orthodox compilers often avoid this quadratic explosion of possible matches by revising the expression into some canonical order: say, constants first, then terms containing variables in order of alphabetically- first variable-name, thus "collecting similar terms". A transform which exchanges two adjacent instances of A3 if they are not "in alphabetic order" would need the pattern matched and a second condition, the second condition not being expressible as a pattern to match. The "second condition" could be any one of such a wide variety of tests' that in general there would have to be the possibility that it is an escape. Often it would address issues that can only be answered by looking at the table(s) of declared names in an ALGOL-like language, and in the present system all other issues of such tables must be bandied by escapes. ,109

The major problems with such escapes are that it is not clear how to pass metavariables of a matching pattern following an escape, and that if changes are made to the syntax rules which govern the parse trees that the pattern matches then

it is quite likely that the escape must be re-written (the system cannot just make the appropriate change, as it would in the matching pattern and any created patterns).

This situation seems unsatisfactory, but so far no un-avoidable requirements have been encountered for such facilities.

4.4.9 Syntactic and lexic rules applicable only to patterns.

Many minor problems have arisen so far for which the simplest solutions would have been to have allowed some construct in patterns that would be illegal as input to the fabricated syntax analyser.

A simple case of this is when a checking transform detects a context-sensitive error, and wishes to create (or edit) a tree consisting of just a leaf labelled "ERROR" - as in 4.4.7. The change would be relatively simple: some, fleWmark should be allowed at the start of the list given in a lexic or syntactic option or alternative of a lexic or syntactic choice .that " indicates to the lexic or syntactic fabricator not to fabricate code to test for the (start of the) list or to accept the list.

In addition if a reserved word is used only in such lists, it should not be a reserved word of the fabricated lexic analyser

(but it shou Id be interpreted as reserved by the transform-fabricator's analyser of the lexic structure of patterns). /Finally, if a ,110

lexic or syntactic metavariable is used in only such lists, then the fabricated analyser need contain no code for analysis of that metalexic or metasyntactic class, and for a metasyntactic class all lists in its right hand side are transform-pattern-only and so may indicate further transform-pattern-only reserved words and metavariables.

Detecting the transform-pattern-only reserved words and lexic metavariables at fabrication time is easy: but because of the possibility of circular dependencies between syntax rules, it is not in general easy to detect transform-pattern-only syntactic metavariables. Therefore it would be desirable to require the transform-writer to mark transform-pattern-only syntax rules

(and, if need be, to tell him he shouldn't have done so).

Apart from Stressing the tests and analysis code in the fabricated analysers, there should be no other effect on the fabrication. In particular, parse-tree nodes must be just as they would be if the tests and analysis code were present - otherwise the corresponding transform would not be supported.

So far, studies which considered simple uses of this facility suggest it to be very useful - it can be used as a nice technique for editing out leggyness, or for editing expression- trees to mark every node as either boolean or arithmetic, leading to subsequent improved efficiency .(7) .

(7) In Algol-like expressions, if an open bracket is met in a context where a boolean expression is expected, it could be the start of a form such as: (arithmetic expression) = ... or of a form such as:(arithmetic expression = arithmetic expression) Cont. on the next page. On the other hand, studies of more complex uses break down in obscure ways that suggest conflicts with other parts of the current design of the software, e.g. suggesting requirements for mutual left-recursion of syntactic categories.

4.4.10 Theories of transforms.

The concept of programming by transforms, with very weak concepts of flow-of-control (apart from recursion) is relatively new. As a consequence, the theory of recursive programs needs extending to this new style. Several authors have made first steps in this direction, notably Arsac _/ARS79/, but none of them are properly applicable to the present software.

The most important property that a system of transforms ought to have is termination. In the absence of looping constructs in the right-hand sides of the transforms, a transform set can fail to terminate only by infinitely deep recursion: for example

XS = <"X" *">

ETERNAL(XS) ^ ETERNAL(XS!XS "X") would never cease recursing if infinite memory were available.

Just as the termination problems for ordinary programs' and' for general recursive functions are undecidable, so is the termina- tion problem in general for transform sets. However, transform sets applied to syntax trees often have a property that guarantee termination. If every created pattern in every case of every

(7) Cont. - with the result that the parser cannot know whether the enclosed expression is boolean or arithmetic, and so must parse according to the worst case of syntactic form _ W0RST_CASE = ARITHEXP, /~~RELATI0NAL_0PERAT0R, ARITHEXP_/ with some subsequent technique of patching up the ambiguity and doing the appropriate checks. ,112

transform of a set is in some sense "smaller" than the matching

pattern of the case, then at successively nested levels of

recursive calling of transforms, the trees passeed will be

smaller and smaller. If the measure of size of a tree is an

integer - say, the number of nodes in the tree - and cannot be

negative, then the sizes at nested calls will be a sequence

of decreasing positive integers. Thus, if the size of a tree

at a call is n, then in turn it can invoke not more than n

recursive levels of nested call: thus it must terminate within

a bounded depth of nested calls.

In most of the sets of transforms written so far, most of the

cases have this "decreasing monotonic" property (size being

measured in tree nodes), Most of the remaining cases create

patterns the same size as the matching pattern, and a very few

cases create larger patterns than the matching pattern. For

the example syntax, the following three cases of transforms

respectively illustrate the three possibilities.

ARITH(A3!A3"*"A2) => ARITH(A3) £RITH(A2) ".MULT"

ARITH(A4!A3) => ARITH(A3)

ARITH(A4!A3"+"A3"*"A2) =?ARITH(A3!A3"*("A2"+l)")

If it can be shown that "same-size" cases only occur in call

cycles that include at least one "decreasing" case, then the

termination is still guaranteed. If "increasing" cases are

inspected for all cycles in which they can occur, and each

cycle shown to be decreasing overall, then termination is again still guaranteed.

Extensions of the normal methods for determining cycles in

a graph can thus be used to study the net worst-case increases or decreases in all possible cycles of calling: if all cycles are guaranteed to be decreasing then the set of transforms terminates. For a pre-chosen and programmable measure of "size" of patterns, it would therefore be possible to write a software tool that shows many sets of transforms terminate. Not so easy would be a tool that uses syntactic information to classify cases within a transform for transform sets that would be undecidable with the simpler tool. ,114

4.5 Example, sets of transforms.

The following set of transforms take a PROGRAM tree, analysed

by the syntax rules given at the end of 4.3, and produce code

for a stack machine to evaluate the expressions of the program

and print their values.

OOTs>,'SEARCH TEVAL tt •MAKESTAK tt GO.* > » IMl(T5TAK T OUT(EXPS) 1 •.STOP ** END GO tt : OUT(EXPS«EXPS"f"EXp)»> OUT(EXPS) OUT(EXP) ".PRINT 44 " ; OUT(EXPSIEXP) a> OUTiEXP) ".PRINT " J

A OUT(EXP!* IF'EXP"THEN"EXI"ELSE"EXP 1> "IF * OUT(EXP)"THEN"OUT(EXI>"EL8E"OUT(EXPAi > J ARITH (EXP I " IF' EXP * THEN "EX I "ELSE"EXP*1) -> " IF"BOOL (EXP)*THEN" ARITH(EXI) "ELSE• ARI.TH( EXP*1 > BOOL(EXP!•IF"EXP'THEN"EXI"ELSE•EXP*1) => " IF" BOOL < EXP>" THEN * BOOL (EXI) "ELSE" BOQL (EXPA1J J OUT(EXPJEXI) =!> OUT(EXI) : ARITH(EXPJEXI) => ARITH(EXI) BOOL BOOL(EXI)

OUT(EXI!A4) ARITH(A4> t v ' . • OUT(EXI) =s BOOL(EXI) : ARITH(EXI) "**" tOUTTEXTIMAGE(BOLD( •EXI which is nob an A4- in ARITH COntAxfiVjG ARITH(EXI!A4) => ARITHIA4) ) BOOL(EXI!BO)- => B00L(B3)• BOOL(EXI !B3"EQVB5~l)s> BOOL(BO) B00L(B5~1) ".EQV tt BGOL(B5!B4) => B00L(B4>: BOOL(B5IB4"IMP"B4~l>=> B00L(B4) B00L(B4~1) ".IMP tt BOOL(B4IB3) »> B00L(B3)J BOOL(B4!B4 * OR * B3)a> B00L(B4> B00L(B3) ".OR ** " ! B00L(B3!B2) *> B00L(B2)* BOOL(B3IB3"AND"B2) «> BOOL(,B3) B00L(B2) ".AND tt ' BOOL(B2!B1) *> BOOL(Bl): B00L(B2!"NOT'Bl) =>> BOOL(Bl) ".NOT " J B00L(B1!"TRUE") *> ".TRUE tt "." BOOL(B1I"FALSE") ".FALSE tt "J B00L(B1!"INPUT"TEXT) => ".INBOOL " ZXB(TEXT) " tt "i B00L(B1!RELATI0NorA4>*> B00L(RELATI0NorA4.) J

BOOL(RELATIONofA4!A4Al RELOP. A4a2> ARITH< A4a1) ARITH (A4-a2) RCRELOP) li

s> R(RELOPJ*<") => "LT* R(RELOP! ">" ) •GT' FT(RELOPI "«•" ) "EOT R(RELOPI"LE"> •LE' R(RELOPI"NE" ) •NE' R(RELOPJ"GE" ) O -GE- ARITH(A4!A3 > ARITH(A3)J ARITH(A4!"+"A3) «> ARITH(A3)'J ARITH(A4!"-"A3) •> ARITH(A3) ".MINUS tt " J ARITH(A4!A4"+"A3) *> ARITH(A4-> ARITHIA3) ".ADD tt " t ARITH(A4!A4*-"A3) a> ARITHIA4) ARITH(A3) " .SU8T tt •

ARITH(A3!A2> -> ARITH(A2>." AftITH(A3!A3"*'AZ) a> ARITH (A3) A8ITH('A2IAZ> * .MULT " , ARITH(A3'A3 /*A2) => ARITH(A3) ARITH(A2) *.DIV tt * t ftRlTH(A2iAl) -> ARITHtftllAJ): ARITH(A2i A2*' 'Al) s> ARITHCA2) ARITH(Al) '.POWER tt * •

ARITH(AliUNSIGNEDNO)=> ".EVAL • ZXB(UNSIGNEDNO) " 44 " ' ARITH(AlJ"INPUT* TEXT) s> ".IHARITH • ZXB(TEXT) « tt * I AWTHCA1»,I(I,EXP">,>«> ARITHCEXP): ,115

The master call from the syntax is equivalent to OUT(PROGRAM).

The transforms OUT(...!. ) are used until it is known that the expression or subexpression is arithmetic or boolean, after which time ARITH or BOOL is used as appropriate. Note the error message output if ARITH of an EXI finds the EXI is not an A4. A similar error message should have been provided for

BOOL(RELATI0NorA4!A4): but seeing that not all cases of

B00L(RELATI0NorA4! ... ) have been given, the fabricator genera- tes code that gives error messages for remaining cases. The problem with these fabricator-provided messages is that it is very hard to engineer them to be at once informative, legible, and not excessively long: that seems to require human intuition.

The transforms ZXB(UNSIGNEDNO) and ZXB(TEXT) which output contents of the buffers associated with input lexemes of the classes UNSIGNEDNO and TEXT are not shown: there is at present no facility to output or otherwise manipulate these buffers from fabricated transform-cases, and so have to be handled by escapes or (as here) by hand-written transforms in SIMULA.

At a more trivial level, the pairs of dollar-signs should be noted: these indicate end-of-line output, and need to be edited accordingly in any output of the fabricated compiler.; This should be replaced by a more explicit facility of the fabricator.

The five lines output by OUT(PROGRAM) itself will be discussed below. ,116

The following is a listing of a run of the fabricated "compiler"

.EX TEVAL.SIM/CREF(-W) SIMULA? TEVAL

ERRORS DETECTED: 4 TYPE U MESSAGES LINK: Loadin* CLNKXCr TEVAL Execution! Ust* Ue?TTY INPUT FILE?TEVTST.TEV OUTPUT FILE7TEVTST, MAC

1+2+3, 27*4-240

**** Expacted: a EF *** fourtd: **** 4 5arbasie collectlon(c) in 160 End of SIMULA XroSra* execution. CPU time: 0.78 Elaeeed bin«: 4.26

The input file contained the following text.

1+2+3,

27*4-200

The syntax of PROGRAM demands one full stop then end-of-file:

the two extra input full stops cause the listing

of an error message relating to the second full stop. After

editing the $$ pairs to end-of-lines, the output of the above

run is the following. .TYPE TEVTST.MAC SEARCH TEVAL ( ; .HAKESTAK GO::.INITSTAK .EVAL 1 N .EVAL 2 .ADD . .EVAL 3 ^ .ADD .PRINT . , .EVAL 27 ° .EVAL 4 .Mult .EVAL 200 .surf .PRINT .STOP END QO Each of the formats of the output lines was programmed as a

assembler macro, so the above output could be macro-assembled

and run. The first three lines and the last two were needed

to initialise and finalise the macro-assembly process and the

program run, and to reserve space for a stack. The following

is a listing of the program run.

.EX TEVTST.MAC/LIST MACRO: .MAIN LINK: Loading CLNKXCT TEVTST Executionj 006 -092

EXIT

The preceeding TEVTST example illustrates that the method still

requires a substantial set of transforms to achieve the transla-

tion of a simple and straightforvyard syntax: for non-optimising

translation the transforms in the examples to date are 3 to 4

times the length of the syntax. (This does not include name

identification and other parts of translation that cannot be written as transforms.)

As an example that minor changes in the transforms can result

in significant improvement of the output, the following two variants of expression-evaluation (both even simpler than TEVTST)

are presented. DEM04 is the naive set of transforms for the

syntax.

.RUN 961

1i s tP iI«?TTY LIST TREE7N LIST ORDINALS7N INPUT FILE7DEM04

LEXIS ID = < LETTER * > : NUM * < DIGIT * > :

EF =« EOF . ,118

SYNTAX SGET ••HANDWRITTENB

PROGRAM = BPUT INIT_IO; INIT-XC; 8 SUM, SPUT A_SUH.T_L;S EF SPUT EN.CLOSE; EX.CLOSE; a :

SUM = < CADD0P"M0N03.PROD • ADDOP*DYAD > I PROD = < PRIM * ("*•!"/") > : PRIM » ( ID ! NUM ! M(",SUM,">" ) I

ADDOP A ( -+» I )

ACTIONS L(PROD! ID ) A> • oi; 8 zxb(ID) L(PROD! NUM ) A> 8 oi; 8 zxb(NUM)

L(PROD! "(" SUM ")" ) a> L(SUM)

L(SUM!PROD) a> L(PROD) : L(SUM!ADDOP PROD) A> L(SUM!PROD) MONO(ADDOP) I L(SUM!SUM ADDOP PROD) A> L(SUM) L(PROD) DYAD(ADDOP) L(SUM!SUM ADDOPA1 ADDOPA2 PROD) A> L(SUM) L(SUM! ADDOP*2 PROD ) DYAD(ADDOP*1) :

MONO(ADDOP! ) >> 88 : MONO(ADDOP! ) '> I ol! I " MINUS "

DYAD(ADDOP! ) >> 8 Oi; 8 " ADD " : DYAD(ADDOP! *-" ) •> • oi; 8 * SUBTRACT

The following is a run of the fabricated translator on the

input expression that it lists, followed by a listing of the outputs. (The escapes in the transforms fabricate calls to the output-line subroutine of the fabricated translator.)

.EX DEM04 LINK: Loadina CLNKXCT DEMQ4 Execution! 1istfilt?TTY INPUT FILE7DEMQ4.IN OUTPUT FILE7DEM04.EX

1 + 2 + +3 + -4 -3 - -6 - -7

3 aarbaac collection^) in 80 MS

End of SIMULA proaraa execution. CPU tiae: 0.52 Elapsed time: 13.32 ,119

.TYPE DEH04.EX

1 2 ADO 3 ADO 4 MINUS ADO 5 SUBTRACT S MINUS SUBTRACT 7 MINUS SUBTRACT

If this is a typical input to this translator, it needs to be more clever in cases of two adjacent ADDOPs. DEM05 illustrates one possible improvement, replacing the one (long) transform case dealing with the two-ADDOPs case by two (short) transform cases, one dealing with the ADDOPs being the same

(i.e. ++ or —, both with arithmetic effect +), the other case dealing with the remaining combinations (i.e. +- and -+, both with arithmetic effect -). .RUN SSI

1 i stP i1*?TTY LIST TREE?N LIST 0RDINALS7N INPUT FILE7DEM05

(The syntax and lexis are identical to DEM04)

ACTIONS

L(PROD! ID ) »> a nt • • , . I 0 ol. 9 zxb(NUM) : L(PROD! •(• SUM ->• ) => L(SUM) I.

L(SUM!PROD) a> L(PROD) '

U(iuS.!SS2°SnSnSD«nn» ° L < 3«ONO(ADDOP, : PR0D> -> L(SUf1) L(SUL (STIMM ISU CMMI I ADDOPANNNO* 1 ADDOPA2 PROD) "-(PROD) DYAD(ADDOP) . L(SUM!SUM ADD0P*1 ADDOP*I'PROD)'^' S^""™00 > = *> L(SUM! SUM-+-PROD ) :

MONO (ADDOP ! ) •> 9* : MONO (ADDOP! -- , ,> , 0[. , . MINUS . .

DYAD •> • ol? • • ADD - • DYAD(ADDOP! -- ) .> • ol! • - sS2tra£T ,120

•EXECUTE LINK: Loadina CLNKXCT DEMOS Execution} 1i stF i1e?TTY INPUT FILE7DEM04.IN OUTPUT FILE7DEM05.EX

1 + Z + +3 + -4 -3 - -B - -7

3 aarbaae co11ection(s) in BO ms

End oF SIMULA proaraa execution. CPU time: 0.68 Elapsed tine: 13.24

.TYPE DEMOS.EX

1 2 ADD 3 ADD 4 SUBTRACT 5 SUBTRACT B ADD 7 ADD

It will be seen that minor changes, in suitable circurflstances,

can lead to significant improvement in the final outputs.

For larger examples, see appendix P.

4.6 REVIEW OF THE PROPOSED TECHNIQUES.

Personal experience of compiler writing, backed by the reading

of the source of a variety of compilers, suggested a stereotype

for the case analysis of the earlier code-generation phases.

Writing the case-analysis is often the most time-consuming and

error-prone part of hand-crafting an orthodox compiler, and is

highly susceptible to even minor changes in the syntax.

The techniques proposed and implemented provide a language in

which cases of syntax are dealt with independently, unless

other cases create exclusions from them. There is consequently

a high separation of program difficulty into the individual ,121

cases, which consequently become very simple in isolation.

The software resolves the inter-relationships between cases

of the same transform, and in most cases resolves the effect

of detailed changes to the syntax on the text of the transforms.

All the case-analysis code is generated automatically.

The transform language has thus been conceived to maximise

localisation of compiler-design issues. Inspecting the fabricated

code, it is often 20% to 50% less efficient than equivalent

code written by hand, but can be implemented in about 10% of

the effort otherwise needed.

There are many things this first version cannot do, as commented

in prior sections. Lacking a stereotype, no name-table manipu-

lation is provided: for a particular case it must be added as

hand-written code, or be a subsequent pass over the output of

the primary code-generator.

To close this chapter, the following is a final example of what the

present system can and cannot do. Consider a lexic: division

including:

CONST =^DIGIT < DIGIT* >_7

with a syntactic division including:

MULTIPLE ^PRIMITIVE* ('*'!'/') ^

with PRIMITIVE defined in such a way that CONST is a case of it.

Then floating-point division (expensive on most machines) can be turned into floating-point multiplication (cheaper) if the divisor is a CONST by the following transform-case: ,122

FLOAD(MULTIPLE[MULTIPLE "/"CONST) =>

FLOAD(MULTIPLE!MULTIPLE"*(1/"CONST")")

Ultimately, the bracketted term must be handled by something outside of the present system. The following is a simple possibility:

FLOAD(MULTIPLE! "1/"C0NST ) => "LOADRECIPROCAL " Z(CONST)

- in which Z is a hand-written transform for outputting the text of lexic items. Already a library of hand-written

transforms such as this Z (and the ZXB of the prior examples)

is beginning to emerge, written in SIMULA. A later revision of the software will include specific facilities for them, but until then they must be included by the techniques described

in the next chapter. CHAPTER 5

A PILOT PRIMARY CODE-GENERATOR FABRICATOR

5.1 INTRODUCTION

This chapter briefly describes an implementation of the system described in the last chapter, the implementation sufficing to demonstrate the viability of the novel essential ideas of the system.

The implementation is large and complex. This short chapter mentions only the key aspects of the implementation, and in a few cases proposes alternative techniques suggested by the experience. Section 5.2 overviews the suite of programs and the methodology of using them and of writing them. Section

5.3 describes the fabricators of lexic and syntactic analysers.

Section 5.4 describes the parser of the transforms. Section

5.5 describes the program that takes the parsed transforms and fabricates the corresponding code, and also describes a program that produces a library to support the fabricated transforms. Section 5.6 closes the chapter with a final discussion.

5•2 OVERVIEW OF THE CF1 SUITE OF PROGRAMS

For brevity, the program suite described here is called CF1:

Compiler Fabricator for PRIMARY code generators. ,124

From the first phases of the programming it was clear that

it was simpler to have multiple programs, each to fabricate

one part of the final parser/code generator. This separated

many of the problems in writing the suite, but some care had

to be taken to preserve identical functionality in some parts

of the programs. The separation also implied a utility for

joining the various outputs into one final source program.

This source processor, its use in writing the programs of the

suite, and its use as a member of the suite, is discussed in

section 5.2.2.

The suite itself comprises the source processor and six other

programs. Figure 5-1 on the next page illustrates the

organisation of the programs, and the flow of information from

the input file, through the suite, to finally obtain a SIMULA

program to be compiled and run as the desired parser/primary

code generator.

The main input file to the suite is in three divisions: lexic

rules, then syntactic rules, then transforms. Hand-written

Simula code for incorporation can be appended to this file

for inclusion in the fabricated lexic analyser, syntactic

analyser, or transform code: this is specified using GET

escapes at the appropriate point in the corresponding division.

Alternatively, hand-written code can be taken from another file.

The L exic Analyser Fabricator and Syntactic Analyser Fabricator

each read only the first two divisions of the input file: each

of them outputs its fabricated Simula with a few extra lines Lexic Syntactic Oddments analyser analyser f abricator f abricator f abricator

1 t y f

In this diagram, programs Figure 5-1 are represented by rectangles, principal files used in an application of the suite are Organisation of represented by drum-shapes. the CF1 suite. ,126

required by the source processor. The lexic analyser fabricator

needs the syntax division only to pick out the reserved lexemes:

the syntax analyser fabricator needs to read the lexic division

only to know the metavariable names of the lexic rules. These

two fabricators are described in detail, in Section 5.3.

The fabrication of the transforms is done in three main stages:

first the transform cases are parsed, and output in a easy-to-

reparse form; this is then sorted; then reparsed and the

transforms finally fabricated. The key reason for the split

is that the intermediate form of the transform cases is

representad starting with the two identifiers that comprise

the name of the transform, so that the sorting collects all

the cases of a transform. 'Without this intermediate stage,

the freedom of order of giving the transforms in the input

file would imply the possiblity that one may need all the

transforms in store at the same time - which for large sets

of transforms would demand a very substantial machine-memory

size. With the sorting, only the memory for the largest

transform is needed.

The intermediate form therefore is one line per transform

case. After the two identifiers comes a form of the matching pattern, in which is embedded all the parse structure inferred

from the syntax division,; then escapes, texts, and transform

calls for the right hand side of the transform case (the

calls being the pair of identifiers plus structured parse of

the creation pattern). The program that reads the transforms must therefore parse the patterns in a special manner, dependent on the syntactic and lexic rules. Since those rules are input-dependent, they cannot be built in (unlike the rules for parsing the lexic and syntactic divisions and the non-transform parts of the trahsform division); Instead this program must hold a data structure representing the lexic and syntactic rules, and interpret that in order to parse the patterns. This program is therefore called the Interpretive Parser: it is discussed in more detail in Section 5.4.

The Transform Fabricator takes the sorted interpretively-parsed transform cases until it finds the name of another case: for each input case it inserts it according to its matching pattern in a tree of cases, the structure of which represents the relationships between the patterns. From this tree, it is able to fabricate case analysis code appropriate to the transforms.

In each branch of the case analysis it fabricates code matching the right hand side of a transform case: apart from the created patterns, this is straightforeward. For a created pattern, it calls a function that builds a parse-tree nocfe representing the pattern, from equivalent function-parameters. This needs

more knowledge of the syntaxtic and lexic rules than the transform fabricator is given by the interpretive parser, so another program is used to read the lexic and syntactic rules, and fabricate these oddments. This last program of the suite is called the Oddments Fabricator , for want of a more concise name. Some of the fabricated oddments are small, and are directly read by the transform fabricator and incorporated ,128 into its output: some are large and are output as a source library- file. The transform fabricator outputs calls on the source processor to incorporate the large oddments needed by the transform fabricator output. The transform fabricator and oddments fabricator are discussed in more detail in Section 5.5.

A choice which had to be made right at the beginning was whether to have the parser-fabricator produce either (1) the source of a compiler, or (2) a table (or set of tables or other structured datasets) which would "paeameterise" a standard parser which interprets the table as a specification of the syntax.

Method (1) has the advantage of producing fast compilers. A slow part of all compilers - often the slow part - is the lexical

analyser so there is no doubt in my mind that method (1) is best for the lexical analysers. This method has the disadvantage of producing large parsers: size being roughly proportional' to

the length of the lexic and syntactic divisions. Most programming

languages have lexical definitions of about the same complexity, so the fabricated sizes of lexical analysers would not differ by a great factor. However the complexities of syntaxes varies enormously, consequently so would the sizes of syntax analysers of this type: for big languages the fabricated syntax analysers would be vast.

Method (2) requires a fixed size for the interpretive syntax analyser: this could be of about the size of a Method (1) parser for a ten-production language. Method (2) also would typically call for table sizes about one-tenth the size of the corresponding method (1) code: this sets breakeven on size for ,129

the two methods at a 11-production language! Thus even for relatively small languages method (2) is overwhelmingly more compact. However it, is at best about one-half the speed.

This would be acceptable for a fast syntax analysis method - e.g. one which takes time linearly proportional with the length of the text to be analysed. (Many common methods take time quadratic in the length, cubic, or to the power of 2.81=log27.)

In the event the decision was made by implementation considerations. Method (2) would be easy to first-draft, but hard to debug or enhance, because the produced tables would be too obscure to verify easily. Even worse is the problem of initialising the tables into object parsers: possible -variants of the method are:

(2a) fabricating code to initialise the table; (2b) fabricating the tables as a file to be read by hand- written code incorporated into all target compilers; (2c) using structure-constants of the implementation language to produce the table as a file to be incorporated in the source of the target compiler; or (2d) same as (2c), but additionally with function entries in the tables.

Variants (2a) and (2b) require space for code used only at start up: but this could be a soon-discarded overlay. However they would both make method (2) even harder to debug or enhance: to my mind this eliminates them. .

Variant (2c) would be viable if the implementation language supported structure-constants - e.g. CORAL, ALGOL-68, ADA.

(Also Fortran if it bore consideration as an implementation language.) ,130

No such language was available to me. The implementation

language selected was SIMULA: structure-constants are not

one of its features. " -

Variant (2d) is supported by few languages: e.g. BCPL and

POP-2. BCPL was not available, POP-2 became available too

late for this part of the project, but has a number of other

features that make it highly desirable.

In the first instance, I therefore had to use method(l).

An implementation of Simula written using a working Compiler

Factory could easily be upgraded for tables for variants (2c)

and (2d), and thus the compiler factory could be re-written

in this upgraded Simula to use variant (2c) or (2d) .

An additional reason to prefer some variant of method (2)

is that it would be more akin to the interpretive parser

for the patterns in the transforms: and the interpretive

parser had to be written anyway! In a sense, the interpretive

parser already is close to variant (2b).

In the event, it was decided to fabricate method (1) parsers,

using no-backtrack LR parsing techniques on (extended) LL grammars. This method (and others) has been called both LC

and "recursive descent".

5.2.1 Bootstrap use of the parser generator, and bootchecking

All the programs of the suite except the sort program and the

source preprocessor have to, syntax-analyse their inputs. It was therefore natural and desirable to write them using the

parser-fabricating programs, that is, the lexic-analyser

fabricator and the syntax-analyser fabricator. Clearly, in

the first instance, these two latter programs could not be

used to write their not-yet-existing selves. A bootstrapping

process was therefore applied. A lexic analyser for the input

language was written by hand: it was very much a first draft

of what a fabricated lexic analyser should be. A syntax

analyser, of the form needed for the syntax-analyser fabricator j|and very much a draft of its own output| was written by hand,

and the two hand-written sets of code joined to give a first draft of the syntax-analyser fabricator. The main input on which this was tested was a syntax division giving the syntax of syntax divisions, so the first working program part- fabricated by the suite was the second version of the syntax- analyser fabricator. Comparison of the first (handwritten) syntax-analyser fabricator and the second (fabricated) fabricator was invaluable for eliminating several subtle bugs from the hand-written version.

t

When using the second (fabricated) version yglided output identical to that of the first (handwritten) version, they were declared bug-free, at least for boot-strapping their own successors. This technique of checking a bootstrapping stage I call a bootcheck. The handwritten version was called version 0, the first fabricated version was called version 1". Bootcheck versions were not numbered, being identical to a numbered version.

A syntax for lexic divisions was then written and fabricated: to this the appropriate fragments of the handwritten dedicated ,132

lexic-analyser fabricator were joined, generalised, and re-debugged until this fabricated a lexic analyser, which (when re-used)fabricated itself* This was then version 2 of the software, had called for a slight amendment in the syntax analyser fabricator, and so had implied a second boot-check.

Whenever any of the programs in the suit was modified, these two fabricators - on which the whole integrity of the system depends - were each bootchecked. Of the 196 versions through which the suite went before completion, about 50 had overt changes to one of these two fabricators, but 6 versions had covert changes (due to modifications of the source library) of which two would have been damaging. Bootchecking caught these covert changes, facilitated validation of the acceptable four and - most importantly - made the unacceptable two changes very obvious. (1)

The remaining three programs of the suite that parse their input are written using the lexic-analyser fabricator, the syntax-analyser fabricator, and large bulks of hand-written code incorporated using escapes. Figure 5-2 on the next page illustrates the organisation of the fabrication of one of the five programs, using the two.

(1) The four acceptable covert changes were given version numbers in their own right - despite there being no input source changes - and bootchecked. It was this second bootcheck, using identical input source, that caught one of the two unacceptable covert changes. The bug in that covert change was very subtle, and by orthodox de-bugging methods would have been extremely difficult to locate. Figure 5-2 A CFI Bootstrap use of CF1 Program VERSION N+l (Simula'

CO CO ,134

This bootstrapping strategem has saved a substantial amount

of effort, especially for maintaining the compatability of

various aspects of the 5-programs.

5.2.2 The source processor, and the source library.

The source processor can be configured in two major ways:

as a set of subroutines in another program, or as a program

in its own right. The program version is little more than

the subroutines called by a loop that reads a line^then

writes a line then loops, with appropriate initialisation cuuuL

finalisation.

The full version of the processor has 16 special facilities

for input, each of which may be present or absent, and one

special facility for output, which may be present or absent.

Only the three major special facilities are described here.

One of the optional special input facilities is source line

selection . When the input subroutines are initialised, if

source line selection is present then the question "WHICH

SELECTIONS?" is printed on the controlling terminal. The response should be zero, one or more letters and/or digits.

When a line is input via the subroutines, the first character • is inspected: if it is a question mark, the next question- mark on the line is located (if there is not another, an error is given). If any of the characters between the two question marks was given in answer to "which selections?", the line is passed to the calling routines: otherwise, the line is discarded, another is input, and the process of line selection is restarted.

Using line selection, it is easy to configure variants of a source depending on the reply to 'Which selections", if the options can be made orthogonal.

The source processor's own source must be processed through a version of itself in which source line selection (and several other special facilities) is present* It is by this

means that the 17 special facilities are made optional.

The four programs of the CF1 suite that read lexic and syntactic divisions (and in one case transform divisions) in CF1 input notation all have their hand written parts at the end of the same file, with the sections for the differeht programs marked for source line selection. The syntactic-analyser fabricator and lexic-analyser fabricator include the source processor routines with the source line selection facility present. Thus, when fabricating themselves or one of the other two, they can make the correct selections.

Another optional special input facility is in-lining. If this facility is present, it inspects every input line, determining if it starts with two asterisks. For such lines it has three subcases:

(1) the line is of at least 5 characters, and the last two characters are also asterisks; (2) the line is of at least 3 characters, and the last two characters are not asterisks: and (3) the line is the two asterisks only. ,136

In case (1), the line is a control line, and the characters between the asterisks determine the effect. If a control line starts **OPEN, then the remaining characters are the name name of a file on external storage, which is to be opened as the secondary input file: if there is already an open secondary input file, an error is given. If a control line is **CLOSE**, then the current secondary input file is closed: if there isn't a secondary input file, an error is given.

If a control line starts **GET, then the remaining characters are the name of a subfile on the current secondary input file which is to be located; then the normal input is switched to the secondary input file: if there isn't a current secondary input file, then ar error is given.

In case (2), the characters after the two asterisks are the name of a subfile which starts at the next line. This should only appear in secondary files: in a primary file it will cause an error if read. Thus after reading a **GET from a primary input file, the next line delivered to the calling routine is the line following the subfile name in the secondary file; and subsequent calls deliver subsequent lines from the secondary input file.

While inputting from a secondary input file, if a case (3) or another case (2) line is encountered, the reading is switched back to the primary file, and the reading process restarted. Thus, in one of the latter cases, the next line delivered to the calling routine will be that after the

**GET in the primary file, or if that is a control line, whatever the control line determines. In a secondary file, there may be an **OPEN of, **GET from, and **CLQSE** of a tertiary file: in a tertiary file of a quaternary file: and so on. Between an **OPEN and its matching **CLOSE**, there may be many (or no) **GET lines.

The same routine that is called by a **GET line is called by the lexic-analyser fabricator and syntactic-analyser fabricator when they process a^GET . . escape, with no primary file, and the CF1 input file as the secondary input.

This is the means by which large quantities of hand-written code are 'easiest incorporated into fabricated programs

(notably into the CF1 programs themselves).

The third facility important to the CF1 suite is library in-lining. A control line starting **CALL simply adds the remaining characters on the line as an entry on the call-1ist, unless they are already present as a entry of either the call-list, or the called-list. When the subroutines are initialised, both lists start empty. A control line **SEARCH** causes the transfer to the next-level **OPENed file, and a search to a line marking the start of a subfile. If the'4 subfile name is not in the call list, the search restarts for another subfile: if that is in the call list, input switches to it, and the name is removed from the call list and put in the called list: if a line of exactly two asterisks terminates the subfile, the search restarts: if the subfile is terminated by a line marking the start of another subfile, that subfile name is checked against the call list and processed as if found by a search. When a search reaches end-of-file, the line ,138

after the **SEARCH** is processed. When the finalisation

routine is called, if the call-list is not empty an error

is given and the names of the un-found subfiles listed.

Library in-lining is important in CF1 for two purposes.

The code output by the transform-fabricator seldom needs

all the oddments (fabricated by the oddments-fabricator)

that it might have needed. Thus, when it outputs code

requiring an oddment, for the next line it outputs a **CALL

of the oddment. The oddments-fabricator puts each oddment

in a subfile of its own. The reason that **GET could not be

used is that the oddment would be in-iined once per **GET, in place of the **GET: but each oddment is needed only once, and

in a place known to be harmless. Therefore, the only remaining

requirement on the transform fabricator is that it should put

a **SEARCH** of the oddments library at a harmless (and very

late) point in it< output to obtain each **CALLed oddment once.

The second purpose of library in-lining is to obtain subfiles

from the CF1 library. This is a library of subroutines common

to all possible CFl-output sources (including the CF1 programs themselves). It includes:

* the source processor routines themselves; * character-input routines needed by all fabricated lexic analysers (and which call the source processor routines to input lines); * listing routines and error-message routines for all fabricated parsers, error messages appearing at appropriate points in output listings; * routines to assist in the listing of trees, for building into debugging versions of CF1 programs; * routines for manipulating names in all CF1 programs (See 5.3.1); and * other miscellaneous facilities needed in CF1 programs or in other CFl-fabricated programs, such as to print in bold, return text string representations of binary- integers, etc,

Having common source for these routines (even if different variants were to be produced) was a great time-saver, and also assisted in guaranteeing consistency between the CF1 programs, and also between CFI-fabricated programs. (2)

Finally, it facilitated parallel updates of the sources of two or more programs at a time.

5.3 THE LEXICAL ANALYSER FABRICATOR AND SYNTACTIC ANALYSER

FABRICATOR

The lexical analyser fabricator fabricates one large Simula procedure that inputs and checks the next lexeme (lexical symbol, token) of the target language, after ignoring any preceeding blanks. A call of the fabricated procedure leaves an integer value representing the kind of the lexeme in a result variable, and the accepted text in a secondary result variable.

Each lexic rule and reserved lexeme of the target language is assigned its own integer.

(2) At one stage, the syntax-analyser generator, the interpretive parser, and the oddments generator were all one program. By introducing appropriate source Selection, it was possible to mark the three sources. Trial of the three, and using the source processor to merge their outputs, gave final output identical to that of the combined program at the first attempt. ,140

The syntactic analyser fabricator fabricates one parsing procedure for each syntactic rule: a call of such a routine

returns a pointer to a (newV tree-node object of a Simula

class also fabricated for the same syntactic rule.

The syntactic analyser fabricator consistently assigns the

same integers as the lexic analyser fabricator to each kind

of lexeme: thus, it can test if the latest-read lexeme is of

a certain kind or of one of a certain set of kinds, as is needed for LR parsing of an LL language. Great care was needed to ensure the consistency of the two assignments (and

of the same assignment again in the interpretive parser and

oddments fabricator).

To simplify the syntax analyser fabricator a little, to ohtain a subtree node the fabricated procedure makes a call that does not distinguish between nodes representing a syntactic rule or a lexic rule. Thus, for each lexic rule (but not for each reserved lexeme) the lexic analyser fabricator produces a procedure that returns a pointer to a node which includes the text of the lexeme, or gives an error if the lexeme currently awaiting acceptance is not of the expected type.

In either case, the lexeme is consumed and the main lexical- analyser routine called to analyse the next lexeme. This has the side-advantage of avoiding duplicated in-line code for accepting a particular kind of lexeme.

In early versions of the software, a boolean procedure was fabricated for each lexic rule and syntactic rule which returned TRUE if (and only if) the current lexeme was an instance of

the lexic rule, or could start an instance of the syntactic

rule. These procedures were found to consume excessive time

for procedure-calling, and so had to be replaced by in-line

tests conjoining and disjoining equalities of constants

(representing lexeme kinds) against the kind of the current

lexeme. This was acceptably fast, and not excessively bulky,

but demanded the consistency mentioned above. In a-substrate

language including case-statements (e.g. BCPL or Ada) it would

be faster and more compact.

5.3.1 The Lexical-analyser fabricator.

The fabricator attempts analyses of each reserved lexeme

according to the lexic rules, as discussed in section 4.3

(pages 84 and 85). Reserved lexemes which do not have a

(non-null) common prefix with any rule are treated as having

a null prefix common with a ficticious rule.

For each rule, the corresponding reserved lexemes are grouped

into:

(1) those which cause an error against the rule before being wholly accepted by the rule; (2) those wholly accepted by the rule (without error);and (3) those of which a prefix is accepted by the rule (without error).

The rule is then fabricated into the code given below, preceeded by tests for the longest acceptable group (1) reserved lexeme, and at every end-point of the code for the longest remaining suffix of a group (2) or (3) reserved lexeme, matching the prefixes possible reaching that end-point. ("End-point" here ,142

applies when the rule ends with a choice: it is an end-point

of the branch of the if-statement fabricated for the choice.)

A sequence in the rule fabricates acceptors for the atoms of

the sequence, the order of the acceptors matching that of the

atoms. The sequence of acceptors is entered only if the first

acceptor is suitable for the current character; each other

acceptor is preceeded by a test that the then-current character

is suitable for the acceptor. For example, the rule

LD = LETTER DIGIT

will fabricate an acceptor for one letter, then a check that

the next character is a digit, then an acceptor for one digit.

The pre-checking implies that acceptors for single-character

atoms (including choices of which each branch is a single-

character atom) need not "know" the variety of character(s)

it is to accept: it merely puts the current character into

the lexeme buffer and inputs the next character.

If the check before an acceptor fails, an error is given at

that point in the lexeme, and control continues as if the

check succeeded. Thus the present software gives a single

good error message, then may become utterly confused, giving

many spurious error messages. A production version ought

to try to avoid the spurious messages, and ideally would

attempt more intelligent recovery or even correction.

A choice in a lexic rule fabricates an if., then., else if ...

acceptor: the final else-branch simply gives an error message

at that point in the lexeme, and (to avoid possible hang-ups) accepts the bad character.

An option in a lexic rule fabricates an if..then... acceptor with no else-branch.

A repeat of form sequence"^ fabricates an acceptor of form:

while current character starts sequence do begin acceptor of sequence end or repeat-acceptor;

A ^repeat of form ^sequencel * sequence2x'fabricates an acceptor of form:

acceptor of sequencel

acceptor of sequence2 sequencel^

This duplicates the code of the acceptance of sequencel: most lexic sequences are simple enough that the duplication is reasonable.

A repeat of form ^sequence * ^ is treated as the last case, with a null sequence2.

The only point of interest in the (pseudo-)parsing routine for each lexic rule is the new node to which it returns a pointer. In most cases, this is just the Simula TEXT object containing the string of characters buffered for the lexeme which is the accepted instance of the rule. When fabricating'

CF1 programs, the "decorated" items need to be multiples of

TEXTS, with leading parts factored out so that (e.g.) determining that the decorated names and have the same prefix.can be by efficient pointer-comparison, rather than inefficient text-value-comparison: ,144

TEXT : POINTER field : field

This is acheived by inserts in the "parsers" for the rules.

For names of rules, the above is still a simplification, in

thatinstead of pointing to a common text object, the common

object pointed to is the parse tree of the rule, or^before

the rule has been read^to a place-holder for the rule. Once

the complete parse-tree has been built, reference to the parse node of a decorated item is only one indirection from

the rule referred to. This avoids expensive and repeated

searches through the lists of rules.

5.3.2 The Syntactic-analyser fabricator.

A parsing procedure is fabricated for each syntactic rule:

the body is organised much as for the acceptor fabricated

for a lexic rule for the same structure. The main differences are the possibility of calling other parsers from this

(including the lexic pseudo-parsers), the expansion of the in-lined tests, and the organisation of the nodes pointed to by the results.

The parsing method used is a variation of GLC /?)EM77_J.

For work such as this project, it is highly desirable to use a strongly typed language: in the project this feature of Simula exposed many trivial errors that could otherwise have been time-wasting. For the simple representation of nodes matching rules with sequences of choices, the nodes would need ^iblfmj

embedded variant parts. No common current language has

this feature, except (by definition) the untyped languages.

Consequently a rule such as

FOUR_TWO = ( Al ! A2 ! A3 ! A4 ), ( B1 ! B2 )

implies a node with space for six pointers, not space for

two. This space-wasting is the worst single feature of

the present software. Some languages allow unchecked type conversions (e.g. Ada): given that the pointers should all be the same size, this would be viable, especially given that the un-checking code would be machine-generated, and so more consistently debugable than in the case of random human omissions. A Simula "trick" could also have been used, but in a trial led to appalling loss of speed of the fabricated code, and also tied the system much more tightly to Simula as a substrate language.

In addition, to record the reserved lexeme represented by a node of a rule such as

ADDING_OPERATOR = ( "+" ! "-" ) there will be no pointers: so null values of pointers cannot be relied on to indicate that the corresponding branch of a choice did not match the construct instance. For each choice there was therefore a need for an integer in the node to indicate which branch matches the instance, and for each option a boolean to indicate whether it was present in the instance: trying to optimise these away would loose the separation of the representation of the structure of a case or option from the representation of the branches or optional list. ,146

The structure of nodes for repeats is the left-recursive

one implied in section 4.3 (page 87).

5.4 THE INTERPRETIVE PARSER

The interpretive parser is written using the syntax-analyser

fabricator and lexic-analyser fabricator as described in 5.2.1, the resulting tree being walked by handwritten code inserted by GET escapes as described in 5.2.2. The syntax tree of the syntactic and lexic divisions is retained for interpretation by the interpretive parser. The syntax of a case of a transform is given as:

TCASE £ PUT integer e; e:=errors; (3 = TFORM, "=>", RHS, @£ET **CASEMAIN£ m:" RHS = < (TFORM'.LITERAL !ESCAPE) *> TFORM = NAME, "(", NAME~2, ©GET **TFORMMAIN£ ")"

TCASE gives the syntax of a single transform-case, with two escapes. The first escape has the effect of preserving the number of errors signalled before the start of the parse of the TCASE. The second escape checks if the number of errors by the end of the transform is still the same: if so it outputs a line for the transform fabricator representing the current transform; otherwise it outputs a flag for the transform fabricator warning that there was (at least) one bad transform.

RHS just syntax-accepts a sequence of TFORMs, LITERALS, and

ESCAPES.

TFORM accepts a form NAME (NAME ! pattern ) The second NAME needs a decoration so that hand-written inserts can distinguish it from the first. The exclamation mark does not appear in the syntax, because if it were accepted normally that would cause the input of the next lexeme according to the lexic rules of the CF1 notation

- not of the language being defined. The escape GET **TFORMMAIN therefore starts by ensuring that the current lexeme is an exclamation mark, accepted by the CFl-notation lexic analyser; then^in dismissing that^accepting according to the lexic rules of the defined language. The exclamation mark must be recognisable as a reserved lexeme of the CF1 notation, and so must appear elsewhere in the syntax of the CF1 notation (it does—in the syntax for choices - if it didn't, a dummy rule would have to appear in the syntax rules of the syntax notation:

DUMMY = "!" to introduce the reserved lexeme). The TFORMAIN escape must also recognise closing-bracket ")" as an end-of-pattern mark, and leave the variable whose value indicates the lexic class of the current lexeme indicating a closing-bracket reserved lexeme.

Thus, at the meta-level, the lexic analyser of patterns must recognise NAMEs (standing for instances of the syntactic or lexic rule named), closing-bracket, and the start of a quoted string of characters representing lexemes of the language being defined (here called ana-lexemes). At the ana-level (i.e. accepting ana-lexemes of the defined language) the interpretive lexic analyser must look to the lexic rules and - ,148

reserved lexemes of the defined language, and also regard

closing-quote as a signal to return to meta-level lexic

Analysis. To avoid problems that could arise if the defined

language has lexic constructs that could include quote

marks, both single and double quote marks are allowed

around a passage at ana-level: such a passage must end with

the same kind of quote-mark as it starts, and inside the

passage the other kind of quote-mark is treated as a normal

character. As an extreme example, if the pattern was to

include as a lexeme of a defined language the three characters i it i

then this would heed to be given, a-s three separate quoted

passages, each containing the other kind of quote:

it MI i ii i ii i ii

Because of cases such as this, it is necessary to allow a

lexeme of the defined language to be split across two or more

passages of text at ana-level. From the point of view of

legibility, it is fortunate that such cases are fairly rare.

The syntax parts of the interpretive parser are relatively

uncomplicated: the only complications are the need to interpret

the syntax, to be able to accept a name at meta-level in lieu

of an instance of a construct matching the rule of that name

(both lexic and syntactic constructs), and the produced parse

tree which must (in Simula) be from records whose structure

is given at the time of compiling the interpretive parser,

and so pannot depend on the defined language. In the following examples, a feature in the interpretive parser for listing the parse-trees of the patterns has been turned on. It includes some of the essential syntactic information that will be needed by the transform fabricator: specifically lists and loops (repeats). To keep the trees simple, they omit information on choices and options.

Names representing meta-instances are suffixed by a hash-mark.

After each example the line output for the transform fabricator is shown: this includes all syntactic informations. In the output notation an up-arrow preceeds a name input at meta-level in a pattern, braces enclose instances of a construct (the opening brace being followed by the name of what it is a construct), diamond brackets enclose instances of loops, round brackets of choices and square brackets of options. The opening round bracket of a choice is followed first by a number indicating which choice (in the relevant rule) is being instanced, then a number indicating which branch of the choice is instanced, then the representantation of the branch. The opening square bracket of an option is followed by a number indicating which choice (in the relevant rule) is being instanced, then representation of the instance of the body of the option if (and only if) the body is instanced.

For the transform:

OUT(AE!AEd) => OUT(AE4!AEJ) the listing with trees of the patterns is: OUT(A£i A6+ AE-list AE4-loop A£4# ) a> 0UT£A64iAS4 AE4-loep A£4# ,150

yt5,lding the following output to the transform fabricator:

out {j ab (l+i {?ab4 } ) } sr> out <_:ac4*> > :

The tree ishows that the AE instance is a list of one item,* an AE4-loop: the AE4-loop loops only oncfc^the once round being instanced by a name AE4 at meta-level.

For the transform:

0UT(AE!"IF"BE"THEN"AE4"ELSE"AE) => "IF"0UT(BE!BE) "THEN"0l)T( AE4!AE4)"ELSE"0UT(AE! AE) the listing with trees of the patterns is: birfCAEiTlF,B£,THEN"A64"ELSE«AE

AC-list. "IF" BE* "then" A64-1&0" ! ae4* •else* AE*

) »> •if'outcbeipe

be* )"THEN"0UT(AE41AE4

AE4-1OOP AE4* )"ELSE"OUT(AS IAS AE*

)t yeilding the following output to the transform fabircator:

WouTHPftEnrHre'iF' ? 'then", {T:AE4 > "IF"T'OUT**:BEFRJ "THEN" t OUT CAE4 <-JAE4*> > f "ELS; E"; OUTA:AB* :

({The last represents one line too long to fit across the page, and therefore "wrapped round" onto successive lines.)

The tree above shows an AE instanced by a list of six items: an "IF" reserved lexeme, a BE meta-name, a "THEN" reserved lexeme, an AE4-loop, an "ELSE" reserved lexeme, and an AE meta-name. As in the first example, the AE4-loop is instanced by an AE4 meta-name only, listed identically, but with a vertical line emphasising continuity of the enclosing AE-list. The reserved lexemes are carried across to the transform

fabricator which (strictly) does not need them, but uses

them to improve the quality of error messages and other

listings.

The transform

OUT(AE3!AE3"/"AE2) =>"(" 0UT(AE3!AE3) "/" 0UT(AE2!AE2) ")"

includes an instance of a loop on the repeat AE3={AE2*"/"^»

and so the listing with trees of the patterns is:

OUT(AE3LAE3,,/«AE2

AE3-IOOP AE3# AEZ-loop AE2t ) «> "(• 0UT(AE3«AE3 AE3-1oop AE3T )V0UTCAEZIAE2 AE2-LOOP AE2#

) •>• : yeilding the following output to the transform fabricator: . AE3 PUT <:AE3 < :AE3* * (I*!"/") J CAE2 <-:AE2*> > > > •> "<"5 OUT TJAET <_JAE3*> > ) "/"I out CAE2 <_JAEZ*> > *' ">" *

The listing shows that the AE3-loop is tantamount to a

list of AE3, "/", and AE2-loop: it does not show what came

to left or right of the asterisk in the syntactic rule. The

output shows that the asterisk of the rule is passed by giving an asterisk, and that the loop is re-started by an exclamation mark.

Note £hat each output line starts with the name-pair of a transform, so that sorting the output lines of the interpretive parser collects the lines of the cases of a single transform, ready for the transform-fabricator, as described in section 5.2. ,152

5.5 THE TRANSFORM FABRICATOR

The transform-fabricator uses routines produced by the syntax-

analyser fabricator and lexic-analyser fabricator to read a

line of the sorted version of the interpretive-parser output. so The-; notation of this intermediate form is designed^that this parse does not depend an any way on the language being defined:

for example, all reserved lexemes are enclosed in double-

quotes marks, with any embedded double-quotes marks given

twice, and the final double-quotes mark followed by another

character (space, comma, semicolon, end-of-line, etc*).

When a transform is first met, first the prior transform (if

this is not the first) is generated, then a tree of cases is

initialised of the form:

Name of the transforms syntactic class

This first case of the transform.

The top-most node is made to look like the parse-tree of a

transform

X(Y!Y) => :

(that is, with an empty right-hand side), where X(Y..) is

the name of the transform (that is, it is a transform of the

syntactic or lexic rule Y) . Thus, comparison of the pattern

in the top-most node with any below it is by the same program

code as the comparison of any two cases input. If at any time

a case with left hand side X(Y!Y) is input, it is placed below

the top node. The top node is thus a device to make insertion

of nodes at the top level subject to the same rules as insertion

at lower levels. The top node is ignored during the output

stage at the end of the processing of the transform. Subsequent cases read are inserted into what becomes a sibling

tree: that is a tree in which each node (except the root) has

either a father or an immediately-elder brother, but not both:

and each node may or may not have a son, and(Independently)

may or may not have an immediately-younger brother. A node

iS.thus the father of its eldest-son node and of its eldest

sons siblings.

X I YL Y2 - Y3 — Y4

I I I Zll — Z12 —Z13 Z31 Z41 —Z42

In the above example of a sibling tree, X hasjbft.fu sons Yi, and each Yi except Y2 has son(s) Zij.

In the tree for a transform, the sons of a node all are of matching patterns which are more specific than that of the father, and such that the matching patterns of brothers are pairwise incomparable (that is, neither more specific than the other).

A subroutine in the transform fabricator takes the trees of two cases and examines the patterns, and returns one of three results: the first pattern is more specific than the second, the second is more specific than the first, or they are incomparable. Patterns cannot be equivalent without being identical (within reedecoration); in this case an error message is printed which lists . the two original transforms, states them to have identical matching patterns, and warns that the latter case will be ignored. When a new case is to be introduced as a descendant of a node, the immediate sons of the node are examined. If there are no sons, this new case is introduced as the first son. Otherwise, the new case is compared against each case in the list of sons until the end of the list is reached or until a case that is less specific than the new case is found. If a less-specific case is found, the algorithm recurses, making the new case a descendant of the less-specific case. Note that if there is a less -specific case than the new case, there cannot be any brother of the less-specific case that is more-specific than the new case, because then(by the transitivity of the specificness relation) there would be brothers of which one is more specific than the other: brothers are incomparable. If no less-specific case is found, then the brothers are classified into two groups: those incomparable with the new case, and those more specific. The new node is then inserted as a new youngest brother of those to which it is incomparable,' with as sons those more specific than it. Against the possibility that there may be cases more specific than two (or more) mutually-incomparable brothers, it is made a descendant of the first of these. For that reason, when a list of brothers is separated into two sublists, the order of the original list mast be preserved in the sublists.

The case that all the brothers are more specific than the new case is special, in that the father's pointer to its eldest son must be changed to point to the new case.

When the name-pair of a new transform is found, the tree of cases (ignoring the root node) is generated. For each

node, an if-then-else is fabricated, the else being followed

followed .immediately by the if of the following brother.

The last else is followed by a begin-end block embodying

the fabrication of the right-hand-side of the parent case.

The condition following the if is the test to be satisfied by an instance of the pattern of the case in addition to those already satisfied by the parent case. This achieves considerable optimisation, but demands either re-discovery of the differences of specificness with the father, or the preservation of the original determination of the differences.

The machine on which the software was implemented was generous with memory, so preservation was selected. Examination of the code reveals that optimisation of the tests between brothers is often also desirable: given syntax rules:

V = W, (X!Y), Z W = <"w" *> Z = <"z" *> then the two matching patterns for a transform of V

V ! 'wwww' X 'zzzz'

V ! 'wwww' Y 'zzzz' are incomparable, and do not at present share their bulky and slow tests for either 'wwww' or 'zzzz1. In practice, examples of this phenomenon seem quite common, but very vari ed .

The topmost level of brothers demands slightly different treatment in that the parent does not have a right hand side: in this case an error message is fabricated, saying that the transform has been called for an instance not covered by any case. ,156

There is also one even more special possibility at the top

level: that there is only one case below the root, with pattern of form X(Y!Y). In this case, the condition to be satisfied is true, so the if true then ... else error message is supressed, leaving only the begin ... end body

of the then, starting with the if-then-else groups of the

immediate sons.

Around the top-level fabrication of each transform the heading and footing of a Simula procedure is output, with parameter a pointer to the tree to be transformed. The name of the procedure includes the first name of the transform. Simula has a limited overloading facility: this is used to resolve the ambiguity which results if several transform have the same first name without

the fabricated procedure-names including both names of

the transform.-

The fabrication of the right-hand sides is straightforward, except for the created patterns. For a call on a transform, the first name of the pair gives the name of the procedure called: the difficulty lies in the parameter representing the created pattern. This parameter must be a tree of the pattern, according to the record-format of the parse tree created by parsing using the parser fabricated by the lexic-analyser fabricator and syntax- analyser fabricator for the defined language. This is achieved by having a function for each type of record possible in the parse tree (that is for every lexic ,157

and syntactic rule) which takes as parameters the components to go in the records of a node which it builds. These node-faking functions each return as result a pointer to the node faked. Whenever the transform generator outputs a call on such a function, it outputs also a source pre-processor

**CALL to the function. The functions themselves depend only on the original syntactic or lexic rule: they can therefore be fabricated by a separate and much simpler program, the oddments fabricator, as a source library. This has the secondary advantage that if a particular faker is not needed by a set of transforms, it is not **CALLed, and so not included.

The remaining difficulty is in fabricating the parameters giving the values of the components for the nodes to be faked.

For the meaningful components in any particular call, there is no problem. However the branches of a choice which are not instanced, or the body of an option not to be instanced, require layout information not passed from the interpretive parser.

This can also be discovered from the syntax without the transforms, so the oddments fabricator produces them in an organised fashion convenient in the transform fabricator. For each syntactic rule it gives the name of the rule, then the number of options and/or choices in the rule; then left-to-right across the rule in order of the starts of the options and choices . For each it gives the identifying number. On the line after the identifying number of an option, it gives the faked fabrication for the components of an omission of that option. On the lines after the identifying number of a choice, one line per branch of the ,158

choice, it gives in left-to-right order the faked fabrication

for the corresponding branch of the choice, if it is not

the chosen branch.

With this, the problem of the transform-fabricator became

routine, if immensely tedious.

5.6 REVIEW OF THE PRIMARY-CODE GENERATOR FABRICATOR

The present chapter makes light of the task of writing the programs: this took about 2200 hours (i.e. over a man-year)

spread over an elapsed period of 2vz years. The total source

is about 3500 lines: more than an order of magnitude slower

than I have ever worked before. Over 10,000 other lines have been written and discarded, either being superseded by bootstrapping (approx 2,000 lines) or replaced by simpler

faster or more compact methods. The product is still big and slow, but more than suffices to meet the objective of creating tools that cut the time to write pilot compilers.

In this sense, the tool is highly successful: re-implementation of a compiler I had previously written, and which had taken

3 man-months, took a week part-time. Inevitably, this used the previous experience, but nonetheless such a gain in human efficiency would.repay the investment in only half a dozen such minor compilers.

This first system is far from perfect: for example, most of the programs are far too complex, embodying too much functionality. Many more smaller programs would have been simpler to write : the difficulty would have been transferred

to defining the interfaces between them. In a pilot system,

the requirements on these interfaces would have been most

unclear, implying many errors, and much w&ste of time replacing

dependent programs. Given the present pilot system, study of

the interfaces inherent within it will facilitate the separation.

If the second version is on a UNIX-like system, the pipe facility

ensures - the inter-program transfers do not cause a loss of

efficiency: alternatively, in a language such as Simula,

co-routineing could be used to' combine programs that logically

are distinct into what the operating system considers a single

program. Finally: for such a re-implementation, there is now

an ideal tool: the system itself. As discussed in chapter 8,

further utilities to improve CFl-output programs are facilitated

by the CFi itself.

In the stereotype early parts of orthodox compilers, the tool

is proving a success. In the non-stereotype parts, it is hard to see what could be done, except to continue to write

code by hand and wait for stereotypes to emerge. The direction

then to be taken depends very much on the nature of the

observation. CHAPTER 6

FABRICATING A SECONDARY CODE-GENERATOR

6.1 INTRODUCTION

Section 2.6.2 developed an external overview for a generalised

method of secondary code-generation. This chapter proposes a

detailed method that conforms to the overview, and an imple-

mentation of the key part of the method. The proposal is for

a program that reads a description of a machine code, written

in a machine description language (MDL), and also reads the A

description of an IL, given in the form of an MDL description

of the corresponding "IL machine". The method could be used

to produce a secondary code-generator. For experimental

purposes the partial implementation, however, gains some

flexibility by reading the descriptions and then configuring

itself into a secondary code generator: that is, it is an

interpretive secondary code-generator.

To facilitate the explanation of the method, one example has

been chosen that appears to demonstrate most of the important

considerations. This example is presented in section 6.2: it

consists of an ALG0L-60 statement, the stack organisation

chosen for the implementation, descriptions of two rival

"IL machines", and a description of the appropriate instruc-

tions of the Digital Equipment Corporation PDP-8 computer

.(slightly tidied up). ,162

Section 6.3 describes the Machine Description Language (MDL)

used in the experiments so far, and rephrases the descriptions

of the "IL machines" and the PDP-8 in MDL. Section 6.4 re-interprets the MDL as trees, and hence also both instruction-

sequences and individual instructions of both the "IL machines" and the PDP-8. Section 6.7 gives a method for tree-matching

from root to leaves and back to discover best possible machine code instructions of the same algebraic effect as the given IL

sequences: this includes register allocation, and an optimisa- tion of the method. The method does not place the instruc- tions in order: techniques for this are given in 6.8.

Sections 6.5 and 6.6 provide concepts that prepare for 6.7 and

6.8. Section 6.5 shows the relationship between the production of the trees and symbolic execution: the instruction-choice method could correspond to an inverse of symbolic execution, in the form of a parse using an ambiguous grammer. This produces some useful insights. Section 6.6 uses criteria arising from

6.4 and 6.5 to eliminate one of the rival styles of IL.

An interpretive implementation of the key parts of the method

(subsections 6.7.1 to 6.7.3) is described in chapter 7.

This pilot implementation will be used to gain practical experience on which to base a later attempt at a fabricator of optimised secondary code-generators.

6.2 THE EXAMPLE

Subsection 6.2.1 describes a hypothetical implementors choice of how he wants to orgSniJre the "memory architecture" of the ,163

ALG0L-60-like language he is implementing. From the point of

view of secondary code-generation, this architecture is visible

only in the IL definition (1). In the translated target-machine

instructions it remains only in its influence on the instructions

chosen.

Subsection. 6.2.2 presents two rival styles of IL in sufficient

detail for the example, and subsection 6.2.3 presents PDP-8

instructions sufficient for a translation of the example.

6.2.1 The example instruction and its IL memory architecture

The example is the addition of the contents of two variables,

and the assignment of the result to a third variable:

I := J + K ;

The variables are assumed to be implemented in a framed stack, as follows: J is a static global variable (i.e has a fixed address j).

I is in the.current "frame" of the stack (2). The base of that frame is pointed to by the contents of a pointer in a static location named F, and I is at a constant offset i from the base of the frame. The address of F is f. K is in the currentmost-but-one frame, with offset k from the base of that frame. The base location of each frame (except the outermost frame) contains a pointer to the base of the next-outermost frame. (See figure 6-1 on the next page.)

(1) The fiction of a memory architecture for the language is much more important in the primary code-generation than the secondary code generation. (2) Often an implementor will choose that a frame of locations corresponds to the variables of a block, or of a procedure invocation. ,164

Figure 6-1

The memory organisation for the example I:=J+K;

Static (global) locations Stack frame locations

Name of Name of Offset of Address location location address I i I 1 J: Current frame

Currentmost- but-one frame

currentmost-but-two frame. ,165

The address calculations of the locations are therefore as follows.

J and F are at addresses j and f, respectively. i plus the contents of F is the address of I. k plus the contents of the location whose address is the contents of F is the address of K.

6.2.2 The example ILs.

A fundamental choice to be made in designing an IL is how each intermediate result is transferred from its originating instruction to its using instruction(s). This can be either explicitly in a named register, or implicitly in an anonymous location on a stack of such temporary intermediate results (3). To study the relative merits of these two styles of IL, the following two subsections propose a "Temporary-stack IL" and a "Register IL". Subsection 6.6 compares them, from the points of view of primary code generation, secondary code generation, and optimisation (that is, whether the choice of style of IL circumscribes the machine code sequences eventually produced).

6.2.2.1 A temporary-stack IL

The "temporary-stack IL machine" has at least the instructions given below. Each instruction is written as a mnemonic followed by an optional integer address or integer address-offset. The "machine" has a sequentially addressed memory, and knows about the framed, stack only in that F (or f) can be an implicit operand of an instruction.

(3) To prevent confusion between the framed stack of 6.2.1 and this stack of temporary intermediate results, they will be called the "framed stack" and the "temporary stack". ,166

It also has a stack of intermediate temporary results. Taking a value from a location at the top of this temporary stack also deletes the location from the stack. Putting a result onto the temporary stack reveals a new location to contain the result.

Instructions

LG m The contents of the global static loca- (Load global) tion of address m are put onto the stack. (Ie, the value where circumflex denotes "contents of".)

LF m The contents of the location of address (Load framed) m+F are put onto the stack. (Ie, the th value "(m+~f), the contents of the m"1 location of the current frame.)

LI m The contents of the top location of the (Load indirect) stack are added to m, and the result used as the address of a location whose contents are put on the top of the stack. (Ie, the value ~(c+~§), where § denotes "the address of the top location of the temporary stack".)

ADD The contents of the top two locations of the stack are added, and the result put back onto the stack. (Ie, the result put back is (~§)+(~§), where the §'s denote two different addresses. The result arrives in the location of the same.address, as the first §.)

SF m The contents of the top location of the (Store framed) temporary stack is stored into the location of the address m+F. The example instruction I:=J+K; of section 6.2.1 can be translated into this version of IL as the following sequences:

LG j LF 0 LF 0 or, using the LI k LI k commutativity LG j ADD of ADD, as: ADD SF i SF i

The major advantage of this kind of IL is that the primary code generator needs no complications to administer it, in the sense that if the primary code generator is correct then each produced result will be used exactly once, and the primary code generator does not need to keep track of the "stack pointer"

The apparent disadvantage is that the secondary code generator may have to do some work to discover the implicit administrative inf ormation.

6.2.2.2 An infinitely-registered IL

Section 6.6.1 shows that for an IL which is register-based, there are no advantages for limiting the number of registers.

This example IL is therefore designed on the presumption that whenever the primary code-generator needs a "new" register, it can also allocate one, without restriction.

The "register IL machine" has at least the instructions given below. Each instruction is written as a mnemonic, followed by the name of a register, followed by either another name of a register, or by an integer address, or by an integer address- offset. The "machine" has a sequentially addressed memory, and knows about the framed stack only in that F (or f) can be an implicit operand of an instruction. It has also an unlimited ,168

numbers of registers, whose names are denoted by capital letters

late in the alphabet. The letters R and S are used for generic

names of registers.

Instructions

LG R,m The contents of the location of address m are (Load global) put into the register R.

LF R,m The contents of the location of address m+F are (Load framed) put into the register R.

< LI R,m The contents of the location of address m+R are (Load indirect) put into the register R.

ADD R,S The contents of R and S are added: the result is (Add) put into register R.

SF R,m The contents of register R are stored into the (Store framed) location of address m+F.

The example instruction I:=J+K; of section 6.2.1 would, at first glance, appear to translate into this version of IL as the following sequences:

LG R,j LF R,0 LF S,0 or, using the LI R,j LI S,k commutativity LG S,k ADD R,S of ADD, as: ADD R,S SF R,i SF R,i

There is, however a problem with this style of IL. Either the definition of each IL instruction has built into it that the register(s) are to be allocated early in the instruction, or deallocated late in the instruction, or both; or alternatively an extra IL instruction needs to be defined to deallocate a register. (Allocation is simpler: the secondary code generator could automatically allocate a register encountered when it is not currently allocated.) The explicit de-allocation has the advantage that a result can then be used repeatedly (because for example it contains the result of a common sub-expression).

In either case, both the primary and secondary code generators must keep track of what registers are currently allocated in the IL stream.

In the convention that every register must be allocated before use with an IL instruction of the form:

A R

(allocate R), and that every register must be deallocated after use with an IL instruction of the form:

D R

(deallocate R), then the example I:=J+K; will be translated into the following IL sequences:

A R A R LG R,j LF R,0 AS LI R,j LF S,0 or,, using the A S LI S,k commutativity LG S,k ADD R,S of ADD, as: ADD R,S D S D S SF R,i SF R,i D R D R

The advantage of this style of IL is that all the information the secondary code-generator needs is there, explicitly. ,170

6.2.3. The PDP-8.

The following is a slightly modified description of three ins- tructions of the PDP-8 computer. (4) The PDP-8 has a sequen- tially addressed main memory. It also has an accumulator, which is not part of the sequential memory. Each of the three instructions acts on the implicitly-addressed accumulator, and on an explicitly-addressed location of the main memory. Each instruction comes in two forms, direct and indirect. Each ins- truction has a three-letter mnemonic to which the letter D is appended for the direct form, and to which the letter I is appended for the indirect form.

A direct-form instruction includes the integer address of the main memory location to be accessed. An indirect-form instruc- tion includes the integer address of a main memory location which, in turn, contains the address of the main memory loca- tion to be accessed.

The description below is of a PDP-8 as seen using ^"clever" assembler. LOAD instructions usually follow STORE instructions: the real PDP-8 STORE instruction also zeros the accumula- tor, so the assembler can implement LOAD as an ADD instruc- tion if the preceeding instruction was a STORE. The real PDP-8 does not have a LOAD instruction (5). It is assumed in this

(4) The described LOAD instruction would only be used in contexts where it could be implemented as the TAD of the actual PDP-8. Similarly, STORE would only be implemented as the DCA instruction of the actual PDP-8. (5) Thus, the following description is in fact of two of the real eight instructions of the PDP-8. The "clever assembler" assump- tion has also been used here to hide complications in the PDP-S memory architecture, due to its short wordlength (12 bits). ,171

thesis that the output of secondary code-generators may need to

be assembled in such ways if it much simplifies the description

of the apparent target machine.

Instructions

LOADDm Load to the accumulator LOADIm . Load to the accumulator the contents of the the contents of the location of which the location of which the address is m. address is in the loca- tion of address m. STOREDm Store from the accumu- STOREIm Store from the accumu- ulator to the location ulator to the location of which the address of which the address is m. is the contents of the location of address m. ADDDm Add to the contents of ADDIm Add to the contents of the accumulator the the accumulator the contents of the loca- contents of the loca- tion of which the add- tion of which the add- ress is m. ress i£ the contents of the location of address m.

The example I:=J+K; can be translated to the above three instruc-

tions in two distinctly different ways: each demands locations to contain temporary results (of addresses t and u):

LOADD=k LOADD f ADD I f ADDD=i STORED t STORED u LOADD f LOADD=k ADDD=i ADD I f STORED u STORED t LOADD j LOADD j ADD I t ADD I (p STOREI u STOREI u ,172

In each sequence, t contains the address of K, and u contains

the address of I. The PDP-8 notation "=x" means "the address

of a location containing x": that is, "=" turns its operand

into a literal. Note that because of the commutativity of

addition, and PDP-8 LOADD/ADDD sequence can have its operands

reversed. Similar reversals apply even if one of'the LOAD or

the ADD is indirect - in this case the indirection is moved with the operand:

LOADD x LOADI y is equivalent to ADDI y ADDD x

With these reversals there are 16 sequences of nine PDP-8 inst-

ructions for I:=J+K;. These are in fact the shortest possible

sequences, even if other PDP-8 instructions than LOAD, STORE,

and ADD are permitted.

6.3 MACHINE DESCRIPTION LANGUAGE (MDL)

The following describes the machine description language used

in the research. It is fairly arbitrary, but fairly obvious until one starts imposing sophisticated requirements on it

(such as indicating the length of values that operands represent - see the last page of section 3.2).

A single MDL instruction is a sequence of operands, monadic operators,, and binary operators. An operand is an integer number (or a bit pattern): it may be interpreted as itself or as the address of a location. In machine models which have locations outside the normal addressable main memory, the special locations need special corresponding non-integer operands as fictional addresses: such special operands will be introduced as needed for the stack-IL, register-IL, and PDP-8. ,173

For the I:=J+K; example we need two diadic operators and one monadic

operator.

Operator Adicity Left Operand . Right Operand Result Monadic (None) An address The contents of the location of that address.

+ Dyadic A value A value The sum of the operands. Dyadic A value An address to" Void which the value is to be assigned.

—^ is called the assignment operator. ~ is called the indirection operator. + is called the addition operator. For examples in the text of this thesis, to avoid excessive use of parentheses ~ will be of highest priority, then'+, and finally will be of lowest priority. (In the implementation, all diadic expressions must be fully parenthesised.)

An MDL instruction is an MDL expression with a void result. Thus the principal operator of an MDL instruction must be the assignment operator: and the assignment operator may have no other use.

The distinction between a value operand and an address operand is merely how it is used. That is, a primitive operand (such as an integer), the result of an indirection, or the result of an addition may each be used as a value or as an address. The only exception is that the "fictional addresses" may not be used as an operand of an addition or of an indirection or as the left operand of an assignment. ,174

6.3.1 MDL summary of the addresses in the I:=J+K; instruction

Using the implementation of the framed stack presented in section

6.2.1, the addresses of F, I, J, K are calculated as follows.

Location Address F f I i+~f

J j

K k+~~f

6.3.2 MDL definition of the Temporary-stack IL

Using the sign, "§", as a fictional address for "the top of the temporary stack", the instructions of the temporary-stack IL machine can be defined as below. We need the convention that whenever "§" is assigned to, it is first altered to point to a "ne cell; and on any other use, after the value in the cell to which it points has been used and the cell deleted, "§" is reset to point to the "newest" of the remaining cells. This convention could be built explicitly into the MDL definition of each IL instruction, for example by defining LG m with the pair of MDL instructions:- § + 1 the address of §

"m § .

Section 6.6 shows that this explicitness (and the meta-fiction

"the address of §") is not needed, so the definition below uses the implicit convention.

The MDL definition of the instructions.

LG m ~m § LF m ~(m+~f) § LI m ~(m+"§) -V § ADD (~§) + r§) SF m (m+~f) ,175

6.3.3. MDL definition of the register-IL

Recall that each register has as name an upper case letter. Using the corresponding lower case letter as a fictional address for the register, the instructions of the register-IL machine can be defi- ned as below.

The MDL definition of the instructions

LG R,m -»• r LF R,m ~(m+~f) r LI R,m ~(m+~r) r ADD R,S + ~s r SF R,m ->> (m+"f)

If the IL is described to include explicit or implicit A (allocate register) and D (deallocate register) instructions then this requires extensions to the MDL.

6.3.4 MDL definition of the PDP-8

There are two styles of definition for the three instructions and two addressing modes of the PDP-8 as presented in section 6.2.3:

Exhaustive: to treat an instruction as indivisible, and so

to define 6 distinct instructions.

Factorised: to identify "common factors" across the instruc-

tions: that is three "functions" (LOAD, STORE, ADD)

and two "address modes" (D and I) for the PDPr-8. ,176

A definition in each of the styles is given below. In both styles

a fake address "ac" is used for the accumulator. In the factorised

style a fake address "ma" is used for the "memory address register"

whose contents are passed from the "address mode" micro-instruction

to the "function" micro-instruction.

Exhaustive MDL definition of the PDP-8 instructions

LOADD m "m ac LOADI m ""m ac STORED m "ac m STOREI m "ac "m ADDD m "m + "ac ac ADDI m ""m + "ac ac

Factorised MDL definition of the PDP-8 instructions

Dm m ma ? Address mode I m "m ma J micro-instructions LOAD Function STORE micro-instructions. ADD

The obvious disadvantage of the factorised definition is that instead

of the instruction (eg):

STOREI m

we will need the micro-instruction sequence:

I m STORE

The advantage becomes more obvious for a larger instruction set:

for example the PDP-10 model KI has 396 functions and 4 common

addressing modes: so the exhaustive description would need 1584

"full instructions" but the factorised description would need ,177

only 400 micro-instructions (6). Clearly factorised definition is

simpler by virtue of being much shorter, and in addition makes the

regularity of the instruction set much more obvious.

(6) For the KI-10 the situation is in fact both better and worse than this suggests. BETTER: in the same way that full instructions can be factorised into function and memory address calculations, because of other regularities in the set of KI-10 functions, they can be factorised further into sequences of up to 4 distinct sub- function micro-instructions. One sequence 4 long is:- Move from right half-word ... to left half-word ... zeroing the other half-word ... from register to memory. There are 142 sub-function microcodes (obviously not all sequences of one two three and four sub-functiong are allowed). From 400 microinstructions to 146 reduces the size of the definition by nearly two thirds. WORSE: the common address modes have a natural two-by-two factorisation: index 'or not ? ... then indirect or not ? However, when indirection is selected it is possible to iterate on these two microinstructions to get the full range of possible memory address calculations. This range which would be infinite were it not for limitations on machine memory size, and hence on the number of locations via which indirection can take place (the hardware detects any cyclic indirection, and invokes the operating system to abort the program). This memory address calculation stops when an iteration does not indirect. For example, if four registers were set up containing the indices of an element in a four-dimentional array, then a sequence of the following form will locate the element (without bounds-checks): Index on the first register ... then indirect ... then index on the second register ... then indirect ... then index on the third register ... then indirect ... then index on the fourth register . . . then don*t indirect - provided the intermediate locations of the indirection are approp- riately set up. Thiscannot be exhaustivly described, but is- still only two-by-two in the factorised fcPm. The factorised form, however^ now has to be self-referential. ,178

There are also practical advantages: the description is easier

to write and input correctly. We shall also see that it will

lead to much faster code-generation without needing to resort

to complex indexing schemes for the instructions - it is in a

sense a natural two-dimenSional indexing scheme.

6.4 INSTRUCTIONS AND INSTRUCTION-SEQUENCES AS TREES

The IL-to-machine code translation methods described in section

6.7 were ultimately inspired by the observation that the symbolic

evaluation of the MDL of two instruction-sequences yeilded the

identical MDL instruction. Thus, given an IL sequence, it could

be translated to machine code by symbolic evaluation of the MDL

of the sequence, then by finding a machine code sequence with

the same symbolic evaluation of its MDL. As MDL text the

evaluation process is tedious and machine-inefficient, especially

the "finding" stage. The same methods on the parse-tree of the

MDL are easier to describe, and both easier and more efficient

to implement. On trying to optimise the method, the tree form

of the method is intuitively clear, whereas the textual form

becomes much less clear.

In this "parsing" approach, it turns out that the assignment

operator has a double role. The first is as a normal operator:

in the context it is the root of the tree in which it appears.

The second apj>t|i£S only to the MDL of a micro-instruction of the

definition: .Interpreting the assignment operator as the meta-

syntactic "is defined as" mark, and the definition of the micro-

instruction as a syntactic production of the grammer defining

the parse-tree of the symbolic evaluation. This second role

cannot apply if the right operand of the assignment operator ,179

is other than a simple address. An example of both roles is the

PDP-8 instruction STORED, with non-factorised MDL "ac-V m: the two parse trees representing this are:

- ^ and ac " — (m)

"Normal role" "Meta-syntactic role"

In the "meta-syntactic role" tree the bracketting of the "m" indicates that at this point the tree can be grafted to the "m" leaf of another tree: see the example below. An example of a

PDP-8 instruction with a non-simple address is STOREI, with non-factorised MDL "ac->- "m: the one parse tree representing this is :

ac " ^

"Normal role"

An example of a grafting is the tree equivalent to the sequence:

Load something to the PDP-8 accumulator STORED m Load something else to the accumulator STOREI m where m is the address of a "temporary" memory location. (The effect of the sequence is to store the "something else" value to the "something" address.) The grafting is of the "meta- syntactic role" tree of the STORED to the m-leaf of the tree of the STOREI:

In this tree the bracketted names embedded in a branch indicate

that the branch represents an intermediate result of the sequence ,180

in that address. The ellipses denote the "somethings" loaded to the accumulator.

The interpretation of a branch as a temporary result gives the branch a natural orientation: the direction in which the data flows. This "data flow" interpretation is very natural and intuitive (although it does not seem to have been used for parse trees or expression evaluation trees before). All trees in this thesis have their root on the right so that one reads the data- flow left-to-right (i.e« according to the conditioning imposed by normal English text). The "address" operand of "normal role" assignment operators has arbitrarily been made the lower branch into the node of the operator.

Note that not all trees "make sense": for example the grafted tree

reflects two STORED instructions from the accumulator to a tempo- rary result that is then never used:

STORED m STORED m

(where the two m-addresses are the same). That one can graft silly trees is a consequence of being able to write silly instruction- sequences: assuming no primary code-generator will output a silly...

IL sequence, silly tree-grafts are not a problem in practice.

The following four sub-sections introduce the varieties of tree of the example ILs, and in the process distinguish two kinds of tree for instruction sequences. The final two sub-sections in this section introduce the trees for the example PDP-8 instructions and

sequences. ,181

6.4.1 The gross tree for an IL sequence.

The dataflow analysis of an instruction sequence imposes a tree structure. Multiple use of an intermediate value is reflected as an inter-graft (also called a union). To introduce a cycle into the data-flow graph requires a loop in the instructions.

In section 6.4.4 a second tree for an instruction sequence is introduced, more detailed than the data-flow tree, and with at least the same topology. To make the distinction between these related trees, the data-flow tree will be called the GROSS TREE of the sequence, the more detailed one will be called the FINE

TREE of the sequence.

For the example instruction I:=J+K; the stack-IL sequences and their corresponding gross trees are shown below.

LG j

LF 0 LI k LG j ADD SF i

The two different instruction orderings look less related than their trees, which show more clearly that the difference is only a reversal of the operands of the ADD.

Very often the gross tree of an IL instruction sequence will be very similar to the checked decorated and edited parse-tree created ,182

during primary code-generation. In this sense the IL can be

thought of as just a linear tree-notation which facilitates the

communication between the primary and secondary code generators.

Intermediate IL-to-IL optimisations may change the IL that the

gross trees of the IL before and after optimisation differ: such

an optimisation can be thought of as a tree transformer.

Long instruction sequences may contain several sub-sequences,

each sub-sequence producing and consuming its own results and

not dependent on the intermediates of the other sequences. Such

a sequence generates not a single tree, but a sequence of trees.

Less obvious data dependencies may apply within this sequence

of trees: for example in the IL corresponding to an ALGOL 60

sequence of form:-

X(I) := ... ; ...... ,

... X(J) .. ;

In such a sequence, whether the X(J) depends on the X(I) in turn

depends on whether J may have the same value as the I (at the

appropriate times). If I and J can be shown un-equal, then there

is no dependency. If they can be shown equal, then there is

dependency. If nothing can be shown (7) then the worst assump-

tion must be made, i.e. that there is dependency. In the

(7) In the general case it is not possible to make such proofs. Even a "clever" compiler may be unable to prCve cases that seem trivial to a human. ,183

absence of dependency, the original sequencing can be ignored (8).

The gross tree of a register-IL sequence will also include nodes

for "A" (allocate) and "D" (deallocate) in a manner corresponding

to their use in the IL sequence. Figure 6-2 (on the next page)

repeats one of the register-IL sequences for I:=J+K; and presents

a matching gross tree. Because a "deallocate" instruction has no

result it must be a root: the sequence needs two deallocates and

so has two roots. This is an organisation difficult to manipulate

by computer. A less problematic organisation is to alternatively

allow "deallocate" nodes to have extra "in" branches matched by

identical "out" branches: the central portion of figure 6-2 would

then be replaced by the following.

ADD R,S D, S -(S)- »- c

This avoids the difficulty of multiple roots, creates the difficulty

that "deallocate" nodes can be of two types (like "assign" nodes),

and leaves us with the "ADD" node still having two "out" branches.

(8) This sequencing could be first represented in the gross subtree\ tree/dataflow tree form then by introducing a "then" second / kind of tree-node, whose subtree operands are the trees of then sub-sequences. In such third / trees the implications of subtree non-dependencies between specific subtrees is- more then complex: this is discussed last / in section 6.8. One cannot subtree simply introduce a commutative "either-order" tree-node. Figure G-2

A register-IL sequence for I:=J+K; and its gross tree

A R LG R, j R A S LF S,0 LI S,k ADD R,S D S A S SF R,i h D R ,185

As remarked in 6.2.2.2., we can avoid the "allocate" instructions, but we cannot easily avoid the "deallocate" instructions, Consequ- ently we can avoid the (harmless) allocate nodes in the tree, but not the problematic deallocate nodes. Thus the register-IL causes not only the problems in the primary code generator of generating the deallocate IL instructions, but also that the "deallocate" instruction must be "built in" to the secondary code generator beca- use of its special properties (9). This is a crucial fact for the argument of section 6.6.

6.4.2 The trees for temporary-stack IL instructions

They are simply the trees for the data-flow within the MDL defini- tions of the instructions. For each instruction there can be two possible trees, representing the "normal role" and "meta-syntactic role" of the assignment operator. However, the semantics of the "§" fake address guarantees that, if the address assigned-to is §, then the "normal role" is meaningless. Of the five instructions, the only one of which "§" is not the target address is SF, which has a non- simple target address with result that it can have a "normal role" only. Thus LG, LF, LI, ADD (producing intermediate results) have

"meta-syntactic role" trees only, SF (not producing an intermediate result) has a "normal role" tree only.

(9) The "assignment operator" also needs building in to the secondary code generator because of its special properties: but we shall see that it has useful properties that facilitate valuable heuristics. The "deallocation operator" does not appear to have useful properties in the IL (though it does in chosen machine instruction-sequences as an aid to register assignment). ,186

Instruction "Normal role" tree "Meta-syntactic role" tree

LG m m —— —( §)

LF m • + —- " —(§) f —

LI m ^ r — " —(§)

ADD —(§)

§ - SF m m.

f —

6.4.3 The trees for register-IL instructions

These instructions have a different problem in drawing their trees:

that there must be a new kind of branch denoting allocation without use. This is represented in the diagrams as a dotted line. In the

LG and LF instructions this is an extra (unused) in-branch to the

assignment operator: in the ADD and SF instructions it is an extra

(unused) out-branch from the assignment operator (10). As in the temporary-stack IL^LG LF LI produce results and so have "meta- syntactic" trees only; but SF has a "normal" role tree but also a result that can be optionally re-used and so also needs a "meta- syntactic" tree. ADD has a definite result and also an optional result, and so has two trees: their classification as meta-syntactic and doubly meta-syntactic str^ghes the classification somewhat.

(10) This rather different notation of allocation without use than in 6.4.1 has been chosen so that no false similarity influences the discussion of the "multi-root" versus "multi-branch" problem in 6.6. ,187

Instruction Normal tree Meta-syntactic tree

LG R,m

m LF R ,m f _ r

LI R,m a / x + — -Cn

...(r) r) SF R,m f—

Instruction Meta-syntactic tree Doubly-meta-syntactic tree

r) » ADD R, S

'-•(s) -( s)

Finally, there are the A (allocate) and D (deallocate) instructions

of the IL machine to be described. If they are needed as trees,

then the trees are as below. If allocate is in-built in the

instructions that use an allocated register but not its value ( i.e.

LG and LF) then the dotted lines in their trees are just deleted.

The classification of the trees of A and D is even more obscure than

for ADD, and has not been attempted here.

Instruction Tree Comment

A R • ...(r) Register comes from a dark and murky hole. D R (r)... t Register returns to its hole. ,188

6.4.4 The fine tree for an IL sequence

As fore-stated in 6.4.1, the fine tree for an IL sequence is

just the replacement in the gross tree of the instructions by

the trees of the instructions, with the in-branches and out-

branches of the replacements grafted in.

There is a subtlety in this grafting. All the out-branches

carry the address in which the result is put (even if it is a

"fake" address like "§"), whereas all the in-branches demand

a contents, signified by an indirection operator (e.g.

representing an in-branch). In the grafting we must prune out

the indirection operator. Using a temporary-stack example:

Before grafting: —§ § — (§) After grafting:

Essentially the "before grafting" diagram illustrates a result

put in an address §, and taking an address and indirecting to

get a result. The "after grafting" diagram shows a result

transmitted from one instruction to another, and indicates the

address in which it is transmitted.

The subtlety could be avoided by writing the assignment operator

always as " -b this could be justified by remarking that it

emphasises that the result of assignment is to put a value (eg ~§)

down at an address (eg §). One would then graft as follows:

Before grafting: ~r§) r§)- After grafting: r§)

The problems with this variant are the nuisance of writing more

signs, and that it seems initially un-intuitive. In the ,189

implementation, typographic restrictions forced a similar

solution, but in this chapter we will retain the "assignment

to an address" convention.

For a first example of the grafting process we will take one of

the stack-IL sequences for the I:=J+K; example. The diagram

below illustrates the gross tree as boxes, inside each box is

the tree of the instruction, -and in-branch indirection operators

are parenthesised to indicate that they must be pruned out.

Discarding the boxes and the pruned-out indirection operators

yeilds the following fine tree of the MDL of the I':=J+K; example.

0

f

(This tree suggests that the IL is slightly mis-designed in that instead of being able to use the "obvious" tree for ""f, a spurious

"0+" has been introduced to give the tree of "(0+ ~f).) Oddly this

is just what is needed on the real PDP-8: this is one of the reasons

that the PDP-8 had a cosmetic treatment in section 6.3.4: the ,190

instructions we are trying to generate do not just "cancel out"

the defect: it must be properly handled.)

Regarding each MDL operator as defining a production of a grammar

such that the fragment of tree around the operator is the parse-

tree corresponding to the production, then the whole tree is the

parse of the MDL instruction ("j) + "(k+"(0+"f)) (i+"f).

We return to this in Section 6.5.-

An important question is: how much does the design of the IL

affect the fine tree? Ideally we want the fine tree to be the

parse of the MDL sequence that is the translation of the original

high-level language fragment (11). To illustrate that differen-

ces can arise from the IL^the following diagram is the fine tree

for one of the register-IL sequences for I:=J+K; - its derivation

from the gross tree is given in figure 6-3 (on the next page).

\ \ 4 /

As can be seen from this fine tree, the only difference is the

information relating to allocation and deallocation. (Ultimately

this is because we have a sufficiently small MDL that, within

commutativity of +, its parse trees are unique: that is, regarding

each MDL operator as defining a production of a grammQr, that

(11) Or a MDL parse tree of the sequence. Figure 6-3

Derivation of a fine tree for I:=J+K; from the register-IL gross tree.

Note: all the material in parentheses is pruned out from the fine tree. ,192

grammar is unambiguous.) This is a highly desirable property of

an MDL, but may not always be possible. One impossible example

would be as follows. Consider a high-level language in which

there are "short integers" and "long integers". It is possible

to convert from short to long. There is "short addition" and

"long addition". If one wishes to add two short integers to

get a long result, there are two possible MDL sequences with

the following fine trees:

Operand 1 — widen-

/long add

Operand 2 widen•

and Operand 1.

short add widen

Operand 2-

With knowledge of the properties of the integer numbers, we see

that the first fine tree cannot lead to overflow, but the second

can. Therefore, somewhere, somehow, the first tree should be

picked. It should be picked in one of two ways: either because

that is what the language demands (and so that is what the IL

must reflect); or because the designer of the primary code-

generator and of the IL observed the problem, and designs

accordingly. This should not be a problem for the secondary

code-generator, its fabricator, or the designer of a fabricator.

Other possible MDL-parse ambiguities are easily found, including

some that look more serious from a syntax-analysis viewpoint.

None, however, seem to have significant semantic consequences

that cannot be resolved in the primary code-generator or original

language design. In addition, they usually ought to be resolved

prior to the IL, and so prior to the MDL, as in the above example. ,193

To conclude the proceeding four subsections, from a data-flow

analysis of an IL sequence we can derive a tree equivalent to

an MDL translation of the IL. To secondary-code-generate, we need to reverse this procedure to obtain machine code from the

MDL. We also need to initially produce the data-flow tree of

the IL. The next two subsections present the trees for the

PDP-8, and hence the PDP-8 to MDL parsettee translation that must be inverted.

6.4.5 The trees for the PDP-8 instructions

The trees for the PDP-8 instructions can be taken directly from the MDL definitions of the instructions in 6.3.4. If the result is left in the accumulator or in the memory address than it cannot be an effect demanded by IL sequences that do not "know" about these registers, so they can only be "m^ta-syntactic" trees.

Results left in "~m" can only be "normal". Results left in "m" can be meta-syntactic or normal. As in 6.3.4, "exhaustive" and

"factorised" definitions are given below.

Trees of the exhaustive definition

Instruction Normal tree Meta-syntactic tree

LOADD m m — A —(ac) LOADI m m — ~ — ~ —(ac) ac STORED m ac (m)

ac STOREI m ADDD m ac

ADD I m •+ —(ac) ac ,194

Trees of the factorised definition

Instruction Normal tree Meta-syntactic tree

D m m —(ma) I m m — (ma) LOAD ma — ~ — —( ac) ac STORE ma ma ADD ac) ac

By grafting together appropriate pairs from the factorised definition (i.e. the tree of an address-mode micro-instruction to an operand of a function micro-instruction) one obtains all

the trees of the exhaustive definition EXCEPT ONE: we cannot get the meta-syntactic tree of STORED, only the normal tree:

ac ac and removing \ the redundant -I material that / m -f-(ma)' / becomes D STORE which needs to be "straightened" into meta-syntactic form,

This kind of deficiency is even more noticable when dealing with instruction sets with complex and varied addressing modes. It turns out, however, that in the implementation a simple heuristic avoids the problem, and that a "normal" tree is not needed when a

"metasyntactic" tree is possible (see 6.7).

The definition of the machine instructions themselves is not sufficient, however. Recall that in the PDP-8 sequences for the

I:=J+K; sequence we had instructions such as:

LOADD =k and ADDD =i where the "=" signified the address of a memory location con- taining the following integer: i.e. what is often provided in ,195

an assembler as a "literal". At- present we do not have this

facility in the trees of the PDP-8. Consider how this must

look as a graft onto a normal instruction LOADD m or ADDD m:

k - -(m). yeilding k —(ac) ? LOADD _ (m)- •r + —(ac) yeilding + —(ac) ac ac

9 ADDD

It is clear that we need to introduce a "fake" instruction, with the

"metasyntactic" tree:

c —(m)

which takes an integer constant as an operand, and returns the

address of a memory location containing the constant. In this

thesis this fake instruction is called LIT, and its constant

operand is written after it. When a literal effect is required

in a PDP-8 instruction sequence, we can now write subsequences

such as:

LIT k LIT i and

LOADD ADDD

This is notationally unsatisfactory: the memory-address operand of the LOADD and ADDD in the examples is not explicitly given.

Because the whole LIT instruction returns this address, instead we will write the LIT instruction in lieu of the address:

LOADD LIT k and ADDD LIT i

This has the secondary advantage of being short. (12) (12) Another possibility is to attach the LIT to the address-mode microinstruction. Thus for the PDP-8 we would have two pre-grafted new address-mode microinstructions DLIT and ILIT. DLIT k and ILIT i LOAD ADD At first glance it would appear that ILIT is one of the same effect as D, and so is not needed. In fact, on the PDP-8 ILIT is wider than D, and so can be used to address the whole of the PDP-8 memory, whereas D can only be used to address part of the memory. This is one of the parts of the architecture of the PDP-8 that are not needed for the i:=J+K; example. ,196

6.4.6 The trees of PDP-8 instruction sequences.

Using the argument of 6.4.1 we can produce a gross tree for a sequence of PDP-8 instructions. Using the argument of 6.4.4 we can graft the trees of the instructions in, to obtain the fine tree:

f (Figure 6.4 on the next page gives the gross-to-fine derivation.)

f —

Within commutativity of addition, this is the same tree as for the two ILs, except that the spurious "0+" in those trees is not present. \ + ——( ac v—( ")——(rn)——(~ )——

—(m>—r) (oc) ( 7

LIT k LOADD ADD I f STORED t + —4ac)——(

ac) T

LOADD j ADD I t

Figure 6-4

Derivation of a fine i —4 m)—-( tree for I:=J+K; from

a PDP-8. gross tree. LIT i k + ——(ac)—(~)——(m V—(~

f — —(ac)—(

LOADD f ADDD STORED u STOREI u

CD <1 ,198

6.5 RELATIONSHIP TO SYMBOLIC EXECUTION

Symbolic execution of the MDL definitions of the instructions

of an IL sequence yields the MDL instruction(s) equivalent to

the sequence. This MDL can then be parsed to obtain the tree

(or sequence of trees) of the MDL.

Alternatively, data-flow analysis of the IL sequence to obtain

the gross tree(s), followed by substitution of the MDL trees of

the individual IL instructions, obtains the fine tree(s). These

trees also are MDL trees of the IL:

symbolic execution _ „,_T IL >• MDL

data flow tree- parse analysis walk

I tr ee— Gross ^ MDL-tree or tree , , Fine tree substitute

All the processes are constructive: the symbolic execution is

described below, the data-flow analysis in 6.6.2, the substitu-

tion in 6.4.4, the parse is the normal process for the syntactic

analysis of algebraic-style expressions, and the walk is the

inverse of the parse. If the IL is well-defined in MDL, then

for an IL sequence there will be at least one corresponding MDL

sequence. If the MDL is well-defined, it should allow of no

ambiguities beyond algebraic transformations. Thus we can talk

of "the MDL of an IL sequence", or "the fine tree of an IL sequence",

both within algebraic equivalence.

The method for the inverse process from the MDL (or fine tree)

to machine code, presented in 6.7.1, can be interpreted as a

parse using an ambiguous grammer £&Lh77j. For the insights promoted by the symbolic execution approach,

to establish the link between this method and /GLA777, and

also for the historical reason that investigations of symbolic

execution suggested the inverse method of 6.7.1, two demonstr-

tions of symbolic execution are given below for the I:=J+K;example

For the optimisation suggested in 6.7.5, the tree form of the

inversion method is much more suitable. The discussion in this

thesis and the implementation are therefore in terms of trees,

except for this section. This choice of trees discards some advan

tages relating to common sub-expressions suggested by ^GLA777 for

the textual form.

6.5.1 Two demonstrations

First, symbolic execution will be demonstrated to obtain the MDL

expression from one of the register-IL sequences for I:=J+K;. The

second demonstration is from a PDP-8 translation of I:=J+K; to MDL.

FROM REGISTER-IL

We start with nothing except the sequence of IL instructions to

come. We have no values, and no places in which to store values.

Instruction 1: A R. We obtain a place in which to store inter- mediate results: a register called R.

Instruction 2: LG R,j (MDL: "j r). We note that the current

contents of R are to be the value ""j". (We do not evaluate "j: we cannot, because we do not have a place of address j, or its contents.)

Instruction 3: A S. We obtain a register called S for'temporary

results. ,200

Instruction 4: LF S,0 (MDL: ~(0+~f) s). We note that the

contents of S are to be "~(0+~f)M.

Instruction 5: LI S,k (MDL: ^tk+^s) -»» s). We have an initial

value of '"s, so we substitute that initial value for As. The

instruction thus becomes "(k+~(0+~f)) s. We now note that

the contents of S have been changed to ~(k+~(0+~f)).

Instruction 6: ADD R,S (MDL: ~r+"s -»» r). Substituting the

noted values for "r and ~s gives the MDL: ~(k+~(0+~f)) r.

We note that the contents of R have been changed to

~j+~(k+~(0+~f)).

Instruction 7: D S. We discard the place S, and the result in it.

Instruction 8: SF R;i (MDL: ~r -> (i+~f)). We substitute

the noted value for ~r to obtain ~(k+~(0+~f)) (i+~f). We

note this as a side effect.

Instruction 9: D R. We discard R and its contents.

Summary: the effect of the whole sequence was the side-effect

of instruction 8. The parse of that is just the example fine

tree of 6.4.4 without the complications of the allocation and

deallocation.

Note that the process records effects relative to the values of

i» j» , ~jf ~k as they were at the start of the symbolic

evaluation, and assuming no unknown cross-talk (aliases etc.)

between them. The process thus produces a concise statement

of the effect of the whole sequence. ,201

Note that the effects of the A and D instructions cancel out: it would appear that they are more suited to symbolic evaluation

than to the gross tree/fine tree approach. Repeating the above

for the stack-IL has just the opposite effect: the "§" in the

MDL definitions of the instructions must be expanded into separate

MDL instructions that increase and decrease the value of a stack pointer by one. If the address of this pointer is s, its initial value is "s, and after a sequence of instructions we may now have

effects such as "s+l+l-l+ 1-1-1 —s cumulated, and corresponding horrors in situations that should cancel out as a sequence of side-effects. Even if the algebraic knowledge is applied that

"+1-1" and "-1+1" cancel out, the overall effect is still not as clear as for register-IL under the symbolic evaluation.

We now turn to the other demonstration.

FROM PDP-8 MACHINE CODE.

This symbolic evaluation is given in a tabular form, both for brevity and to make the principles more obvious. There is no allocation and de-allocation: it is assumed that AC, T and U are available before, and will be available after. (Hopefully the values carried over will not be used meaninglessly: this should not happen for a "correct" IL sequence - i.e. one produced by a correct primary code generator.)

The exhaustive definition is used. Note that a literal introduced by an "=" in the operand cancels the effect of an indirection in

the MDL definition of the instruction. ,202

INSTRUCTION MDL ~ac after ~t after ~u after side effect

LOADD =k k-^ac k 7 7 none

AA 7 7 ADD I f f+~ac f+k none

STORED t "ac-^t ditto 7 none

LOADD f ditto 7 none

i+~ac ADDD=i ditto 7 none -*ac

STORED u ~ac-Mi ditto ditto none

LOADD j ditto ditto none i

ADD I t ditto ditto none ->ac

STOREI u Aac->"u ditto 'ditto ditto

The side-effect has parse tree precisely the fine tree of 6.4.6.

Within commutativity of addition and the redundancy of "+0" it is

also precisely the side-effect of the register-IL symbolic evalua-

tion demonstration. ,203

6.6. RELATIVE MERITS OF THE STYLES OF IL

In subsection 6.6.3 the relative merits of register-IL and stack-IL for the purposes of tree-oriented manipulation are discussed, and a somewhat arbitrary choice made for the rest of chapter 6. Subsection 6.6.2 completes the groundwork needed for 6.6.3 by giving data-flow analysis methods for the two styles of IL. First however, there is a problem that so far has been avoided: in a register-IL, what is the best organisation of IL registers?

6.6.1 Organisation of registers in a register-IL

As well as being fictional temporary-result locations, the registers can be used as a means for the primary code-generator

(with its superior knowledge of the source program) to state which temporary results are "most important", as advice to the secondary code-generator (with its superior knowledge of the actual registers of the machine).

This advice could be in one of two forms. Either the IL registers could be ranked in "order of importance", so that if the secondary code generator has a certain number of spare actual registers it can allocate them to the "most important" IL registers. Alterna-i" tively, the primary code-generator couldTgenerate"for several "grades" of register. Grading of IL registers could be valuable if the target machine has corresponding grades of actual register, but introduces the problem of choice of-target into the secondary code-generator for each grade of IL register. Both ranking and grading of IL registers introduce new complexities into the primary code-generator.

Whether or not ranking is employed, there is also the possibility. ,204 of limiting the number of registers, or if grading is employed of limiting the number of registers of each grade. This requires register-allocation in the primary code generator, and allocation of non-registers in the IL (as places for registers to be dumped for medium term retrieval, while the IL registers are allocated for higher priority short-term purposes).

All of ranking, grading, and limiting introduce complications in both the primary and secondary code-generators. None of them make translation more likely to be possible.

The primary objective of this research was to produce something which works: none of ranking, grading, or limiting affect this

(except, perhaps, negatively).. A secondary objective is to discuss easy and obvious optimisations of the method: none of ranking, grading, or limiting are easy.

Therefore, although the problems of register allocation must be faced for the final target code (see 6.7), there is no clear advantage in introducing it into the IL as well. Finally, the unclear advantages are medium-term: but it is compiler-writers

folklore that?for optimisation^good short-term code is the highest priority. Therefore the unclear advantages are probably secondary as well.

In the following two subsections we assume an unranked, ungraded and unlimited set of registers. ,205

6.6.2 Data-flow analyses

For both the register and stack styles of IL there is the problem of the data-flow analyses to determine the gross tree of an IL sequence. Since this is a final factor that decides the choice between register and stack-IL, it is presented here. The analysis for stack-IL is presented first, it being the simpler.

6.6.2.1 -Data-flow analysis for stack-IL

Let us consider a node in the gross tree (i.e. the data-flow tree) corresponding to the ADD

IL instruction of 6.2.2.1. It needs two in-branches, and one out-branch. In 6.3.2 this is given the MDL

+ §• The one out-branch of the node corresponds to the

§ sign after the assignment operator: they both represent the result of the instruction. The other two § signs correspond to the two in-branches of x. + --(§) the node: they represent the operands. / (§) In general, an IL instruction is to be a root of gross trees if it does not have (only) § after the assignment operator (13), otherwise it has a single out-branch; also the adicity of the node of an IL instruction is just the number of other § signs in the MDL of the instruction.

Since we have a stack IL, the discipline of the stack forces operands to be taken off the stack in the reverse order that they were put onto the stack, and therefore the instruction sequence must be decom- posable into subsequences, each subsequence being a reverse polish

(13) Thus SF, with MDL -•» (m+"f), must correspond to a root. ,206

string in which each item (IL instruction) has an adicity as

determined from the MDL, and each root (IL instruction at the

end of a susequence) as determined from the MDL. A reverse

polish parse, such as that outlined in SIMULA in-figu-re .6.-5

(on the next page) therefore suffices as a data-flow analysis

program building the gross tree.

There is one clarification that this introduces into an MDL

definition of a stack IL. This is that if an IL instruction

has more than one operand (i.e. an adicity greater than one)

and is not commutative in the operands, then the operands must

be loaded onto stack in the reverse order of the matching §

signs in the MDL definition of the instruction. For example,

a MINUS stack-IL instruction, which requires the operand from

which to subtract first, and the operand to be subtracted second,

must be defined in MDL as -"§ +"§-•§, rather than "§-"§-*•§.

(The latter could be a definition of a "reverse minus" IL instruc-

tion if the overloading of monadic and dyadic "-" operators in

the MDL can be resolved.)

There is also a related restriction on the design of the IL.

Suppose we wanted an instruction that took two operands, the

first being an address to which to store the value of the second

operand. Let us call it "store indirect", with mnemonic SI. It

could be defined by the MDLA§ "§. The restriction is that

because of the non-commutativity of the assignment operator in

MDL, the § sign for the address must be given after the § sign

for the value to be stored: therefore the address must have been ,207

Figure 6-5

SIMULA outline of reverse polish parsing to obtain the gross tree of a stack-IL sequence by data- flow analysis.

begin

class Node(IL,Adicity); text IL; integer Adicity; begin ref(Node)array SubNode[l:AdicityJ; end of Node; comment For simplicity, it is assumed that Simula permits declarations of null arrays with bounds (1:0);

ref(Node)This Node; text This_IL;

integer procedure Adicity(IL); text IL; These return boolean procedure Is_Root(IL); text IL; ... appropriate text i. procedure Input_an_IL ; values.

ref(Node)array Stackfl:10003; integer Top; comment Top-of-stack pointer;

comment ++++ INITIALISE ++++ ;

Top:=1; This_IL: »Input_an_IL; StackfTopJ :—new Node(This_IL,Adicity(This_IL ));

comment if StackpTopJ.Adicity / 0, then the input IL sequence starts with a root - i.e. is probably bad. Not checked in this outline;

comment ++++ LOOP ON SUBSEQUENT IL INSTRUCTIONS ++++ building a node for each IL (the node includes pointers to the subnodes). Nodes which do not yet have a superior are kept in "Stack";

while not Is_Root(This_IL) do begin integer I;

comment ++++ INPUT AN IL: BUILD ITS NODE ++++; This_IL:=Input_an_IL; This Node:=new Node(This_IL,Adicity(This_IL));

comment ++++ GET NODE'S SUBNODES FROM STACK ++++; for I:=This_Node.Adicity step -1 until 1 do begin This_Node.SubNode[lJ :-StackfTopJ; Top-: =Top-l; end of node building;

comment ++++ PUT THE NEW NODE ON STACK ++++; Top:=Top+l;

Stack[TopJ :-This_Node;

end of loop, end of IL sequence; end with Top=l, StackfTop] pointing to the root. (If Top/1, the input IL sequence was ill-formed) ; ,208

put on stack before the value. "Reverse store indirect" cannot

be defined. (14)

The most important thing about this stack-IL data-flow analysis

is that the "apparent disadvantage" feared for temporary-stack IL

(when it was introduced in 6.2.2.1) has proved to be a mirage: using

this method there is no surplus work to be done to keep track of

the top-of-stack pointer.

6.6.2.2 Data-flow analysis fQf register-IL

The next three pages give a program outline for analysing a regis-

ter IL sequence into a multi-rooted tree, the "result" of the

program being a pointer to a linked-list of the roots. A program

outline to produce the variant of the"fcfee in which multiple roots

are avoided by having multiple branches is not presented here: the

outline is essentially the same, but involves much more detailed

administration - a typed draft was seven pages long, without comments.

The multi-root data-flow analysis is broadly similar to the stack-IL

method, but with a different analysis of the numbers of in- and

out-branches, and instead of a stack of pointers to nodes not yet

used, a heap of pointers to nodes not .yet completely used.

In the stack-IL case, for each node it sufficed to know the number

of in-branches required for the node of an IL instruction. For

(14) More accurately , reverse store indirect could be defined with a reverse MDL assignment operator: 4- . MDL restrictions can lead to restrictions on the IL that can be defined. ,209

Figure 6-6

SIMULA outline of reverse polish parsing to obtain the rnulti-rooted gross tree of a register-IL sequence by data-flow analysis. begin

class Node(IL.Registers); text IL; integer Registers; begin ref(Node)SubNodep.:Registers]; arrstj ref(Node)Next; comment For lists of Nodes;

procedure Prepend_on(List); name List; ref(Node)List; comment prepend this Node onto the list pointed to by the parameter; begin Next:—List; List:-this Node; end Prepend on;

integer Unused_results;

procedure Used_a_result; comment Called when a result has been used: decrements the count of unused results by one, and if the count is then zero, all the results of the node have been used, so it is removed from the list of nodes with unused results; begin Unused_results:=Unused_results-1; if Unused_results = 0 then begin if Unused_Nodes == this Node then Unused Nodes:—this Node.Next else begin ref(Node)X; X:-Unused_Nodes; while X.Next=/=this Node do X:-X.Next; comment should now have X such that X.Next==thi_s Node; X.Next:-X.Next.Next; end end end Used a result;

comment Initialise the node by setting the number of unused results to start equc^l to the number of registers in the IL instruction represented by the node;

Unused_results := Registers;

end of the class Node;

ref(Node) Root Nodes; comment Pointer to the list of root- nodes of the tree, updated as the root-nodes are constructed;

ref(Node) Unused_Nodes; comment Pointer to the list of nodes which currently have unused results. Is none only before the first IL of a sequence, and after the last.- comment Figure 6-6 continued; When a new non-deallocate node (i.e. a node with results) is built, it is prepended onto this list. When a nodes last result is used, the node is de-linked from the list;

integer procedure Registers(IL) ; text IL ; ... text procedure Register(I,IL); Integer I; text ) These boolean procedure Is_Allocate(IL); text IL; .. ) return boolean procedure Is_Deallocate(IL); text IL; ) appropriate text procedure Input an IL ; • ... ) values.

ref(Node)procedure Node_of(This_register); text This_register; comment Return the most recent node, one of whose registers is the parameter. This is the node that gave the register its current value, Because the "unused" list is only ever prepended to, "most recent" means "earliest in the list". ; begin ref(Node)X; integer I; X:-Unused_Nodes ; while true do begin comment Loop until "goto return" out; for I := 1 step I until X.Registers do begin if Register(I)of:(X.IL)=This^Register then begin Node_of :- X; goto Return; end if; end for; end whi1e; Return: comment ****** ; end Node of(This register); comment the detailed actions to be performed for each of "allocate" ILs, "deallocate" ILs, and other (i.e. normal) ILs are out-of-lined to make for clarity of the main program. ; procedure Allocate_IL(IL); text IL; begin new Node(IL,0) .])repend_on(Unused_Nodes) ; Unused_Nodes .Unused__results: =1 ; comment This node cannot take advantage of the default initialisation for the count of unused results, because for allocate nodes there should be 1 unused register for no in-branches. ; end of Allocate_IL; procedure Deallocate_IL(IL); text IL; begin new Node(IL,1).Prepend_on(Root_Nodes) ; Root_Nodes.Sub_NodeCl3 Node_of(Register(1)of: (IL)) ; Root_Nodes.SubNodeLll.Used_a_result; comment NB: the 'unused results' fields of deallocate nodes are never used; end of Deallocate IL; ,211

comment Figure 6-6 continued;

procedure Normal_IL(IL); text IL; begin integer I; new Node(IL,Registers(IL)).Prepend_on(Unused_Nodes); for I:=l step 1 until Unused_Nodes^. Registers do begin Unused_Nodes.SubNode[ll :- Node_of(Register(I)of:(IL)); Unused_Nodes.SubNode|lJ.Used_a_result; end of Normal IL;

comment

comment INITIALISE;

text This_IL;

Allocate_IL(Input_an_IL); •

comment If the input was not an allocate instruction, then the input IL sequence is bad;

comment LOOP ON SUBSEQUENT IL INSTRUCTIONS, BUILDING A NODE FOR EACH (INCLUDING POINTERS TO ITS SUBNODES). KEEP NODES OF WHICH NOT ALL RESULTS HAVE YET BEEN USED IN A HEAP. KEEP A LIST OF ALL ROOT NODES. ;

while Unused_Nodes =/= none do begin This_IL := Input_an_3L; if Is_Allocate(This_IL) then Allocate_IL(This_IL) else if Is_Deallocate(This_IL) then Deallocate_IL(This_IL) else Normal_IL(This_IL); end while;

comment We now have a linked list of root nodes; end of program; ,212

the multi-rooted register-IL case, there are three subcases:

* For an allocate instruction, there are no in-branches, and

one named out-branch;

* For a deallocate instruction, there is one named in-branch,

no out-branches, and the node must be a root; and

* For each other instruction, there are one or more named

in-branches, and for each in-branch there is an out-branch

with the same name. (It is assumed that no two in-branches

to a node will have the same name: this could be an irritating

restriction on the IL design, but the alternative is further

complication of the data-flow analysis program.)

If there is no use of common-subexpressions in the IL sequence, then

the node that would be the root in the corresponding stack-IL tree will

have all of its out-branches being in-branches to deallocate nodes,

and each other non-allocate non-deallocate node will have all except

one of its out-branches being in-branches to deallocate nodes. To

obtain the corresponding stack-IL tree it would be necessary to prune

away all allocate and de-allocate nodes, and all branches to them:

every other node would have to be inspected to see if it has an

out-branch to an un-pruned node, and the one node that has no such out-branches is to be the root. (Alternatively, if the primary code- generator chooses the last de-allocate in a sequence to be one for which the in-branch is from the stack-IL root, that could be used to identify the desired root.)

In the data-flow analysis outline program, it is assumed that there are "allocate" instructions in the IL sequence. Re-drafting the outline to not need allocate instructions required an extra half-page

(without comments) to keep track of what registers were currently ,213 allocated, so that when a register was referenced it was possible to know whether it was new, and so to be allocated. This was longer, and much more complex, than adding the generation of allocate inst- ructions to each of three drafted primary code-generators. It is also less machine-efficient (implemented in SIMULA, by a factor of just over 2.6).

The outline was also re-drafted to automatically prune out in-branches from allocate nodes, and so prune out the allocate nodes also. The easiest method was to have the in-branch pointer in the node be null, and to accept the waste of space: this has the disadvantage that if the associated name of the (now null) in-branch were later needed, it would not be available.

6.6.3 Register-IL vs. Stack-IL

It is now clear that the register-IL demands a program for data-flow analysis over twice as long as for stack-IL, which is much slower, yields a tree of which the "true" (non-deallocate) root is hard to identify, and which contains spurious information.

The only advantage apparent for register-IL is that it makes common subexpressions representable as internally cross-grafted trees: to simplify later use of this, the secondary code-generator must mark a node as such when it cross-grafts it.

On the other hand, common-subexpressions can be achieved for a stack-

IL by having the primary code-generator instead generate a separate prec n.ding sequence that stores the common sub-expression in a named place. When the main sequence needs the value of the common sub-expression result,it takes it from the named place. (On some ,214

target machines, this will lessen the optimisation advantage of

common sub-expressions.) Both for register-IL and stack-IL, common

sub-expression detection needs code in the primary code-generator:

for register-IL the more complex tree and data-flow analysis are

powerful enough that the only further complication is noting and

using the cross-grafts: for stack-IL there is the weakening of the

optimisation and the introduction of the "named places" (which are

much like registers, except that they are used between IL sequences

representing high-level instructions, not internally inside the

sequences).

For this pilot study, considering the above, it seems adaquate to

demand a stack-IL, with perhaps a provision for "named places" (15).

Register-IL is therefore ignored in the rest of this thesis. For a

work based on textual representations the criteria would seem to be

reversed.

(15) Initial studies suggest that the allocation strategy for "named places" (i.e. whether to optimise them away; or if not, when to allocate them to what class of target memory) is so dependant on the high-level language, the target machine, and on other implementor decisions (such as the implementation of procedure calls and parameters) that it must be an implementor decision. This would apply to the results of register-IL tree cross-grafts also, in which case the implementor decision has a major effect inside the sequence, causing further complications: this is another reason to avoid register-IL. ,215

6.7 AN INSTRUCTION CHOICE METHOD.

This section presents a technique for instruction choice which incorporates register allocation. Just as. a translation from IL to MDL was presented in 6.4.1 to 6.4.4, in terms of a transformation from a gross tree to fine tree, this instruction choice method will be presented in terms of a converse translation from fine tree-to- gross tree (6.7.1 1 . In general there can be many machine-code gross trees matching a fine tree (i.e. translating the original IL sequence and its MDL translation). Some of these alternatives need STORE-LOAD decompositions of the tree to be introduced to conform to the numbers of target registers available (6.7.2) : then knowing the cost of the instructions in alternative subtrees, it is possible to make choices between them (6.7.3) and so derive a set of equal-cost optimal gross trees corresponding to the original IL sequence.

A pilot implementation of this method of choosing gross trees, amending them for allocation, and then choosing between them, is the subject of chapter 7.

The main defect of the method of producing the machine code gross tree from the MDL fine tree is that often too many alternatives are produced. Many of the consequent choices arise from MDL tree compo- nents which arise directly from individual IL instruction trees, or combinations of IL instruction trees. Rather than make such choices many times while running the fabricated secondary code-generators, it is preferable to make them once only at fabrication of the secon- dary code-generator. Subsection 6.7.4 presents a method of doing this by transforming directly from the IL gross tree to the machine code gr^oss tree, the MDL trees being considered only at fabrication time, if possible. The consequent secondary code-generator will be made ,216

more efficient in three ways: spurious choices between equivalents

will be avoided; of alternatives that show a clear "best" in a region

IOCEQ. to one IL^only that best will ever be considered; and only one

tree-transformation is needed, not two. After this gross-to-gross

transform, the allocate stage and :_'ree-choice stage would again be

needed, but only in restricted sections of the target gross tree.

Having obtained a gross tree of the machine code instructions needed,

those instructions must be placed in order. This is dealt with in

section 6.8.

6.7.1 Transforming an MDL fine tree to a machine code gross tree.

There are two distinct aspects of this transformation: handling the

leaves, and handling the internal nodes.

At a leaf of the MDL fine tree, there will be either a constant, or

an address of a memory class of the IL. In this thesis it is assumed

that translating from either of these to a leaf acceptable in the

machine code is the responsibility of the implementor. He can achieve

this by introducing into the machine description pseudo-instructions

of the machine that assign an IL leaf class to a machine code leaf class.

In the example of translating I:=J+K; to PDP-8 machine code, there is

a need for two implementor-introduced pseudo-PDP-8 instructions:

LIT c c m (Hold a literal in a memory location, then use the address of the literal. LIT has already been suggested in 6.4.5 for other reasons.)

MEM x x — — m (An indirection of a constant is a memory address, of which the contents are to be used.)

Instruction MDL tree ,217

These instructions and their trees would be defined by giving them the MDL descriptions c—m and —m. Some syntactic device is needed to ensure that between them they match all integer leaves of the IL fine tree. The Mm" results then match the ~m operands of the trees of the PDP-8 machine instructions.

The method is presented by example in sub-section 6.7.1.1, and sum- marised in sub-section 6.7.1.2. Another example in 6.7.1.3 illust- rates the power of the method, and organisational techniques sugges- ted in 6.7.1.2. The method is related to that of Wasilew/WAS727 and

Cattells derivative of Wasilews method /CAT787, but differs in its treatment of the assignment operator to yiild greater efficiency:

The Wasilew and Cattell methods.are reviewed in Appendix A.

6.7.1.1. A demonstration of the method

The method is demonstrated fCCthe fine tree of the stack-IL trans- lation of the I:=J+K; example, as developed in 6.4.4, and the exhaus- tive form.of the MDL for the PDP, as developed in 6.3.4.

The method works from the root to the leaves: working from leaves to root is Wasilews method, and sacrifices a use of the special properties of the assignment operator.

The root zone of the example tree is:

+

+ ,218

In the MDL descriptions of the PDP-L8 there are none of the form: (...+...) -> (...+...) so on at least one side of the assignment operator the addition must be due to another PDP-8 instruction than that to which the assignment is due. If only one side need be due to another inst- ruction, then that side of the instruction including the assignment must have an indirection operator ready for grafting: so we are now looking for PDP-8 MDL of the form:

(...+...)->"(...) OR

" (...) (...+...)

There are none of either of these forms, so we must search for a

PDP-8 MDL with indirections on both sides, ready for grafting on to

two subtrees which do additions: that is, for an MDL of form:

" (...) " (...)

There is only one such instruction:

STOREI m "ac "m so that must be the translation of the root of the fine tree. The relationship between the gross tree and the fine tree in the root

zone must therefore be:

tree for value to be assigned .... (

tree for address to which to assign ... (

The process must then recurse on each of the subtrees, but now with target locations for results.

The root zone of the "value to be assigned" subtree is now: We search for instructions that do additions and leave the result in the PDP-8 accumulator, and find four:

ADDD m "m + "ac ac

ADDD m "ac + "m ac

ADDI m ""m + "ac ac

ADDI m "ac + ""m ac

The commutative reverses of the plus operators are needed as a second instruction definition, leading to the same PDP-8 mnemonic^ in order to avoid having to give commutative operators a special status. Having found the four possiblities, for each the method

o » attempts to build a subtree: in this case all four y iG.ld succesijui* subtrees, so the four alternative PDP-8 gross trees are noted as possibles for the "value to be assigned" subtree.

For this demonstration, we will follow the construction of the last of the four ("ac+""m—*>ac): the relationship between the gross tree and the fine will be:

j — (ac) T)

—(ac) . . .STOREI y mj [ ) ADDI I Taking the top branch first, the search is for an indirection to give result in the accumulator, that is for an MDL pafctern of form

"... •> ac

There is only one:

LOADD m " m -V ac

So the gross tree of the branch must be of form: I 1 j — ~ —(m) — (")--(ac) . . . | LOADD | A similar search to get j indirect to m yeilds the MEM pseudo- ,220

instruction, and the gross/fine relationship of the branch becomes:

j | j — " —(m) — ( " V —(ac) . ..

MEM | LOADD |

Returning to the other branch of the ADDI, we need an addition that

leaves its results in m. There is no such operation in the PDP-8,

i.e. the MDL descriptions do not include any of form:

... + ... —b m

There are however the four forms:

... + ... ac

so another search is made to see if there is an instruction to

transfer a value from ac to m, i.e. a form:

"ac m

There is exactly one, the LOADD instruction: so the branch to the

ADDI is to be of the gross/fine relationship:

\ r , + -b (ac)-»-(")-V(m) . . . ADDI / I L0ADD I

The fine tree of the addition is.now to be:

0V ^ + —(ac)

f

Trying to maximise the match leads us to pick ADDI again;

"ac + ""m — ac

in which the "ac will graft to the k, loosing its indirection, and

the m to the "(...+...), loosing an indirection: r ,221

The only way to get an integer constant k to any other place is LIT k, with MDL k->m, so what is needed is an intermediate instruction that gets m to ac: again this is LOADD only, so the integer k is got to ac by a gross/fine tree relationship:

I I k —(m) —D —(ac)... LIT | LQADD |

The remaining branch of the second ADDI, and the "address" branch of the STOREI are Converted to gross trees by continuing to use the same searching method: the above description contains all the necessary kinds of search step-.

Having to introduce the names of memory-types at the junctions of the gross tree nodes is unavoidable: it is how one gross tree node determines the "target memory type" of its descendants (i.e. where it needs its operand). If there are allocation or assignment problems in a memory type with more than one location (e.g. m but not ac) then the memory type names need to be decorated later with identification of the location of the class chosen at that branch between gross tree nodes. The following diagram shows one of the alternative resulting gross trees of the I:=J+K; example via stack-IL to PDP-8 machine code; figure 6-7 (on the next page) gives the fine/gross relationship and a resulting PDP-8 code sequence.

LOADD- (ac) LIT k-(m)—LOADD. (ac) (m ) LIT 0 T- \ ^ADDI-(ac)-STORED^ (m ) / .v (m) ADDD-(ac)-STORED

LOADD f LIT U (ac) (m) \ADDD-(ac)-STORED (ac) LOADD f" LOADD j •r

ac

+ ——(ac)1 (")——(m ) • •(")—— 'k

/'ADDI v ADDI t / , .. + —(ac) (") (m ) (") f — - -

C\J OJ Ol ,223

The final PDP-8 sequence needs to name tree locations in main memory: how this is done is one subject of the next subsection

(6.7.2.). For clarity, the names have been assumed to be t, u, and v in the code sequence given with figure 6-7, and as m^, and m^ in the diagrams above and in figure 6-7. (There are also anonymous m-branches in the diagrams, representing the "results" of the LIT pseudo-instruction.) The gross tree without these namings is the input to the method given in subsection 6.7.2, which checks the tree to ensure that no memory-class is over- allocated ( modifying the tree, where needed, to prevent over-allocation) and introduces information into the tree to v. name locations to a later assignment pass (which then?only needs to do a linear scan of the produced machine instruction sequence).

Having to introduce indirections in the fine tree for the single purpose of grafting them out again is a nuisance, with no advantages at the practical level. (16) In chapter 7 a slight modification to the MDL notation will elide out the need for these indirections.

One major point to note from the example above is that because of the use of the IL instruction "Load-in-frame" with a zero operand, and the resulting "f+0" in the fine tree, three spurious instructions are generated to do an addition of zero. Some means is needed to prevent this*.

(16) However, these indirections have the theoretical and pedagogic advantage of saying exactly what is meant. ,224

Obtained Desired

MDL +"(f+0)

Fine 0 + + tree + f f

PDP-8 LOADD f instruction ADDD LIT 0 ADD I f sequence STORED v

ADD I v

There are several possibilities to avoid this kind of phenomenon, depending on the time at which they are applied. The following paragraphs deal with them in reverse order of time.

After the machine instruction sequence has been formed, a peephole optimiser could "spot" the redundancy of the ADD!) LIT 0, and so eliminate it: assuming it then backs up to discover that v is set to be a copy of f (so that the reference to v can be modified into

a reference f); then a "deadness" check in the optimiser would discover that there is never any reference to v. Consequently the STORED v can be eliminated, in which case the LOADD f loads a value that is not used and so also may be eliminated. This demands an unusually clever peephole optimiser, and then works it very hard. It is also possibly too late to apply the optimisation, in that at the earlier stage of choosing which one of many alternative machine-oriented gross trees to select, a redundancy such as the consequences of this"

"+0" might cause a tree to be not chosen, although after peephole optimisation it would in fact have been the cheapest of the alter- natives .

Thus the latest reasonable possibility would be after the gross tree alternatives have been built, but not yet selected. A tree-walk ,225

could be introduced to "spot" and remove tree fragments of form:

and delete them. This would often leave fragments of form:

LOADD f STORED v —(m )... v which would need to be amended to:

(m )...

There would possibly be other kinds of subtree to transform.

Especially if some of the branches include alternatives^ the mechanics of the tree-editing can become very complex.

A corresponding edit at fine-tree time would simply edit out fine tree fragments of form:

0

+ ...

The disadvantage with this is that optimisations of the method given in subsection 6.7.4 depend on avoiding the construction of the fine tree (at secondary code-generate time) as much as possible, so that the fragment to edit out would never exist.

(At the time of fabricating the secondary code-generator, proformae for fine trees will be built, and the proforma for the "load-in-frame" would include the fragment:

integer (That is, the proforma \ + .... would be more general ^ than "=0;'.)

The best that could be done would be to realise the possiblity of a

"+0", and fabricate a secondary code generator that checked for "LF 0" as a special case, treating it differently than "LF" of any other ,226

integer. This is complex at fabrication time, and introduces

inefficiait special-case checks at secondary code-generate time.)

All the above assumes that some program is going to have the

abstract knowledge that M+0M is redundant, and the ability to

know when to apply that knowledge. It is relatively easy (but

tedious) to write such programs, but very difficult to have them

apply the check only where meaningful, in order to avoid excessive

inefficiencies. The best solution to the problem is to avoid it:

the designer of the IL should provide "special case" IL instructions

that are likely to facilitate the generation of good code. lit

would have been better for the IL designer to have provided a

specific IL instruction, perhaps called "Load-in-frame-zero" (LFZ),

with MDL -> §. The primary code-generator would never generate

LF 0 for an access to an outer frame, it would generate LFZ. The

relatively intelligent knowledge that "+0" is redundant would not be

built into a program, but more appropriately would be implicit

in the decisions of the designer of the IL and of the implementor of

the primary code generator. No program complications arise.

In the "I:=J+K; examples of the rest of this thesis, it will be

assumed that LFZ is generated instead of LF 0: this simplifies

the gross tree into the form given Qi\ . " >;:_-_* =

6.7.1.2 Summary of the method

This sub-section summarises the kinds of steps used in the

demonstration given in the last sub-section, ignoring avoidable

complications such as tree-edits for optimisation purposes. ,227

The most complex kind of step is that when the method has recursed to try to discover a gross tree (or set of alternative gross trees) matching a subtree of which the principal operator is diadic:

source 1 desired destination source 2

The recursion desires that the result of the found gross tree be left in a particular class of memory, the "desired (memory class of) destination": that'is, the class of memory in which is to be left the value resulting from executing the instructions generated in consequence of the subtree.

The following paragraphs describe this kind of step in general terms, but without the implementation details given in chapter 7.

After that the phases of the step are commented upon, and finally the other kinds of step described by their relationship to it.

The next sub-section re-demonstrates the method in terms of the

I:=J+K; stack-IL fine tree for the PDP-10 computer: a better example of the general case of the method, but not showing the problems due to the PDP-8.

PHASE 1. Find all the MDL trees describing target machine instructions each such that the out-branch of the tree is of the desired class of destination and such that the principal operator-of the description tree is that of the fine tree node being matched.

PHASE 2.For each of the instruction-description trees found:

a) Match the left branch of the instruction description

tree with the "source 1" subtree of the fine tree,

checking for equivalent topology; and ,228

b) similarly match the right branch of the instruction

description tree with the "source 2" subtree of the tree.

The matching of the branches proceeds as follows. If the remnant

of the instruction-description tree is an indirection of a leaf,

the leaf must be a class of destination in which is to be left

the result of obeying instructions due to the remaining fine subtree

in such cases, note the class of the leaf and a pointer to the

subtree. If a leaf of the fine tree is reached without simulta^

neously reaching an indirection of a leaf of the instruction

description tree, the match of the instruction with the fine tree

has failed, Jf operators are reached in both trees (i.e. neither

a leaf) then if the operators are different, the match of the

instruction with the fine tree has failed; if they are the same

operator, for a monadic operator recurse the matching process on

the in-branches of the trees, fcr a diadic operator recurse first

for the left in-branches of the subtrees, then for the right in-

branches. If at any point in this recursive matching a failure is

encountered, the recursive matching is aborted, and the instruction

is not considered further. If the recursive matching completes

without failure, we have a candidate instruction with a list of

subtrees themselves to be succesfully generated, each with a

desired memory class for the destination of the result to be

obtained by executing instructions generated for the subtree.

Thus an instruction description MDL tree can be regarded, as a

proforma to be matched by the "same shaped" part of the fine tree

nearest the root (or after recursion, by the fine subtree). Phase

2 determines for each candidate instruction that the proforma it

sets fails or survives, and if it survives generates a list of ,229

required destination classes, each for a corresponding

subtree*

PHASE 3. For each surviving candidate instruction, recurse for each of its leaves as a desired class of memory for the correspond- ing subtree, as discovered by phase 2. If all the leaves return successful results, generate a gross tree node for the instruction, with the results of the recursions grafted on as the in-branches to the gross tree node, and succesfully recordia pointer to the • new node. If any recursion fails, do not generate a node, and fail to record a result for this instruction.

PHASE 4. Take the set of pointers to gross trees representing instructions succesful at phase 3, form a list or array of those pointers, and return a pointer to that as the result for the fine tree (i.e. a pointer to all the alternative gross trees matching the fine tree).

PHASE 5. If phases 1 to 4 failed to produce any gross trees, go through every other memory class, to determine if gross trees exist for which the results can be obtained to another intermediate memory class, and then transferred to the desired class, so that effectively the fine tree had been modified to be of the form:

source 1 , intermediate desired operator —— . . . —. destination destination source 2

This can be achieved in three sub-phases:

PHASE 5A. For each other memory class, determine if there is a trpnsfer from it to the desired class: i.e. an MDL description:

"other -V desired ,230

If not, determine if there are two other classes such that there

are MDL descriptions:

"other 1 -V other 2

"other 2 desired

so that a two-step gross/fine tree is possible for a two-instruction

transfer sequence:

If not, try for a three-step transfer sequence, then a four-step

sequence, and so on, until either a (shortest) transfer sequence

is found or not even a transfer sequence as long as the classes of

memory provides a transfer .

PHASE 5B. For each memory class for which a transfer sequence is

found, try phases 1 to 3 for a set of gross trees for the fine tree

to that memory class as desired destination.

PHASE 5C. For each memory class for which a transfer sequence and

a gross tree is found, graft gross tree nodes for the instructions

of the transfer sequence onto the root of the gross tree, build a

list or array of the pointers to the resulting "transferred" gross

trees, and return that as the pointer to the gross trees for the fine

tree.

PHASE 6. If neither phase 4 nor 5 produces a gross tree, then the

fine tree cannot be code-generated for the target machine: return an

indication that the code-generation has failed. This would normally

only occur in recursions of the method: were it to occur at an outer-

most call it would indicate some error by the implementor - either the

IL sequence is un-generatable (suggesting an error in the primary

code-generator), or an IL or machine instruction has been mis-specified. ,231

The following paragraphs deal with the remaining forms of node in the fine tree.

If the node of the fine tree is a monadic MDL operator, the method of phases 1 to 6 is identical, except that only one recursive match is called for in phase 2.

If the node of the fine tree is a leaf, then instead of phases 1 to 4, the leaf should be of a memory class of the target machine, or by phase

5 there should be a transfer from its IL memory-class to a target- machine memory class (e.g. the LIT or MEM pseudo-instructions suggest- ed in 6.7.1 as extensions to the PDP-8 instruction set.)

If the node of the fine tree is the root of the fine tree, then it does not have a "desired destination class of memory", the operator at the node being either assignment or some result-less operator which is an extension of the MDL . If it is an assignment, there are two cases: where the second operand of the assignment is an integer or other atomic form of address; or where it is compound.

If the recipient of the assignment is source atomic, the tree can be straightened recipient out by taking the memory class of the . . . straightens to ... recipient as the desired destination class of the result. source •• • '(recipient)

If the recipient is not atomic, then phase 1 should select the instructions of the target machine, whose description trees have assignment as root operator} and which have non-atomic recipients. ,232

If the root node of the fine tree is not an assignment operator,

then it must be matched by a machine instruction with the same

operator at the root node of its description tree. This is a

simple way to deal with operators such as procedure call and

return, goto and label: they would be introduced into the MDL

as equivalents (or near-equivalents) of the corresponding IL

instructions, and would have suitable target machine instructions

or instruction-sequences given as having description trees rooted

on them. They should be simple implementor decisions, not formi-

dable Artificial Intelligence problems.

The remaining paragraphs of this sub-section make some observations

about the method,and concerning the support it needs from other

parts of the compiling system.

If a gross tree is not found for a fine tree or subtree, then there

is no machine code sequence matching the MDL of the tree: this is

so because all possible gross trees have been tried from the root

until they failed.

If at least one gross tree is found by phases 1 to 4 without any

transfers at the root^ then it may be that omitting phase 5 (i.e. not

generate further alternatives)causes a more optimal gross tree

to be missed. Consider for example a fine subtree with root

operator "£" and two operands both of memory class x, to have

desired result of class y - that is the fine tree: ,233

-and for a target machine with instructions of the following MDL descriptions:

XTOY x y XPOUND 'x £ x YPOUND 'y £ "y y

Phases 1 to 4 will produce the gross/fine tree: and code sequence:

XTOY | XTOY I XTOY y)... YPOUND

[ XTOY |> YPOUN D Phase 5 as well would find:

(x XPOUND XTOY XPOUND I XTOY I (x k. Clearly, other factors being ignored, the latter sequence is better.

"Other factors" can influence the choice between the above possibi- lities either way: if x were "short integer", XPOUND were "short add", y "long integer", and YPOUND "Ion/! add", then the first sequence would avoid overflow, and so would often be preferable; but if x were "full integer" and y "short integer" (with appropriate XPOUND and YPOUND additions) then the sequences would have the same the overflow- charac- teristics. The decision to use phase 5 only as a fall back (when phases 1 to 4 had failed) was based on a study of several real machi- nes- The study suggested that,if synthetic counter-example machines were ignored,that for fine trees which yeild gross trees both with and without phase 5, the gross trees yeilded without phase 5 include fewer gross tree instruction nodes. Practical experience of more machines is needed to validate this observation.

The various phases of the method need to refer to different subsets of the MDL descriptions of the target machine instructions. ,234

Instructions with MDL of the form:

"^memory class another memory class" are needed only at a leaf of the fine tree if the source address is of an IL memory class, or only at phase 5A if the source address is of a targetmachine memory class. In either case, the instruction is requested by its from/to combination of memory classes. Fabri- cation-time generation of all the transfer sequences from one memory class to another, and picking the shortest sequence for any from/to combination, and recording that as if it were a single instruction, will make phase 5A much more efficient. Instructions of the form:

"^memory class same memory class" do not yet have a use: one will be found in 6.8. Other instructions of form:

"operator ... memory class" or of form :

"... operator ... memory class" are selected only by phase 1, and are requested by the operator/memory class combination. The only remaining instructions are of form:

"... non-atomic address" and are only used at the root of a fine tree corresponding to an MDL instruction.

Clearly the fabricator should pre-classify the machine instructions as above, and place them into groups with the same lookup requirements

The pre-fabrication of transfer sequences has another important consequence: it avoids the need for inefficiencies in the secondary code-generator which would otherwise be needed to prevent cycling through a sequence of transfers in phase 5A. Suppose phase 5A needs ,235

a transfer from class a to class z, when there are transfers

between each pair of a b and c, but none between any of those

and class z. The search could cycle round classes a b and c

indefinitely: simply trying to stop it by insisting that we

never get back to the start-class still allows the step from

a to (say) b, and then to cycle indefinitely between b and c.

To guarantee that failed non-prefabricated phase 5A searches

stop at failures requires either building all combinations

(including cycles) of length up to the number of memory classes

(that is, if there are m memory classes, m™ combinations of

length m, taking time proportional to m*mm at code-generate time)

or checking at every step that we have not arrived somewhere we have been before in this combination (that is, doing many expen-

sive tests). Such inefficiencies are acceptable only at fabri-

tion time, and then once, but are not acceptable at every call of phase 5A during every secondary code-generation.

Within the machine code sequence corresponding to a MDL tree the method produces optimal or near-optimal code. This conforms well to the folklore that the most important objective of optimi- sation if : "good local code". Between sequences corresponding to MDL trees, the method makes no attempt at optimisation. This would need to be the responsibility of the primary code generator or of some third agent, such as a window optimiser. Either the primary code-generator or a window optimiser for the IL could help the secondary code generator in three ways: it could produce

IL that is more explicit (i.e. lower level); it could produce IL that yeilds "bigger" MDL trees; or it could provide more medium- term information such as assignments from a location with a complex address to one of simple address before a volume of IL that uses ,236

the location heavily, then using the simply addressed location in

the following IL. If a store is done to the address during the

IL, then a final store-back to the complex address is needed.

(Alternatively, instead of shadowing the location of complex

address, the address itself could be shadowed, and used by refe-

rence. Which is desired might depend on the address structure

of the target machine: but referencing via a shadowed address has the advantage that jumps out of the volume of IL need not provoke a store-out, whereas.shadowing the location would have to Both techniques become more complicated ijl there is the possiblity of a jump into the volume of IL.) An IL-window

optimiser could also remove silly redundancies across borders

in the IL corresponding to the separation in the primary code generator between different high-level instructions. A window

optimiser could also be applied to the machine code output of

the secondary code generator, as well-as (or instead of) the IL

input: it would have all the same advantages, but would be

operating on different information and so could often produce different effects. It is unlikely that varied languages on a

selection of machines would have different window-optimisation requirements, so whether to use it on the IL, or on the machine code, or neither, or both, should be an implementor decision - probably after benchmarks.

Since the primary object of this research was to get something that worked, and to only consider optimisation either when it was easy and obvious, or when it was imperative because adding it later would be excessively difficult, window optimisation has not been deeply considered in this research. Recently there have been a number of publications that suggest that window optimisers can relatively easily be generated from

IL or machine descriptions in MDL-like notations, at least for small window sizes ("peephole optimisers"), mostly using localised symbolic execution methods /pAV8c7 ^KIN8j7 .

Other methods have been suggested for specific languages, specific ILs, or specific machines, and some seem capable of generalisation £kE7(]f ^CAT7^^PER7^^RUD797. There is also a vast body of literature on medium- to long-range optimisation deriving from control flow and data flow analyses: although in general very abstract as methods, it is hard to see how to implement them without classes of information not deriv-

able from MDL-like notations:- a notable exception is^B0Y7<2^.

6.7.1.3 Another demonstration of the method

This sub-section demonstrates the code generation of the example fine tree (without the "+0" redundancy: i.e. the tree of the MDL instruction "(k+""f) (i+"f) ) for the RDP-10 computer as target, described in terms of the phases of 6.7.1.2.

The neccessary PDP-10 instructions are described below in exhaus tive form. The instructions including addition in their MDL are also given in commuted form: for the instruction with two addi- tions in MDL there are 2*2 (=4) cross-commutations. The fabri cation-time classification is also shown.

AT PHASE 5A. MDL DESCRIPTION FROM-CLASS TO-CLASS MOVE rl,m "m -fr* rl m r MOVEM rl,m "rl -fr* m r m ,238

AT PHASE 1 FOR A ROOT MDL DESCRIPTION Root-type MDLs could MOVEM rl,(r2) "rl "r2 be ^ssified by the >V rl,m(r2) "rl (m+"r2) two secondary operators: "rl -> ("r2+m) e.g. ("/") for MOVEM rl, (. and ("/+) for MOVEM rl,m(r2

AT PHASE 1 BUT PRINCIPAL REQUIRED MOT AT A ROOT MDL DESCRIPTION OPERATOR TO-CL.ASS MOVE rl,(r2) ""r2 rl r MOVE rl,m(r2) "(m.4-"r2) rl r "C"r2+m) ' r^- rl r ADD rl,m "rl + "m rl + r M "m + "rl rl + r ^DD rl,m(r2) "rl +"(m+"r2) rl + r "rl +"("r2+m) -> rl + r "(m+"r2) + "rl. rl + r "("r2+m)+ "rl rl + r ADDI rl,m "rl+m rl + r m+"rl -b- rl + r

It is assumed that the fine tree leaves f, i, j, k are of memory

class m, so that in all the following they will be written as

m^, rru , riK , and m^. Consequently no LIT-type pseudo-instructions

are needed to convert from the IL-oriented fine tree to the target- machine-oriented fine tree: they are the same.

Note that for the PDP-10 description, all the "m+" and "•+nf'

fragments can only be matched at leaves, because of the absence

of an indirection between the m and the addition.

The root zone of the fine tree is of the form (...+...) -V (...+"...)

All of the root-phase-1 descriptions are of form "rl ..., so

at phase 2 matching the left hand side of the assignment must match

"rl, and so to be recorded as a subgoal with destination rl for phase ,239

The three right-hand sides of the root-phase-1 descriptions are:

~r2 (m+~r2) and ("r2+m), and are to be matched at phase 2 with (m.+"m_). The first of I • f- the three, "r2 /MQVEM rl » succeeds in the match, and creates a subgoal in which (rTu+"m ) is to have destination r2. The second,

(m+"r2) ^MOVEM rl,m(r2jl^ , succeeds in the match with m matching rrn , and creates a subgoal in which is to have destination r2. The third, (~r2+m) ^also MOVEM rl,m (r2j7 .fails because it would need a subgoal in which "m^ goes to the address m itself, rather than the location of address m: that is, there is not indirection before the m in ("r2+m) and so cannot match anything other than some m , as observed previously.

The subgoal (mi+*m ) with destination r2 selects the last 8 MDL descriptions in the table for "phase 1 not at root" - i.e. those with destination class r and principal operator . Phase 2 eliminates all of these except (m+"r2) r2 ^JNDDI r2(in which rl has been renamed r2), either because operators mismatch, or because of requirements to produce a result to the address m itself.

The one survivor matches the m of the ADDI to m,, and creates a sub-sub- goal of "m to r2. For the subgoal, there is only one phase 5A candidate MDL description, "m r2 ^MOVE m, r2^ with m matchi ng

and r2 matching r2 at phase 5B. The sub-subgoal and the subgoal therefore succeed, so if the other subgoal at the root succeeds, these three goals will produce the gross tree/fine tree relationship shown in the following diagram: ^ ...(rl) T)

m. is r2) ( mf — ~ —(r2) rf > MOVE r2,m ADDI r2,m. MOVEM rl,(r2) ,240

The subgoal "m^ with destination r). due to the other candidate instruction at the root, ^MOVEM rl.m (r2j£7 » is the sam$- as the sub-subgoal above: so if the other subgoal at ;the root succeeds, this candidate will produce an alternative gross tree to the above, with the following fine tree/gross tree relation- ship .

...(rl>

MOVEM rl,m.(r2) 1

When the subgoal with destination rl has succeeded, the two rival gross trees will be produced at phase 4, and later the two-instruction subtree will be preferred to the three-instruc- tion subtree. The rl subgoal will be identically attempted for each of the two gross subtrees above: the optimisation suggested in 6.7.4 will prevent this inefficiency by eliminating the three- instruction alternative at fabricate time, and will also discover that the secondary code generator never needs to build the

... (m MDL subtree, but can directly generate the gross subtree fragment:

. . .(rl)

We now deal with the other subgoal from the root of the fine tree

The subgoal "m.+"(m, +""m ) with destination r2 due to either k f candidate at the root has destination class r and principal operator +, so selects the last 8 MDL descriptions in the

"non-root phase-1" table. Phase 2 eliminates all of these save: ,241

INSTRUCTION DESCRIPTION MATCHES SUB-SUBGQAL(S)

ADD rl.m "m+"rl-*rl m with m. "(m, +""m ) for rl j k f ADD rl,m(r3) "rl+"(m+"r3)->rl m with m, "m. for rl and K J ""rn for r3

Recursing for the sub-subgoals gives the following gross trees:

SUB-SUBGOAL ALTERNATIVE GROSS'SUB-SUBTREES •

for rl MOVE r4,m MOVE r3, (r4) MOVE rl,m. (r3) k "m . for rl MOVE rl ,m . J ,1 ""m^. for r3 MOVE r4 MOVE r3,(r4)

MOVE rl k ( ADDI r3,m. — MOVE rl,(r3) k

Grafting these into the two candidates gives the gross trees following as alternatives for the other subgoal at the root.

(Note: at the starred node, there is again the longer alternative.)

Returning to the root of the gross tree, the second candidate has succeeded with two alternatives^ each of four instructions.

The whole translation of I:=J+K; therefore takes six PDP-10 instructions.

Two of the PDP-10 instructions are of form MOVE r.m to get "f into memory class r. The implementor could decide to reserve a location of class r for "f, rather than a location of class m.

There are several ways this could be done. The first is to treat

~f as ~r (rather than ~m_). The second is to introduce a ,242 pseudo-instruction NULL with description but later we will see that if locations of class r are scarce (they are -

there only 16 ) then this complicates register allocations.

The third is to repeat every MDL description in which r ^

appears, but with the r replaced by f: this could be done at

fabricate time, and for some target machines will lead to a bigger but faster code-generator - for other target machines

it will lead to a bigger and slower code generator.

6.7.2 Register allocation on the gross tree.

There is a large body of work on generalised register allocation:

some of the more notable recent contributions are

^LEVSlJ, ^5IT79y and ^WEBSoJ . Almost all of this work attempts

the global problem in isolation from other parts of the compiler, usually from a linear form of the chosen instructions

after data-flow analysis^or from late versions of IL. For the more limited purposes of this thesis, the problem is attempted

only locally, and from the first version of the^gross tree. All

of the alternatives and combinations of alternatives are treated

as representing distinct gross trees:

subtree

^^ GROSS NODE . . .

alternative 1 > OR • alternative 2, would be treated as representing both of the following:

subtree subtree

^^ GROSS NODE . . . ^^^ GROSS NODE

alternative 1 ^^ alternative 2 ^^ The register allocation method described below may in some cases modify one alternative gross subtree in a manner incompatible with other alternatives, so that the point at which the alternatives are introduced in the common repre- sentation may move. For example, if register allocation modified the first of the two trees shown as being represented above into the following:

subtree

-OHOSSNOOK—

alternative 1 and the second as following:

subtree \ GROSS NODE ...

INSERTED / alternative 2 NODES 2 then in the combined form representing them the "gross node" must be duplicated, and the branch in which the "OR" appears must be superior to the "gross node":

subtree

GROSS NODE

_ .. _ INSERTED^^ alternative 2 - NODES 2 °R

^ GROSS NODE ™SERTED _

alternative l4

This requires some programming that is rather complex, but introduces no theoretical problems. Common subtrees do not need copying (as illustrated in the diagram).

The method given in 6.7.1 proceeds from the root of the fine tree to the leaves, hypothesising gross tree nodes, then returns from the leaves to the root building successful gross tree nodes ,244

(i.e. for those hypotheses of which all the sub-hypotheses succeeded).

Register allocation can therefore be merged in with the root-to-leaf

process or with the leaf-to-root process.

Following the observations made on the Sethi-Ullman method in appendix

C, it would appear simplest to do the register allocation in the

leaf-to-root process, inspecting each branch of the new gross trees

to see whether it represents a result to be held in a class of

memory that is plentiful or scarce. Branches that represent

"plentiful" memory are assigned a location with no attempt to re-use

these locations (in one gross tree: there is no clash if they get

re-used in another gross tree). For such a branch, the subtree

dependent on it can be separated from the rest of the tree, so long

as the target machine code generated from it is completed before the

result is needed.(this is the basis of the instruction order algorithm

of 6.8). Results lying quite long times in plentiful memory cost

nothing, and isolate the allocation in that sub-tree from that in

all other sub-trees.

Branches that represent "scarce" memory classes must be accounted

for in the "bandwidth" manner of appendix C.2.2, as analysed from

the Sethi-Ullman method. During the leaf-to-root process, a

numeric label is kept for each branch of the band-width of each

scarce memory class used in the subtree from which it leads: if'

a node has two in-branches with equal-highest labels of one scarce

class, then the out-branch needs a label one more than that: other-

wise the highest in-branch label is chosen as the label for that

class of the out-branch at that node. A branch that represents

a result in "plentiful" memory "cuts" all allocations as in the

manner of C.2.2 - that is, as an in-branch it has a number of zero ,245 for each scarce memory-class.

At a node that would exceed the allowance of some class, the allocation algorithm must introduce a "cut" in one of the in- branches equalling that allowance. A cut can be achieved in one of two ways.

The first way of making a cut, in its simplest case, to introduce two new instructions on the branch: a store to a plentiful memory class, and a load back from it. As suggested in 6.7.1.2, this can be simplified by recognising at fabricate time all cyclic sequences of memory-class-to-memory-class transfers: e.g.

On the PDP-10: MOVE r,m "m r

MOVEM r,m "r m

giving the seqences

~m r; "r ->*m(with effect "m -*-m)

and -*>m; "m r(with effect "r -> r)

On the Z80: LD a, b "b -JK a

LD a,m "m a

LD b ,m "m ->• b

LD m,a "a m

(and others) giving sequences including

"a m: "m b; ~b a (with effect "a-lK&)

and the shorter sequence"

"a m; "m a(also with effect "a-*»a )

From such sequences,' the fabricator should keep all those from and back to a scarce class of memory via a plentiful one. I'f there are several sequences cycling fron one particular class, only a

"cheapest" (in some sense) sequence should be kept: this prevents introducing alternatives which will only be deleted later. To ,246 make a cut, the prefabricated sequence is introduced on the branch.

If there is not a prefabricated sequence for the class, some kind of failure action is needed: but in a survey of 22 real machine architectures, ancient and modern, none had such a deficient memory class within reasonable interpretations of "plentiful" for the machine.

The disadvantage.of this method of cutting is that the load at the end of the cutting sequence may need optimising into the node: for example a PDP-10 sequence:

MOVE rl,m (MDL is ~m rl)

ADD r2, rl (MDL is "rl+"r2 r2)

will need optimising into:

ADD r2,m (MDL is "m+"r2 r2).

If the fabricator finds several idempotent sequences from a class via another, it preserves the cheapest in the sense of the next section, 6.7.3.

The second method of introducing a cut is to re-try the expensive sub-tree(s) for new gross trees, but enabling phase 5 of the method eventhough there were phase 4 successes. This may produce better results than the first cutting method, or may fail (so that the first method must be used). The survey of 22 machines suggested that this expensive technique would seldom produce better results.

All the above assumes that no instruction can have more than two equally-labelled in-branches for one class. If there were n in- branches with equal-highest label h, then the label on the out- branch would need to be n+h-1 (that is h for the last-evaluated, and at the same time (n-1) for the results of the other (n-1) ,247 in-branches). If this exceeds the allowance by m, then m of the in-branches must be cut. A survey of 22 actual machines produced none that had an instruction that used three or more distinct operands of the same scarce memory class, so in practice this situation does not appear to arise often. (Indeed, 9 of the 22 machines had a single accumulator, a plentiful main memory, and no other memory classes: like the PDP-8. The PDP-8 example below will show this case to be rather trivial.)

The following example shows the allocator introducing "ALLOC" and "DEALLOC" pseudo-instructions into the gross tree, to make it obvious where : allocated memory locations are introduced and dismissed. The pilot implementation of chapter 7 instead marks the node whose out-branch is the nearest to the root representing a result in a particular scarce-class location.

Either method makes subsequent assignment easy. Also, the pilot implementation uses only the first method of cutting, and for simplicity of programming and for more informative trace output of experiments, it has had the allocation prog- rammed as a sep&pate pass over the tree, rather than being merged in with the leaf-to-root pass of building the gross tree.

6.7.2.1 The PDP-8 example

The PDP-8 has only one accumulator: and that id the only scarce memory class. Consequently, any instruction can only have one in-branch representing a result in the accumulator, and so any other in-branch must be in (plentiful) main memory. There therefore cannot be any need to introduce ,248

"cuts" in the gross tree: all the allocate pass need do is

introduce"ALLOC" and "DEALLOC" instructions to assist later

assignment of main-memory locations. Figure 6-8' (en- the next page)

also shows them for the accumulator, as if there were several

accumulators: but (as remarked above) at no point can two of

these "virtual accumulators" co-exist. No Horwitz-Karp-Miller-

W . ~grad method or similar (appendix C.3) for assignment of these

"virtual registers" is needed - they can simply be ignored.

6.7.3 Choice of alternative machine code gross trees.

This can only take place after the register allocation

phase, so that the instructions in-the alternatives have been

finalised, and thus so that the alternative can be costed.

Like the register allocation, choosing among the alternative

gross trees is straightforeward. On a pass from leaves to root,

some costing algorithm is used to "price" each branch. At a

branch with several alternative gross subtrees, all but the

cheapest are pruned out. On arrival at the root, one tree,

containing no alternatives, is left. If, at an alternative,

there are two or more equal-cheapest contenders, then some

tie-breaking rule is applied to eliminate all but one.

The only costing algorithm tried yet is to put a price on

each kind of instruction, and to sum the prices of all the

instructions in a gross subtree. This is easily- done at

each node by summing the costs of its subordinates and of

the instruction of the node itself. At a leaf node of the

gross tree, the cost is the price of the instruction at the leaf. ALLOC AC2 L0ADD2 j AC2) ADD2 t DEALLOC M I AC2)

ALLOC Mt ALLOC AC1 ST0REI2 u —(ACl)— -(AC1)— STORED t LOADD1 LIT k DEALLOC A2 DEALLOC AC1 DEALLOC M U --tM ) u ALLOC AC2 ALLOC Mu AC3)— L0ADD2 f ST0RED2 u

Figure 6-8 The PDP-8 gross tree of the I:=J+K; example, after The identifying digit of allocation each virtual accumulator has been appended to the mnemonic of each instruction that uses it. ,250

To optimise for compactness of code, the price of an instruction is the ,'lsize" of the instruction. To optimise for speed, the price of an instruction is the time it takes to execute in some reasonable measure. To ohtain a hybrid costing, assuming both size and time are small integers (e.g. number of bytes anpl number of machine cycles respectively) then pricing each instruction at (say) 10*bytes+cycles would optimise primarily for compactness, with speed as a tie-breaker, but will not produce slightly more compact code at a price of excessively reduced speed.

The pilot implementation described in chapter 7 uses addition of prices given as numbers in the instruction descriptions. The tie-breaker used is to keep the first cheapest encountered. -For simplicity of programming, and for more informative tracing of the costing, the pilot implementation makes a separate pass for choosing the cheapest alternatives.

6.7.4 Optimisation of the method of instruction choice

It is desirable at fabrication time to do as much as possible of the instruction choice to alternatives, eegister allocation in the alternatives, and choice between the alternatives. The method below attempts this, and as a side-effect, minimises the need to build the MDL tree at secondary code-generation time, but instead often facilitates direct translation from an IL-gross-tree to a finalised target-machine-gross-tree. The MDL tree, the alter- natives, the register allocation, and the costing and choice of alternatives are all only needed when the gross-to-gross transla- tion fails. ,251

Consider a gross-to-gross translation of the example IL instruction

SF i, which has gross/fine relationship:

If the fabricator attempts the methods of 6.7.1, 6.7.2, 6.7.3, on the fine tree in this, assuming that the value to be stored can be loaded to the accumulator, then for the PDP-8 it produces the gross/fine relationship:

...(ac)—r:

i —-(m)——(

LIT i + i—( ac ) ( ~) (m )—( A' f _ ~ —(ac)

LOADD f ADDD STORED u STOREJ u

There is no new difficulty in this. Thus the fabricator can realise that whenever the input IL contains an SF instruction, the following

PDP-8 gross tree is to be generated:

The fabricator will therefore configure the secondary code generator to make this gross-to-gross transformation on reading the SF instruc- tion, without even needing to build the IL-gross tree.

Two problems arise in other kinds of case. ,252

The first is that the IL sequence to produce the result to be stored by the SF may be fabricated as PDP-8 instructions that leave the result in a class of memory,-other than that required - for the PDP-8, in main memory, rather than the (required) accumulator.

There are two solutions to this. The first is to secondary-code generate the two ILs separately, then introduce a phase-5-like transfer sequence to link them. This has the disadvantage that if another translation of the value-producing IL is possible with result in another place, then a "cheaper" transfer sequence may be possible, or even no transfer sequence at all if another translation leaves its result in the correct class of memory. As a consequence, the second solution is that the fabricator discovers all possible translations of an IL, for all possible result and operand memory classes, and the secondary code generator must make its choice from the viable combinations. The resulting method in secondary code generator is just a version of that of 6.7.1, 6.7.2, and 6.7.3, but using a tree containing many fewer nodes. (17) The larger the

MDL of each IL instruction - i.e. the more functionality in the IL instructions - the greater the difference in the number of nodes, and the greater the consequent optimisation of the secondary code generator. Many secondary optimisations are possible with this second solution: for example if the fabricator discovers two target machine sequences for an IL; the "costs" of the two sequences are

C^ and C , and there is a transfer sequence of cost C^ from the result class of the first sequence to that of the second such that

C1+C12 < C^ , then the sequence of cost C^ followed by the transfer sequence of cost C is cheaper than the sequence of cost C^: the fabricator should discard the latter sequence.

(17) If this second solution fails, it invokes the first solution as its phase 5. The second problem arising is more severe: it is possible that some IL instructions do not match target machine instructions or instruction-sequences, and the translation is not a simple macro-expansion with re-ordering. The remainder of this subsection deals with this problem, using the remainder of the stack-IL/PDP-8 example.

Superimposing both the IL and the PDP-8 gross trees on the fine tree, and the PDP-8 memory classes, but not the grafted-out introduced indirections (which are different for the two trees), gives the following relationship.

LOADD j J ADDI t

LG j ^ac)

(in the diagram, dotted lines apply to the PDP-8 gross tree, dashed lines to the IL gross tree, solid lines to both.)

The LOADD j and QG j match perfectly. The following discusses processes for the fabricator and secodfiry code generator to almost avoid constructing any MDL at secondary code generation time by making all other ILs have matches.

First, the fabricator explores the possiblities, working from results to operands in the IL instructions. The SF instruction had a translation to PDP-8 instructions that required the accumulator to contain a value. Therefore, any value-producing ,254

IL may need its PDP-8 translation to have a result in the

accumulator. C onsidering the MDL of the ADD value—producing

IL instruction for a result, the fabricator must consider PDP-8

instructions that would have MDL of the form ... + ... ac,

and also PDP-8 instruction sequences of which the tree after

grafting would reflect that form. There are four such instruction;

descriptions, and no sequences:

ADDD "ac + "m ac ADDD "m + "ac ac ADDI "ac + ""m ->• ac ADDI ""m + "ac -V ac

For the first two, ADD IL is a perfect match, but memory class m

must be added to the list of destination classes required for

all results. For the two latter instructions,

(m)— " —...

must be similarly prescribed as a destination, meaning that for

any IL with MDL description of form "(...)->§, a fabrication must

be attempted for. (...)-Nn, where the ellipses denote the same MDL

expression.

Only two of the IL instructions have definitions in which indirection

is the principal operator: one is:

LI c "(c+"§) §

For this instruction therefore, a fabrication must be attempted for

c+"(...) m. This produces only two possibilities, both PDP-8

instruction sequences:

PDP-8 gross tree Net MDL effect

LIT c—"LOADD .ADDD •STORED c+ m->m (mK^

LIT c LOADD 'ADDI STORED! c+ m->-m (mi- ,255

The first of these demands an operand PDPr48 gross tree with result m, the second (like ADDI above) demands that any IL " with MDL description "(...)-> §needs a fabrication of (...)->»m.

The other example IL instruction with indirection as principal operatoris LFZ, with description:

LFZ -> §

To use the first form of fabrication of LI, we would need to fabricate LFZ for a result in memory: i.e. a gross-tree-fragment

with MDL effect mf m. This yeilds:

LOADI f STOREDx m ) x To use the second form of fabrication of LI, we would need to strip the principal indirection from the MDL of LFZ and take desired destination of class m: that is fabricateua gross-tree- fragment with MDL effect ->• m. This is just a null opera- tion^ all that is needed is to note that the particular memory location is

The first translation of LFZ/LI c above could be verbally described thus: " the LI demands an operand of class m, which therefore is the required destination class for the LFZ". The second translation does not have such a neat verbal description:

" the LI demands an operand whose principal operator is indirection, and requires fabrication for it without the indirection to class m".

Translations of the first type have so far been diagrammed as a

simple grafting on the memory class: ]-<§)- LFZ LI k yeilds ... (m)

so that the LFZ can have its translation diagrammed as follows, ,256

if the objective is of class m:

|LFZ | (§) is equivalent to LOADI f STORE x

objective would need something like:

is equivalent to

(§)... (m^,) . ..

As a diagramatic form this is inconvenient: so we will instead

state that a target is for an indirection from memory by rotating

the graft point as (An) . The above diagram would be redrawn as:

| LFZ [——(§ ) . .. is equivalent to ("m^)...

We can revise the verbal description thus: "the LI demands an

operand 'of class "m' , which therefore is the required destination

class for the LFZ". Using this fiction of a 'memory class "m' ,

the fabrication of the LI c would be that:

(§) LI c (§)

is equivalent to:

LIT c LOADD

ADDI STORED •Cm)

C m)

We do not now need complicated explanations that the target of

the gross fragment itself, and of its operand, must first ignore

an indirection: that is built in.

Figure 6-9 on the next page gives the resulting relationship

between the fine tree, the IL gross tree, and the optimised

form of the PDP-8 gross tree. The diagram clearly shows the

tidying of the relationship between the two gross trees due to

the more complex form of grafting points. Figure 6-10 on the

subsequent pages shows the IL-gross-to-PDP-8-gross equivalences Figure 6-9 Relationship of the IL gross tree, the fine tree, and the optimised form of the PDP-8 gross tree.

(The solid lines represent the IL gross tree. The dotted lines represent the PDP-8 gross nodes dividing the IL gross nodes. Note the null PDP-8 operations yeilding tree fragments of form: (m)— " —-("m) and which just represent the fact that (m>— "... is just the tree form of the expression M"mn.) ,258

Figure 6-10 Some fabricated optimised IL-to- PDP-8 equivalences.

(Note: for conciseness of presentation, some instruction ordering has already been carried out, as foreshadowed in this subsection. Also allocation pseudo-operators have been introduced - they have so far been ignored in the present subsection, as they do not make any theoretical difference, but would have confused the presentation. The forms shown below are therefore very much those that the fabricator would configure into the secondary code-generator.)

IL PDP-8

(§ )—| SF c]

(ac (§ ADD ADDD- -(ac) (§ (m>

or (m ac) (ac

or ac)

or ( m jj flDDI-[_( (ac

(§ LI c §) V

or V

or either of the above grafted to

(~m) ["LPAPI j (ac) EEH ) or LOADI f ]—(ac)

|LG C [ ( m — ("m) §) c or LOADD c j (ac) ,259

we have observed above, in the revised form of diagram.

Clearly, a larger IL or a larger machine code will yntld many more equivalences, and may need target 'classes' incorporating operators other than indirection.

The fabricator would analyse all the desirable equivalences and configure them into the secondary code generator, which would then apply them in the manner of the method of 6.7.1, except with the rather more complex forms of "required destinations". Much of the work will have been done at fab- rication time, including the elimination of a substantial proportion of non-optimal target machine alternatives.

There are two major differences from the method of 6.7.1.

The first is that the method can (and must) be combined with the parsing of the IL. This yields a set of target-machine gross trees which can be register-allocated, costed, and chosen either in that same parsing/building pass, or in separate passes: this is a trade-off between programming complexity and program speed.

The second major difference is that the unit of building is no longer necessarily a single node, but often a combination of nodes. This doesn't change the theory, but does cause programming complexity, and greater difficulty in representing the equivalences efficiently in the secondary code-generator. To some extent, however, this last can be ameliorated by having the fabricator do as much instruction-ordering as possible, as discussed in the next section. ,260

6.8 INSTRUCTION ORDERING FROM THE TARGET-MACHINE C/lBSS TREE.

There are three kinds of case in the target-machine gross tree to be considered:

* pseudo-instructions that convert IL leaves into target-

machine leaves, and are to be incorporated into "real"

instructions;

* sequences of nodes with out-branch of each being the in-

branch of the next, and with no side-branches (call these

"linear section^); and

* nodes with multiple in-branches (call these "fork points").

These are dealt with, in order, in the following three subsections.

There is also the need to assemble into the nodes either side the

"decorations" on graftingpoints on target-machine branches. This is very much akin to the first case above, which is also really an assembly matter, rather than an instruction-ordering matter.

Assembling the decorations is therefore also dealt with in the first subsection below.

At the end of each subsection, the case of the subsection is shown for the example PDP-8 gross tree.

6.8.1 ASSEMBLY CASES OF INSTRUCTION ORDERING.

It is simpler and more convenient to program the "assembly" cases before the true instruction-ordering cases of the next two sub- sections, because they would confuse which instruction an in-branch leads to (and for the ^decorate" case, the instruction an out-branch leads to). If instructions like the "LIT" instruction introduced for the

PDP-8 are specially marked in their descriptions, they should be "assembled inline" as follows. At a configuration such as:

LIT x LOADD then the "LIT x" is really an operand to the "LOADD", and so should be made part of the LOADD instruction:

LOADD LIT x

It doesn't really affect this transformation if the node superior to the pseudo-instruction has other in-branches: for example:

LIT x ADDD

(ac)' can be converted directly to

(ac ADDD LIT x

This has the side-advantage of linearising many fork points near the leaves of the tree, so that the simple case in the next sub- section applies, rather than^more complex case in the subsection after.

Other branches than those from pseudo-instructions should be deco- rated, unless the branch represents a memory class with one loca- tion only (e,g. the "ac" class of the PDP-8). (18)

(18) Pseudo-instruction out-branches can be regarded as decorated for example: LIT xj (m) ... could be regarded as being equivalent to: (mLIT x) This turns the pseudo-instruction case into a "decorated leaf". Thus the two cases dealt with in this section are essentially similar. ,262

For a decorated in-branch or out-branch, the decoration is the

operand of the instruction at the node, or one of the operands

of the instruction at the node. For example, the .PDP-10 instruc-

tion "ADD r,m" should be assembled in that format, with the number

of the register to be used replacing "r", and the name of the

memory location (or its address) replacing the "m". A node such as

ADD' -(r ) ... D

" * x should therefore be converted into:

. .. (r) ADD 5 ,X r).. ...(mr

There are two points to note here. The decorations are not needed

again, and so can be discarded (from here on we will ignore them,

except in using the decoration in the PDP-8 example below). The

second point is that if the MDL description of the instruction

gives dS result a decoration identical to that on an operand, then

the decoration on the out-branch of the node will be identical to

that on the appropriate in-branch. For example, the MDL of the

PDP-10 ADD instruction is ~rl+"m rl: allocation has replaced

the "rl" by "(r ) on both the in- and out-branch above. 5

It is possible for an instruction node to have a decorated out-

branch that is not also a (decorated) in-branch. An example is

thePDP-10 instruction node:

. . • (m, MOVE r ) . .. X o which will be assembled as:

...(m)— MOVE 5,X r)... ,263

If "ALLOC" and "DEALLOC" pseudo-instructions have been introduced by the allocation phase to separate it from the register assignment, then the above is modified in the obvious and trivial ways. If this is not the case, then the tree can contain leaves that are not nodes, but just decorated grafting-points. An example of this occurs in the PDP-8 translation of the I:=J+K; example, at the translation of the LFZ, where the PDP-8 gross tree is of form:

. . .(ac (Note the absence of ellipsis before "m^", ADDI ac) . .. denoting that it is a (rn grafting-point leaf.) which must be linearised, as normal, into:

. . .(ac )— ADDI f ac) . ..

This "assembly" phase of the instruction-ordering is needed only once, whereas the phases for linear sections and fork points can each reveal cases in which the other can be applied, and so must iterate, one then the other, and repeat until only one node is left.

Not using "ALLOC" or "DEALLOC" instructions, this phase would

"assemble" the PDP-8 example gross tree into the form:-

LOADD j (ac ADDI t J mY* LOADD LIT k ac ADDI fl—(ac STORED t Y (ac T^s- STOREI u

LOADD f—(ac ADDD LIT i ac STORED u

6.8.2 THE "LINEAR SECTION" CASE OF INSTRUCTION ORDERING

For most of the next two subsections, the important feature of a branch is not the memory class it represents, but the scarceness of the memory class it represents. From here on, except at one ,264

time, we can therefore stop decorating branches. Instead we :

will draw a thick line for a branch representing a scarce memory

class, and we will call it a STRONG BRANCH. We will draw a thin

line for a branch representing a plentiful class of memory, and

will call it a WEAK BRANCH.

Re-drawn according to this convention, the PDP-8 example gross

tree becomes:

There are two subcases of linear sections: two nodes with a

strong branch between them; and two nodes with a weak branch

between them. In both subcases, the node of which the branch

is an in-branch has no other in-branches (or the section is

not linear).

For a strong branch, the instruction-ordering packs the two

nodes into one, with the two instructions associated with the

one node. Because the strong branch was part of a data-flow

tree, the node that had the branch as an out-branch produces a result, and the result is consumed by the node that the strong branch as in-branch. Therefore, in the new node, the instruction

from what was the 'but-branch" node must come before the instruc-

tion from what was the "in-branch" node.

Diagramatically: FIRST FIRST SECOND| must be converted into SECOND ,265

If there are several strong branches in sequence, with no side- branches, then this packing operation can be performed repeatedly to yeild a single node. In the PDP-8 example, there are two instances of three nodes and two strong branches in line. Packing them leaves the PDP-8 example tree as follows.

LOADD j - _

L.OADD LIT k ADDI f . STORED t

ADDD LIT i STORED u

The important feature of a strong branch i's that, since it repre- sents a result in a scarce memory class, if any other instruction obeyed between the instructions represented by the nodes either end of the strong branch, it may use a location of that same scarce class, and so will violate the assumptions of the register allocation.

For linear sections, there is no need to interrupt them in this way, and so nodes in linear sections linked by strong branches can be packed.

A weak branch, on the other hand, represents a result in a plentiful memory class. In the target-machine sequence representing one MDL . instruction,/ these plentiful locations need not be re-assigned, and so clashes violating allocation cannot occur. Thus the instructions either end of a weak branch can be separated by intermediate instruc- tions in the final instruction ordering: that is:

INSTRUCTION! | |lNSTRUCTI0N2 | can be finally ordered as the following, without problems: ,266

INSTRUCTI0N1 (other instructions) INSTRUCTION

Thus, a weak branch can be BROKEN, whereas a strong branch must not be broken.

In linear sectionsilinked by weak branches, there is thus no need to pack adjacent nodes before•proceeding to deal with fork- points. However, it does no harm, and reduces the number of times that the more costly "twisting" operation (described below) is needed, and so it will usually be more efficient to do all the packing possible - of weak and strong branches in linear sections before proceeding to twist forks.

There are no weak-branch linear sections in the PDP-8 example after the strong-branch packing, so the example of packing weak branches must be postponed until examples of twisting introduce weak-branch linear sections into the tree.

6.8.3 THE "FORK POINT" CASE OF INSTRUCTION ORDERING.

This subsection discusses the instruction-ordering operations around a node with more than one in-branch. Initially, the subcases with two in-branches are dealt with: then these subcases are generalised to deal with more than two in-branches.

With two in-branches, there are three possibilities: both in- branches are weak; both are strong; and one is weak, the other strong.

In the case of one weak in-branch, one strong in-branch, the ,267

instructions representing one of the in-branch source nodes must break the other in-branch: thus there are two possibilities.

For the three section:

...[SOURCE11

one of the two following sequences must be created:

SOURCEl or S0URCE2 S0URCE2 SOURCEl FORK FORK with the subordinates of the SOURCEl and S0URCE2 nodes inserted

(somewhere) before the matching instructions. The second case breaks the strong branch of the tree section, and so cannot be permitted. The first sequence is needed, breaking the weak branch: it is the linear packing of the tree:

(subordinate of SOURCEl)...

(subordinate subtree of S0URCE2) S0URCE2 FORK

In practice, it is simpler to convert the tree section into this last form: call this tree-manipulation TWISTING the SOURCEl branch back up the S0URCE2 branch. Repeated packings and twistings then look after the subordinate subtrees of the sources, according to the strength of the branches from them.

At a fork where- the strengths of the branches is the reverse of that illustrated above, the twisting- is correspondingly reversed

SOURCEl

^[FOR^. . . S0URCE2 twists into ,268

Clearly, both the twisted trees contain a one-branch linear section

that can be packed, immediately, or (if it simplifies the programming)

during a separate subsequent packing phase.

The example PDP-8 gross tree, after packing, contained two nodes

each with one strong in-branch and one weak in-branch. Twisting

first at the ADD t gives:

|LOADD j

LOADD LIT k ADDXt ADDI f STORED t STOREI u LOADD f ADDD LIT i STORED u

The tree-node linear section could now be packed, but it makes

a more interesting example to twist the other fork back twice,

so that the gross tree becomes:

We now have a second form of fork: with two weak in-branches.

This can be resolved by breaking either weak in-branch with a

twisting operation, for example:

|LOADD j LOADD LIT kj^ ADDI f ADDXt STORED t STOREI u LOADD f ADDD LIT i STORED u LOADD f ADDD LIT i STORED u This is now one long linear sequence, and can be LOADD LIT k ADDI f packed as shown on the right. The alternative STORED t LOADD j twisting would have interchanged the first three ADDXt PDP-8 instructions with the second three. Earlier STOREI u ,269

packings would have produced the same effects: one of the two same sequences would have been produced. Forks with two strong in-branches are more difficult. One of the strong branches must be broken. A strong branch may be broken if the memory class that it represents is not fully allocated in the subtree connected to the other strong branch. This does not violate the presumptions of the allocation algorithm: however, it is possible that the searcec memory location assigned to the branch which is to be broken is also assigned in the^other subtree, so that one of the assignments needs to be changed. Alternatively (and much simpler), the allocation should assign 'virtual' scarce locations as if they were plentiful, and introduce ALLOC and DEALLOC information: then after a linear sequence has been created by the instruction ordering code, a simple linear scan can use the ALLOC and DEALLOC information to direct a translation to direct a translation from 'virtual' scarce locations to assigned 'real' scarce locations;

When the instruction-ordering is started, there cannot be a gross tree node with two unbreakable strong in-branches: if there were, then the two subtrees attached to the branches would both be fully allocated in the same class of memory as both the branches, so that the node itself would be more-than-allocated in that class, which the prior allocation would not have permitted. (The allocator would have introduced a transfer sequence via a plentiful class of memory, thereby lowering the allocation on that branch to one.

This suffices on any machine that does not have an instruction with two operands of the same class, that may refer to different locations, but the machine has only one location of the class - an impossible machine.) It is trivially easy to see that none of the instruction ,270

ordering operations (packing, either form of twisting, or breaking

one of a pair of strong in-branches) can introduce an unbreakable

pair^of strong in-branches.

Thus all forks with two strong in-branches must have at least one"

of them breakable.

The following paragraphs deal with the cases of more than two

in-branches to a gross tree node.

If there are two or more strong in-branches, then by extension of

the above argument, all but one must be breakable. An algorithm

for this is to identify one that is breakable with pairwise respect

to all the others, and then to break it: and then repeat until one unbroken strong in-branch remains. The broken strong branches are

then treated as weak branches.

If there is a strong in-branch and two or more weak in-branches, then the weak in-branches are all twisted back up the strong branch.

This leaves a packable strong in-branch.

If there are multiple weak in-branches and no strong in-branch, then one of the weak branches is selected arbitrarily, the others twisted back up it, and it is packed. This is equivalent to taking the in-brancbes and their weakly connected component subtrees in any order. ,271

6.9 REMAINING TOPICS IN SECONDARY CODE GENERATION.

The techniques proposed in this chapter are not complete. In particular, target machine anomalies may be difficult or L impossible to describe in IL, and so need handling by other methods.

The PDP-8 offers several examples of this. There is no LOAD instruction: one has to use the "add-to-accumulator" instruction after ensuring that the accumulator is zero: but LOAD instruc- tions normally only occur in two kinds of place:

* immediately after a "deposit-and-clear accumulator"

instruction, which is a STORE instruction with the

side effect of clearing the accumulator - which is

what is needed; or

* immediately after a jump or a label - in which case •.

the LOAD must be translated as a "clear accumulator"

instruction followed by the add.

Another PDP-8 anomaly is that the memory is structured into

128-word sections, and that any instruction may only refer to its own section, and to the section with addresses 0 to 127

(decimal): all other memory locations must be reached by indirection. Both-the above anomalies can be dealt with by the better PDP-8 assemblers. An anomaly that is beyond an assembler is that eight of the locations in the first section kfok^Ack IaolA fo be 1*3 of the PDP-8 memory^* automatically incremented whenever its contents are used indirectly: this could be used in proposed secondary code-generation method if special IL and MDL instructions were introduced to use these locations, and to ,272

have the primary code generator do the assignment of these

locations. This might be useful for "tuning" a particular

compiler for a particular machine, but would"defeat the

portability objective for that primary code-generator.

Some decisions must be taken by the implementor for each

combination of language and target machine: especially those

concerned with the calling of subroutines, and return from

and parameter passing to subroutines. Support libraries

will need at least customising for particular combinations

of language and machine, but 'endeavouring to keep them all

in one high-level language in a source pre-processable

form could much reduce the customisation cost. To achieve

this reduction for input-output and other very "machine-

intimate" library routines would require an interface at

IL or MDL level (19), plus a minimal machine-specific

library that implements the interface; or alternatively

would require machine-specific preprocessed source assembly

code inserts in the high-level source of the library.

(19) A minimal such interface might be: * open file, with machine-dependant parameters specifiying which file, and its properties; * read one binary unit (of the "natural" size of the machine) on a specified open file; * write one binary unit; * read one character; * write one character; * close a file; * heap-allocate a fragment of space according to a parameter that points to a format specification block; and * routines for the primary code-generator to create heap format specification blocks in machine independant manner. ,273

It is not clear how side effects of target machine instructions can be handled. In some cases the instructions concerned can be ignored: for example the PDP-10 has an instruction that loads the value of a memory location to a register, and skips the next instruction if the value was zero: the instruction would simply not be described. Many machines have "Block Move" instructions, that copy one region of memory to another: not being able to describe these, and so unable to generate them, lc>ses a power- ful machine facility. Most serious is when side effects cannot be be ignored: for example the setting of an'bverflow" flag when arithmetic fails. High level languages which impose requirements as to when such flags should be inspected are unlikely to be able to have a machine-portable IL.

The techniques proposed would need extension to produce binary relocatable output, as opposed to assembler-format output.

This can be automated straightforu-'ardly /WIC757. Another input to the fabricator would be a description of the formatting rules for the relocatable form of the target machine, and in the descriptions of the individual instructions there would be statements of the binary values to be produced, relative to the format. Especially because of the need to have multiple descrip- tions of some instructions, this might be factored out from the

MDL descriptions, to reduce repetitions, and to enable greater legibility of the separate descriptions. ,274

6.10 REVIEW OF THE PROPOSED TECHNIQUES.

Of the remaining topics of the last section, only the "overflow"

problem raises new fundemental difficulties-- The other remaining

topics can all be dealt with fairly easily (if sometimes

inelegantly), or ignored. Overall, the techniques proposed in

this chapter suggest that it is viable to create cheap secondary

code generators for orthodox languages and orthodox machines.

THIS MEETS THE PRIMARY REQUIREMENTS OF CHAPTER 1.

In addition, the proposed techniques yu.,ld generators that are

moderately fast, and produce fairly good local code. THE

TECHNIQUES THEREFORE MAY BE SUITABLE FOR CHEAPER PRODUCTION

QUALITY- COMPILERS, not merely for pilot compilers: this

is an advantage that was not actively looked for, but was

observed as a by-product.

Many details remain to be determined, some being quite major.

For example, it seems likely that , because of the need for a

subgoal the translation should start from the root.. This in

turn suggests that the ideal organisation of the IL should be

prefix polish, rather than the postfix polish of the example

IL of the chapter- Tt is, for example, not clear when an IL

can be treated as a simple macro-expansion, and whether this

can be made more straightforward by a different order of the '

ILs (e.g. postfix as opposed to prefix).

The main requirement now is therefore to complete a pilot imple-

mentation of the full proposal. With experience gained from that,

the next phase of the project can be planned. CHAPTER 7

A PILOT SECONDARY CODE-GENERATOR SYSTEM

7.1 INTRODUCTION

This chapter describes a partial implementation of the methods proposed in the last chapter, sufficient to demonstrate the viability of the novel essential ideas.

7.2 The System

The pilot implementation is in the language POP-2, a language particularly suitable for pilot programs that use data structures.

The following description does not depend heavily &n the features of POP-2 except in two respects, the POP-2 MACRO and COMPILE facilities. A POP-2 macro is a parameter-less function to be obeyed at the time the POP-2 program is compiled. The macro is called when the compiler reaches the name of the macro in its program input stream, and then discards that instance of the name

The macro function usually calls for input, which comes from the program input stream; and usually creates a stream of lexic tokens which it passes to the compiler. When the macro function returns to the compiler, the compiler takes the stream of tokens and prepends it to the remaining part of the input stream: that is to say, it continues by compiling first the returned tokens before resuming compilation from the input stream. This facility was used to create a macro called MI (machine instruction) which was used to incorporate an individual machine instruction descrip tion by doing the necessary fabrication, this description of the instruction following the token MI in the input stream to the ,276

compiler. The COMPILE facility of POP-2 enables parts-:of a

program source to be incorporated in files other than the

principal input file. When the instruction COMPILE (filename)

is encountered, the compiler continues by taking its input

from the named file until that file is exhausted, when the

input reverts to just after the COMPILE instruction. The MI

instructions were put in a file that is COMPILE'd into the

input stream: a listing of that file is given later. The

following listing shows the POP-2 system being started and

applied to a series of files. The lines starting with a colon

are the main input stream of the POP-2 compiler, taken from

the controlling terminal: the lines starting with two hyphens

are outputs from COMPILE'd files, indicating progress, and

naming the principal program components in the file.

.DO CF2V52 •PDF 3V6» TESTS

.R P0P2

CITY PCLY F0P2 V3

setpop : COMPILE(CDUMP.POPJ)? —DUMP COMPILING --DUMP COMPILED J COMPILE ( C CF'JT IL I) » —CF UTILITIES COMPILING J COMPILE(CCFMD04I)» --COMPILING MACHINE DESCRIPTION CODE: MDINFIXOJ MDPR(MD J MDTCP(MD) J COMPILE(CCFMI123)» --COMPILING .MIPR AND MACROS "MIBEGIN" » 'MI MNEM ARGS COST MD.-* AND *MIEND" J COMPILE(CCFCG52D)> --COMPILING .CGSPR .CGPRINT AND CGENERATE —COMPILED .CGSPR CGPRINT AND CGENERATE : COMPILE(CCFUKS2I). —COMPILING .WALK AND MACRO *CG MDL•" —COMPILED .UALK AND MACRO *CG MDL•* S COMPILE(CMDV2.MDSJ)t i t MIBEGIN : COMPILE < CPDP8V6.MIS3 > J —PDP8.MIS COMPILING --PDP8.MIS COMPILED t MIEND

CF is an abbreviation of "compiler factory". DUMP is a

debugging facility. CFUTIL is assorted utility routines.

CFMD is routines to read manipulate and print MDL trees. ,277

CFMI is the MI macro definition and its subroutines: that is, the fabricator. CFCG is the code generator. CFWK is the routines for a series of post-walks that complete the code generation. MD is a file containing calls to macros that specify the MDL operators to be used by CFMD, and whether they are monadic or diadic. PDP8 is a file containing a description of some of the instructions of the PDP8. (The subscript numbers on each file name indicates its version.)

The MIBEGIN and MIEND macros initialise and finalise the fabrication phase. Down to and including CFWK is the code of the system: the following files initialise it for a par- ticular compilation.

MIEND outputs a listing shown later: then a final COMPILE proceeds to take calls on a further macro, CG, from another file. Each call of CG reads a following MDL to be code- generated .

For simpilicity, the fabrication stage is very simple, and is incorporated in the same program as the secondary code- generator. The analysed machine description can thus be passed as a data structurec there is no need for the complexity of outputting it, and then inputting it. The production of the MDL tree from the IL is straightforward, and so is not described here, a simple parser of infix MDL expressions being used.

All the practical problems of introducing indirections to be subsequehtly grafted out have been removed by eliding out the indirection at a grafting. Instead of writing addresses in lower case, preceeded by an indirection, the corresponding ,278

upper case.has been used to signify the contents of the location.

The assignment operator is re-interpreted to ignore the implied indirection of a following capitalised simple name. Thus

the PDP-8 ADDI instruction is subsequently described as:

AC + "M -*> AC

whereas the last chapter describes it as:

"ac + ""m -> ac

Constants are handled by assuming that they exist in locations

apart: so that in the same way that M represents the contents

of a certain location, K represents the constant contents of

a fictional location.

For simplicity of the MD routines for manipulating the MDL input

trees and listings, all names are decorated. The notation of a

name is a letter, a backslash character , and a decoration.

*fhe decoration is either a letter or a number. For names in

which the decoration is unimportant, it is the same as the letter

at the start of the name: the printing routine just prints the

letter once, and no backslash. Thus A\A prints simply as A.

For decorations introduced by the code generator, the decoration

is a negative number. Thus M\4 refers to a location name input

to the code generator, but M\—• 4 is a location; introduced by the

code generator to carry a transient result.

The examples use notations for the PDP-8 m\Urmonics tabulated over-

leaf , that are more faithful to the simpler PDP-8 assemblers. ,279

Chapter 6 Chapter 7 STORED DCA (Deposit and clear STOREI DCAI accumulator) ADDD TAD • (Twos complement add) ADDI TAD I LOADD LOAD LOADI LOADI

As observed in ghapter 6, LOAD and LOADI are really TAD and TADI, used where the accumulator will contain zero. The following listing is of the input file used to describe il PDP-8 for the subsequent examples.

In the examples of this chapter, the fabricator and code generator are configured by the MDL file for the MDL monadic operators " $

- and +, and for the MDL dyadic operators && ++ and —. ($ signifies

"not", + and ++ signify "plus", - and — signify "minus" and && signifies "and ".) Note that with the indirection operator no longer having a special grafting role, it is now an ordinary monadic opera- tor. The dyadic assignment operator is now the only special operator.

The COMPILE(PDP8...) instruction reads the following file. (Overleaf) ,280

.TYPE PDP8VB.HIS COMMENT PDP8V.. 8/38 — MI NOT ML» AND COSTS. 3/18 — GENERIC/SPECIFIC NAME

—PTFP8.MIS COMFLIIN» !.rrstrina;

MI LOAD ANA MNM; 2: C MNM -> ANA > MI DCAF ANA MNM; Z ( ANA -> K\M ) A. MI TAD ANA MNM; 2 ( (ANA MNM) -> ANA ) MI TAI) A\A MNM; 2 . ( (MNM ANA) -> ANA > MI ANQ ANA MNM; 2 ( (ANA && MNM) -> ANA ) • •• MI AND. MNM; ANA 2 ( (MNM && ANA) -> ANA ) /•

MI LOAD! A'. A M\M; 3 ( "MNMT -> ANA ) . HI DCAf ANA MNM; 3 ( ANA -> "MNM ) MI TAD I ANA MNMI 3 . < (ANA ** "MNM) - > ANA ) . MI TAD I ANA MNM; 3 C ("MNM •• ANA) - > ANA ) . HI AND! ANA MNM; 3 ( (ANA && "MNM) - > ANA ) ^ MX•ANDI ANA ft. _ _ NH; 3 ( ("MNM && ANA) - > ANA >

MI LITERAL M\M R\KJ O ( K\K -> M\K > , MI ZERO Z\0 K\K? O ( ZN6->KNK) . MI ONE UU KNK; O C U\1-»KNK) •

MI CLA A\A Z\0; ± ANA). MI CMA ANA.,? 1 C SANA -> ANA ). MI I AC ANA,; 1 I AN* ). MI IAC ANA .; 1 < -> ANA ).

'—rtfrS.MIS oo*i»iltd l.pritrins*

Note the literal instruction: also the special instructions

ZERO and ONE for two special values which can appear as special

cases of constants;.• or (by application of the instruction) as an

ordinary literal. This was a trick to prevent the code generator

introducing "other" constants of value one and zero with different

decorations. In a production system, constants would need special

handling. Each instance of MI is a macro call to read the assembler

format of the instruction (for listing purposes), a semi-colon, a

number, the MDL description of the instruction, and a full stop.

The number is the cost of the instruction: in the above example ,281 it is the number of clock cycles taken to execute the PDP-8 instruction.

The fabricating done in the pilot system is as follows. The MI call identifies whether the instruction is a transfer, or another assign- ment to a simple address, or an assignment to a compound address.

If it is a transfer, it is first tried as a prefix and as a suffix to all transfers and transfer sequences; if the names match, the appropriate new transfer sequence is added to the list of transfer sequences. Finally the transfer is added to the list of transfer sequences. Thus after the last MI, all possible non-cyclic transfer

Sequences will have been constructed. The transfer sequences are one-level indexed by the combination of source and destination memory classes.

If an MI instruction input is an assignment to a simple address, it is two-level-indexed first by the destination memory class, then by the principal operator of the source.

If an MI instruction input is an assignment to a compound address, it is two-level-indexed first as an assignment to VOID, then by the assignment operator. Such instructions can only be used at roots, so efficiency of indexing is less important.

In a production system, all the indexing would be more sophisticated to achieve lookup speed, especially for the assignments to simple addresses, which would also be separated from the assignments to compound addresses.

The MIEND macro finalises the fabrication, and lists the indexed instructions, as illustrated below. First the two-level-indexed ,282

items are shown, prefixed by the keys: then'.the transfer sequenc

are shown, prefixed by a key of form S»D, where S is the source

slass, and D the destination class. Transfer sequences and

subsequences are listed in angle brackets.

MACHINE DEFINITION:

VOID

DCAICA Mil 3 ( A -> )

A $ CMACANALII 1 $A » A

^ LOADICA Mil 3 AM » A tt ANDICA Mil 3 < aM tt A ) » A ^ ANDICA MJt 3 ( A XX ~M ) » A ANDCA Mil 2 ( M XX A ) » A ANDCA Hit 2 ( A XX M ) » A

IACCA UII 1 < U ++ A > » A IACCA UI1 1 ( A ++ U > » A ^ TADICA MIX 3 < AM A ) » A TADICA Mil 3 < A "M ) » A TADCA Mil 2 < M ++ A ) » A ^ TADCA Mil 2 < A +f M ) » A

Z » A II CLACA ZI U » K Ol ONECU KI Z » K 01 ZEROCZ KI K » M 01 LITERALCM KI AND » M\D 21 DCACAND MNDI M » A 21 LOADCA MI Z » MND 31 «DCACAND MNDIOCLACA ZI»CZ MNDI U » M 01 «LITERALCM KIOONECU K3»CU MI Z » M 01 «LITERALCM KIOZEROCZ KI»CZ MI K » A 21 «L0ADCA MIOLITERALCM KI>>CK AI U » A 21 «L0ADCA MI<>«LITERALCW KIOONECU KI»CU MI»CU A3 > AND » A 4i «L0ADCA MIODCACAND M\DI»CA\D AI M » MND 41 «DCACAND MNDIOL0ADCA M3»CM MNDI ,283

7.3 A structured notation for trees which include alternatives.

This section describes the format of the listings in which

the produced gross trees are listed in the later sections.

Since the root of a subtree must be executed after all its

branches (ie. operands), it is listed after the operands.

The root nodes of the operands of a node are indented 3 characters

more than the node itself. For example:

LOAD A\l- tf\2= i 2) DCA A\1 - ( 2) AND AM ti\-it ( 2)

represents the gross tree:

LOAD A\l M\2

AND A\l M\-l

DCA A\l M\-l

The equal signs mark operands which stand for themselves, that

is, do not have subtrees. Since instruction ordering is not

illustrated in this pilot system, the significance of the order

of the operands of a node is only that they are presented in

the order of the operand-names in the node that do not stand for

themselves: that is in the above example, in the AND, the A^l

stands for the LOAD subtree, and the M\-1 stands for the DCA

subtree.

Alternatives are listed with the same indentation, but with a line of 3 hyphens (at the same indentation also) between them.

For example:

LOAD A\-3=« M\2=» < 2) DCA A\3= M\-8= C 2J TAD A\ -3 ,1\ 3* ( 2) DCA A\-C* i1\-i- < 2)

TAD A\3- H\2* ( 2) DCA A\3 ii\-l= ( 2) DCA I AM- tt\-l* ( 3) ,284

represents the gross tree LOAD A\-5 M\? ;TAD A\-5 M\-8 DCA A\-5 M\-1;

DCA A\3 M\-8 OR DCAI A\-L M\-I

TAD A\3 M\2 DCA A\3 1^-1 '

In the listings, the negative decorations represent introduced

memory locations: the asterisks mark the point of introduction.

If the output stream is to include ALLOC and DEALLOC pseudo-

instructions, then before walking a subtree based on an asterisked

name an ALLOC is to be output for that name, and after the walk

a DEALLOC is to be output for that name.

The numbers in brackets represent costs: before the cost pass,

of the node itself; after the cost pass of the subtree of which

it is the root.

7.4 A simple example code generation.

This example is to generate from the MDL of the PDP AND

instruction, which is listed first. That is followed by a

trace of the code generation, in which indentation represents

the level of recursion.

< ; a\1 ii m\2 ) -> a\1 >

generate **devoid to** generating: < a\1 ts M\2 ) for am \ try: andca m3* 2 ( m It a > » a conforming a-= m\2 m*>a\1 v \ xfertry gives: load a\l, m\2 \ \ xfertry gives: dca a\l, m\-l \ try gives: and amcload am, m\23» m\-1*cdca am, m\-13 \ try: andca mi; 2 < a tt m ) > a conforming m=:m\2 a=>a\i \ \ generation eouiv: AM AS AM \ \ generation £quiv: m\2 as m\2 \ try gives: and a\l, m\2 generated: AND amclaad a\l, m\2!, m\-i*cdca am, m\-1} f and A\l, h\2 The initial objective is to translate the whole of the MDL for the target VOID: the first level of recursion detects that assignment is the principal operator and thus gets rid of the VOID target ("devoids") by altering its parameters to generate A\l && M\2 for the target A\l. The GENERATING line then states what is to be generated and for what target: the matching GENERATED line lists what was generated (in a linear form,). Within a "generation" level, a "try" is made for each candidate instruction with the correct target and principal operator: if the tree of the instruction does not conform to the MDL tree (or subtree) to be generated, it is not listed: if it does, then the candidate instruction is listed together # a with the desired conformances at the leaves. For example: the fconformance'/A=>M\2 means that M\2 is to be generated for ohosui target A. At the end of a try, the code^for the try is listed, or the try is reported as having failed due to a conformance failing.

The conformances are generated by recursing on the GENERATING/

GENERATED procedure. Two special cases of that procedure are that it is a transfer that is requested - listed above as an

XFERTRY - which simply looks up the table of transfer sequences and so either succeeds or fails quickly; and that a generation is requested from a class of memory to the same class of memory

- listed as GENERATION EQUIV. If an EQUIV has identical decorations for the source and target, it generates a null subtree, as in the example above. ,286

The listing of the code generation is followed by a repeat of

the MDL being generated, and a structured listing of the code

generated. For the above example, that was:

< l «\i St M\2 ) -> A\1 ) "N

^ LOAD AM* M\2» (2) DCA AM* M\-l* ( 2) AND AS1 M\-l* ( 2)

AND AM* M\2* (2)

This is followed by two more walks discussed in the next section,

and a third walk which adds the costs of the nodes in a subtree

to be the new cost of the root of the subtree: and at alternatives

eliminates all but the cheapest. After each of these three walks,

if the walk modified the tree then the revised tree is listed in

structured form. For the above example the three-instruction

alternative is eliminated by the costing, so the final listing is

the following. * ^ — ' AND AM® H\2* « 2)

An important point to be drawn from the preceeding example is

that, as formulated in the last chapter, the optimisation methods

would eliminat e-fkfi three-instruction gross tree at fabricate time •

it would not then be generated at secondary code-generation

time.

7. 5 Examples involving transfer sequences.

The following two examples are straightforward applications of

transfer sequences at top level. The first is mainly of interest

as compared to the first example of 7.6. ,287

i n\i -> a\2 > •N GENERATE **QEVOIH TO** XFERTRY 0IUES: LOAD A\2> HVl

( M\i -> A\Z )

LOAD A\2- «\1= i Zi

The second example illustrates a three-instruction prefabricated

sequencev

( U\i -> A\9 )

GENERATE **DEVOID TO**

XFERTRY GIVES: LOAD AN?* M\-1*CLITERAL H\-l, K\-2*C0NE U\l» K\-23J

( U\i -> A\9 )

ONE K\-2» < 0) LITERAL K\-2* ( 0) LOAD A\9» MW* (2)

7.6 Retargeting

In the example of section 7.4, transfers from a memory class and decoration to the same class and decoration were observed to be null. However, it can arise that there is a source and target of the same memory class, but with differing decorations.

In the pilot system, this generates a node with operator > followed by the two operands, "from', thenVtc?. These nodes are eliminated in the two following walks (which were separated out to enable distinct before-and-after listings of the walks, thus facilitating study of the effects of the walks).

The first of the walks deals with a special case: the second walk only applies at leaves, at ' which the destination of the assignment differs from the "natural" destination of ,288

the assigned. This is illustrated in the following example.

< am -> a\2 >

generate **devoid t0*» generation equivj a\2 as a\1 *** respecified *** a\1 to a\2

< a\1 -> a\2 >

—> a\l» a\2» { 0)

cgs ualkinj? on > , A»**undef*»* assoc a\d« a4fi\2> ASSOS: A\i » 2 cgcut: load a\2» h\-1*cdca h\i. l*\-iJ

DCA a\l= m\-la ( 2) load a\2= m\-l* < 2)

The zero costs disappear before the costing walk. The RESPECIFIED

line draws attention to the transfer-to-same-class.

The lines between the two tree listings trace the developement of

the association of the formal and actual parameters of the transfer

sequence:

formal/actual f ormal/actual

A\D UNDEF A UNDEF in the fabricated instruction

A\D A\I A UNDEF after one association A\D A\1 A A\2 after two associations

AM A\l A\2 A\2 after renaming,.

The renaming of the formals is to obtain the desired formalnames; rather than the formal names appearing in the prefabricated instruction. (A similar association process is applied for the conformances desired for a try.)

A subtlety is that in cases such as the above, the "natural destination" must not be corrupted: this is why the target must be used as a temporary, rather than some conflicting source leaf. The retargeting must therefore have no effect at the root, but be pushed back in the tree as a renaming process to the conflicting leaf (or leaves) and the prefabri- cated sequence for transferring that memory class to itself applied there.

The special case is one in which conflict can be avoided: a source or target given in the original input MDL conflicts with an introduced transient-result location of the same class. This can only happen in the case of the generation of a subtree failing, so that a transfer-sequence is tried to find another memory class for which the subtree can be generated. In the following listing this is indicated by the SPLITTRY lines: a TRY having failed (and so not been listed) the source-*target objective is split into source-* intermediate-*target.

There are two new failure messages in this example. CANT

SPLIT.. GENERATED NOWT means that a generation failed because no TRY succeeded, and there were no possible intermediates with which to acheive a split. GENERATION

MISMATCH means a conformance from one memory class to ,290

< a\1 -> ~< mn2 ff a\3 > )

generating: ( am -> mn2 ff a\3 ) ) for void \ try: dcaica m3» 3 < a -> ~t\ ) conforming m=>< m\2 ff a\3 > a=>a\1 \ \ generation equivj a\1 as am \ \ generating: ( mn2 ff a\3 ) for mn-1 \ \ \ sflittry: z j, m oi literalcm k3 ozerocz k3»cz m3 \ \ \ \ generating: < m\2 ff a\3 ) for zn-2 ***cant sf lit \ \ \ \ generated: **nowt*» \ \ \ sflittry retarget failed \ \ \ splittry: u » h o? lcliteralcm n300necu k3»cu m3 \ \ \ \ generating: < m\2 ff a\3 ) for u\-3 ***cant split \ \ \ \ generated: **nout*« \ \ \ splittry retarget failed \ \ \ splittry: z » m\d h AN3 U=>MN2 \ \ \ \ YEILDER OF AN3 AT A=tAN3 \ N \ \ ***** YEILDER ***** AN3 \ \ \ \ GENERATION EOUIVJ AN3 AS AN3 \ \ \ \ GENERATION MISMATCH: MN2 AGAINST U\-6 \ V \ TRY GAVE NOUT \ \ \ TRY J IACCA U3, 1 < A FF U ) » A CONFORMING U = >AN3 A=: MN2 \ \ \ \ XFERTRY GIVES: LOAD AN-3, MN2 \ \ \ \ GENERATION MISMATCH: AN3 AGAINST U\-7 \ \ \ TRY GAVE NOUT \ \ \ TRY : TADCA M31 2 < M FF A ) » A CONFORMING A= >AN3 M-VMN2 N \ \ \ YEILDER OF AN3 AT A=>AN3 N \ N \ ***** YEILDER ***** AN3 N \ \ \ GENERATION EOUIV: AN3 AS AN3 \ \ \ N GENERATION EQUIV: MN2 AS MN2 N \ N \ *** RESFECIFIED *** AN3 TO AN-3 N \ \ TRY GIVES: > AN3CTAD AN3, MN23, AN-3 N \ \ TRY : TADCA M3i 2 < A FF M ) » A CONFORMING M- >A\3 A=>MN2 N \ N \ XFERTRY GIVES: LOAD AN-3, MN2 \ \ \ \ XFERTRY GIVES: DCA AN3, MN-8 \ \ \ TRY GIVES: TAD AN-5CL0AD AN-3, MN23, MN-3*CDCA A\3, MN-S3 \ \ \ generated: > an3ctad an3, mn23, an-31 tad an-5cl0ad an-5, mn23t m\-s*cdc \ \ sflittry ok \ \ sflittryj k » m 0, literalcm k3 \ \ \ generating: < mn2 ff a\3 ) for k\-9 ***cant split \ \ \ generated: **n0u7** \ \ sflittry retarget failed \ generated: dca an-5*c > an3ctad an3, mn23, an-31 tad an-3cl0ad an-3, mn23, mn-3*c try gives! dcai am» mn-1*cdca an-3«c > an3ctad an3, mn23, an-3» tad an-3clcad an-3» enerated: dcai am, mn-1*cdca an-s*c > a\3ctad a\3, mn23, an-3j tad an-3cl0ad a\-s, mn23

< a\1 -> mn2 ff a\3 > >

tad an33 mn2- < 2) > an3 an-3- < 0) load an-5- mn2« < 2) dca an3- mn-8- ( 2> tad a\-3 mn-84 < 2) dca an-3* h\-i» « 2> dcai a\l- hn-1» ( 3)

another had no prefabricated transfer sequence. ,291

The problem causing the retargeting in this example is that

having introduced a temporary location M^-5 (not knowing

any better) at the second-level DCA SPLITTRY, then at the

TAD where the memory classes are EQUIVj it is discovered

that in that alternative M\-5 is to be identical to M\3 -

but not in the other alternative. The walk that renames

the M\-5 to M\3 must therefore (as it returns from recursion")

split the DCA into two alternative DCAs, one with operand

the other with operand m\3:

LOAD A\-5= rt .2= ( 2 > DCA A\3 = M\-3 = < 2) TAD A\-5 ti\-3* < 2) DCA A\-5F rt\-l- < 2)

TAD A\3- M\2= ( 2) DCA A\3 M \ -1 = ( 2) DCAI A\l= M\-ll ( 3)

(Note: in this example, the renaming walk, has reversed the

order of the two alternatives.) The final costing walk

eliminates the 5-instruction sequence:

tad a\3=» m\2» < 2> dca a\3 m\-l- < 4) dcai am- m\-l# < 7) £

7.7 Two realistic examples.

The two examples in this section both yeild optimal code for

the input MDLs: the first does so quickly and with no need for

alternatives or retargeting.

( ~a\? -> "h\9 >

generating: ( ~a\? -> -m\8 > for void \ try : dcaica m3» 3 < a -> "h ) conforming m=>m\8 a—>"*a\9 \ \ generating: ~a\? for a\-i \ \ \ try: loadica mil 3 "h » a conforming m= >a\9 \ \ \ \ generation equiv: a\-l as a\-l \ \ \ \ xfertry gives: dca a\9, m\-2 \ \ \ try gives: loadi a\-l» m\-2»cdca a\9» m\-23 \ \ generated: loadi a\-l» m\-2*c0ca a\9 r mn-23 \ \ generation equivj m\3 as m\8 xb \ try gives: dcai a\-1*cl0adi a\-l» m\-2*cdca a\9» m\-233» M\3 generated: dcai a\-1*cl0adi a\-l» m\-2*cbca a\9» m\-233» m\3 ,292

( ~AN9 -> "M\8 )

DCA A\9= M\-2- ( 2) LCADI A\-1- MN-2* ( 3) DCAI AN-l* M\3 = C 3)

In the above example there were no failures, either in TRY,

XFERTRY, or SPLITTRY. Code generating a larger realistic

MDL instruction:

((~M\l++("M\2++M\3)) "(M\4++M\5))

produces no retargeting, and 9 alternatives of which 7 are

eliminated at the costing stage:

Before costing: After costing:

LOAD A\-1= M\3= ( 2) TADI A\-l MN2= ( 3) LOAD A\-l= M\3= ( TADI AN-l MN2 = ( 5) LOAD A\-1= MN3 = ( 2) LOADI AN-1= MN2= LOAD I A\-19- M\2 = < 3) TAD AN-l MN3= ( 5) DCA AN-l?* M\-15= ( 2) TADI AN-l MN1* ( 8) TAD A\-l MN -15* < 2) LOAD AN-52= MN5= TAD AN-02 MN4= ( 4) LOADI A\-l= MN2= ( 3) TAD A\-l M\3= ( 2) LOAD AN-52= MN4= TADI A\-I MN1 = (3) TAD AN-02 M\D= ( 4) DCA AN-52* MN-2 = < 6) LCAD A\-l= MN3= ( 2) TADI A\-l MN2= ( 3) DCAI AN-i* MN-2* ( 17)

LOAD AN-1 = M\3= ( 2) LOADI A\-28= M\2= ( 3) DCA AN-23* M\-Z4= ( 2) TAD A\-l MN-24* ( 2)

LOADI A\-l= MN2= ( 3) TAD A\-1 MN3= ( 2) LOADI A\-33= MN1 = ( 3) DCA AN-33* MN-21= < 2) TAD A\-l MN-21* < 2)

LOADI AN-1 = MN 1 = ( 3) LOAD AN-39= M\3= < 2) TADI A\-3? M\2* ( 3)

LOAD AN-3?= M\3- ( 2) LOADI AN-46- MN2= ( 3) DCA AN-46* MN-42= < 2) TAD AN-39 MN-42* < 2)

LOADI AN-39- MN2= ( 3> TAD AN-39 H\3 = < 2) DCA AN-39* M\-35= < 2) TAD AN-l M\-35t ( 2) LCAD AN -02= M\3= ( 2) TAD AN-52 MN4* ( 2)

LOAD A\-52~ MN4= ( 2) TAD AN-52 MN3= ( 2) DCA AN-5Z* MN-2- < 2) DCAI AN-l* MN-2* < 3) The listing of the code generation trace is so long that it is relegated to appendix S. It is interesting to note in it that only about one in three trys fail - but in failing often take successful subordinate trys with them - but that about

4 out of 5 of the splittrys fail. This suggests that two- level indexing suffices for trys, but not for splittrys.

On the other hand, when a splittry fails, it fails quickly, whereas a failed try might be very expensive. It is not clear what improvements in the code (e.g. higher levels of indexing) or re-orderings of trys (so that failures tend to come before siblings that would be expensive) would make the secondary-code-generator faster. However, figures such as the above suggest that the method can be made very efficient, provided the optimisation method can avoid sufficient alternatives that the combinatorial explosion of the number of remaining alternatives is contained. CHAPTER 8

OTHER USES OF -VfflE TOOLS

8.1 INTRODUCTION

This chapter briefly lists some interesting potential uses of the CF1 and CF2 software tools. For lack of time, none of these potentials have yet been investigated more than superficially.

In general, CF1 could be easily used for any parse-dependent text transformation that is not too intelligent* For example, the transform notation is far to primitive to describe transformations to programs that may significantly change their algoritms, as does the system of /DAR 8_1/. However, even with elementary transforms, the possiblities are mainly limited by the imagination, as evidenced by the humber of possibilities listed in 8.2 and 8.3.

CF2 is less likely to be so versatile, but 8.4 lists some possiblities. Section 8.5 presents another possible use for CF2 and CF1 in (reversed) combination, and 8.6 comes full circle to the origins of CF1.

All of the uses discussed in section 8.2 to 8.6 are oriented to programming languages. Sections 8.7 and 8.8 mention two other application domains. ,296

8.2 SOURCE PROGRAM TRANSFORMATION

There are two primary reasons to transform a program into

another in the same language: either to y^C^ld a faster

program (8.2.1) or to yt&ld a more understandable program

(8.2.2). Usually these objectives conflict.

There are also some applications and motivations for

translating into a simpler underlying language, such as

'language contraction'.

Sometimes there are hybrids of these motives: for example

/BOY727 describes a system for translating into a contracted

language for re translation back into a more understandable

full-language program, or into an optimised contracted-

language program for eventual execution. Such a combined

system would be a natural application for CF1.

One oddity of the literature on program transformation is

that it tends to be uuv*.. ' as to whether it is oriented

to speed improvement or legibility improvement. One project

close in technique to CF1 was the Irvine Program Transformation

Project /KIB77/, /STA76,76• ,76^7 in these documents they address the technology in detail, but limit the motivation

to isolated sentences using the word 'improvement',

(Perhaps the problem is that they wished to capture all forms of improvement: perhaps it is the difficulty of separating the two forms of improvement.) ,297

Apart from 'improvement', source-to-source transformation could be used for purposes of language extension: this is discussed in section 8.2.3.

8.2.1 Program transformation for simplification

Some optimising transformations also y\2ld a more readable program: for example the re-arrangement of disordered blocks to remove logically redundant goto's described in section

2.2.2.1 (page 26). This is an area on which very little has' been published, and yet would seem attractive. (Perhaps it is the case that the environments in which such work might be considered are precisely those in which it is less likely to be needed.) The work that does appear to have been done is often specific to one technique, and motivated by other considerations: e.g. /BIR77/ takes an approach to the introduction of recursion into a program apparently motivated more by intellectual curiosity than for the resulting program and whether it is more legible.

The one major form of simplification that has an extensive literature is Algebraic Simplification: _/M0S7_l/ is a good entry to the topic, if slightly dated.

To summarise: THIS IS AN AREA MERITING INVESTIGATION: CF1 would be of assistance. The major difficulty would probably be that of locating a source of suitable 'bad' programs. ,298

8.2.2 Program transformation for optimisation

A variety of systems have been proposed: some specific to

a single optimisation (e.g. /AUS787, /BIR77I,797, /D0N797;

some to a small group of optimisations (e.g. /BAG70/);

some more general in their capabilities (e.g. /SCH74/,

/L0V77/, /RUT77/). The two last cited systems are

particularly relevant to the present work, both being

systems to which transformations are described, rather

than being built-in.

8.2.3 Program transformation for language extension

Often it is the case that in a pre-existing language, it

is possible to write the program desired, but there is an

obvious syntactic extension that would render the program

more comprehensible. For example, in both CF1 and CF2

there was a heed for a programming language construct

equivalent to the following pattern in Simula.

begin index x; comment the following loop uses a control variable of some type 'index' - often a pointer type, sometimes an integer; x := initialisation; while not found(x) do begin if last(x) then goto failed; x:=next(x); end; comment if we get here, we have x being the first value such that'found' is true of it; success(x); goto exit;

failure; exit: end of searching loop; ,299

The construct makes a search from an initial value until some test is'found to be true in which case 'success' code is executed- or until the search fails, in which case

'failure' code is executed. A suitable language extension might be:

SE^ftCH FROM x:= initialisation STEP x:= next(x) UNTIL last(x) IF found(x) THEN success(x) ELSE failure;

In CF2, written in POP-2, this was possible, POP-2 supporting source language macros, but in CF1 this could have been achieved by a pre-processor written using CF1.

Various aspects of source language enrichment have been suggested in the literature, for a variety of motives, varying from new constructs specific to a type of problem

(as above), to the concept of simplifying the programming language problem by providing a core language designed to be used only as the target of syntactic extensions as above. Notable contributions are /BEN64/ , /BIT74.75/,

/CAM78/, /SHA747.

8.3. MISCELLANEOUS PROGRAMMERS AIDS

This section presents a variety of applications for syntax- based transformation from program source that are of assistance in the writing and documenting of programs. The first two yeild programs equivalent to the original: third for preparing programs, the final three for extracting useful information about the program. ,300

8.3.1 Prettyprinting

For structured language, organisation of the program text

so that each line is a logical unit, with indentation such

that the left margin reflects the program structure has been

found to improve the legibility. A program that automates

this indentation is called a prettyprinter.

Many "prettyprinters" have been written for various languages,

especially in the last decade: e.g. /CLX78/, /C0N70/.

The best prettyprinters use extensive syntactic knowledge.

CF1 could be a tool for supplying that knowledge cheaply.

8.3.2 Program profiling

For a variety of purposes - notably identifying which parts

of a program most require "improvement" (in some sense)-

it is desirable to have estimates of the fraction of runtime

spent in each fragment (say, a line) of a program: or

alternatively the number of times the line is executed

during a 'typical' run (if 'typical' can be defined).

/CHE78 / , /ELS787, /ING727/R.UT777, /SAL757, /WEG757-

This information is of use also to some optimisers/BAL 79/.

These statistics are often post-processed onto a listing

of the program, as a column of figures in one of the margins.

THE "width" of the column gives a logarithmic profile of the

frequency of use of the adjacent code, so the technique has

become known as 'program profiling'. Few compilers support this facility. It is therefore commonly.acheived by preprocessing the program to insert

"counting" instructions and count-output code, by running

the program to get the raw statistics, and finally post-

processing the raw statistics onto a listing of the source.

Both the pre- and post-processing requires syntactic

knowledge, and so are natural applications of CF1.

8.3.3 Language specific editing

Editors that are specific to a single programming language

are now becoming available. They have several advantages,

including automatic pretty-printing (pretty-editing?) of

the source.. They make it difficult to create program sources

containing syntax errors.

Such a tool based on CF1 or another facility for syntax definition would have the additional advantages of programmer-portability and cheapness of implementation of related editors for a series of languages. A good example is _/MED8_l /.

8.3.4 Flowcharting

The best automatic program-flowcharters and related tools depend heavily on syntactic knowledge: again CF1 could be used to implement related flow charters for a series of languages.

8.3.5 Cross-referencing

The best automatic program-cross-references depend heavily ,302

on syntactic knowledge: again CF1 could be used to implement

related cross-references for a series of languages.

8.3.6 Portability aids

A variety of tools have been devised to assist in various

portability problems.

Validators examine a program to discover non-portable and/or

non-standard features /J0H777, /TRI787-

i

Bridgeware converts a program source from one language to

another, or one machine-dialect of a language to another /BR07l7.

Validators and bridgeware of high level languages depend

on syntactic knowledge, and so could be implemented using CF1.

8.4 OTHER APPLICATIONS OF CF2

CF2 does not have such a range of alternative uses as CF1,

mainly because it is more specific and more generative

J(whereas CF1 is a generalised compiler of a flexible language) .

Consequently, whereas the applications suggested for CF1

tend to require use of an unmodified CF1 with the output

combined with hand-written code, the following suggestions

tend to imply modified versions of CF2 itself, and to be

more nebulous in nature.

8-4.1 Verification of programs and translations

There are several possibilities for taking a translated

program and converting back to a form better capable of being understood as the "effect" of the code. This could be

an editedgross tree (or a combination of gross trees), or a

symbolic execution^or a hybrid form. In any case the

"translation back" would have features akin to the inversion

of a machine-code-to-MDL translation, as discussed in section

3.2.2. (pages 70-71).

/"30Y77J, £RUT77J, all suggest variations on this : concept, either for validating the program against some requirements, or for comparison of its effect against that of the original source , in order to validate the translator

"in the case of a certain execution only.

8.4.2 Assembler-level bridgeware

When an installation converts from one machine to another, a major problem is often the conversion of assembler level programs. Tools to assist in this can be devised, and often convert around 95% of the programs. The remaining 5% - and particularly locating the remaining 5% - requires human skill.

Using a variant of CF2 for this purpose has the advantage that the tool would be configurable to the pair of machines, and so could be created more cheaply. However, CF2 is designed for translation from an IL, therIL in turn being oriented to such use: a machine language does not have such an orientation. An IL design would avoid features that are diffrcult to translate from , such as the condition codes found in many machine languages. Thus, using CF2 from ,304

machine 'code1 will cause new problems. In purpose-written

bridgeware there will be specific code to deal with such

problems. Hence a CF2 variant will either need dedicated

modifications, or will fail on maybe 10% of the conversion.

8.4.3 Verification of architecture descriptions

Variations on the techniques of the above two subsections

could be used to construct a tool to emulate programs in

the machine codes of hypothetical machines.

/BAR78/ gives a good description of such a method and its

usefulness.

8.5, DE-COMPILATION

Using CF2 to translate from machine code to an IL (as in

subsections 8.4.1, 8.4.2) with the IL organised to appear

structured, then using CF1 to transform the IL to nicer

syntax, could de-compile programs. The main applications

of this are for 'rescuing* programs of which the source

has been lost, or of doing the bulk of the work of converting

a program written in assembler to a higher level language.

There are problems in this technique. Again, 5% of the

code will not be properly translated to IL. Also, in

compilation, the compiler discards much information,

such as symbolic identifiers: a de-compiling system cannot

create intelligent names, and so needs them to be manually

supplied. (One could say that compilation is an entropic

process, so that de-compilation requires external work ,305

to be sacrificed in order to apparently reverse that entropy.)

The topic is further discussed by /H0P78/, /SCH74/.

8.6 LANGUAGE DEFINITION

The CF1 transform notation was conceived in 1970 as a means

of defining BCPL and extensions to BCPL by means of transforms

ultimately into a "well understood" pseudo-machine language

_/LES70/. If the target is still not sufficiently basic, the

CF2 notation is a means of describing it in MDL (which is

very basic in nature). Thus, the concept of the CF1 notation

as a definitional tool stands, and has implicitly been

extended by the CF2 notation.

This is similar in principle to the Vienna Definition Language

(VDL) for PL/1 /LUC697, /WAL697, /WEG727, which defines PL/1 by translation to the language of an interpreting machine,

and by a clear statement of the behaviour of the interpreting machine.

8.7. OTHER POSSIBLE CF TOOLS

This section lists some tools to extend the compiler factory, which could be more easily written using CF1 itself.

If a syntax is presented to CF1 in a non-LL form, CF1 will fail. There is a need for a tool to read CF1 lexic and syntactic divisions and to check that they contain no left recursions and require no left factorisation. The tool' could easily remove some recursions and factorisations, but others ,306

require re-writing of the transform case patterns (which

is probably not mechanisable-, involving quite sophisticated

judgements.)

For large CF1 inputs a structured layout editor and a

crossreferencer would be a considerable development and

maintanance aid. Cross-referencing a transform division

would output a much more elaborate document than

orthodox program cross-references.

As discussed in subsection 4.4.10 (pages 111-113), there

is a need for tools which examine a set of transforms

forr termination. Many subsets can be shown to terminate

using methods such as those of subsection 4.4.10, so the

tool would either certify the whole set, or draw attention

to cyles of cases that either cannot terminate or might

not terminate.

Note that new tools operating on the transform division

would be most simply written to read the output of the

interpretive parser, rather than reading the "raw"

transforms, thus avoiding any requirement for an inbuilt

interpretive parser in the new tools.

As discussed in section 5.5 (page 155), the output of

CF1 can contain major inefficiencies, such as duplicated

tests arising from commonalities in sibling matching-patterns,

the commonality not being present in the parent. A Simula-toSimula optimiser specific to these inefficiencies could accellecate some CFl-output programs to around double their speed.

8.8. OTHER APPLICATION DOMAINS

Other applications of the tools are limited mainly by the imagination of the implementor. CF1 in particular could find use in any domain that requires transformations from a parsed input, e.g. the symbolic solution of Ordinary

Differential Equations /WIL71/. CHAPTER 9

SUMMARY

Come, let us go down, and there confound their language, that they may not under- stand one anothers speech. Genesis XI, 7

9.1 INTRODUCTION AND OBJECTIVES

The modern Babel of programming languages arises because we

do not know quite what we want from a programming language.

Each language needs a compiler for each machine on which it

is to be used, and writing a compiler takes man -years.

To even study the problem by language experimentation requires

many pilot compilers, which cannot be afforded for such

experiment. Most languages are therefore designed on hope

and guesswork, which cannot even be amended by early experience.

Study of the previous work on methods for cheapening the

creation of compilers suggested that the major difficulty

is that we have not fully identified the problems, especially

in the area of code generation. On this, we do have a mass

of experience - most of it little above the level of folklore.

We have phrases like 'Compiler compiler' and 'Compiler generator', representing ideals that have not even been clearly described, let alone acheived.

The initial literature survey gave little guidance, except to

suggest pitfalls to avoid. The only positive leads were a

few very isolated pieces of work started in the 1970's, and

addressing the problem in a very fragmented manner. ,310

This work has therefore aimed at being a radical departure:

to attempt an analysis of the whole problem, and to first-

draft a solution to it.

9.2 THE ANALYSIS

The key conclusion was that three input descriptions were

identified as the characterising property of the desired

toolset:

(1) a description of the lexis and syntax; of

the language to be compiled;

(2) a description of the meaning of each syntactic

construct; and

(3) a description of the machine code.

The detailed design attempted implements (3) as a description

of the effect of each instruction in a "machine description

language" (MDL), and of (2) as a two-part description - first

of the effect of each syntactic construct in an intermediate

language (IL), then of each IL instruction in MDL. The double

use of MDL gives the relationship of the language description

to the machine description.

Another key aspect is that none of the descriptions are

programs in an orthodox sense, being composed of many un-related

fragments, and with no concept of "program control". The

relations required between the fragments are deduced by the

tools to which they are input, from any inter-fragment

dependencies. This high degree of separation permits much

simpler detailed implementations. The key analysis is that of section 2.2.2.4 (page 29) which

separates the phases of the desired stereotype of a compiler.

The first-draft tools cover the essential parts of that

stereotype, so far as it is clear. At the points at which

the stereotype becomes unclear, there has been no attempt to

force a stereotype: the old-fashioned hand-written code methods will be used until they suggest a stereotype. Thus, experience has been used where possible, but elsewhere more experience is persued.

In general, whereas parsing is well understood, code generation is not. Therefore, most of the emphasis has been placed on code generation. The approach has been very pragmatic, searching for safe foundations on which theory might later be based.

9.3 THE DRAFT SOLUTION

The final sections of chapters three to seven draw many detailed conclusions: this section addresses issues covering more than one of those chapters.

The final sections of chapter two gives a detailed view of a monolithic Compiler Fabricator. Relating the compiler stereotype of 2.2.2.4 to the various descriptions suggests two. tools:. one...' to create compiler "front-ends" which translate from source to intermediate language, the other to create compiler "back-ends" which translate from intermediate language to machine code.

The key observation was that some language-knowledge and most machine-knowledge is required during one pass: all other language-knowledge can be separated out. Intermediate language ,312

programs should therefore embody all the knowledge of the

original source to be considered together with the machine

code description.

Putting the ideas to the test of implementation and initial

uses of the implementation has not shown any fundamental

problems, save one: the tools themselves being a kind of

compiler, they have been expensive to write - about 1V4,

man years. This merely emphasises the problem, rather than

suggesting anything about the attempted solution. The parts

of the tools analysed but not yet implemented are concerned

with efficiency, rather than feasibility.

Thus, so far, it seems that the tools are potentially useful:

considerable experience will be needed to determine the extent

of that potential.

9./J. BEYOND THE OBJECTIVES

Creating tools to meet the original objectives of simplifying

the creation of pilot compilers has in some respect led to

passing those objectives. In particular, major possibilities

for the optimisation of secondary code-generation were observed,

and are quite unlike any known published work. These possibilities

may prove viable as the basis for compiler back-ends of production

quality, with advantages of consistency and cheapness across

machines and languages. 9.FUTURE WORK

The current tools were intended as first drafts. Use will be

needed to clarify the requirements for their enhancement or

replacement. The only clearly established deficiency so far

is that LL(1) is an impractical restriction on the freedom of

syntax-writing: for most current languages LL(2) would be much

more desirable, for example to disambiguate procedure call

statements from assignment statements without left-factoring

"name" out of the syntax.

There are also theoretic problems to be addressed, notably

termination in sets of CF1 transforms, should that prove to be

a practical problem. The "level" to be desired in an inter-

mediate language needs clarification: set too high, it may

restrict possibilities for writing optimisations in CF1 notation

and may lead to explosion of the number of possible trees to be

examined by the fabricator (or it may lead to faster optimal

compilation of special cases); set too low it would make it very

difficult to use "long" machine instructions such as block-moves,

or block-searches.

9.6 CONCLUSION

Many shall run to and fro, and knowledge shall be increased.

Daniel XIII

Only the two projects described in appendix A sections 1 and 2 are known to have led to tools akin to those of the present

thesis. The three projects overlapped In time, all proceeded

from radical attempts to clarify the problem, and have been limited successes serving more to demonstrate the magnitude ,908

of the problem than its inherent difficulty. That all three

started' in the mid-seventies suggests, perhaps, that the time

was ripe for such efforts. The key concept in all three is

that compilers are too complex to program cheaply; that the

complexity is of the number of inter-actions to be analysed

and that such analysis can be automated; hence, that the

implementor ought to describe the requirements - not the

solution.

The instruction choice method of this thesis and of the PQCC

project are both related on that of Wadilew jfahS72j (see also

appendix A, section A.l). Apart from this, the three solutions

differ greatly in detail.

The common conclusion of the three works is that the automation

of the production of the common parts of compilers is now a

probability, whereas before it was only a possibility.

The thing can be done, " said the Butbher," I think. The- thing must be done,. I am sure. The thing shall be done! Bring me paper and ink, The best there is time to procure." (Lewis Carroll, "The Hunting of the Snark", Fit the fifth, stanza 13, MacMillan & Co, London 1876.) APPENDIX A

ALLIED WORK

This appendix consists of three sections.

The first two sections each review a project akin to that ~ • reported in this thesis: that is with the object of creating a compiler fabricator which ultimately would produce a complete compiler from descriptions of the machine and language.

Both of these projects have a philosophy for compiler compilation, though not as extensive as that of chapter 2

(mainly because they are both more based on prior works which at most only groped towards such a philosophy). The documenta- tion, of each suggests the conclusion that they suffer traditional problems by having missed two major points: that register allocation for temporaries and instruction choice are best handled together (appendix C); and that it is desirable to have a fabricator phase that considers the IL and the machine instructions together (section 2.3.1). They both separate choice and allocation, IL and machine.

The third section reviews other work allied to this' project," but'- of more fragmentary nature. ,316

A. 1 PQCC AND WAS I LEV;

The Production Quality Compiler Compiler (PQCC) project at

Carnegie-Mellon University has the aim of automating the

production of Production Quality Compilers (PQC's). There

are three fundamental concepts:

* To describe the language and machine;

* To fabricate a compiler based on the BLISS-11 compiler

/WUL75/ as stereotype; and

* To use an 'Expert System ' to produce each phase of

the PQC.

Section A.1.1 discusses 'expert systems';. Section A.1.2

overviews the stereotype and the system as a whole. Apart

from overviews, most of the published literature on PQCC is

on register allocation and instruction choice, so section

A.1.3 reviews that in more detail. The instruction choice

method is based on that of Wasilew, and both are related to

that of this thesis. Therefore section A.1.3 also discusses

Wasilew's method, and the relationship between the three methods.

A.1.1 Expert Systems

Wulf _/WlTL80/ observes that techniques originating in Artificial

Intelligence (Al) are now appearing in Systems and applications

programming; (For example; the tree matching of chapter 6

operates by starting with a goal, and generating subgoals to

be met to reach the goal, then recursing. This is a typical

example of 'means-end analysis', first used for theorem-proving

programs, game-playing programs, and the like.) Wulfw observes that a program that uses a generalised AT technique for a specific class of problems is impractically inefficient, but that building highly specific knowledge and ability into it makes it viable - and an 'expert' in its class of problems.

He cites three phases of a PQC with such 'expert' aspects.

This concept is interesting insofaras other work on compiler fabrication, carried out independently shows similar tendencies to be 'expert'. An outstanding example is _/FRA77,77_V, reviewed in section A.3 of this appendix.

A. 1.2 QVERVI5'-' OF PQCC

This section is a precis of /LEV79/. The design is for a series expert programs,each of which generates one phase of a PQC (1).

The project is strongly founded on the BLISS-11 compiler _/WUL75/ as a model of what "Production Quality" compilers should be.

The fundamental idea derived from BLISS-11 is that the produced compiler consists of the following phases:

* Parse the input into a tree (PARSE phase) ;

* Perform Constant fold and Constant propagate

optimisations (this phase is not named);

* Construct a control flow graph (FLOWAN phase);

(1) Here, a.'phase' could be a pass over the whole of a pass over an internal form of the translated program, or it could be one of several phases in a pass, or it could even be spread over several passes. The key point is that one phase carries out one kind of action over the whole translated program, however it is invoked: they do not say if they use one pass per phase. ,318

* Perform source-to-source transforms on the tree

prior to register allocation and code generation

(desirability analysis of (e.g) constant subexpression

removal, unary complement operator propagation,

evaluation order and targetting, and-others which

are machine independent tree transforms and only

involve a small amount of context - really a group

of phases defined in a tree-transform language)

(DELAY phase - an explained misnomer);

* Access mode determination of storage references in the

light of target machine architecture (AMDphase);

*" Register allocation (not assignment) which in PQCC

/LEV8_1/ includes allocation of programmer-defined

memory to particular kinds of machine memory, subject

to size limits on the various kinds of machine memory

(TNBIND phase).

* Code generation (in the sense that this thesis calls

'instruction choice') from the tree /CAT78/ for a

double-linked list of instructions and labels which

might (e.g) use more registers than the machine has,

overlook certain side-effects of some instructions

(CODE phase);

* Peephole optimise and Finalise including register

assignment and resolving the other omissions and

comissions of CODE pass (FINAL phase).

A'/UL80/ mentions an additional phase called UCOMP for linearising arithmetic and boolean expressions and presumably minimises the

consequences of not having feedback from instruction choice to the allocation. (This may be a subphase of DELAY.)

A key notion in PQCC is that of having a generalised tree notation used by all PQC phases. A phase can be fabricated either to take a tree in main memory, or to input it from a

"dense" binary form (using standard routines) or to read it from a human-understandable form (using other standard routines): similarly for output. Given the multi-phase nature of the

PQC, this would have the advantage that a fabricated phase could be tested in isolation, on test data written by hand. A

A second advantage is presumably that for languages demanding compiler phases not provided by PQCC, such a language specific

phase could be hand-written; e.g. the "middle end" of the Charette

ADA Compiler /BR0807.

A curious omission throughout the PQCC literature is that

"systems" aspects are not discussed, apart from the inter-phase data format. There is no mention of whether a fabricated phase consists of code specific to the language/ target combination, or of fabricated tables that parameterise a handwritten common driver program (2).

A.1.3 Instruction Choice and Register Allocation

In a PQC these are two separate phases.

/WUL80/ mentions an additional phase, called UCOMP ("unary complement optimisation") which is a algebraic simplifier for

(2) Or a hybrid: e.g. fabricating parameterising tables, plus SYSGEN lines that cause code redundant in this case to be compiled out of a handwritten driver program. ,320

arithmetic aid boolean expressions, for example converting

(-y) > (x*(a-b)) into (b-a)*x>y which, in the sense of

appendix C.2.2, has the ideal bandwidth of 1 (whereas the

original has bandwidth 2 at two points). This linearisation

is a pre-emption of some of the task of instruction choice

(CODE), and presumably is needed before the register allocation phase because in"a PQC register allocation comes before

instruction choice, but needs information from instruction

choice. This is the opposite problem of that observed in

C.3.2: there, register allocation after instruction choice

cannot influence the instruction choice (but should). These

are both in constrast to C.2.2 in which the instruction choice

and register allocation are simultaneous, and to this thesis

in which alternative instruction choices are presented to the

register allocator for it to pick the cheapest after insertion

of any required store/restore sequences in a manner that can be interleaved with the instruction choice. (3)

The above suggests that in a PQC, register allocation and

instruction choice are not merely separate phases, but separate passes . (This is consistent with the BLISS-11 compiler.)

A.1.3.1 Register Allecation /LEV8l7

Leverett's thesis covers all memory allocation, not merely register allocation. His system takes a machine description and analyses the kinds of class of memory, the number of memory units in each class, and overlaps between classes and their usage. The algorithm is essentially to associate with

(3) This paragraph describes what section 2.2.2.2 (pages 27&28) calls a "negative phase ordering problem". each variable of the source program and with each likely

temporary a cost (4) for each memory class to which it might

plausibly be allocated, to analyse when the variable is alive

as a graph in which the nodes each represent a variable and

the arcs that the two variables represented by the end-nodes

are both alive at some point. The problem is then to find a

"map colouring" of this graph in which a "colour" is a memory

class. There must not exist any subset of the nodes of the

same colour (memory class) with every node in the subset

connected to every other node in the subset such that the size

of the subset exceeds that of the memory class. This colouring

determines which cost applies to each node (variable): the «

objective is to find a colouring that minimises the sum of

costs. (See also /CHA 80?) •

There are complications. If the target machine has two classes

of memory that overlap (i.e. some locations of class A, some of

class B, and some of both class A and B) then the size limit

on each mutually-connected monochromatic subgraph must be replaced with a treble limit on each mutually-connected dichromatic subgraph. Many machines in fact have complex class overlaps: for example, the ICL 1900 in "15.AM" address mode has a memory of 32K locations, all accessible via modified indexing as secondary operands, of which 4K are also accessible directly as secondary, of which 8 are available directly as primary operands (registers), of which 3 are usable as index-modifiers

to access the 32K. Thus there are four main memory classes, (4) Le-verett mentions that the cost may be in time, code space, data space or a hybrid. He does not mention execution profiles with respect to time costs, so presumably he would calculate these statically, on the assumption that every line of a program is executed once. 322 i

each a subset of the last. There are also instructions with double-length operands, all of which require the operands be in adjacent locations, and some of which require the lower location of the pair to be at an odd address. Using graph colouring, and avoiding unlikely allocations, one needs a 12-chromatic subset limit!

The simplest case of graph colouring is that in which no two adjacent nodes are of same colour. This is an NP-complete problem.

Finding a maximal size connected n-chromatic subgraph is

?iP-complete, though cheap so long as the largest allowed subgraph is small (e.g. representing three or eight registers, 4 but not 4K!). Minimisation problems over multivalued domains are a variation of the knapsack problem, and so are NP-complete.

Thus, for at least three aspects of the problem, suboptimal heuristics are needed. Leverett does this mainly by ranking, starting with scarcest memory classes, and preferentially allocating those variables for which that class much reduces the cost, until the chromatic limit is reached (see also section

6.6.1, page 203).

/

Leverett also gives extensive consideration to the problem of'" allocation within stacks, trying where possible to utilise stack instructions provided by the hardware.

A.1.3.2 Instruction Choice /WAS727 /CAT78/

Wasilew's technique for instruction choice is essentially similar to that of chapter 6 except that he starts from a fine tree, and his algorithms do not take advantage of assignment for an efficiency heuristic. Instead he uses two

other efficiency heuristics: he expects the tree to be in a

canonical form (e.g. constants "left" of variables) so that

patterns of other forms need not be tried. Also, given a

choice of instructions at a node, he prefers the "most

complex" (e.g. the one with most interior nodes) to generate

subgoals: it is only if subgoals fail that he tries simpler

instructions at the node. (Chapter 6, by contrast, tries

all possibilities at the node, that is it regards compile- time optimisation as secondary to run-time optimisation.)

Cattell uses essentially the same method, also with the

preference for a maximal choice top-down in the tree, but

with no mention of other heuristics, He places more attention

on the generation of the patterns for which to search. This

is separated into a fabricate-time program which inputs the

machine description and also logic axioms such as "not not X=X".

His fabricator has two main parts: SELECT proposes patterns

likely to arise to be matched, SEARCH tries to find combinations

of instructions to match them. The fabricator outputs patterns

and matching instruction sequences as a table that parameterises

a hand-written instruction choice program.

From examples, it is clear that Cattell !-s system embodies Al-like

abilities: examples show correct use of "skip" instructions,

including one that skips over a no-operation so as to obtain

the side-effect (only) of the skip. ,324

Presumably, one leaves the SEARCH/SELECT fabricator running

so long as it is producing pattern-sequence output pairs at an acceptable rate.

Cattell has considered the possiblity of using the machine description to help choose 'interesting' patterns:

"It is intuitively unsatisfying that this approximation does well for existing machines. We would like a scheme to automatically determine what special cases... to insure optimal code."

Unfortunately he does not go further in this direction: he has nothing like algorithm of 6.7.4 which alternately reacts depending on machine instruction and IL instruction descriptions.

(Indeed, since he does not have an IL description in the sense of thepresent thesis, he would be limited to the machine instructions only.) A.2 TUM-CC, MUGI, MUG2, AND MILLERS DMACS.

In the 1975-1977 period a group at the Technischen Universistat

Munchen produced a family of compiler-writing systems. The

first was called TUM-CC:/RIP75/ reports on it as if it were

partly finished, without mention of whether the produced

compilers were to be one-pass or multi-pass. The essential

philosophy seems to have been to search for flexibility by

having a large number of phases and internal forms, some phases

being "generated" code, some being standard code parameterised

by preprocessed tables. Some phases may have alternatives:

e.g. for the syntax phase it is said that the compiler-implement

has a choice of LL or LR methods. There is a hint that in some

phases, the compiler-implementor has a choice between generation

or parameterisation.

Perhaps this became too large a task, even for a group. By 1976

they were reporting on a descendant of TUM-CC called MUGI,

limited to the production of single-pass compilers /WIL76/,

_/C-AN76/, and a year later again on a descendant for multi-pass

compilers called MUG2/GAN777-

This appendix describes first TUM-CC in section A.2.1: most of

this description applies to MUGI and MUG2 also. All three are

founded heavily on an implementation of Millers DMACS system

_/MIL'7_l/ for describing instruction choice, so that is the

subject of A.2.2. Finally sections A.2.3 and A.2.4 respectively describe the differences and advances in MUGI and MUC-2. ,326

A.2.1 TUM-CC as the base of MUG1 and MUG2

7UM-CC _/RIP7_5/ is described as if it is composed of four phases, without mention of how they are grouped into passes:

(1) A "string to tree transduction grammar"

_/DER73/ specifies the parsing, tree building,

and machine-independent optimisation of the

tree. The parsing is by an attributed grammar

_/KNU68/ with semantic functions associated

with productions, and operating on attributes.

The tree—building description- is checked for

being feedback-free /MCK7^y>, The optimisations

are given in the form of an attributed trans-

formational grammar.

(2) The abstract program tree (APT) from the last

phase is read in an operator-order linear form,

annotated with attributes, by a "tree walking

push-down transducer (TPDT)", i.e. a tree

automaton /AH071/ whose effects depend on local

properties (only) of the tree. It£ behaviour

is specified by '.'code templates" (indexed by

APT operator, for efficiency). It outputs a

"zero address virtual machine code " which, has ...

usually three stacks: of operands, of labels,

and of temporaries. The 'lifetimes' of stack

entries encode longer-term information about

the represented entities. (3) The zero-address code is simulated (see section

6.5, page 198) to give one n-address instruction

per zero-address instruction: that is to make

the information implicit in the stack explicit

as addresses. This phase is "driven by a

de scription of the IL" (that is of the output

form) .

(4) A system based on Millers DMACS . • converts

the IL to final target machine code.

IL design is by the compiler-implementor, so it can be machine independent (for portability, but poor optimisation) or machine dependent. In the portable case, the implementor would need to write a new DMACS description per language/machine combination, not one per language plus one per machine (section 2.3.1, pages

30,31) .

The separation of the zero-address and n-address forms makes the allocation of temporaries relatively simple. Presumably the description of the zero-address form is just that of the n-address form, stripped of explicit addresses.

The separation of translation to a tree and of optimisation in the tree is interesting and unusual. The authors claim this is a major aid to legibility. It is not clear if the two description notations are the same. Note that each phase operates on a highly suitable form of

the program. The price of this is the variety of intermediate

forms.

A.2.2. Millers DMACS /MIL7l7

The feature of these systems that raises them above the common

run of "syntax-directed translators" (i.e. parser generators

with semantic inserts) is the code-generator generator for

producing translators CAPABLE OF OPTIMISATION from intermediate

macro-code to machine-level code and which are derived from

descriptions which INCLUDE MACHINE DETAILS. Unfortunately

_/GAM7_6/ dismisses this in only the following paragraph:

"Generation of a machine-depende>nt code generator

The Descriptive Macro System proposed by Miller ...

has been implemented for use in MUG1. Its inputs are

machine code macros and a target machine description

in terms of storage resources and transport paths

between them."

No examples are given: nor have I been able to yet locate and

translate from German any better description than the above.

This brief dismissal is quite tantalising because Miller's

thesis only deals with two subproblems of code generation - viz expressions and data access - and then does not in either

achfCve full generality - e.g. in data access he allows only

simple address calculation (index, displacement, base, but-not in-

definitely long indirection loops), assumes .all memory Is directly

addressible (despite the DEC PDP-11 and Ferranti Hermes),

and that registers have certain properties such as that they

are the first main memory locations (which the one accumulator of the DEC PDP-8 is not). Miller's scheme is not actuate

to describe a stack-based machine.

The following description is based on Miller's thesis: I

have not speculated on how it was realised in TUM-CC/MUG1.

(Page numbers in brackets refer to Miller's thesis.)

Miller commences with a target that sets him far beyond

the "compiler-compiler"s of the 1960's: "The present research

represents an attempt to isolate some of the problems involved

in code generation and to show how a code generator can be

automatically created from a description of the computer on

which the code is to be run" (p7) ... " There are two steps

in creating a code generator using DMACS. The first step

is to define a set of procedural macros in a machine independent

somewhat skeletal form. The second step is to supply information

describing the computer for which the code is to be generated.

DMACS uses this information to flesh out the macro definitions.

The two steps are quite independent, so that once the first

step is done for a language, the second step can then be done

for a variety of object machines"(pll) Miller differs from

MUG1 in that he does not separate semantic expansion-and-analysi

into the intermediate macro language from final expansion

into machine code, also in that he has his parser create his

symbol table for use by the instruction choice phase; Source program

PARSER

Symbol Macros Table /

^Code Generator

I

• Machine Code

the following example (pl9) illustrates both 'data description"

(array reference) and "computation" (arithmetic expression):- A( I) =B+C(J)*D

105 SS C,J SubScript 106 MUL 105,D MULtiply 107 ADD 106,B 108 SS A, 105 109 ASSG 108,107 ASSiGn

The language implementor might write a machine independent ADD

macro with logic as follows (p36) :-

macro ADD X,Y if the types of X and Y are integer then IADD X,Y else if the types of X and Y are floating then FADD X,Y else error

To describe machine-dependent IADD and FADD it is necessary

to describe the registers and paths between them (p36):-

rclass R: r2,r3,r4,r5,r6,r7,r8,r9,rlO,rll rpath R->S: ST R,S rpath S-^R: L R,S

Then the machine-dependent operations can be defined (p37) )(.$)

IADD al,a2 (commutative) from R(al),R(a2) emit AR al,a2 result R(al) from "R(a2),S(a2) emit A al,a2 result R(al)

"This defines two permitted states, code to be emitted from each

state, and the location of the macro result"... two other states are:

S(al),R(a2)

(6) Millers examples are mostly in IBM 3G0 assembler

V in which case the qualifier 'commutative' after 'IADD al,a2' specifies 'A a2,al' is to be emitted with 'result R(a2)'; and:-

S(al),S(a2) in which case the register path S—>R must be invoked to emit:-

L ri,al A ri,a2 with result R(ri) , or:-

L ri,a2 A ri, al also with result R(ri). An arbitrary easy choice of these two last must be made: much harder is the choice of which register is to be used as ri because this may be affected by prior or subsequent register usage (which in turn may depend on this choice). Such choices build on one another: some means needs to be found to contain the consequent combina- tional explosion. Miller seems to content himself with arbitrary heuristics designed by instinct,

Miller's major contribution was to observe and detail the problem of code-generator generation: it marks the distinction between compiler-compilers of the 1960s • and what must be demanded of compiler-compilers of the 1970s. However the details need ironing out: that is why the unpublished details of MUG1 are potentially so interesting.

A.2.3 MUG1

It is not clear whether all the differences described below were genuinely changes from TUM-CC, or merely apparently so due to better documentation. ,332

The main goals were a usable system that inputs compiler

descriptions in readable and modifiable form to yf2ld

efficient one-pass compilers.

Secondarily, the system is designed to detect errors in

compiler design where possible, to., facilitate exchangeability,

of compiler modules, and to be portable in the sense of

being able to be used to create compilers for many machines

(though MUC-1 itself is tied to theTELEFUNKEN TR440, being

programmed in PS440, a PL-360-like language).

The 'usability' appears to have been a revolutionary concept:

prior compiler writing systems tended to overlook that point!

The portability-of-output requirement prompted a "generalised

operating system interface".

MUG1 separates lexical analysis from syntactic analysis, having

a distinct lexical analyser generator. (It is not clear if

this fabricates code or if it preprocesses its input to tables

that parameterise a standard scanner.) The inputs are BNF-like

rules with right hand sides based on regular expression notation,

extended by an ALLBUT operator analagous to the - exclusion- operator'*

of section 4.2 (page 77). Each rule may be marked as having

special properties: two properties mentioned are 'unique identi- cation' (e.g. for identifiers) and 'being ignored' (e.g. for comments). This lexical analyser was one of the first designed

to be part of a system: prior lexical analysers were seldom well-integrated into a system. Keywords (e.g. reserved words) are not derived from the syntax: they must appear in the lexic description, if need be as a singleton lexic class.

There are two parser-generators: for LL(k) and LR(k) parsers.

(It is implied that they are two separate programs,) It is stated that "error-recovery scheme independent of parsing- strategy is being implemented" which makes it sound as if the scheme does not call for extra input to or in the grammar: an objective which my literature search suggests no-one else has attempted with satisfactory results, especially given that most parsing errors are due to lexic erors (e.g. mis-spelling of keywords). Eroor-recovery handlers in parsers are of generally one of two types:-

* Confinement of consequences of an error in a fragment of

code to that section so that parsing of previous and

subsequent code is unaffected; and

* Correction (by substituting a legal likely alternative)

which allows _a_ parse (though maybe not the intended

parse) of the section of code containing the error.

It is not clear which of these applies to MUG1.

The attribute-handler generator produces semantic analysers that satisfy well-formedness conditions /K0S7JL/ which allow evaluation of the attributes in a single left-to-right top-down pass: this avoids feed-back problems. The well-formedness is checked (the test being only of linear complexity) and then converted into a "head grammar" /WAT74/. The left-to-rightness allows the attribute evaluation to be . inter-leaved ,334

with the syntax analysis in a single pass over the program:

this has the advantage that a parse tree does not need to be

explicitly built to carry information from a syntax pass to

one or more separate semantics passes, because at any time

the path from the current position in the conceptual parse-

• tree to its root will be explicit in the stack of the parser.

The top-downness of the evaluation requires the bottom-up

parser to simulate top-downness where inherited attributes

are transferred: the transform is performed by "the generator"

(sic) (parser-generator? attribute-handler generator?).

The descriptions of the semantic actions for a syntactic

production are embedded in that production and are written

in PS440:

conditional_statement: ifsymbol, call generate_label (labell), call generate_label (label2), condition, call generate_jump_on_false(labell), thensymbol, truepart, call generate_jump(label2), elsesymbol, call define-label(labell), f alsepart, call define_label(label2), fi symbol.

Presumably the "call"ed routines are written in PS440: so down

to an intermediate-language level the produced compiler is a

generated compiler plus hand-written semantic generators, which

will simply be copied and called from appropriate points in the

generated parser. (This part of MUG1 is a good example of the

"parser generator with facilities for inserting hand-written

semantic generators" that I rejected as not being a compiler-

compiler in section 2.3.6 (page 43). X cannot imagine how one could remove handwritten inserts totally from the semantic

generators: at a very deep level it seems one must write

fundamental routines in something resembling either a machine

language or orthodox systems programming language.)

/GAN76/ gives a table listing the kinds of changes possible to

the inputs:

Character sets and symbol definitions attribution and underlying CF grammar attribution without changes in the CF grammar semantic actions machine description

Hopefully this gives us a complete list of the inputs. The

stress on the kind of change is that for each kind of change

it is only necessary to re-generate one or two modules of

the output compiler. Presumably this implies the cost of

generating a module was high. There are other hints that a

very slow machine (even for the time) was being used.

A/IL76/ gives an extensive analysis of the kinds of advantages

a compiler compiler system can have, and who for. He then

analyses how they relate to MUG1, and also the relationships

and tradeoffs between possible advantages.

A.2.4 MUG2

MUG2 /GAN77/ appears to differ from MUG1 in that a new

attribute-handler generator has been written which is

functionally equivalent save that the produced attribute- handler is now multi-pass: this-(presumably) because of the

inherent inabilities of one-pass code generation due to ,336

being unable to modify what is being generated now according

to what comes later (for a discussion of such problems see

/GIE78? ).

The HUG2 description adds a few words on "describing code generation":-

"First code functions are ... associated with the operators in the tree part of the string-^to-tree grammar" (viz. in the syntax augmented with descrip- tions of tree-nodes to be generated, one per production) ... "They specify the case analysis which, depending on the local attribute environment, decides on the sequence of instructions to be emitted. These instruc- tions, being zero-address instructions for a special 3-stack machine, are then mapped into the desired intermediate code of a desired structure. The mapping and the specification of the structure form the second part of the description"... "In the future MUC-2 will include a module that generates an 'optimising' machine code generator from normal descriptions of target machine and source language elements."

Here the features of interest are that:—

(1) having obtained a tree MUG2 can be given a description of

transforms of the attributed parse-tree that achieve optimisations

(including look-ahead optimisations) and decorating the tree with

information for use in later optimisations. This is a major step

beyond the "semantic inserts" technique, permitting clearer

easier-to-write descriptions of code generation. Regrettably

t he language of these transforms is rather idiosyncratic:-

cvar of id := (dn of id) ; cvar of binop := union (cvar of binop.1, cvar of binop.2); cvar of while := cvar of end condition; cvar of if := cvar of condition;

though it is hard to see how it could be much improved. (2) that they appear to have set another ambitious objective that MUG2 will supply methods for optimisation on the target machine (presumably the required abstract knowledge - e.g.

"anything times zero y j^lds zero" - will be derived from corresponding abstract knowledge in-built into the system relating to their 3-stack machine). i 338

A .3 Other allied work

Most of the remaining known work strongly allied to that of

this thesis is already reviewed in /CAT77/: Donegan's CGGL

/D0N73,79/, /N0079/; systems of macro-expansion followed

by symbolic execution to optimise /CAR77/, /ELS70/, /MAR77/;

Newcomers Template System /NEW75/; Snyders system of gene-

ration driven by tables representing many machine properties

/SNY75/, _/J0H77,78/; and Weingarts system which resembles

that of Wasilew and Cattell /WE 173/. (The above list updates

Cattells bibliography.) For all of these, there is little

point in duplicating Cattell's excellent summary.

Accordingly, the following subsections are limited to major

allied works not covered in A.l A.2 or in Cattell.^

The bibliography includes other works of weaker alli&pce to

this thesis.

A.3.1 Aho/johnson: configurable optimal code generation

_/AH076/ generalises the Seth'i-Ullman method of section C.2

for a much wider class of machines, and for an algorithm

that configures itself depending on a description of the instructionjof the target machine. The essential restriction of the Sethi-Ullman method - that there be only one instruc- tion that can change main memory - is retained. The method remains time-linear in the length of the expression, but becomes expensive per operator. See also section A.3.6. ,339

A. 3.2 Frailey: Optimisation in an IL framework

_/FRA71,79/ describe a system for describing certain aspects

of an intermediate language to an IL-to-IL optimizer. Each

IL instruction is identified by an operator, and the number

and kinds of its operands: the kinds of operands include

numbers, identifiers of data locations, identifiers of IL

instructions, etc. Each operand is described by its

"properties": e.g. that it always causes a jump, that it

is an associative operator, that memory operands must be in

registers rather than main memory (or vice-versa) etc.

First the optimiser converts an IL program into a canonical

form (to avoid the subsequent inefficiency in searching for

optimisable patterns), and then optimises the IL. Both the canonicalisation and optimisation depend only on the properties of the operands. The optimser also gathers some information which it inserts into fields in IL instructions. Especially using a code generator using this collected information,

Frailey claims local optimisation comparable with that of dedicated optimizers specific to both machine and language.

A.3.3 Fraser: A knowledge-based meta-code generator

/FRA77,77'/ describe a system called XGEN which accepts a machine description and "code generation knowledge", and configures itself into a (saveable) code generator for a predefined ALGOL-like language.

The system is basically a general AI production system, pre-

provided with routines to input a machine description akin ,340

that of this thesis and of Cattell, and with generalised

productions relating to code generation. New knowledge

needed when moving to a new machine is encoded^as new

productions. For example, moving to a machine with index

registers, XGEN would need.a rule encoding:

"If compiling some operation with a subscript that is not in an index register, then allocate one, load it with the subscript, and change the original instruction to reference it."

provided a rule had already been given to identify "an index

register".

be Fraser observes that each new machine will generally^able to

take advantage of rules from prior machines. His first 3

implementations took 30 rules, 10 more rules, 6 more rules:

in each implementation the rules took an "average" of around

two man-days to write.

The pilot system took 55K words on a PDP-10 (big) and compiled

one output instruction a second (too slow). Fraser believes

that by sacrificing unused flexibility this can be made practical.

If this is so, it would be a revolutionary advance.

A.3.4 Glanville: instruction choice by parsing an ambiguous language

_/C-LA7'7»/, /GRA78/ describe a system in which a parse tree

is passed to the code generator (in lieu of an IL) as a for .:-

ward polish expression in an absolute notation (i.e., as

mentioned in section 6.5, pages 198 and 199, in a linearised < ,341 form of MDL). Each instruction of the target machine is regarded as a "production" of a grammar, with the result as the "goal" of the "production". The set of instructions thus define a grammar, which for most machines is highly- ambiguous .

The productions applied to parse a forhi/ard polish text, and the order of application, gives the corresponding . sequence of instructions: this target machine instruction sequence is a translation of. the fortjard polish.

The ambiguity of the language leads to alternative parses for most forward polish texts: these correspond to alter- native translations. To produce the optimal translation

(or a good one) the parser must pick the "best" parse.

Glanville uses the method of _/AH075/ to pre-process the instructions to give an order in which they are to be tried, i.e; "best" instructions first. (This roughly corresponds to the Wasilew/Cattell method prefering

"biggest" instructions). The tables of productions are separated according to the "main- operator" of the production (rather like two-level indexing of section

7,2, page 281) to increase efficiency.

This gives a deterministic backtrack-free parse which A yields good translation if any translation exists (but not necessarily the best translation). 342

C-lanville's algorithm will give incorrect parses under circumstances which could arise for certain rather strange instruction sets. He therefore gives an algorithm to test for those circumstances.

He finally extends this method in practical ad-hoc ways to permit common subexpressions to be specified by the preceding pass, and also to perform register allocation for temporaries needed by the chosen instructions.

In summary, because of the use of assignment to induce the productions, Glanville's method is more like the method of chapters 6 & 7 than like the Cattell/V/asilew method.

A. 3.5 Goguen: analysis of IL to implement Wilcox's method

AvTL7_1/ gives a method for creating a code generator based on piece-wise macro expansion from IL. A set of translation proformae are provided: each consists of a source IL pattern and a target machine code pattern: how art instance of the source pattern meets the allowed variation in the pattern determines how the target pattern will be filled in.

Translation then consists of searching for instances of source patterns, and transforming, them.,, ( A,.transformed...... pattern, or part of it, can be re-transformed if it satisfies or part-satisfies another source pattern.)

There are three problems here: efficiency of the searching, guaranteeing completeness (that is, that every valid IL sequence will ultimately transform to machine code, no 343

natter what order candidate transforms are applied at any time):

and creating the transforms from descriptions of the IL and

of the target nachine instructions.

Goguen attempts the last problem, and to a lesser extent the

second. In many respects his system is a symbolic-execution

form of Cattail's SELECT/SEARCH, or the method of section

6.7.4 (page 250)./G0G78?

Much more importantly, C-oguen is one of the few authors to

consider what an IL must be and can be for a particular

application. He makes many observations, but comes to no

clear conclusions. This work could be most valuable as a

starting point for other researches into ILs.

A.3.6 Kameda and Abdel-Wahab: beyond Sethi-Ullman_

register allocations

The Sethi-Ullman algorithm section C.2 assumes that the only

instruction that modifies memory is STORE instruction.

/KAM77/ generalises the results to allow multiple register-to-

memory instructions, also memory-to-memory instructions.

The results are much less decisive than those of Sethi-Ullman.

A.3.7_ Sites: Machine-independfent storage allocation

Universal P-code /SIT79j7 is the stack-based IL of the UCSD

Pascal compiler. /SIT79/ discusses storage allocation

(not merely register allocation) for P-code in a manner

determined by a description of the target machine's memory hierarchy. The method is not original, but getting it ,344

to work in this machine-independent manner is.

The allocator accepts a set of variable names, their live

ranges, expected frequencies of use, and remaps these logical

names into a smaller set of physical names such that most-

used logical names fall into the physical names corresponding

to fastest memory classes. This is done before final instruction

choice: this requires the fast registers be pre-partjfjoned,

some being reserved for possible temporaries introduced by

instruction choice.

The method is essentially one of ranking. The paper gives

a good illustrations of the kinds of policy decisions that

must be made by an implementor with respect to various

trade-offs, for example on machines with addresses of differing

lengths, of size against speed, etc. The special problems

due to prior name-equivalences (e.g. due to FORTRAN EQUIVALENCE

statements, or Pascal variant records) are briefly discussed.

A. 3.8 Terman: a configurable secondary code generator

/TER78/ proposes a secondary code generator that inputs

descriptions of the IL and target machine instructions each

annotated by attributes: each must have concepts such as

flow of control with conditional transfers, typed value

storage, values must signify addresses or not, and there

must be literals. The translation is essentially a

minimisation proces - preserving the attributes, using

nacro-to-macro transformations subject to conditions. Conceptually, the system is similar to that of Goguen & Wilcox, except that the "framework" of the IL and machine code descriptions is much more predefined, so that more detailed abstract knowledge on code generation can be built into the meta-code-generator, thus making it more powerful (cf Fraser).

This, together with the separation of the descriptions of intermediate language and machine (as opposed to their use together), was the major goal.

This is an important paper, and deserves to be better known. ,347

APPENDIX C

TWO OPTIMAL CODE GENERATION ALGORITHMS

C.l INTRODUCTION

There are two long-known optimal code generation algorithms for restricted parts of the whole problem that had major influence on

the design of the method presented in chapters 6 and 7. They are given here in sufficient detail to support the reading of those chapters. Each is followed by some remarks.

C.2 THE SETHI-ULLMAN METHOD /SET70

This method is for optimal code generation of arithmetic expressions from the parse tree of the expression, for a simple register-based target machine. The method is fast in the sense that it takes time proportional to the length of the expression. Follow-up work tried to make the method more realistic, by introducing common sub- expressions ^AH076^ and more complex instruction sets jjtfSfliJ: both of these follow-ups demonstrated exponentially-bounded search methods backed by proofs that the problems were at least NP-complete. This led ultimately to a paper entitled "How hard is Compiler Code

Generation?" ^AH077J_7 - the conclusion being that it is, in general, very hard : if you demand optimal code (rather than moderately optimised code) be produced. Apart from detail lessons from the Sethi-Ullman method, the net effect on the present work was to motivate attempting to push as much of the hardness as possible from code-generation time (many times: expensive) to code-generator fabrication time (once: relativly cheap). 348 C.2.1 The method.

The method derives ultimately from an observation by Floyd /FL061~J

that if a code generator is short of registers, it is better to

compile the right hand side of an expression before the left hand

side.

The method compiles for a target machine with four kinds of

instruction, described here in the chapter 7 variant of MDL:

M R (LOAD) R M (STORE) R 0 M R (MEMORY OPERATION) Rl 0 R2 Rl (REGISTER OPERATION)

Main memory (M) is plentiful: registers (R,R1,R2) are'scarce. The

degree sign (°) in the MDL represents 'any diadic operator': the

expression to be compiled contains only operators for which the machine has both a memory operation and a register operation. No

algebraic property is assumed of any operator: no common sub-

expressions are optimised: within those restrictions the method is

optimal.

The following kinds of instructions are not assumed:

M 0 R R (REVERSE MEMORY OPERATION) -Rl 0 R2 R2 (REVERSE REGISTER OPERATION)

(It is the absence of the reverse operations that leads to the

as\]^metry observed by Floyd.) J

The original paper presents a two-pass algorithm: the first pass labels each node with the number of registers needed to evaluate the sub-tree of which the node is the root without any store

instructions; the second pass generates the evaluating code. 1 , ! 349 \

The labelling pass needs the labels of the subordinate nodes to

determine the label of a node. It must therefore proceed from

leaves to root.

Notate the label of the current node as L , : and of its sub- node ordinates L, ^ and L . _ . (if it is not a leaf). Notate the left right number of registers as NR. The five following paragraphs derive

the labelling rules for the five cases of node when there are

sufficient registers from the code to be produced in each case.

To evaluate a LEFT LEAF (a leaf that is the left subordinate of its'

superior) needs a single load instruction to a register. This needs

one register, so the label of a left leaf is one.

For a BALANCED NODE L_ _ = L . , ^: evaluation is to first evaluate left right

R L the left subtree to a register left (needing left registers) then

leaving the left subresult there, to evaluate the right subtree to

a register (needing L^g^ registers plus then

R R R The eak obeying ieft° rigb-t left* P register usage is that

during the evaluation of the right subtree: so the label of a balanced node, L = L, = L . , .+1. —node left right—

For an UNBALANCED NODE with Ln L . . evaluation is as for — left right a balanced node, except that during the respective subtree.evaluations

the pea k registe& r usag& e is L. and L . +1. Since L. ^L . , ^ left right left right L, & L . , iff: so the label of the node is L. _. . left rv"'H left

For an UNBALANCED NODE with L. < L . . the evaluation of the left right

subtrees is reversed. During the subtree evaluations the peak

L the register usages are Lright and Lleft+1, and as Lright > ieft' ,350 t peak usage is ^ The two cases of unbalanced node

need a label L . = the higher of L, and L . , node left right

A RIGHT LEAF does not need to be evaluated: the evaluation of the

superior is to evaluate its left subtree (needing at least one

register) into a register R . , then to give the memory superior operation R . °M . , ^^R . . A label of zero at the superior rightleaf superior _ leaf preserves the above rule for labelling the superior node

(which must now be unbalanced, as its L.. must be at least one) left but introduces a third case of evaluation at an unbal&iced node.

The above evaluation rules need one operation instruction at each

non-leaf, and one load at each left leaf. The above evaluations

fail only in one case: a node of which both subordinates have label

greater than or equal to the number of registers, NR. Sethi and

Ullman call such a node MAJOR. (If one subordinate has label

greater than NR, the other subordinate has label less than NR, then

the normal rules can be applied at the node for an unbalenced node,

but not at the major node - or nodes - in the subtree of the sub-

ordinate with label exceeding NR.)

At a MAJOR NODE the generated evaluation code evaluates the right

subordinate subtree to a register and stores that to a temporary

location ^ in main memory, evaluates the left subtree to a

register R^eft' ^en 0beys "the memory operate .

Thus, for a major node, there must be the operation and also one STORE

Thus the whole method takes one LOAD per left leaf, one operation

instruction per non-leaf node, and one STORE per major node. Sethi

and Ullman show that in the simple target machine these are the

minimal numbers of each of the instructions. ,351

C.2.2 Some remarks on the method.

The important feature of the parse tree of an arithmetic

expression is that it is also the data-flow tree. The operator represented by M0M in the last subsection does not need to be arithmetic: it could be "is assigned to" if the destination of the assignment involves calculating an address, and that demands registers.

The method as presented by Sethi and Ullman is oriented to the facility of the optimality proof. They do not discuss practical matters of its' implementation. The method does, however, show that instruction choice and register allocation are inextricably intertwined. (On the other hand, register assignment is not: the above.description avoids register assignment. Sethi and Ullman attempt a stack-like register allocation and so introduce a bug in that they need a reverse register operate. One simple register assignment that suffices for their instruction choice/register allocate method is "pick any register that is free".)

One implementation matter is how to avoid two passes. This can be simply acheived by replacing each subtree by its compilation: and by recording how many registers that requires; at inner nodes the label at the node is evaluated, and the subsequences of the subtrees placed in order with the instructions of the node itself (the operate and at major nodes the STORE) only after the kind of node has been determined.

Sethi and Ullman also omit to point out that their method generates an optimal instruction sequence, but not the only one for many trees.

For example, the simplest tree with label 3 at its root, for two ,352 registers, has the tree on the right: the STORE can be used to "lop" the 2 branch to the right subordinate node ^ ^ of the major node (the root) as Sethi 1/ 0\ / 1\0 and Ullman suggest: but a lop of the second-level rightmost branch also introduces only one STORE. The two diagrams below illustrate these two loppings of the tree. In each of the diagrams only the label is given at a node: for this purpose the operation is irrelevant. In the loppedLtrees, the stub is labelled zero, and the cut branch from the stubs to the nodes lopped off at those points are dotted.

.2 2 \ / \

2. 2 2 1 0 / \ / \ / \ 1 A 1 ,1 r L 1

Turning these trees on their sides so that they are read as left-to- right data-flow trees, and omitting the right leaves, gives the following two diagrams, in which a solid line represents a register in use.

1 ^ x 1

-2*' 0 1 1»*

Within each connected part of the tree:(l) the nodes appear left-to- right in the order they must be evaluated or (for trees representing more general situations than expression evaluation) in which they must be executed; and (2) a vertical section through the connected part ; 353

gives the number of registers in use at that point. Thinking of

a connected instruction-ordered part of the tree as a "band", the

observation can be rephrased as "the maximum bandwidth is the

number of registers". An algorithm which gives either of the above

two divisions of the tree just splits it into the fewest possible

bands of maximum width. Under this re-presentation, the Sethi-Ullman method is much more "obvious". (1) - ~ •

Instruction ordering within a band is prescribed by the shape

of the band only: for example

1 2 1 -2 and

1- i.2 and 1 all have width 2.

Instruction ordering between bands is limited only by the required data-flow. For example, cutting the label-3 tree into bands of width one for a one-register machine can give only:

a: ,0 .0 b: 1 c: 1—r

d: l 1'' in which d must come before c before a, and b before a, so the three possible orders are bdca, dbca, and dcba.

(1) The two different loppings of the last page correspond to starting from the leaves (left hand diagrams on the last page) and starting from the root until bandwidth is exceeded (right hand diagrams). Either variant, or hybrids of the two, will produce optimal target machine code. ,354

C.3 THE HORWITZ-KARP-MILLER-WINOGRAD METHOD /HOR667

This methodfs for allocation of registers in a sequence of already-

chosen machine instructions, the order of which is not to be

changed. Where the number of registers exceeds that allowed on the

machine, store/load sequences must be introduced: a rather different

model than that assumed by Sethi and Ullman. Register assignment

is a side-effect: with Sethi and Ullman it was a separate issue.

The method (from here on, called the "HKMW method") starts by

considering all possible allocation/assignment combinations - the

number of them usually being exponential in the length of the

instruction sequence - and then appli/rules to eliminate non-optimal

combinations, or to eliminate a combination if there is another

of the same cost, until only a few equal-cost options remain: one is

then arbirarily chosen. The example used in the original HKMW

paper will be used: with four virtual registers, it is assumed that

they are used in the order 1* 2* 3 1 2 2* 4, and this must be

reduced to two real registers. The asterisks denote modifying use,

so that if a virtual register is moved out of real registers after

a modification, it must be stored: otherwise the real register can

be overwritten, and later uses re-load from a main memory back-up

location.

C.3.1 The method.

The method starts by assuming that every virtual register contains

a required value, and is held in a main memory backup location.

The first action, therefore, is to load the first-needed virtual

register (number 1) to a real register. It is then used and modified,

so if it ceases to be in a register later, it must first be stored,

at a cost of one STORE instruction. The next instruction uses

register 2: there are two possibilities - to store register one ,355 out and re-use its register, or to use the other real register.

The following diagram illustrates the possibilities. At each node there are three digits: the virtual register in the first real register: the virtual in the second real: and the cost by the cheapest route to that node in LOAD and STORE instructions. When a real register does not contain a virtual register it is marked by a hyphen. Virtual registers that have been modified are underlined. A branch to a node that is not an optimal route to the node is dotted: optimal routes to their destination are solid: optimal routes to the cheapest final state are

VIRTUAL STEP REGISTER USED 1 1* t S N 2 2* 2-3 122 /"N S~\ 3 3 3-5 234 134 / \ /"V " \ 4 1 1-6 136. 125 3.34 / \ / / \ 5 2 2-7 127 125 236 125 I • • / I 1 6 2* 2-7. 125 236 ..125 "i -"' I 7 4 348 147 4-9 147 *246 -* 0/,Q 1 ' i I FINAL STORE —I 9 —I 7 --17 —8 —8 MODIFIED VIRTUALS

(HKMW neglect the final storing out, which makes routes to the step-7 node 246 the optimal ones).

HKMW provide six rules to be applied as the above is built step-by- step, which either prevent nodes being built, or detect that a node at the prior step cannot be on an optimal route, and so can be back-pruned out, together with its' predecessors if they only lead to the back-pruned dead end. , 356

The rules are as follows.

RULE 1. If a node at step i includes the register in used in

the next step, the only descendant of the node is to be identical

to the node. Otherwise, generate one descendant per real register

being succegivly the replacements of the registers of the node

at step i by For example, in two real registers, from a node -

of virtual registers 12 with next step using virtual register 3,

generate 32 and 13.

This rule of generating only minimal change branches, has already

been incorporated in the diagram on the last page. (In that diagram

permutations of the virtuals in the reals have also been regarded

as giving the same node.)

Notate the cost of getting from the origin to a node n as W(n),

and the cost from node n^ to node n^ by a minimal set of STORE and

LOAD instructions as w(n jn^).

RULE 2. If n^ and n^ are nodes at the same step, and W(n^)+w(n^,n^)

^WCn^), then eliminate n^ and branches to it. The example becomes:

STEP 0 USES J- —0 1 1* 1^1 V 2 2* 2-3 122 X V 3 3 3-5 234 134 / \ X - \ 4 1 1-6 136 125 134 5 2 125 236 125 6 2* 125^ 236 125 7 4 147 246 X FINAL CLEAR —7 RULE 3 eliminates only nodes that would be eliminated by rule 4 ,357

one step earlier, and so rule 3 can be omitted.

Rules 3 and 4 are both phrased in terms of two DISTANCE FUNCTIONS

defined as follows:

d(i,x) = j-i where step j is the first after step i in which either x or x* appears in the program. d*(i,x) = j-i where step j is the first after step i in which x* appears.

In both definitions, it is assumed that there is a step later on

in which all registers appear as "to be cleared", so that j is -

limited to at most the number of that step.

RULE 4 forbids the formation of a node at step i+1 which uses

x or x* if there was a node n at step i which contains neither x

nor x*, and if any of the following conditions holds:

1) n contains x^ and x^, x^ is replaced by x or x* at step

i+1, and either d(i,x)

i+1, and d*(ifx1)^ d(i,x ); or 3 n contains x* and x*, x* is replaced by x or x* at step i+1; and either d*(i< d(i

or d(i) = d(i,x2) and x^x^.

The equality subcases for all cases are in fact tests that neither

x ^ nor x^ occur again, so one can be arbitraririly stored out,

and that should be indicated by not forming one of two nodes.

The x < x tests could be reversed to prevent formation of the 1 2 other node, and so to force storing of the other unneeded register.

Observe the presumption that only one register should be stored

out at each step, even when it is known that others could also be stored out now, cutting combinatorial explosions of the numbers

of nodes later. Finally: observe that the HKMW definitions of the ,358

distance functions in terms of "j-i" could equally well have

been in terms of j only, since in each of the five relations the

i on either side of the relation is the same: and so it is in

turn clear that each of the five relations is a test of which

of x^ (or x*) and x^ occurs sooner in the program, if at all. )

In the example, rule 4 prevents the 1-6 and 136 nodes at "step..4.

and the 236 node (and thus its descendant) at step 5. This means

that the combinations formed on a for'-'^Ard pass, with look-ahead

on the program, is as follows.

STEP 0 USES - —0

I 1 1* 1-1 /a 2 2* 2-3 122

3 3 3-5 234 134 I "I 4 1 125 134 I 1 5 2 125 125 i i 6 2* 125 125 /a ~ 7 4 147 246 \/ FINAL - —7

Rules 5 and 6 are the two obvious ones: they are delayed until last

because rules 1 2 (3) and 4 reveal more opportunites for them (and

rule 5 can reveal opportunities for rule 6). To cut costs in

the other rules, an implementation could try rules 5 and 6 after

each application of the other rules.

RULE 5. If there are two branches into a node with the same cost

at the node, then eliminate one of the two branches. ,359

RULE 6. If a node has no descendants, eliminate it.

Rule 6 is the only one that goes backwards in the net of possible combinations. Rule 5 makes the arbitrary choices remaining after the two arbitrary subcases of rule 4.

The HKMW paper proves that if the rules are applied in order at each step, or alternativly if the whole net is built according to rule 1, and then the other rules applied in order over the whole net, that the one series of combinations that remains is optimal.

C.3.2 Some remarks on the method.

The observation made after rule 4 above, that the HKMW method loads registers as late as possible, and then STORES THEM OUT ONLY WHEN

FORCED - i.e. also as late as possible - contravenes the usual compiler-writers heuristic that loads should be as late as possible but STORES SHOULD BE AS SOON AS POSSIBLE. This heuristic applied to the HKMW example is summarised in the table below: an "L" between steps signifies a load; after that a single line represents presence is desirable, but there has been no modification, and so no store is needed; a letter "U" signifies a non-modifying use, an "M" signifies a modifying use; after a modification a double line signifies that before the register ceases to be in use a store is needed, and a "S" signifies a store. The diagram depicts the- case- where there are sufficient real registers. ,360

STEP- 1 2 6

USES— 1* 2* 2*

L-M: :U—S

L-M: =M=S

L-U

L-U

In the diagram, it is clear that the only problem in reducing the

actual number of register to two, is to decide whether to cut

into the use of register 1 with a store and a load, or the same

with register in order to make real-room for virtual register

3. Looking ahead for register 1, it can be seen that a store and

a load around step 3 will avoid the need for the store after

£tep 4: that is will cost one extra instruction. With a cut of

the allocation of virtual register 2 around step 3, due to the

later modification of register 2 the second store is still needed,

so the cost of the cut is two extra instructions. Thus register

1 should be cut, being the cheaper. The combinations of real

registers in store should be: STEP COMBINATION COST (in extra loads 1 1- 2 and stores) 2 2- 3 3 23 4

>.A 21 5 5 2- 5 6 2- 6 7 4- 7

If the heuristic is carried to its logical conclusion, and stores are

brought forward to just after the last modification, rather- than the

last use, the diagram becomes the following:

USES— 1* 2* 3 1 2 2* 4

1 L-M=S . L-U

2 L-M: =M=S

3,4 L-M L-U ,361

The difference between the two forms of the heuristic, from the

implementation point of view,, is that in the weak form, for the

section -of code being considered, one knows only the last

appearance of a register; whereas in the strong form one knows both the last use and the last modifying use. Both forms of the

* heuristic can be acht ;ved in two passes, so the strong form costs

little more then the weak form:

PASS 1. Search ahead for the end of the linear section

of code. At all time^ for each virtual register appearing^keep, the step number at which it most recently appeared, and the step number at which it was last modified.

PASS 2. Pass over the program using the information

from the first pass to decide which allocations to break when

there is a conflict.

Unfortunately, the obvious criteria for the allocation-breaking choices, to break IJ—U or U—M or M—U (at cost one load) rather than M==M (at cost a store and a load) and otherwise to make arbitrary

choices, are not optimal in cases such as the following programs.

STEP 12 3456789 10 11 R "1 M======M E 2 U U G 3 U I 4 U U S 5 U T 6 U E 7 U R

STEP 123456789 R 1 M======M E 2 U U U U G 3 M======M U ,362 t

In either of these examples, breaking the allocation of register

2 at step 3 ultimately costs 3 extra instructions (at least) but

breaking the allocation of register 1 only costs two extra

instructions. Using the diagrams, the HKMW rule 4 can be given

a simple statement forbidding bifurcations of the HKMW net:

for example case 1 o£ rule 4 relates to the following form of

allocation diagram: ' '

1 ?

2 ? *"There is a shortage of real registers at this point

and the rule then saljs that at a node in the HKMW net which contains

both virtual register 1 and virtual register 2 a descendant should

not be formed in which virtual register 1 is replaced as register

1 is used or modified before register 2. Using a question mark

again to denote "U" or "M", and also using "==(U)==M" to denote

"modified, and maybe used in the meantime" then cases 2 and 3 of

HKMW rule 4 respectivly say that at the forms of allocation diagram:

1 ======(U)====M 2 ? ^Shortage;

1 ======(u)====M 2 ======? Shortage;

that a bifurcation from a node containing registers 1 and 2 is

forbidden at the shortage of real registers, in that a node in

which 1 is replaced is not to be generated.

From the above diagrams it can now be seen that there are only two

forms of allocation diagram corresponding to cases at which HKMW

rule 4 allows bifurcations of the net of possible combinations:

1 ======( fQ=====M i ======( U) =====M and 2 ?... 2 ======U==. ..

/ 363 i

This being so, from the allocation diagrams of the example 9-step and 11-step programs (two pages back) the HKMW diagrams using rules

1 and 4 are the following.

STEP 7° 7° 1 1-1 1-1 "I 2 I 122 \ 122 3 324 133 234• ~ .13\ 3 1 I " I 4 -24 124 125 I "I 234* 5 -45 . 137-M I 145 23I4 • 6 546" 234 129 I 156 T 7 -46 234 1310 I 147-+ 8 I -67 2-5 ,1211 I 168 9 768" 179 I "I "I 1-7 1-11 10 I 1610 11 -68 1-10 "I CLEAR -U —8 —12 -Iio —11

In the above diagrams, the dotted lines are branches that would be eliminated by rule 5 (expensive ways of getting to their destinations) and the arrows (pointing at the JL3 combination at step 5 of the 9- step program and the 14 combination of step 7 of the 11-step program ) indicate the two nodes that are eliminated by rule 2 (so that their successors are not formed, and rule 6 eliminates their predecessors back to the preceeding bifurcation).

The important feature of the foregoing development of the HKMW is that the method becomes practicable: even if the instruction-order cannot be changed (unlike in the Sethi-Ullman . problem),combinatorial explosion of the possibilities is much constrained. The only inform- ation used extra to that of the HKMW method is cheaply available compared to the cost of computing the HKMW distance functions. The method can therefore be practicable when applied to "real" programs ,364

containing non-linear structures (loops and procedure calls) for

analysis of the usage of pages in a virtual memory system (i.e.

reading 'page' instead of 'register' in all the above) or for

shadowing variables with real registers (reading 'variable' in

place of 'virtual register').

Such looping programs need a whole new theory. Present heuristics

are to find an allocation that has the same end and start condition.

The simplest such heuristic is to take the program of the loop

and lay it repeatedly end-on to itself, and allocate this "unrolled"

program: if at the end-of-loop points an allocation condition

appears every time, it is used: starting and end conditions are then

handled separately. Such heuristics presume that most loops are

executed enough times that inefficiency initialising for, and

finalising after, a loop is less important than efficiency in the

loop. ,365

APPENDIX P

EXTENDED EXAMPLES OF PRIMARY CODE GENERATION.

This appendix presents two CFl-generated programs, which

are together an early and incomplete version of a translator

from Simula to a high-level intermediate language (HLIL).

For this first major application of CF1 an attempt has

been made to avoid hand-written code in the CFl-generated

programs (apart from routines from the standard library).

The intent is that a program entirely written by hand will

perform all the processing of names, scopes, etc. needed for

translation of Simula to (normal-level) IL. To simplify that

program as much as possible, it is highly desirable that the

input to it be in a linear form with an elementary syntax,

as opposed to the nested complex Syntax of Simula. This simple

linear form is the HLIL. To process a sequence of declarations

in Simula requires two passes to avoid references-ahead: this

double passing has already been performed prior to HLIL by the

CFl-generated programs. Thus the hand-written HLIL-

to-( normal )-IL program need only perform a single forward

pass. Various other devices are incorporated in the CFl- generated programs to make the HLIL easier to process: for >

example duplicating names at the end of the HLIL of a Simula

construct whereas the name only appears once at the start of

the original Simula construct. ,366

It is possible to write a valid syntax for Simula in the CF1

extended-LL(1) notation. However, the possibility of having

"empty statements" in Simula much complicates the syntax, and

makes the resulting parses hostile to transform-based trans-

lation. (This would not be the case in LL(2).). The first

CF1 program presented here just removes empty statements; the

second is then the translator for a variant of Simula without

this syntactic problem,

There are two cases of empty statement to be dealt with: between

two semicolons; and after a semicolon and before an end symbol.

The preprocessor (which is called SIMPRE) defines an OB (object)

to be any valid Simula lexeme other than begin end or semicolon,

or to be a begin...end enclosure of an optional serial. It defines

a UNIT to be a list of OBs: that is, of lexemes other than

semicolon and of begin...end enclosures. Every unit is

re-output identically, followed by a semicolon: if there was an

input semicolon after the UNIT then it is ignored. Other

input semicolons are ignored.

There is another case of an empty statement: a begin...end

enclosure which encloses nothing (other than the empty statement).

This does not cause the main Simula-to-HLIL translator trouble,

but as an easy optimisation at this stage, if such an empty

enclosure is the only statement in another enclosure, then it is ,367

Apart from the escapes (which are of little interest) the following listing of the lexic syntax and transform divisions specifying

SIMPRE should be fairly self-explanatory.

CIMT * : CCHAR «GFT **ZCCHAR» « POFFR PON0 ANY pOFFa ••• : CTEXT ®GET »*ZCTEXTa a floFFfl ROnr < <*ANY-•"•-EOF> 90FF0 * BONA >

• LETTER <»( LETTFR'niGIT!• B •!•)> : ENDFILE s EOF.

®GET *•IREENUULa apUT ••CALL»*STR« apiIT ••CALL*»0T1B »PUT **CALL»*OTL» aptJT **CALL*«G_ERRA apiir **CALL**C'lEKa

PROGRAMaAPnr INIT_IO? INIT.XC.'R I "BF.GIN" ) , [PSERIAL) , CEND'l , {")"), ENDFILE aPHT E'l .CLOSE; FX.CLOSE;® t

PSERIAL = < fFNIP apiIT A_UNIP. T„Z; P :-NONE? OTL(NL ) ; f» «> :

SERIALs < U NIP «> : UNIPS ( UNIT,[";") • »?• ) : units j

ORB C ORX I •(•

! •(• 1 •)"

• "a/a* J "sa" I ">• ) :

ORX' ( "BEGIN", [SERiAL), "END" ! TINT ! CCHAR ! CTEXT • ID ) .

Z(SERIAL!SERIAL UNIP) *> Z(SERIAL) Z(UNIP) : Z( SERIAL • i'NIp) ' «> Z(UNIP) : Z(UN IP J UN IT) B> Z(UNIT) ";?" : ZUINlPiUNIT"*") k> Z(UNIT) •»?• | Z(UNIP!";") B> " • ; Z(UNIT!UNIT OB) s> ZCUNIT) Z(OB) t Z(UN IT!OB) a> Z(OB) t 7.(OB'08X) r> 7(ORX) : Z(OB!"C") e> "I " : Z(OB!"J") «>"!": ZC OB!"(") »>•(": 368 •

ZCOB *> ") " : Z(OB a> "NOT " : ZCOB • *» ) *> • $ « . ZCOB "•") s> " I ZCOB » ) *> mm m . n m « ZCDB t> • . ZCOB *> ZCOB e> m. » . ZCOB t> ": — " : ZCOB ": * ") «> ":* " : ZCOB "<") e> "LT " : ZCOB •as" ) => "EO " : ZCOB •«/a») a> •*/« * j ZCOB 3> »3B M . ZCOB S> •GT • : ZCORXICINT) «> ZXB(CINT) J ZCOHX'CCHAR) => ZCHAR CCCHAR) I ZCOBX'CTEXT) => ZTEXTCCTEXT) : Z(OBX!10) => ZXB(ID) S Z(0BX1"BEGIN END") *> "BEGIN END " t Z(OBX!"BEGIN BEGIN END ENO") => ZCOBX!"BEGIN END"): Z(OBXI"BEGIN BEGIN ENDyEND") s> ZCOBX!"BEGIN ENO"): Z(OBX!"BEGIN"SERIAL"END") s> "BEGIN?" Z(SERIAL) "END " ; Z(OBX!"BEGIN BEGIN"SERIAL"END END") => Z(OBX!"BEGIN"SERIAL"END"): Z(ORXi"BEGIM BEGIN"SERIAL"END;END") => Z(OBX!"BEGIN"StRIAL"END"): Z(SERIAL) = > " * * *BAD SERIAL*** ZCMNIP) = > • *»«raD UN IP *** " Z(UNIT) s> « «$»rad UNIT*** " ZCDR) = > • ***RAD OR*** " : Z(OBX) s> • * * *pAD ORX * * * " !

Used after SIMPRE, the main translator now only has to deal

with a syntax in which statements in lists of statements are

each followed by a semicolon, and other statements are not

followed by a semicolon. The following listing shows a few

pages of the full translator listing an input preprocessed by

SIMPRE. The original Simula program is a version of the TEVAL

CFl-generated program given as an example in chapter 7.

.EXECUTE SIM24.SIM/CREF<-U) SIMULA: SIM24 ' errors detected: LINK. tm„.? typLoadirrse w hbs* s CLNKXCT SIM24- Execution! I iSfcfile?TTY INPUT FILEJTINY.ISO OUTPUT FILE?TINr.ISl REF ( Outfile ) lisbef f

Svsout . Oufctext < *listfil«?» ) ; ,369

Svsout . Breakouiimaae f

Inimade J

IF Sasin . ImeSe . Strip EQ "TTY" THEN lister Sysout ELSE BEGIN li6ter :- NEW Outfile < Sasin . Image > % lister . Open ( Blanks < 200 > > » END ;

PROCEDURE tree ( b ) i

BOOLEAN b S

BEGIN END i

CLASS C-Prostram f

be6in REF ( c.exps ) a.exps . REF ( c-eF > *.«F i PROCEDURE t.out f BEGIN 0t2 ( "SEARCH TEVAL .NA< a.exps . t-.ou.fc f otZ <, '.STOP END GO * END f REF ( c.prosram ) PROCEDURE p_Prosram }

INSPECT NEW e-prcyfranr DO BEGIN P_proSram I- THIS c.program r loit-io i ini b_xc J e.exps .'— p.exp* ; chek < U3 ) ; t.oub ; en . Clo*« > ex • Close < a.ef r-ef > end ;

CLASS c-6xps ( Ihfl ) r

REF < C-exPS > lhs f

BEGIN REF ( c.exp ) a.exp f PROCEDURE t.eut » BEGIN IF lhs NONE THEN BEGIN a.exp . t.out i ok2 C ".PRINT M » ) I END ELSE IF lhs NONE THEN BEGIN lhs . t.out f a-exp . t.out r ot2 t *.PRINT ** " ' ** > * END ELSE BEGIN sj.err ( "EXPS NOT COVEREJ BY ANY OUT* ) » tree ( FALSE > » END r END ( END r

REF ( c.exps ) PROCEDURE p.exes 1 ,370

BEGIN REF ( c_exj»s ) p ; INSPECT NEW c-exps ( JVONE ) DO BEGIN P «- THIS c_e*ps ) a_exp p.bxp ; END * WHILE xno EQ 1«4 DO BEGIN INSPECT NEW e.exp's ( p ) DO BEGIN P J- THIS c-exps f xirt » juexp r END ! END ? p_exp0 J- p | END >

CLASS c_exp ( lha ) J

REF < c_exp ) lbs ,

The main program is called SIM24 (first part of Simula, edit 24).

The following is a listing of the lexic syntax and transform

divisions of SIM24 as input to CF1: for interest it is followed by

the few small pieces of handwritten code that support it.

cm! PGET ••ZCINTP a ( CCHAR PGET »*ZCCHARP a POFFP ••• PON* ANY POFFP t CTEXT PGET »*ZCTEXTP a POFFP PONP < <*ANY-•••-E0F> POFFP * PONP > :

ID ®GET *»ZIDp a LETTER <•( LETTERIDIGITl • P • I • PM'«.• >> »

ENDFILE a EOF,

PPUT ••CALL»*INTP PPUT **CALL**STRP PPUT ••CALL**0T3P PPUT **CALL**0TLP PPUT **CALL**G—ERRP PPUT **CALL**CHEKP

PROGRAK«PPUT INIT.IO; INIT.XCjP SERIALD, ENDF1LEM, •PUT EN.CLOSE; EN.OPEN(BLANKS(200));P PPUT Kta« ') KIN; XNOt*0; XIN; § SERIALC, ENDFILE*2 PPUT EN.CLOSE; EX.CLOSE;P :

SERIALDa< UNIT PPUT A.UNIT.TJ);PX-NONE;OTL(NL);P *> i

SERIALC»< UNIT PPUT A„UNIT.T_C»PS-NONE;OTL(NL);P *> : 371 •

serial* uhit" C ST I DC ), "J" I st* t("LOOP","J•i"TEST",":")), STL : stl* C "IF", EXP, "THEN", ( STU-THEN, ["ELSE", [SfELSEl] 1 ! STZ*THEN ! "ELSE",[ST*ELSE2] ) • STZ* NONIF ) : stz* ( STJ • STD 1 STI ) : stu» ( STX J STB • "GOTO",("LOOP"!"TEST") ) I std* C "FOR", ID, ":*", EXP*l, "STEP", EXP*2, "UNTIL" • "WHILE" ), EXP*3, "DO", ST J

STI* "INSPECT", EXP, "DO", ST, ["OTHERWISE", ST*OJ t

STX* EXPP, [(":-"!*:*"),EXPJ I STB* "BEGIN", [SERIAL); "END" :

DC* ( "PROCEDURE", DCP ! [("XC"I"XC")1, "CLASS", DCC ! TYPE, { "PROCEDURE",DCP-T ! "ARRAY",DCAS ! IDS ) ) I

HCP* ID, ( ";" I "(",IDS,")",";•,(DCVNSJ,0CSPECS ), ST : DCC* IDXC, ( "?" i "(",IDS,")",";",[DCVLSI,DCSPECS ), DCVTSST X DCVNS* < DCVN *> J DCVN* ("VALUE"!"NAME"), IDS, ">• : DCVLS* < DCVL * > J DCVL* "VALUE", [IDS], "*• * DCSPECS*< OCSPEC * > : DCSPEC* TYPE, (("ARRAY"!"PROCEDURE")], IDS, ")" I DCVTSST*("VIRTUAL","J",DCVTSl, ST : DCVTS* < DCVT •> s DCVT* (TYPE), "PROCEDURE", IDS, ";" t

DCAS* < DCA •",">: DCA* IDS, "[",DCADS,"J" X DCABS «GET **NDCARS« • * < DCAB * "," > s DCAB* EXP"L,":",EXP*U t

TYPE* ("INTEGER"!"CMARACTFR"•"BOOLEAN"!"TEXT"!"REF","(",IDXC,")"):

EXP* ( EXPZ ! "IF", EXP*R, "THEN", EXPZ*T, "ELSE", EXP*E ): EXPZ* < [ "NOT"],EXPR * ("AND"!"OR"!"EQV") >: EXPR* EXPA, ( ( "EQ" ! "NE" ! "I,T"! "GT" ! "LE" ! "GE" I"**"!"*/*" 1 "IS" ! "QUA" ) , EXPA*R )l

EXPA* < [("•"•"-")],EXPH * ("••!"-") > : EXPH* < EXPP • •*• > :

EXPP* EXPPR,(EXPSUBS): EXPPR* ( CONST ! ID ! EXPPX ) * EXPPX* ( "NEW",ID*N ! "THIS",TDXC ! "(",EXP,")" )S EXPSUR5*

CONST* ( "TRUE" ! "FALSE" ! "NOTEXT" • "NONE" ! CINT I CCHAR ! CTEXT ) X

EXPS RGET *«NEXPSP « < EXP •",">: IDS «GET »*Ninsi> * < io * ",• >: IDXC* (ID!"XC"!"XC") V

c(st!stl) *> c(stl) : c(stlistz) *> ccstz) » c(stz!stu) z*> C(stu) : c(stz!std) c(std) i c(stzisti) *> c(stl) » c(stu!stb) *> c(stb) : c(stuistx) c(stx) i ,372

C(STi"LOOP:"STL)»> "TLAB LOOP" C(STL) I C(STUi"GOTO LOOP")=> "?JUMP LOOP" J

C(ST1"IEST:"STL)*> "TLAB TEST" C(STL) : C(STU{"GOTO TEST")»> "TJUMP TEST" I

C(3TLi"IF"EXP"THEN"STU"ELSE"ST) a> "TBEGIN IF" L(EXP) "7JFALSE ELSE" C(STU) "'JUMP EXIT" •LAB ELSE" C(ST) •LAB EXIT" •TEND IF" I C(STLi"IF"EXP"THEN"STZ) «> C(STLi"IF"EXP"THEN BEGIN'STZ"?END ELSE BEGIN END") I C(STL!"IF"EXP"THEN"STU) «> C(STLl"IF"EXP"THEN"STU"ELSE BEGIN END") t C(STL'"IF"EXP"THEN ELSE")=> 00 J CCSTLi"IF"EXP"THEN"STU"ELSE")«> C(STLl"IF"EXP"THEN"STU"ELSE BEGIN END") I

ClSTDi"NHILE"EXP"DO"ST)a> "TBEGIN" C(STU!"GOTO TEST") C ( ST I"LOOP I BEGIN" ST "JEND") C(ST!"TEST* IF" EXP "THEN GOTO LOOP") •TEND" ;

C(STDl"F0R"id"j3"exp"l"STEP"exp*2"UNTIL"exp*3"d0"ST) a> "tbegin" C(STXJID"X «"EXP*1) C(DC1"INTEGER STEEP,STOOP") Cf DC I"BOOLEAN STEEB,STOOB") C(STX!"STF.EPX«"EXP*2) t(STX!"STEEBjsSTEEP GT 0") C(ST'I • "GOTO TEST") C(ST!"LOOP : BEGIN" ST "J END") C(STX!ID"I«"ID"fSTEEP") C(ST!"TEST S STOOP!»"EXP*3) C(STX!"STOOBX»"lD"GE STOOP") CCSTXI"STOOBS*STOOB EQV STOOP") CCSTLJ"IF STOOB THEN GOTO LOOP") "TEND" J

C(STII"INSPECT"EXP?D0"ST*I"0THERWISE"ST*2) »> "TBEGIN INSPECT" L(EXP) "TJNONE OTHER" C(ST*1) •TJUMP EXIT" •LAB OTHER" C(ST*2) "LArB EXIT" "TEND INSPECT" t C C STI1"1NSPECT"EXP"D0"ST) »> C(STII"INSPECT"EXP"DO"ST"OTHERWISE BEGIN END") S

C(STX!EXPP":-"EXP)«> L(EXP) SR(EXPP) I C(STXIEXPP":«"EXP)«> L(EXP) SO(FXPP) J C(STXiEXPP) »> L(EXPP) t

CCSTB!"BEGIN"SERIAL"END") «> "TBEGIN " D(SEPIAL) CCSERIAL) "TEND " X C(STB!"BEGIN END") *> e« : C(SERIAL J SERIAL UNIT)»> C(SERIAL) C(UNIT) * C(SERIAL!UNIT)»> C(HNIT) *

C(UNIT!ST";") «> C(ST) I C(UNIT!DC";") »> C(PC) I

D(SERIAL!SFRIAL UNIT)«> DC SERIAL) DCUNIT) X D(SERIAL!UNIT)»> D(UNlT) I

DCUNIT!ST";") *> I D(UNIT!DC";") «> DCDC) 1

CCDC!"PROCEDURE"DCP)®> "7PROCBODY " CCDCP) "?PROCEND " S C(DC!TYPE"PROCEDURE"DCP)•> "7PROCBODY • C(DCP) "7PROCEND • X

CCDCP1 ID";"ST) »> ZCID) CCST) X C(DCP!ID"("IDS")/"DCSPECS ST) »> Z(ID) CCST) X

CCDCP!ID"("IDS")J"DCVNS UCSPECS ST) «> ZCID) CCST) X

D(DCi"PROCEDURE"DCP) »> "7PROCSPEC VOID" DCDCP) "7PROCEND " X D(DC I TYPE"PROCEDURE"DCP)«> "7PROCSPEC " Z(TYPE) D(DCP) "7PROCEND

DCDCP!IO";"ST)s> ZCID) "0" x

DCDCPJID"("IDS");"DCSPECS ST) «> ZCID) N(IDS) ONCDCP!ID"("IDS")|"DCSPECS ST) X

DNCDCPiID"("IOS","ID-0");" DCSPECS ST) »> DN(DCP!ID"C"IDS")|" DCSPECS ST) DN(DCP!ID"("ID-0")/" DCSPECS ST) X

DNCDCP!ID"("ID"0");" DCSPECS ST) *> "7PARAM • ZCID-0) " SKIP • DS(DCP!ID"C"ID*0")|" DCSPECS ST) X

D(DCP!IO"("IDS");"DCVNS DCSPECS ST) »> ZCID) N(IDS) DN(DCP!ID"("IDS");"DCVNS DCSPECS ST) X

DN(DCPJID"("IDS","ID-0");"DCVNS DCSPECS ST) «> DN(DCP!1D"("IDS");"DCVNS DCSPECS ST) DN(DCP!ID"("ID*0*);"DCVNS DCSPECS ST) J

DN(DCP!ID"("10*0")|"DCVNS DCSPECS ST) «> "7PARAM" ZCID-0) DVCDCP!ID"("ID-0");"DCVNS DCSPECS ST) DS(DCP!ID"("ID"0");"DCSPECS ST) X

DVC DCP! ID" ( " ID-.U" ) ; "DCVNS DC VN DCSPECS ST) »> : DV(DCP!ID"C"ID-0");"DCVNS DCSPECS ST) DVCDCP!ID"("ID*0");"DCVN DCSPECS ST) X OVCDCP! ID"("ID*.W") »VALUE" IDS" ,"ID-1";"DCSPECS ST) «> DV(DCP!ID"("1D*0");VALUE"IDS"X"DCSPECS ST) " DVCDCP!ID"("ID-0");VALUE"ID-1";"DCSPECS ST) X dv(dcp!ID"("io-0");value"id-0";"dcspecs st) •> • value • x DVC DCP!ID"("ID"0");VALUE"ID-1";"DCSPECS ST) «> BR X DVCDCP! ID"("ID-R");NAM{i:"IDS","in-l"; "DCSPECS ST) »> '!.<• DVCDCP!ID"C"ID-0");NAME"IDS";"DCSPECS ST) DV(DCP!ID"C"ID-0");NANE"ID-1";"DCSPECS ST) X DV(DCP!ID"(" IO-.0" ) ; N A.'4£" ID-0" ; "DCSPECS ST) »> "NAME" X DV(DCP!ID"C"I0-J");NA»!E"ID-1";"DCSPECS ST) •> RB x ,374 \

DS(DCP!ID"("ID*0")/"DCSPECS DCSPEC ST) => DS(DCP!ID"("ID*0")/"OCSPECS ST) DS(DCPiID"("ID*0")/"DCSPEC ST) : DS(DCPUO"("tD*0")/"TYPE IDS","ID*1"/"ST) => , DS(DCP!ID"("ID*0")/"TYPE IDS"?"ST) DS(DCP'ID"("ID*0")/"TYPE ID*l"?"ST) J OS(DCP!ID"("ID*0")/ "TYPE ID*0"/"ST) «> Z(TYPE) : nS(DCPiIO"("IO"0")/"TYPE ID*1"/"ST) a> pp J

C(DCi"CLASS"DCC) »> "7CLASBODY C(DCC) "7CLASEND • J C(DC!"XC CLASS"DCC) «> "7CLASBOD* C(DCC) "7CLASEND • t C(DC!"XC CLASS'DCC) •> "7CI.ASBODY C(DCC) "7CLASEND * t C(DCClID";"ST) •> Z(ID) C(ST) I C(DCC1 ID"("IDS")/"DCSPECS ST) *> Z(ID) C(ST) t C(DCC!ID"("IDS")/"DCVLS DCSPECS ST) •> Z(ID) C(ST) I

D(DC! "CLASS'DCO «> "7CLASSPEC D(DCC) "7CLASEND • J D(DCI"XC CLASS"DCC) »> "7CI.ASSPEC XC " n(DCC) "7CLASEND " S D(DCl"XC CLASS"DCC) «> "7CLASSPEC XC • D(DCC) "7CLASEND » : 0(DCC!ID";"ST) »> Z(ID) "0" DC(ST)

D(DCC!ID"("IDS")/"DCSPECS ST) «> Z(1D) N(IDS) DN(DCC!ID"("IDS")/"DCSPECS ST) DC(ST) t DC(STI"BEGIN"SERIAL"END") •> "7CLASATT • D(SERIAL) : DC(ST) »> BP :

DM(DCC!ID"("IDS","ID*0")?" DCSPECS ST) *> DN(DCCiID"("IDS")/" DCSPECS ST) DN(DCC!ID"("ID*0")/" DCSPECS ST) : DN(DCClID"("10*0")I" DCSPECS ST) •> "7PARAM " Z(ID*0) " SKIP " DSCDCCiID"("ID*0")/• DCSPECS ST) : D(DCCIIO"("IDS")/"OCVLS DCSPECS ST) » Z(ID) N(IDS) DN(DCC!ID"("IDS")/"DCVLS DCSPECS ST) t

DN(DCC110"("IDS*,"ID*0")/"DCVLS DCSPECS ST) «> DN(DCCUD"("IDS")/"DCVLS OCSPECS ST) DN(DCCiID"("ID*0")f"DCVLS DCSPECS ST) : DN(DCC!ID"("10*0")/"DCVLS DCSPECS ST) *> "7PARAM" Z(1D*0) DV(DCC JID"("ID*0")l"DCVLS DCSPECS ST) DS(DCC!ID"("ID*0")/"DCSPECS ST) S

DV(DCC!ID"("1D*0")/"DCVLS PCVL DCSPECS ST) a> DV(DCCiID"("ID*0")|•DCVLS DCSPECS ST)

DV(nCC!ID"("ID*0*);"DCVL DCSPECS ST) t DV(DCC!ID"("tO*0")I VALUE"tDS","ID*l"|"DCSPECS ST) s> DV(DCC!ID"("ID*0")/VALUE"IDS"/"DCSPECS ST) ' DV(DCCIID"("ID*0")/VALUE"ID*1"/"DCSPECS ST) * DV(DCClID"("TD*0")/VALUE"tD*0"/"DCSPECS ST) •> " VALUE " t DV(DCC!ID"("ID*0")/VALUE"I0*1"/"DCSPECS ST) «> PP : ,375

ns(OCC!I0"("10-3");"DCSPECS DCSPEC ST) s> ns(dcciid"("id*0");"ocspecs st) hs(dcc!id"("id*0");"dcspec st) t ds(dcciid"("10-3");"type ids","id"1";"st) s> DS(OCC:iD"("ID-0");"TYPE IDS";"ST) DS(DCC!ID"t"ID-0")X"TYPE ID-1";"ST) » OS ( OCC ! ID" I " I D-.3" ) # "TYPE I0-3";"ST) x> Z(TYPE) i DS(DCCJI0"("ID-3")J"TYPE ID-|";"ST) x> P? X

D(DCITYPE"ARRAY"DCAS","DCA) *> D(DC!TYPE"ARRAY"DCAS) D(DC!TYPE"ARRAY"DCA) I 0(DC I TYPE"ARRAY"IDS"["DCARS" ] ") •> "7ARRD1MS " Z(TYPE) N(DCABS) OM(DCABS) "7ARRNAMES " N(IDS) DM(IDS) "7ARREND " I

DM(DCABS!DCABS","DCAB) •> DM(DCABS) DH(DCAB) I DH(DCABSIDCAB) X> dm(dcab) x DM(DCABI EXP*1"I"EXP*2) •> "7DIML0 * L(EXP*1) "7D1MHI • LCEXP-2)

DM(IDS!IDS","ID) x> dm(ids) dm(io) t DM(IOSIIO) a> DM(ID) J OM(IDIIO) »> "7" z(id) 1

C(DC!TYPE IDS) •> PP J D(DCITYPE IDS","ID) •> D(DCI TYPE IDS) DIDCiTYPE ID) I DIDCITYPE 10) «> "70BJ " Z(TYPE) Z(ID) I

ZCTYPE "INTEGER") »> " INT " I ZCTYPE "CHARACTER") •> " char • s ZITYPE "BOOLEAN") «> " BOOL " X ZITYPE "TEXT") »> " TEXT " X ZITYPE "REEIXO") •> •REFI XC)" X ZCTYPE •REE(XC)")a> •REFtXC)" X ZITYPE "REEI"ID")")•> "REFt" ZIID) ')" I

"TSTOREOBtl " SOIEXPP) LAIEXPP) "7ST0REND "X

x> "7STOREREF " SRIEXPP) LAIEXPP) "7ST0RF.ND "X

SOIEXPP1ID) x> "7STDRE0BJT0 " ZtID) X SRIEXPP!ID) *> "7ST0REPEFT0 " ZtID) X

LAIEXPP1EXPPR EXPSUBS) «> LIEXPPR) LAIEXPSUBS) X LAIEXPSUBS!EXPSUQS EXPSUB) x> L(EXPSUBS) LAIEXPSUBSIEXPSUB) X LA I F.XPSUBS!","ID) »> "7FIELDA" ZtID) I LAtEXPSUBS!"("EXPS") ^ ATTEMPT TO STORE TO A PROCEDURE CALL ***" X LAIEXPSUBS!"("EXPS"]") x> "7B0UNDSA" NIEXPS) I.SIEXPS) "7BOENOA" X ,376

L(EXP'EXPZ) s> LCEXPZ) t

LCEXP!"IF"EXP"THEN"FXPZ*t "EI.SE"EXP*2) a> "7L0ADIF" L(EXP) "7JFALSE ELSE" HFXPZ-1) "7JUMP EXIT" "7LAP ELSE" L(EXP*2) "7LAH EXIT" "7LENDIF" J

LIEXPZIGXPZ"AND NOT"EXPR) ~> L(FXPZ) LIEXPZ!"NOT"EXPR) "70PAND " LCEXPZ!EX"Z"OR NOT"EXPR) a> L(LXPZ) LCEXPZ!"N0T"EXPR) "70P0R " : L(EXPZ1"N0T"EXPR) => L(F.XPR) "?OPNOT" J L(EXP7.!EXPZ"AND"EXPR) a> L(F.XPZ) LCEXPRV "70PAND " : T,(EXPZ!EXPZ"OR"EXPP) => L(FXPZ) L(EXPR) "70POR " J L(EXPZIEXPR) «> L(EXPR) :

LCEXPR!£XPA"t"EQ"EXPA*2) > LCEXPA*1 ) LCEXPA-2) "70PEQ"I L(EXPR!EXPA"1"NE"EXPA*2) > LCEXPA-1 ) LCEXPA-2) "70PNE": L(EXPRJEXPA"1"LT"EXPA*2) > LCEXPA*1 ) LCEXPA-2) "70PLT": L(EXPR!EXPA*1"LE"EXPA"2) > LCEXPA*1 ) LCEXPA-2) "70PLE"! > L(EXPR!EXPA"1"GT"EXPA"2) > LCFXPA*1 ) LCEXPA-2) "?OPGT"S L(EXPR!EXPA"l"GE"EXPA-2) > L(FXPA-1 ) LCEXPA-2) "70PGE": L(EXPR!EXPA"1"««"EXPA"2) => LCFXPAM ) LCEXPA-2) "70PEQREF"I L(EXPR!EXPA"1"a/*"EXPA"2) f.CFXPAM ) LCEXPA-2) "70PNEKEF"S L(EXPR!EXPA"IS"IU) a L(EXPA) "70PIS" ZCID) | LCEXPR!EXPA"(JUA"ID) * L(EXPA) "70PQUA" ZCID) | LCEXPR1EXPA) a L(EXPA) :

LCEXPA!EXPA"* •"EXPM) a L(EXPA) LCEXPM) "70PPLUS L(EXPA!EXPA"- ••EXPM) a L(FXPA) LCEXPM) "TOPSUbT LCEXPA!"••EXPM) a L(EXPM) I LCEXPAi 6XPA"t -"EXPM) a LCEXPA) LCEXPM) "TOPSUBT LIEXPA1EXPA"- -"EXPM) a I.(FXPA) LCEXPM) "70PPLUS LCEXPA!"-"FXPM) a L(EXPM) "70PMINUS" * L(EXPA!EXPA"*"EXPM) a L(EXPA) LCEXPM) "70PPLUS L(EXPAiEXPA"-"EXPM) a LCEXPA) LCEXPM) "70PSUBT L(FXPAiEXPM) a L(EXPM) I

LCEXPM!EXPM"*"EXPP) a> LCEXPM) LCEXPP) "70PMULT " I L(EXPM'EXPP) a> LCEXPP) I

LCEXPP!EXPPR EXPSUBS) a> "7COMPOUND " LC EXPPR) LCEXPSUBS) "7COMPEND " t LCEXPP!EXPPR) a> LC EXPPR) I L(EXPSUBS!EXPSUBS EXPSUB) a> LCEXPSUBS) LC EXPSUBS!EXPSUB) LCFXPSUBS!"."I0) a> "7FIELD • ZCID) : LCEXPSUBS!" C'EXlPS") • ) a> "7BOUNDS " NCEXPS) LSCEXPS) •7BOEND • I LCEXPSUBS!*C"EXPS")") a> "7PARAMS " N(EXPS) LSCEXPS) "7PAREND • I

LSCEXPS!EXPS","EXP) a> LSCEXPS) LSCEXPS!EXP) : L3CEXP31EXP) a> "7SUB " L(EXP) I

LCEXPPR!CONST) a> LCCONST) I LCEXPPR!10) a> "7I.0A0ID " ZCIO) : LCEXPPR!"NEW"ID) a> "7L0ADNEW " Z(1D) I LCEXPPR!"THIS"id) xi "7L0ADTHTS " ZCIO) : LCEXPPR!"THIS XC") a> "7L0A0THIS XC" J L(EXPPR!"THIS XC") a> • "7L0A0THIS XC" I LCEXPPR!"C"EXP")") => t.CEXP) : ,377

L(C0Nsr: •TRUE") a> "TLOADT • X L( CONST!"FALSE " ) => "7LOAOF a • a L(CONST! "NOTEXT ") a> •7LOADX X L(CONSTi •NONE") a> "7LOADN a X L(CONST! C IN T ) a> "7LOAUI a Z(CINT) X L(CONST! CCUAR) a> "7LOADC • Z(CCHAR) X L(CONST! CTEXT) a> "7LOADT a Z(CTEXT) X

C(STB) •> "«•« BAD stb • »•" t C(SERIAL)«> BAD serial »••" | D(SERTAL)a> •»»• BAD serial *••• t C(UN IT)a> "*»« BAD unit *»*" X D('IHIT) a> •»»« HAD unit i C(ST) »> •«** BAD st »•*• X C(STL) »> "•«• BAD stl • x C(STZ) »> "•»• BAD stz •»»" x- C(STU) •> ' "••• BAD stm «»«" x C(STD) a> "••• BAD std • **" X C(STI) •> "•** bAD sti ••»" X C(STX) •> BAD stx *»•• X C(DC) «> "*«• HAD dc • *«" X D(DC) •> "••• BAD dc •««• X C(DCP) •> BAD dcp »»«" X D(DCP) »> "*«* BAD dcp • »•• DS(DCP) •> "«•» BAD dcp •»*" DN(DCP) «> "•«» BAD dcp *»«" DV(DCP) «> "«*• BAD dcp *•«" C(DCC) «> "«*« BAD dcc • «•" D(DCC) «> •*** BAD dcc DN(DCC) «> BAD dcc »»•• DS(DCC) «> BAD dcc DV(DCC) •> "«*» bAD dcc X(DCVNS)a> -*** BAO dcvns «*•" x X(DCVN)•> •»•• BAD dcvn X X(DCVLS)»> '*** BAD dcvls »»*" X X(DCVL)»> BAD dcvl «•»" X X(DCSPECS)a> BAD dcspecs »»»• X X(DCSPEC)•> "«»• BAD dcspec »••" x DM(DCABS)«> '*** BAD dcabs • •«" x Z(TYPE)a> BAD type **»• X L(EXP) •> BAD exp *••• X L(EXPZ)«> BAD expz X L(EXPR)•> "*«» BAD expr »**• X L(EXPA)a> "*•« BAD expa «••" : L(EXPN)«> ••»• BAD expm : L(EXPP)•> BAD expp X LA(EXPP)«> •»*• BAD expp «*•• : L(EXPPR)•> »•** BAD exppr »*«" X L(EXPSUBS)a> "••« BAD expsubs »*•" X LA(EXPSURS)a> "•»» BAD expsubs »••" X X(EXPSUB)a> "»»» BAD expsub «»»" x L(CONST)a> "*»• BAD const »«*" i LS(EXPS)a> "»•• BAD exps *•*• X DH(IDS) «> BAD ids «*»" X

••NDCABS

INTEGER PROCEDURE T.Nl T.NjsIF LHSsa'IDME THEN 1 EI.SE 1+LHS.T.N;

••MEXPS

INTEGER PROCEDURE T.N; T.N s«IE l.HSasU'jMF THEN 1 ELSE t+LHS.T.N;

•*NTOS

INTEGER PROCEDURE T.N; • T.N:«IF LHSaaNOME THEN 1 El,SE 1+LHS.T.N;

*»ZCINT

PROCEDURE T.Z; OT3("•",IX,• "); ,378

••ZCCHAR

procedure t_z: ot3(",",ix," ")*

••zctext procedure t.z; ot2(strcix)");

**zid procedure t_zj 0t3("s",ix,"

The main first names of transforms are:-

C - compile statements, second pass over declarations

D - first pass over declarations

L - load an expression

S - store to a variable (which may be a class attribute).

Second letters are variants on the first, and are mostly

obvious from the original call: for example S (store) cal,ls

LA (load address) when it needs an address prior to a store-

indirect (page 375): DN and DV respectivly declare name and

value parameters (page 374). The Z transforms are simple

copies of reserved lexemes or common groups of reserved lexemes.

The Z transforms called but not defined •

predefined for every lexic class, and haSQ,the effect of

outputting the buffered characters of the lexeme. The hand-

written transforms N count the length of a syntactic-class

repeat.

The 396 CF1 input lines generate a Simula source of about 3200

lines (not counting blank lines): as will be seen from the

excerpts of generated Simula given later, the CF1 input is

much simpler to read and (because of the case analysis tests)

much much easier to write. ,379

To avoid having to hold a tree of the input to SIM24 wholly in main store, and yet to make the two passes at outermost level, the input is parsed twice. The CLOSE/OPEN sequence in the escape in the PROGRAM syntax rule (on page 370) effects the resetting to input start-of-file. SERIALD is then the outer- most declaration-pass, SERIALC is the outermost compilation- pass. (SERIAL is both declaration and compilation passes over inner lists of declarations and statements.) This allows both

SERIALD and SERIALC to parse a top-level UNIT, and then enter an escape that calls for the output of the tree and erases the pointer to the tree (which allows the tree to be garbage- collected) .

The run of CF1 of which the start is shown on pages 368-370 gave an output file of which the following is the start (of the SERIALD output) .

.TYPE TINY.IS1

OBJ REF(soutfile )*iister

PROCSPEC VOIDS tree PARAM ib SKIP BOOL PROCEND

CLASSPEC cos ram 0 CLASATT OBJ REF ( $C-eXPS )$d-*exf»s OBJ REF(*e_ef PROCSPEC VOIDSt-Qut 0 PROCEND CLASEND

PROCSPEC REF($e_pro^ranr >SP-prosiram 0 PROCEND

CLASSPEC -$c-e*P3 PARAM Slhs SKIP REF<$c_expsr ) CLASATT OBJ REF(*e~exp )Se.exp PROCSPEC V0ID$t_out 0 PROCEND CLASEND

PROCSPEC REF($e_exps )$r.exp3 0 PROCEND CLASSPEC -*e_exp PARAM *lhs SKIP REF<$c_exP > CLASATT OBJ INT ia3 OBJ BOOL «sl OBJ INT S»2 OBJ REF(*c_prim >$»_prim f*0CSPEC VDIWJ aut C "PTTRAM ilhs SKIP REF($e_CXp ) CLASATT OBJ INT SaJ OBJ BOOL Sal OBJ INT ta2 OBJ REF(*c_rriw >43_p-rioj PROCSPEC VOIDJt.out 0 PROCEND CLASEND

PROCSPEC REPC$c_C*P O PROCEMD

CLASSPEC -*c-prl« 0 CLASATT OBJ INT Sal OBJ REF<$c_unsi0nedno Jta.urmitf riedno OBJ REF($c_exp Ur.cxp PROCSPEC VOID*t.out 0 PROCEND CLASEND

PROCSPEC REF(*c»prim )*p_prim 0 PROCEND

OBJ REF(XC)4r>i«J

CLASSPEC XC *e_id Q CLASATT PROCSPEC REF

PROCSPEC REF($c»id )*P_id 0 PROCEND OBJ REF(XC) $ r_uit«ijfnednO

CLASSPEC XC *c-unsignadru> C CLASATT PROCSPEQ ftEF(XC)$Setr 0 PROCEND PROCSPEC VOIDtsefcr PARAH $ r 6KLP REF(XC) PROCEND CLASEND

PROCSFEC REF < $c_un«i£nc8no >4i»_ur>»i*nedno 0 PROCEND

OBJ REF(XC)*r-ef

CLASSPEC XC *c_ef 0 CLASATT PROCSPEC REF(XC)$tfetr 0 PROCEND PROCSPEC VOIJ>*»etr PARAM 4r SKIP REF(XC> PROCEND CLASENP

PROCSPEC REF<*c-®F )4P-eF 0 PROCEND

"C

The remainder of this appendix is excerpts from the fabricated

SIM24 Simula source. The first is the class for PROGRAM: a

simple syntactic list with escapes, and no transforms.

LA %4A(310) 9-JUL-1981 20X09 PAGE 9-JUL-1981 20X02 PROGRAM

13 CLASS C.PROGRAM; BEGIN 14 REF(C_SERIALD)A_SKRIALD; 15 REF(C.ENPFILE)A1.EN0FILE? 16 PFF(C_SERIALC)A_SERIALC; 17 REF(C_ENDFILE)A2_ENDFILE; 18 END OF CLASS; 19 20 RFF(C_PPOGRAM)PROCEDURE P.PROGRAM;INSPECT NEW C.PROGRAM DO BEGIN 21 P.PROGPAM;»THIS C.PROGRAM; 22 INIT.IO; 1NIT.XC; 23 a.sfrialdx-p.seriald; 24 ai.enofilfj-p.endfile; 25 EN.CLOSE; FN.OPEN(BLANKS(200)); 26 Kt*> '; KIN? XNOX«0; XIN; 27 A^SERIALCx-P.SEPlALC? 28 A2.ENDFILEX-P.ENDFILE; 29 EN.CLOSE; FX.CLOSE? 30 END OF PARSER; 31 ,382

The second excerpt is the class and parser of SERIAL (page 371):

a repeat with a fairly long test, and a tJlTee-case transform of

which the first is the end of a repeat, the second is earlier

stages of the repeat, and the third gives a better error-message

than the default message provided by CF1. Note that the two

tests LHS==NONE and LHS=/=NONE are logically complementary:

this is a simple case where a post-CFl Simula-to-Simula optim-

iser could delete the if LHS=/=NONE then and the matching else

part - that is, the "error message" case would be removed. As

observed in 8.7, this optimiser could itself be written in

CF1 (but would be best preceeded by SIMPRE).

'LA %4A(310) 9-JUL-1981 20:09 PAGE 5 9-JUL-1981 20:02 SERIAL

78 CLASS C_SERIAL(LHS)/REF(C_SERIAL)LHS; BEGIN 79 REF(C.UNIT)A_UNIT| 80 81 PROCEDURE T.C1 BEGIN 82 IF LHS»»NONE THEN BEGIN 8 3 A.UNIT.T.C! 84 END 85 ELSE IF LHS«/«NONE THEN BEGIN 86 LHS.T.Cj 87 A.UNIT.T.C! 88 END 89 ELSE BEGIN 90 0T2("••* BAO SERIAL •*••,••)» 91 END 92 END! 93 94 95 PROCEDURE T.D;BEGIN 96 IF LHS**NONE THEN BEGIN 97 A_UNIT.T_D; 98 END 99 ELSE IF LHS*/*NONE THEN BEGIN 100 LHS.T.D! 101 A_UNIT.T_D! 102 END 103 ELSE BEGIN 104 OT2("«M BAD SERIAL *•••,••)/ 105 END 106 END! 107 108 END OF CLASS! 109 110 REF(C.SERIAL)PROCEDURE P.SERIAL! BEGIN REF(C.SERJAL)P! 111 INSPECT NEW C.SERIAL(NONE) DO BEGIN Pt-THIS C.SERIAL! 112 A.lJNITl-P.UNIT! 113 END! 114 WHILE XNO«37 OR XN0»39 OR XN0*41 OR XNO«131 OR XNO«t32 OH XNO*133 OR 115 XNO«134 OR XNO«l OR XN0»3 OR XN0*8 OR XN0*11 OR XN0*126 OR XNO*127 OR 116 XN0*74 OR XN0«62 OR XN0*51 OR XN0=52 OH XN0*56 OR XNO*58 OR XN0»64 OR 117 X{10*66 OR XN0*67 OR XN0«68 OR XN0*96 OR XN0*97 OR XN0*98 OR XN0*99 OR XNO*100 OQ BEGIN 118 INSPECT NEW C„SERIAL(P) DO BEGIN P:-THIS C.SERIAL! 119 A_UNIT:-P_UNIT! 120 END END! 121 P.SERIAL:«P! 122 END OF PARSER! 123 ,383

The third excerpt is the class and parser of STU (page 371): a

Choice, within a. c\\otc&,, with a five-case transform with the last

a redundant error message. Here the source optimiser would have

to treat the third and fourth tests together as Al=3 and

(A2=l or A2=2): it would need to know that the only valid values

of A2 are 1 and 2 so that together the third and fourth tests

are equivalent to Al=3; that the only valid values of Al are

£ 2 and 3, so the fifth case is redundant, the third test can

be reduced to only A2-1 and the fourth test is redundant;

NULA %4A(310) 9-JUL-1981 20X09 PAGE 10 9-JUL-19R1 20X02 STU

381 CLASS C.STU/ BEGIN 382 INTEGER All 383 PEF(C»STX)A.STX| 384 REF(C_STB)A_STB? 385 INTEGER A2 I 386 387 PROCEDURE T„CI BEGIN 388 IF A1*1 THEN BEGIN 389 A_STX,T_C| 390 END 391 ELSE IF Al>2 THEN BEGIN 392 A_STB.T„C; 393 END 394 ELSE . IF (IF Al»3 THEN A2«l ELSE FALSE) THEN BEGIN \ 395 0T2("7JUMP LOOP",""); 396 END 397 ELSE IF (IF Al*3 THEN A2«2 ELSE FALSE) THEN BEGIN 398 OT2("?JUMP TEST","")/ 399 END 400 ELSE BEGIN 401 0T2("*** BAD STU *•*",••); 402 END 403 END) 404 405 FND OF CLASS) 406 407 RFF(C.STU)PROCEDURE P.STU)INSPECT NEW C.STU DO BEGIN 408 P.STUX-THIS CESTUI 409 IF XN0=131 OR XNO*132 OR XNO*133 OR XN0«134 OR XN0*1 OR XN0«3 OR XN0»8 OR 410 XNOsll OR XN0«126 OR XNO«127 OR XNO»74 THEN BEGIN 411 A1X*1) 412 A_STXX-P^STX| 413 END 414 ELSE IF XN0«62 THEN BEGIN 415 Al:*2) 416 A.STRX-P.STBl 417 END 418 ELSE IF XN0«51 THEN BEGIN 419 Al:*3; 420 XIN; 421 IF XNO*37 THEN BEGIN 422 A2X*1» 423 XIN; 424 END 425 ELSE IF XNO*39 THEN BEGIN 426 A2X«2| 427 XIN; 428 F.ND 429 FLSE EXPECT("•LOOP' OR •TEST'")) 430 FND 431 FLSE EXPECT("STX OR STB OR •GOTO'")) 432 END OF PARSER) The fourth excerpt is the class and parser of STLk (page 371):

nested. options and choices with a multi-case transform.

The source optimiser would need the same strength as for STU,

but with a higher degree of nesting. The right hand sides of

the cases call for transforms of faked trees: the functions

with the names of form F__... are the fakers fDr the syntactic

class whose name is included in that of the faker.

IULA %4A(310) 9-JUL-1981 20X09 PAGE 8 9-JUL-1981 20X02 STL

233 CLASS C.STL; BEGIN 234 INTEGER All 235 REF(C.EXP)A.EXP| 236 INTEGER A2; 237 REF(C.STU)ATHEN.STU| 238 BOOLEAN A3I 239 BOOLEAN A4; 240 REF(C.ST)AELSE1.ST| 241 PF.F(C.STZ)ATHEN STZ» 242 BOOLEAN A5; 243 PEF(C.ST)AELSE2.ST| 244 REF(C.STZ)ANONTF.STZ| 245 246 PROCEDURE T.ClBEGIN 247 IF (IF Alsl THEN (IF A2«l THEN (IF A3 THEN A4 ELSE FALSE) ELSE FALSE) ELSE FALSE) THEN bEGIN 248 0T2("7REGIN IF","")l 249 A.EXP.T.L; * 250 OT2C7JFALSE ELSE",""); 251 ATHEN.STU.T.Cl 252 0T2("7JUMP EXIT",""); 253 0T2("LAB ELSE",""); 254 AELSE1.ST.T_C; 255 0T2("LAB EXIT",""); 256 0T2("7END IF",""); 257 END 258 ELSE IF (IF Al*l THEN (IF A2«l THEN (IF A3 THEN NOT A4 ELSE FALSE) ELSE 259 FALSE) ELSE FALSE) THEN BEGIN 260 F.STL(1,A.EXP,1,ATHEN.STU,TRUE,TRUE,F.STtFALSE,0,F.STL<2,NONE,0,NONE, 261 FALSE,FALSE,NONE,NONE,FALSE,NONE,F.STZ(1,F.STU(2,NONE,F.STB(FALSE,NONE),0) 262 ,NONE,NONE))),NONE,FALSE,NONE,NONE).T.CI 263 END 264 ELSE IF (IF Al«l THEN (IF A2»l THEN NOT A3 ELSE FALSE) ELSE FALSE) THEN BEGIN 265 F.STL(1,A.EXP,1,ATHEN.STU,TRUE,TRUE,F.ST(FALSE,0,F.STL(2,NONE,0,NONE, 266 FALSE,FALSE,NONE,NONE,FALSE,NONE,F.STZ(1,F_STU(2,NONE,F.STB(FALSE,NONE),0) 267 ,NONE,NONE))),NONE,FALSE,NONE,NONE)«T.C| 268 END 269 ELSE ir (IF A1>1 THEN A2«2 ELSE FALSE) THEN BEGIN 270 f.STL(1,A.EXP,1,F.STU(2,NONE,F.STB(TRUE,F.SERIAL(N0NE,F.UNIT(1,F.ST(FALSE, 271 t»'»F.STL(2,NONE,0,NONE,FALSE,FALSE, NONE,NONE,FALSE,NONE, ATHEN.STZ) ) ,NONE) ) ) 272 ,0),TRUE,TRUE,F.ST(FALSE,0,F.STL(2,NONE,0,NONE,FALSE,FALSE,NONE,NONE, 27 3 FALSE, NONE, F.STZ( 1 *F.STU(2,NONE,F.STB(FALSE,NONE) ,0),NONE,NONE))), NONE,FALSE,NONE»MONE) • T.lCl' 274 END 275 ELSE IF (IF A!*l THEN (IF A2«3 THEN NOT A5 ELSE FALSE) ELSE FALSE) THEN BEGIN 276 END 277 ELSE IF Al«2 THEN BEGIN 278 ANONIF.STZ.T.C; 279 END 280 ELSE BEGIN 281 0T2("•** BAD STL ••*",""); 282 END 283 END; 284 T: ,385

,A %4A(310) 9-JIII.-1981 20109 PAGE 8-1 9-JUL-1981 20IP2 STL

285 END OF CLASS? 286 287 REF(C.STL)PROCEDURE P.STL?INSPECT NEW C.STL DO BEGIN 2RB pj,5TLJ-THIS C.STL; 289 TF XNO«41 THEN BEGIN 290 Altai; 291 X IN; 292 A.EXPt-P.EXP; 293 CHFKI43); 294 IF XNO«13l OR XNO*132 OR XNO*133 OR XNO«134 OR XNOal OR XNO«3 OR XNO«8 OR 295 XMO=ll OF XN0»126 OR XNO«127 OR XNO«74 OR XNO«52 OR XNO*51 THEN BEGIN 296 A2;ai; 297 ATHEN.STUt-P.STU7 29R IF XNOs45 THEN BEGIN 299 AltsTRUE; 300 XlN; - 301 IF XNOs37 OR XNO«39 OR XNO«41 Op XNO«131 OR XNOal32 OR XNO*133 OR XNOai34 302 OP XNOal OR XNO«3 OR XNO*8 OR XNOal1 OP XNOal26 OR XNOal27 OR XNO«74 OR 303 XNOa62 OR XNOaSl OR XN0a52 OR XN0a56 OR XNOa58 THEN BEGIN 304 A4:=TPUE; 305 AELSE1.ST7-P.ST; 306 END; 307 END? 308 END 309 ELSE IF XNOal31 OR XNOal32 OR XNOal33 OR XNOal34 OR XNOal OR XN0a3 OR 310 XMO.R OR XNOrlj OR XN0a]26 OR XN0al27 OR XN0a74 OR XN0ab2 OR XN0«51 OR 311 XNO«52 OR XN0=56 OR XN0a58 THEN BEGIN 312 A2:=2? 313 ATHEN.STZ *-P.STZ7 314 END 315 ELSE IF XN0«45 THEN BEGIN 316 A2:«3; 317 XIN? 318 IF XNOa37 OR XNOa39 OR XN0a41 OR XN0al31 OR XN0al32 OR XMOsl33 OR XNOal34 319 OP XNOal OR XN0a3 OR XNOaR OR XNO«tl OR XN0al26 OR XN0al27 OR XNOa74 OR 320 XNOS62 OR XN0*51 OR XNO=52 OR XN0s56 OR XN0a58 THEN BEGIN 321 A5;aTRUE? 322 AELSE2.ST:-P.ST; 323 END; 324 END 325 ELSE EXPECTCSTU OR STZ OR 'ELSE'")! 326 END 327 ELSE IE XN0»131 OR XN0al32 OR XNOal33 OR XN0al34 OR XNOal OR XN0a3 OR 328 XN0a8 OR XNOal1 OR XN0al26 OR XN0al27 OR XN0a74 OR XN0a62 OR XN0a51 OR 329 XNOa52 OP XN0a56 OR XN0a58 THEN BEGIN 330 AI:*2; 331 ANONIF.STZ * —P.STZ? 332 END 333 ELSE EXPECT!••IF' OR STZ")7 334 END OF PARSER? 335 336

As an example of a faker, the following is that tf STL itself.

3029 REF(C.STL)PROCEDURE F.STL(PI,P2,P3,P4,P5,P6,P7,P8,P9,P10,PI1) 3030 ; INTEGER PI ?REF(C.EXP) P2; INTEGER P3?REF(C.STU)P4;BOOLEAN P5;B00LF.AN P6? 3031 REF(C.ST)P7;PEF(C.STZ)P8? BOOLEAN P9;REF(C.ST)P10?REF(C.STZ)P11 3032 ;INSPECT NEW C.STL DO BEGIN F.STLJ-THIS C.STL 3033 ;A1:*Pl?A.EXP»-P2?A2:xP3;ATHEN.STUt-P4;A3:=P5?A4:aP6;AELSEI.ST:-P7; 30 34 ATHEN.STZ:-P8?A5JaP9;AELSE2.STt-Pl0?ANONIF.STZ:-PI1 3035 END; 3036 ,386

The final excerpt shows the two-line main program (the value 15

for Max_XC_ord is needed by an intelligent error routine,

which distinguishes between lexemes with XNo less than 15 as

due to lexic classes, or greater than 15 as due to reserved

lexemes - the error routine is in the pre-written library,

and so needs this value from the fabricated code), The

excerpt.also shows two fakers for lexic classes, and another

kind of "oddment": a test for the equality of the buffers of

two class object representing IDs - the test is safe in that

either or both of the class-object-pointer parameters may be

NONE (that is, a Simula null pointer).

>LA %4A{ 310 ) 9-JUL-19R1 20:09 PAGE 48- 9-JUL-1981 20:<*2 —LEX1C INVERSE ROUTINE 3213 REF(C CINT)PROCEDURE F.CINTILL,AA)!VALUE LL,AAJ TEXT LL,AA) 3214 INSPECT NEW C.CINT DO BEGIN F.CINTI-THIS C.CINTjIXX-COPY(LL)!IYX-COPY(AA) 3215 3216 3217 BOOLEAN PPOCF.DURE SANE_ID(W, X) ;REF(C_ID) W, X J INSPECT V 3218 DO SAME_ID:»IF (X=*NONE) THEN FALSE ELSE (IXSX.IX AND IY=X.1Y) 3219 OTHERWISE SANE.ID:»(X««NONE)t 3220 3221 3222 RFF(C_ID)PROCEDURE F.IDCLL,AA) I VALUE LL,AAjTEXT LL,AA| 3223 INSPECT NEW C.ID 00 BEGIN F.IOt-THIS C.IDlIXt-COPY(LL)IIYl-COPYIAA) END) 3224 3225 3226 3227 3228 max_xc_ord:»15i 3229 3230 P—PROGRAMg OUTlHi 3231 END OF PROGRAM! ,387

APPENDIX S

EXTENDED EXAMPLE OF SECONDARY CODE GENERATION.

The following is the listing of the secondary code generation

trace which was postponed to here from section 7.7.

( ( "MM +4- < "MN2 44 M\3 >>->"< M\4 44 MN5 > >

GENERATING: ( ( "MM 44 ( "M\2 +4 M\3 )>->"< MN4 44 M\5 ) ) FGR VOID \ TRY : dcAIC A M3 i 3 ( A -> "M ) CONFORMING M=>< M\4 4+ M\5 ) A( "MM 4+ ( "M\2 \ \ generating: ( " i\l 44 ( "M\2 44 M\3 ) ) FOR AN-l \ \ \ try: IACCA J J » 1 ( U 44- A > M A CONFORMING A=>( "M\2 44 M\3 ) U^>~M\1 \ \ \ \ generating: ( ~M\2 44 M\3 ) FOR A\-l \ \ \ \ TRY : iaCC A 1 ( U 44 A ) » A CONFORMING A= M\3 U-V"M\2 \ \ \ \ \ XFZ RTRY GIVES: LOAD AN-1» M\3 \ \ \ \ \ ."EN ERATING: "M\2 FGR U\-4 \ \ \ \ \ generated: **ngut*« \ \ \ \ TRY gave nout \ \ \ TRY : IACCA J3t 1 i A H U ) » A CONFORMING U-=: M\3 A->"MN2 \ \ \ \ \ GEN ERATING: "M\2 for AN-l \ \ \ \ \ \ try: LGADICA MJI 3 ~M » A CONFORMING M-"- M\2 \ \ \ V \ \ \ GENERATION EQUIV: A\-l AS AN-l \ \ \ \ \ \ \ GENERATION EOUIV: M\2 AS M\2 \ \ \ \ \ \ TRY GIVES: LOADI AN-l» M\2 \ \ \ \ \ generated: LOADI A\-l» M\2 \ \ \ \ \ GENERATION MISMATCH: M\3 AGAINST U\-5 \ \ \ \ TRY GAVE NOUT \ \ \ \ TRY : TADICA MJ* 3 < "M 44 A > » A CONFORMING A=>M\3 M=>M\2 \ \ \ \ \ XFERTRY GIVES: LOAD AN-1» M\3 \ \ \ \ \ GENERATION EQUIV: M\2 AS M\2 \ \ \ \ TRY GIVES: TADI A\-1CL0AD AN-1» MX33» M\2 \ \ \ \ TRY : taDC A MIf 2 < M 44 A ) » A CONFORMING A=;M\3 M=>"M\2 \ \ \ \ \ XFERTRY GIVES: LOAD A\-lr M\3 \ \ \ \ \ GENERATING: "M\2 for mn-A \ \ \ \ \ \ 3PLITTRY: 2 » M Oi < » A CONFORMING M-! M\3 A->"MN2 \ \ \ \ \ ge; ERATING: "M\2 FOR AN-l \ \ \ \ \ \ TRY: LQADICA MI* 3 "M » A CONFORMING M-?MN2 \ \ \ \ \ \ \ GENERATION EOUIV: AN-l AS AN-l \ \ \ \ \ \ N GENERATION EQUIV1 MN2 AS MN2 \ \ \ \ \ \ TRY GIVES.' LOADI AN-l» MN2 ,388

\ \ \ \ \ generated: loadi a\-l» m\2 v \ \ \ \ generation eguiv: m\3 as m\3 \ \ \ \ try gives: tad a\-iclgadi ax-i» m\2j» m\3 * \ \ \ generated: thdi a\-1cl0ad a\-l» m\33» m\2» tad ax-1cload a\-l» h\33, mx g*c d a\ 1clgadi a\-•1, m\23, h\3 \ \ \ generating: "mm for u\-3 \ \ \ generated: «*nout*« \ \ try gave ngut \ \ try: i acta jji 1 i a f+ u ) a conforming u=> < ~mn2 f+ m\3 ) a^>~mx1 \ \ \ generating: -mm for a\-i \ \ \ x try: loadica m3» 3 "n ;•> a conforming mamxi \ \ \ \ \ generation eguiv: a\-i as a\-i \ \ \ \ \ generation equiv: mm as mm \ \ \ \ try gives: loadi a\—1, m\1 \ \ \ generated: lqadi a\-i, mm \ \ \ generating: c "mx2 •• mx3 > for ux-12 \ \ \ generated: **nowt«* v v try gave nomt \ v tryt tadica n3i 3 ( ++ A > » A conforming A->C ~h\2 •• h\3 > h->hx1 \ \ v generating: < ~hx2 •• «\3 > for ax-1 \ \ \ \ try iacca u3 f 1 ( u ff a ) >> a conforming a=>m\3 u=>~mx2 \ \ \ \ \ xfertry gives: load ax-l, m\3 \ \ \ \ \ generating: ~m\2 for u\-13 \ \ \ \ \ generated: **nout** \ \ \ \ try gave nowt \ \ \ \ try : iacca u3» 1 < a fp u > » a conforming u^>n\3 a=>~mx2 \ \ \ \ \ generating: ~m\2 for a\-l \ \ \ \ \ \ try: loadica m39 3 ~h » a conforming m»>m\2 \ \ \ \ \ \ \ generation equiv: ax-1 as a\-l \ \ \ \ \ \ \ generation equiv: m\2 as m\2 \ \ \ \ \ \ try gives: loadi a\-l» m\2 \ \ \ \ \ generated: loadi a\-lr m\2 \ \ \ \ \ generation mismatch: m\3 against ux-14 \ \ \ \ try gave nowt \ \ \ \ try : tadica m3* 3 < -m ++ a ) » a conforming a=>m\3 m=>m\2 \ \ \ \ \ xfertry gives! load ax-lr m\3 \ \ \ \ \ generation equiv: m\2 as n\2 \ \ \ \ try gives: tadi ax-icload ax-1, mx33 , n\2 \ \ \ \ try : tadca m3» 2 < m ff a > » a conforming a=>h\3 m=>~mx2 \ \ \ \ \ xfertry gives: load ax-1, m\3 \ \ \ \ \ generating: 'm\2 for m\-15 \ \ \ \ \ \ 3plittry: z » m o: <n\2 \ \ \ \ \ \ \ \ \ generation equiv: a\-19 as ax-19 \ \ \ \ \ \ \ \ \ generation equiv: m\2 as m\2 \ \ \ \ \ x x \ try gives: loadi ax-19, m\2 \ \ v- \ \ \ \ generated: loadi ax-19, m\2 \ \ \ \ \ \ splittry ok \ \ \ \ \ \ splittry: k » m 0» literalcm k3 \ \ \ \ \ \ \ generating: ~mx2 for kx-20 ***cant split \ \ \ \ \ \ \ generated: »*nout»* \ \ \ \ \ \ splittry retarget failed ! \ \ \ \ \ generated: dca ax-1?*cl0adi ax-19f hx23* mv-13 \ \ \ \ try gives: tad ax-icload ax-1» mx33, mx-15*cdca ax-19*cl0adi ax-19 » m\23a \ \ \ \ try : tadca m3i 2 < a fp m > » a conforming m=>m\3 a»>"m\2 \ \ \ \ \ generating: ~m\2 for a\-l \ \ \ \ \ \ try: loadica m3> 3 "h » a conforming m->m\2 \ N \ \ \ \ x generation equivj ax-l as ax-1 \ \ \ \ \ \ \ generation equiv: m\2 as m\2 \ \ \ \ \ \ try gives: loadi a\-l» m\2 \ \ \ \ \ generated: loadi ax-l, mx2 \ \ \ \ \ generation equiv! m\3 as m\3 \ \ \ \ try gives: tad ax-1cloadi a\-l» m\23, mx3 \ \ \ generated: tadi a\-1cl0ad a\-lt mx33, mx2j tad A\-1cl0ad A\-i» mx33» n\-13«l tad \-lcloadi a\- i, m\ 23, m\3 \ \ \ gemeration eguiv: mm as nm \ \ try gi ves: tadi .•a-1ctadi a\-1cl0ad ax-1, mx33, m\2i tad ax-icload ax-lt m\33» 153? tad a\-1cloadi a\ -t» m\23» m\33, mm ,389

N \ N TRY I T ADC A M3 I 2 < H f+ A ) » A CONFORMING A*>C ~MN2 F-f MN3 ) M*>~MN1 N \ \ N generating: < ~m\2 ++ mn3 ) for A\-l N \ N N \ \ N N \ TRY: IACCA U3» 1 ( U A > » A CONFORMING A=>MN3 U->~MN2 \ N N N \ \ XFER TRY GIVES: LOAD AN-l. MN3 N N N N \ \ GENERATING: -M\2 FOR UN-22 \ N N N \ generated: **nout** \ N \ N \ try cave nout \ \ \ N \ try: iacca u3 j 1 ( a u ) » a conforming u->m\3 A=>~M\2 \ N N N \ \ generating: ~mn2 for A\-I N N \ \ \ try: loadica ml) 3 "M » a conforming m=>M\2 N N \ \ \ \ generation eouivj A\-l as a\-l \ N N N N N N \ \ \ \ generation equivi m\2 as m\2 N \ \ \ try gives: loadi an-l. M\2 N N N N \ \ generated: loadi an-l. m\2 N N N N \ \ generation mismatch? M\3 against u\-23 N N N N \ try gave nowt \ N \ \ \ TRY: TADICA MI? 3 < ~M A > » A CONFORMING A=! MN3 M=>M\2 N N \ N \ \ XFERTRY GIVES? LOAD AN-l. M\3 \ N N N \ \ GENERATION EQUIV? M\2 AS M\2 " N N \ N \ TRY GIVES: TADI AN-lCLOAD AN-l. MN33» M\2 \ N \ N \ TRY: TALCA M3J 2 ( M A ) » A CONFORMING A=>MN3 M=>~MN2 N N N N \ \ XFERTRY GIVES: LOAD AN-l. M\3 N N N N \ \ GENERATING: ~MN2 FOR M\-24 N N N N \ \ \ splittry: z » m o; <;-cz m3 N N N N \ \ \ \ generating: "mn2 for zn-25 ***cant split N N N \ \ \ \ \ generated: «*ngut«* N N N N \ \ \ splittry retarget failed \ N N N \ \ \ sflittry: u » m Of < .literalcm n30 0necu k3>: cu h3 N N N N \ \ \ \ generating: ~mn2 for un-26 ***cant split N N N N \ \ \ \ generated: **nout*« N N N N \ \ \ SPLITTRY RETARGET FAILED N N N N \ \ \ splittry: z » mnd 3. < .dcacand mnd3cclaca z3»cz mnd3 N N N N \ \ \ \ generating: ~mn2 for zn-27 ***cant split N N N N \ \ \ \ generated: **NCJT** N N N N \ \ \ sflittry retarget failed N \ N N \ \ \ splittry: and ;•> m\d 2f dcacand mnd3 N N N N \ \ \ n generating: "mn2 for an-23 N N N N \ \ \ \ \ TRY: LGADICA M3» 3 "M » A CONFORMING M=>MN2 N N N N \ \ \ \ \ \ GENERATION EGUIV: AN-2S AS AN-2S N N N N \ \ \ N N \ GENERATION EOUIV: M\2 AS MN2 N N N N \ \ \ \ \ TRY GIVES? LOADI AN-23. MN2 N N N N \ \ \ N GENERATED: LOADI A\-23» MN2 N N N N \ \ \ SFLITTRY OK N N N N \ \ \ splittry: k » m o; literalcm k3 N N N N \ \ \ \ generating: ~mn2 for kn-29 **«cant split N N V N \ \ n n generated: **nout»* N N N N \ \ \ splittry retarget failed N N N N \ \ GENERATED: DCA AN-23*CL0ADI AN-28. MN231 MN-24 N N N N \ TRY GIVES? TAD AN-lCLOAD AN-l. MN33» M\-24«CDCA AN-23*CL0ADI A\-23» MN23* i N N N N \ TRY: TADCA M3; 2 ( A f+ M ) » A CONFORMING M=>MN3 A=::"MN2 N N N N \ \ generating: -:i\2 for an-i N N N N \ \ N TRY: LOADICA Mi: 3 ~M » A CONFORMING M=>M\2 N N N N \ \ \ \ GENERATION EOUIVJ AN-l AS A\-l N N N N \ \ \ N GENERATION EQUIVJ MN2 AS M\2 N N N N \ \ \ TRY GIVES? LOADI A\-l» M\2 \ N N N \ \ GENERATED: LOADI A\-l» M\2 \ \ N N \ \ GENERATION EQUIVJ MN3 AS M\3 \ N N N \ TRY GIVES: TAD AN-1CLOADI AN-l» MN23» M\3 N N N N GENERATED: TADI A\-1CL0AD A\-l» M\33> M\2» TAD AN-lCLOAD A\-l» MN33. M\-24«CD' TAD AN- 1CLOADI AN-lt M\23» MN3 N N N N GENERATING: "MN1 FOR MN-21 N N \ N \ splittry: z » m 01 vxiteralcm kiczerocz k3 ncz m3 N N \ \ \ N generating: -MNI for ZN-30 ***cant split N N \ N \ n generated: **nowtm N N \ N n sflittry retarget failed \ N N N n SPLITTRY: J M Of : LITERALCM KIsUONECU K3> CU M3" N N N N n \ GENERATING: ~MN1 FOR un-31 ***CANT SPLIT N N N N n \ GENERATED: **mcutit* \ \ N \ n SPLITTRY RETARGET FAILED N N N N n sflittry: z >;• m\d 3; .dcacand MNDTOCLACA z3>;: cz mnd3 \ N \ N n n generating: ~m\1 for zn-32 ***cant split N N N N \ n generated: **nout*« N \ N \ \ splittry retarget failed N N N N \ splittry: and mnd 2» dcacand mndi N N N N \ n generating: ~m\l for an-33 N N N N \ \ \ TR i': LOAD I LA M3» 3 "*M » A CONFORMING M=>MN1 N N N \ N N \ \ GENERATION EQUIV: AN-33 AS AN-33 N N N N \ N N GENERATION EGUIV: MN1 AS MN1 N N N \ N N N TRY GIVES: LOADI AN-33. MN1 \ N N N \ N GENERATED: LOADI AN-33. MN1 N N N N N SFLITTRY OK N N N N N SPLITTRY: K » M Of LITERALCM K3 N N N N \ N GENERATING: ~MN1 FOR KN-34 ***CANT SPLIT \ \ N N \ n generated: **nout«* ,390

\ \ \ \ \ SPLITTRY RETARGET FAILED \ \ \ \ GENERATED: DCA AX-33*CL0ADI AX-33, MX1I, M\-21 \ \ \ TRY GIVES: TAD AX-i CTADI AX-1CL0AD A\-lt MX3I, MX2* TAD AX-1CL0AD AX-l, HX33, M\ 41* TAD AX-iCLOADI AX-l, MX2I, MX3I, MX-214CDCA A\-33*CL0ADI AX-33, MX1I, MX-213 \ \ \ TRYJ TADCA MI* 2 ( A ++ M ) » A CONFORMING M=>< ~MX2 ++ M\3 > A=>~M\1 \ \ \ \ generating: ~mxi for a\-i \ \ \ \ \ TRY: LOADICA Mi: 3 ~M » A CONFORMING N=>M\1 \ \ \ \ \ \ generation equiv1 A\-l AS A\-l \ \ \ \ \ \ generation equiv: M\1 AS M\1 \ \ \ \ \ try gives: LOADI A\-l» MM \ \ \ \ generated: loadi ax-i, mm \ \ \ \ generating: < "MX2 ++ M\3 ) FOR MX-35 \ \ \ \ \ sflittry: z » m o: < for an-39 N N N N N N N TRY: IACCA Ui: 1 < U 44 A > » A CONFORMING A=>MN3 U=>~M\2 NNNNNNNN XFERTRY GIVES? LOAD AX-39, MN3 NNNNNNNN GENERATING: "MN2 FOR UN-40 N N N N N X N X GENERATED*. **NOUT«* N N \ N N N N TRY GAVE NOUT N N N N N N N TRY.* IACCA UI* 1 ( A 44 U > » A CONFORMING U=>MN3 A=>~M\2 X X \ \ NX X X GENERATING: "MX2 FOR AX-39 X X v \ X X X X x TRY: LOADICA MI* 3 ~M » A CONFORMING M=>M\2 XXX XXXXXXX GENERATION EQUIV: AX-39 AS AX-39 XXXXXXXXXX GENERATION EQUIV: MX2 AS MX2 XXXXXXXXX TRY GIVES: LOADI AX-39, MX2 XXXXXXXX GENERATED: LOADI AX-39, MX2 XXXXXXXX GENERATION MISMATCH: MX3 AGAINST UX-41 X X X X X \ X TRY GAVE NCUT x \ \ x x x x try: tadica mi: 3 < ~M 44 a > » a conforming a=>mx3 m^mx2 XXXXXXXX XFERTRY GIVES* LOAD AX-39, MX3 XXXXXXXX GENERATION EQUIV? MX2 AS MX2 X X X X X X X TRY GIVES? TADI AX-39CL0AD AX-39, MX3I, MX2 X X X X X X X TRY: TADCA MI* 2 < M 44 A > >> A CONFORMING A»>M\3 M=>~MX2 XXXXXXXX XFERTRY GIVES: LOAD AX-39, MX3 XXXXXXXX GENERATING: ~MX2 FOR MX-42 XXX XXXXXX SPLITTRr: Z » M O» < > > > ^ \ \ \ \ \ GENERATING: ~M\2 FOR ZX-43 **«CANT SPLIT ^XXXXXXX GENERATED: **NOUT*t \\\X X\\XX SPLIT TRY RETARGET FAILED XXXXXXXXX 3PLITTRY: A\D » MXD 2* DCACAXD MXD3 \N\\\\\\\\ GENERATING: ~M\2 FOR AX-46 \\v\\\\\xx\ try: loadica mi* 3 ~m » a conforming m-i- mn2 \\\\\\\\\\\\ generation equivj a\-46 as ax-46 \\\x\\x\\\\\ generation eguivj mx2 as mx2 \\\\\x\x\x\try gives: l3adi a\-46»mn2; " \\\x\\\\\\ generated: loadi a\-4o, mx2 xxxx xxxxx splittry ok XXXXXXXXX 3FLITTRYJ K » M 0* LITERALCM K3 XXXXXXXXXX GENERATING: "MX2 FOR KX-47 ***CANT SPLIT xxxxxxxxxx generated: **nout*« xxxxxxxxx splittry retarget failed XXXXXXXX GENERATED.* DCA AX-464CL0ADI AX-46, MX2I, MX-42 X X \ \ X \ X TRY GIVES: TAD AX-39CL0AD AX-39, MX3I, M\-42*CDCA A\-46»CL0ADI AX X X X X X X X TRY: TADCA MI* 2 < A 44 M > » A CONFORMING M=0 HX3 A»>~MX2 XXXXXXXX GENERATING: "MX2 FOR AX-39 X XXXXXXXX TRY: LOADICA MI? 3 "M » A CONFORMING M->MX2 XXXXXXXXXX GENERATION EQUIV? AX-39 AS AX-39 \\\\\\\\\\ GENERATION EOUIV? MX2 AS MX2 \\XX\\XXX TRY GIVES: LOADI AX-39, M\2 XXXXXXXX GENERATED: LOADI AX-39, MX2 \\\XXXXX GENERATION EOUIVJ M\3 AS MX3 \ \ \ x X X X TRY GIVES: TAD AX-39CLCADI AX-39, MX2I, M\3 \ \ x X X \ GENERATED: TADI AX-39CL0AD AX-39, MX3I, MX2* TAD AX-39CLSAD AX-39, M\ I, MX-42I; TAD AX-39CL0ADI AX-39, MX2I, MX3 ,391

\ \ \ \ \ SPLITTRY OK \ \ \ \ \ splittry: k » m of literalcm nj \ \ \ \ \ \ generating: ( ~M\2 H- M\3 ) FOR N\-43 *«»CANT SPLIT \ \ \ \ \ \ generated: **nout** \ \ \ \ \ splittry retarget failed \ \ \ \ generated: sca ax-3?*ctadi ax-37cl0ad a\-39» M\33» MX2» tad ax-37cl0ad ax-39» /I mx23 » mx-423t tad ax-39cl0adi a\-39f m\23, mx33» mx-35 \ \ \ try gives? tad a\-1cloagi a\-l» mx13» M\-35*cdca a\-39*ctadi AX-37cl0ad a\-39» .1X33* \-42tcdca a\-t6»clcadi a\-4af mx23» mx-423? tad ax-39cl0adi a\-39f M\23» m\33» mx-353 \ \ generated: tadi ax-1ctadi ax-icload a\-l» mx33 » m\2» tad ax-icload a\-l» mx33» MX-134cd. tad A\-1 cloadi a\-l» m\23» mx3if mxl? tad ax-1ctadi ax-iclcad a\-lr mx33» MX2» tad AX-icload a\-28» mx231 mx-243j tad ax-1cloadi a\-l, mx23» m\33r mx-21*cdca ax-334clcadi ax-33* M\13» MX- 35»cdca ax-374c tadi ax-39clgad a\-39» m\33f mx2? tad ax-39cl0ad a\-39» m\33r MX-424cdca ax-464ci 9clgadi a\-39» mx23» m\33r mx-353 \ \ generating: < m\4 f+ mx5 ) fgr mx-2 V \ \ SPLITTRY: Z » M Or < XITERALCM K3CZEROCZ K3»CZ M3 \ V \ \ GENERATING: C M\4 ++ M\5 ) FOR Z\-49 •••CANT SPLIT \ \ \ \ GENERATED: **NOUT»* ' \ \ \ SPLITTRY RETARGET FAILED \ \ \ SPLITTRY: U » M Of < XITERALCN K3 OONECU K3>>CU M3 \ \ \ \ GENERATING: ( M\4 I-+ M\5 ) FOR U\-50 X**CANT SPLIT \ \ \ \ GENERATED: **NOUT*t \ \ \ SPLITTRY RETARGET FAILED \ \ \ splittry: z :> mxd 3i <M\4 \ \ \ \ \ \ XFERTRY GIVES: LOAD A\-52» M\5 \ \ \ X X \ GENERATION MISMATCH: N\4 AGAINST U\-53 \ \ \ \ \ TRY GAVE NOUT \ \ \ \ \ TRY: IACCA U3r 1 ( A • + u ) » A CONFORMING U^ MX5 A-OMX4 \ \ \ \ \ \ XFERTRY GIVES.' LOAD AX-52. MX4 \ X X X X X GENERATION MISMATCH? MX3 AGAINST UX-34 \ \ \ \ \ TRY GAVE NOUT \ \ \ \ \ TRY: TADCA M3» 2 < M A ) » A CONFORMING A=>MX3 M=;M\4 \ \ \ \ \ \ XFERTRY GIVES? LOAD A\-52» M\3 \ \ \ \ \ \ GENERATION EQUIV: M\4 AS M\4 \ \ \ \ \ TRY GIVES? TAD AX-52CL0AD A\-52» M\33» M\4 \ \ \ \ \ TRY? TADCA Hit 2 ( A H N ) » A CONFORMING M^>M\3 A»>M\4 \ \ \ \ \ \ XFERTRY GIVES? LOAD A\-32» M\4 \ \ \ \ \ \ GENERATION EQUIVJ M\5 AS M\5 \ \ X X \ TRY GIVES? TAD AX-52CL0AD AX-52* MX43» M\3 \ \ \ \ GENERATED: TAD A\-52CLOAD A\-52» M\53» MX4I TAD AX-52CL0AD A\-52» M\43» M\3 \ \ \ SPLITTRY OK \ \ \ sflittry: k » m o; literalcm k3 \ \ \ \ generating: < M\4 f + M\5 ) FOR K\-33 •••CANT SPLIT \ \ \ \ generated: ««nout»* \ \ \ SPLITTRY RETARGET FAILED \ \ GENERATED: DCA A\-52»CTAD AX-52CL0AD A\-52» MX53» MX4J TAD A\-52CL0AD A\-32t M\43» MX33 \ TRY GIVES? DCAI A\-1»CTADI A\-lCTADI AX-ICLOAD A\-l» M\33» M\2J TAD AX-ICLOAD A\-l» H\33t M* MX-153 ? TAD A\-1 CLOADI A\-l» M\23» M\33» M\l» TAD AX-1CTADI AWCLOAD A\-l» M\33» M\2I TAD A\ • CLOADI A\-2s» M\23» M\-243? TAD AX-1CLOADI A\-l# M\23» M\33r M\-21«CDCA A\-33«CL0ADI A\-33» M\ lip M\-35fCDCA A\-3?>CTADI AX-39CL0AD A\-39» M\33» M\2> TAD AX-39CL0AD A\-39» MX33» MX-42TCDCA AD AX-39CL0ADI AX-39» M\23» MX33» M\-3333F MX-2ICDCA A\-32«CTAD AX-52CL0AD A\-32» MX33» MX4I TA 3 GENERATED: DCAI AX-ltCTADI A\-1CTADI AX-ICLOAD A\-l» MX331 MX2J TAD AX-ICLOAD AX-1» M\33» MN-13 33» TAD AX-1CLOADI AX-1» M\23» M\33» MX1J TAD AX-1CTADI AX-ICLOAD A\-l» M\33» HX2I TAD A\-lCLi ASI AX-231 MX23» MX-2431 TAD AX-1CLOADI AX-1» MX23» MX33» MX-21«CDCA AX-334CL0ADI A\-33» M\13» f MX-354CDCA A\-39«CTADI AX-37CL0AD A\-39» M\33» MX2I TAD AX-37CL0AD A\-39» M\33# HX-42»CDCA A\-4. X-37CLGADI AX-39» M\23» MX33. MX-3533» MX-24CDCA AX-32»CTAD AX-32CL0AD AX-52» M\33» MX4# TAD A\ 392

( ( AM\1 ++ ( *M\2 ++ M\3 ))->*( M\4- 4+ M\S )

LOAD A\-l» M\3= I 2) TADI AN-i M\2= C 3)

LOAD A\-i= M\3« ( 2) LOADI AW9a M\2= ( 3) DCA A\-19* MW5= ( 2) TAD A\—1 M\-1S* ( 2) LOAOI A\-l* M\2= C 3) TAD A\-i h\3= ( 2) TADI A\-l H\l= < 3)

LOAD A\-ia M\3= < 2) TADI A\-l M\2= ( 3)

LOAD A\-l= H\3» ( 2) LOADI A\-28» M\2=» ( 3) DCA A\-2$* M\-24- ( 2) TAD A\-l MW4* ( 2) LOADI A\-ia M\2» < 3) TAD AV-i M\3= ( 2) LOADI A\-33=» h\l» ( 3) DCA A\-33* MV-21- ( 2) TAD A\-l M\-21* L 2)

LOADI A\-l» M\l* < 3) LOAD A\-39= M\3= < 2> TADI A\-39 H\2«* ( 3)

LOAD A\-39— M\3= ( 2) LOADI A\-4A=» M\2= ( 3) OCA A\-4A* M\-42= ( 2) TAD A\-39 MX-4-2* ( 2> LOADI A\-39= M\2«- < 3) TAD A\-3? M\3= ( 2> DCA A\-39* M\-35== ( 2) TAD AW NWS* < 2) LOAD A\-52* M\5- ( 2) TAD A\-52 M\4- C 2)

( 2) 393 i APPENDIX U

UNCOL DIAGRAMS

UNCOL diagrams (also called MT-diagrams) provide a notation for illustrating the inter-relationships within sets of translations.

Other diagrammatic notations have been devised: all have defects:

I use UNCOL notation because it appears to be the most widely-known.

For the purposes of this appendix a "translator" is a generator,

compiler, or assembler. In examples I name programs "Z"? programming languages "P", machines and machine codes M: when the nature of a language does not matter I will use a number in place of a name.

A program other than a translator is drawn as a vertical bar inside which is written:-

* at the top: the name of the program; and M * at the bottom: the language it is in.

A machine is represented as an equilateral triangle with one vertex pointing downwards, inside which is written the name of the program. V

A program runnable on a machine is illustrated by placing the bottom of the bar representing the program on top of the triangle representing M the machine: the name of the language in the bottom of the bar must match the name of the machine. v 394 » « i

A translator is represented by a T-shape:

inside it is written:

* at the left of the cross-piece: the name

of the language translated from;

* at the right of the cross-piece: the name of the language

translated to; and

* at the bottom of the vertical: the name of the language the

tanslator is in.

The diagram illustrates a translator from 1 to 2 in 3.

To illustrate the translation of a

program the program-from is drawn with z Z the bottom of its vertical touching the

left side of the cross-piece of the T of p P M M

the translator, the program-to is drawn similarly on the right. The languages- V from languages-to and program-names must match. The diagram illustrates a compilation of a program Z from

programming language P to machine code M using a compiler which

runs on the machine M.

Translators must themselves be programmed, and so will need

translation. The diagram illustrates the most general single

translation-of-a-translator: starting'with a translator " -

from 1 to 2 in 3 1 2 1 2 to obtain a translator

from 1 to 2 in 4 3 3 4 4 using a directly runnable translator

from 3 to 4 in M. M

W ,395

If a compiler from P to M p M P M were written in a compiler- writing language C, and we c C M M had a compiler from C to M on M then we could M V produce a runnable compiler for P to M in M.

Composing more than three T's we can illustrate quite complex sets of translations and cross-translations (U ), as shown in section 2.5 of this thesis.

For other information on T-diagrams and related notations for

translator interactions see ^BRAT6l7 ^EARL797 ^KLA687 and

(U) A cross-translation is a translation that is run on one machine (Ml in the M2 diagram) to produce output runnable on another macnine (M2 in the diagram). Ml V 397

BIBLIOGRAPHY

Many of the entries in this biblioqraphy are annotated with one or more short remarks on the entry. Each entry is also annotated with the numbers of sections to which the entry is relevant: if the entry is cited in the text, then the paqe number of the citation is also qiven.

The biblioqraphy is followed by a cross-index from section number to the bibliqraohy entries relevant to it.

The followinq abbreviations are commonly used in the biblioqraohy entries: — CA — Computer Abstracts. — CACM — Communications of the ACM. — CMU — Carneqie-MeI Ion University, Deoartment of Comnuter Science. — Como J — BCS Computer Journal. — CR — ACM Comouter Reviews. — DAI — Dissertataion Abstracts InternationaI. — IEEE SE — IEEE transactions of Software Enqi neerinq. — J ACM — Journal of the ACM. — SIGPLAN — ACM Siqplan notices. — SWP^E — Software Practice and experience. — TOPLAS — ACM transactions on Droqramminq lanauaqes and systems.

CABR791 Abrahams. The maD colorinq problem. SIGPLAN 14:4 10-11. — oroqramminq exercise relevant to reqister a Ilocat ion. — see sections 6.7.2 and A.1.3.1.

TAH0701 Aho, Sethi, & Ullman. A formal aoproach to code optimisation. SIGPLAN 5:7 86-100. — see section C.2.

TAH0711 Aho A.V., S Ullman J.D. Translations on a context free qrammar. Information Control 19 439-475. — see section A.2.1 (paqe 3261. 990

>"AH0751 Aho A.V., Johnson S.C., ft Ullman J.D. Deterministic oarsing of ambiguous grammars, CACM 18:8 441-452. — see section A.3.4 (Dane 341).

CAH0761 Aho A.V., ft Johnson S.C. ODtimal code generation for exoression trees J ACM 23 488-501 . •— see section A.3.1 (oaqe 338). rAH0761 Aho, Johnson, 8 Ullman. Code generation for exDressions with common sub-exoressions. Proc 3rd ACM Symp Princ Prog Langs 19-31. — see section C.2 (oage 347).

CAH077'1 Aho, 3 Sethi. How hard is r.omoiler code generation? Automata Languages and Programming, 4th colloq., CEds Saloma.a 3 Steinby: Springer Lecture nptes in Comp.Sci. number 52.1. -- see section C.2 (page 347).

TAH0771 Aho, Johnson, 3 Ullman. Code generation for machines with multi-register operat i ons. Proc 4th ACM symp orinc oroq langs 21-28. — see section C.2 (oaqe 347).

(ARS79) Arsac J.J. Syntactic source to source transforms and program manipulation. CACM 22:1 43-54 — see section 4..4.10 (page 113).

CAUS781 Auslander, & Strong. Systematic recursion removal. CACM 21:2 127-134. , — see section 8.2.2 (page 298)

CBAG703 Bagwell. Local optimisations. SIGPLAN proc symp compiler optimisation — see section 8.2.2 (page 298). 399 rBAL791 Ball J.E. Predicting the effects of optimisation on a procedure body. SIGPLAN 14:8 214-220. — Deciding when to inline: needs an execution orof ile. — see section 8.3.2 (page 300).

CBAR77] Barbacci M., Barnes G. , Cattell R.G.G., 8 Siewiorek D. ISPS reference manual. CMU-CS-TR-79-1 3 7. — spp section 8.4.3.

CBAR781 Barbacci M.R., 8 Parker A. Using emulation to verify formal architecture descript ions. IEEE "computer" magazine. May 1978, 51-56. — see section 2.3.6 (page 45). and section 8.4.3 (page 304).

EBEN64] Bennett, 8 Neuman. Extension of existing compilers by soohisticated us° of macros. CACM 7:9 541-542. — see section 8.2.3 (page 299).

CBIR77'1 Bird R.S. Notes on recursion elimination. CACM 20:6 434-441. — see section 8.2.2 (page 293).

CBIR7/H Bird R.S. Improvino proqrams by the introduction of recursion. CACM 20:11 356-863. — see section 8.2.1 (page 297). 400 rBIR793 Bird R.S. Recursion elimination with variable Darameters. Conp J 22:2 151-154. — see section 8.2.2 (oaqe 298).

C3IT74] Bitner J.R. Use of macros in backtrack programming. Illinois U UIUCDCS-R-74-637. — see section 8.2.5 (oaqe 299).

CBIT751 Bitner J.R., 3 Reinqold E. "1. Backtrack-nroqramming techniques. CACM 18:11 651-656. — Macros for backtrackino : examples of depth-first-search, 8 queens. — see section 8.2.5 (oaqp 299). r30Y721 Boyce R.F. Topological reorganisation as an aid to nroqram s imoli fi cat ion. DAI 72-50860, and Purdue U. PhD thesis. — translates NELIAC sourc3 to Aikin dynamic alqebra (goto free) and transforms that: imDlementation shows maroinal benefits. — see rrj00711. — see sections 6.7.1 (page 257) and 8.2 (oaqe 296).

T3RA61"! Bratman. An alternative form of the UNCOL diaqram. CACM 4:3 142-. — see section U (oaqe 595).

TBRE791 Orelaz. New methods to color the vertices of a graph. CACM 22:4 251. — relevant to register allocation. — man colouring as an NP-comolete problem: comparison of methods. — see sections 6.7.2 and A.1.3.1. 401.

CBRO/11 Brown, Cald»rbank, 8 Pooler Some comments on the portability of a large Algol program. SWP8E 1 567-571 . — SID on KDF9. — see section 8.5.6 (oaqe 502).

CDR0811 Brosqol B.M. TCOL/ADA and the "middle end" of the PQCC Ada compiler. SIGPLAN 15:11 101-112 (Proc ACM-SIGPLAN symo on the Ada proq lang). — see section A.1.2 (page 519).

CQR00621 Brooker R.A., 8 Morris D. A aeneral translation oroqram for phrase structure lanquaqes. J ACM 9:1 1-10. — see section 2.5.6 (page 45).

C3R00631 Brooker R.A., Morfos D., 8 Rohl J.S. Th° Comoilei—compiler. Annual Reviews in Automatic Programming, Vol 5 p229-. — see section 2.5.6 (page 43).

! f"BR00671 Brooker R.A., Morfjs D., 8 Rohl J.S. Experience with the Compile!—Compiler. Como J 9:4, 345-549. — s°e section 2.3.6 (page 43). rBR0W671 Brown P.J. The ML/1 Macro Processor. CACM 10:10 618-623. — see section 2.3.5 (oage 39).

TBR0W69T Brown P.J. Usino a macroprocessor to aid software imnlementat ion. Como J 12, 256-331. — see section 2.5.5 (page 39). 402.

CCAM731 Campbell. A comoiler definition facility based on the syntactic macro. Comn J 21:1. -- CR33972. — macros within LL(1). — see section 3.2.3 (page 299).

CCAMP731 Campbell-Kelly. An introduction to macros. Macdonald Computer Science Monoqraohs. -- CR26828. — ONCOL. — see section 2.3.1 (oaqe 32).

CCAR771 Carter J.L. A case study of a new code generation technique for compilers. CACM 20:12 914-920. — CR32363. — see also CHAR/^I. — see section A.3 (oaqe 333).

CCAT77"! Cattell R.G.3. A survey and critique of some models of code qenerat i on. CMU-CS-TR (unnumbered). — see-section A.5 (paqe 338).

CCAT781 Cattell R.G.G. Formalisation and automatic derivation of code aenerators. CMU-CS-78-115 a AD-A058-372/3UC. — Review: SIGPLAN 14:4, 2. — see sections A.1.2 (paqe 318) and A.1.3.2 (paqe 522).

CCAT79"! Cattell R.G.G., Newcomer J.M., a Leverett B.W. Code-qeneration in a machine-indenendant compiler. SIGPLAN 14:8 65-75 symoosium on compiler construct ion. — see section A.1.3.2. 403.

rcatdni Cattell R.G.G. Automatic derivation of code generators from machine descriptions. TOPLAS 2:2 173-190. — see section A.1.3.2. rCHA80] Chaitin G.J., Auslander M.A., Chandra A.K., Cocke J., HoDkins M.E., 8 Markstein P.W. Register allocation via coloring. IBM Research reoort RC3395. — see sections 6.7.2 (oage 242) and A.1.3.1 (Dage 321).

CCHE701 Cheatham T.E., 8 Standish T.A. Optimisation asDects of Compil^r-compil^rs. SIGPLAN 5:10 10-17. — see section 6.7.1 (oaqe 237). rCUE731 Chevance, 8 Heidet. Static profiles and dynamic behaviour of Cobol orograms. SIGPLAN 13:4 44-57. — see section 8.3.2 (page 300). rCLI731 C I i fton. A technique for making structured oroqrams more readable. SIGPLAN 13:4 53-. — orettyprintinq with emboldening. •— flowchartinq. — see section 8.3.1 (page 300).

CC0LE741 Coleman S.S., 8 Waite W.M. The mobile oroqramminq system JANUS. SWP8E 4:1 5-23. — See also THAD731. — see section 2.3.1 (page 32).

rC0LE741 Coleman S.S. JANUS: a universal Intermediate Language. DAI 74-22327 and U. Colorado Dept of Electrical Enq. — see section 2.3.1 (page 32). 404.

TC0N581 Conway M.E. Pronosal for an UNCOL. CACM 1:10 5-8. — see section 2.3.1.

CC0N701 Conrow, ft Smith. NEATER2: a PL/1 source statement re-formatter. CACM 13:11 669-675. — orettyorintinq with elementary source s imoli fi cat ion. -- see section 8.3.1 (naqe 300).

CDAR81] Darlington J. An exnerirnentaI program transformation and synthesis system. Artif Intel I 16,1 1-46. — see section 8.1 (nane 295).

\DAV8Q1 Davison J.U., ft Fraser C.W. The desinn and imnlementation of a retargetable peephole ootimiser. TOPLAS 2:2 191-202: corrigendum T0PLAS 3:1 110. — see also FFRA791. — see section 6.7.1 (page 237). rDEM77;i Demers A.J. Generalised left corner parsing. Fourth ACM symo orinc prog lanqs. — see section 5.3.2 (oane 144). rDER691 DeRemer F.L. Practical translators for LR(k) languaaes. MIT oroject MAC TR-65. — parsing only. — see section 2.3.6 (oage 42).

CDER731 DeRemer F.L. Transformational grammars for languages and comoilers. U Newcastle on Tyne, Deot ComD Sci TR-50. — see section A.2.1 (naqe 326). 405.

CD0073? Donegan M.K. An aporoach to the automatic qeneration of code generators. Rice U., Comn Sci Enq. $ DAI 73-21548. — see CN00793. — see section A.3 (oaqe 338).

CD0N79"! Donegan M.K., al. A code generator generator language. SIGPLAN 14:3 53-64 (Proc SIGPLAN symp compiler construct ion). — see section A.3 (Qaqe 338).

FD0N79] Dongarra J.J, S Hinds A.R. IJn-rollong loops in Fortran. SWP8E 9 219-226. — see section 8.2.2 (oage 298). rEAR701 Earley J. An efficient context free oarsing algorithm. CACM 1.5:2 94-102 (also CMU 1968). — see section 2.3.6 (oaqe 42).

CEARL701 Earley, * Sturqis. A formalism for translator interactions. CACM 13:10 607-617. — T-diagrams. — see section U (oaqe 395). rELS701 Elson f1., 3 Rake S.T. Code-aeneration techniques for larqe-lanquaqe- comoilers. IBM Syst J 9:3 166-188. — see section A.3 (oaqe 338).

FELS781 Elshoff. Investiaation into the effects of the counting method used in Software Science measurements. SIGPLAN 13:2 30-. — see section 8.3.2 (Daqe 300). 406.

CEVA731 Evans R.V., Lockington G.S., 8 Reid T.N. Compiler Compiler methodology for oroblem oriented language compiler imolementation. COMP J 21:2 117-121. — see section 3.2.1 (oaqe 63). rFEL64] Feldman J.A. A formal semantics for computer oriented languages. Carnegie Inst of Tech, Pittsburgh, Pa. — see section 2.3.6 (oage 44).

CFLA791 Fla j olet, Raoult, 8 Vuillemin. The number of registers required for evaluating arithmetic exDressions. Theoretical Como. Sci. 9:1. — see section C.2. rFL0611 Floyd R.W. An algorithm for codinq efficient arithmetic operat i ons. CACM 4:1 42-51. — see section C.2.1 (page 348).

CF0R811 Forqy C.L. 0PS5 Users manual. CMU-CS-81-135. — see section 4.4 (page 91).

CF0S68T Foster J.H. A syntax improving oroqram. ComD J 11:1 31-34. -- see section 2.3.6 (oaqe 42).

CFRA711 Frai ley. A study of code optimisat ion-using-a-general-'• < ouroose comoiler. Perdue U 8 DAI 71-20451. -- see section A.3.1 (page 339). 407.

CFRA77] Fraser C.W. A knowlege-based code generator generator. SIGPLAN 12:3 126-129. — se*3 sections A.1.1 (page 317) and A.3.3 (Dage 339). rFRA771 Fraser C.W. Automatic generation of code generators. Yale Univ, Deot Como Sci. — see sections A.1.1 (Daqe 317) and A.3.3 (oaqe 339) .

CFRA7911 Frailey D.J. An intermediate language for source and target independant ootimi sation. SIGPLAN 14:8 138-200. -- s°e section A.3.2 (page 339).

CFRA791 Fraser C.W. A compact machine-independant peeohole oDtimiser. Proc 6th ACM symp on princ Droq langs 1-6. — see also CDAVam. — see section 6.7.1.

6GAN761 Ganzinger H., Rioken K., 8 Wilhelm R. MUG1 - an incremental compilei—compiler. In: Proc ACM 76 415-418. — see sections A.2 (page 325) A.2.2 (page 328) and A.2.3 (page 335).

CGAN77] Ganzinger H., Ripken K., 3 Wilhelm R. Automatic generation of optimising multipass comoilers. In: IFIP 77 (ed Gilchrist) 535-540. North Holland. — No mention of feedback problems. — see sections A.2 (page 325) and A.2.4 (page 335). rGIE781 Gieaerich, 8 Wilhelm. Counter-one-oass features in one pass comDilation: a forma Iisat ion using attribute grammars. Inf oroc lett 7:6, 279-. — see section A.2.4 (page 336). 408. rGLA77' 1 Glanvill* R.S. A machine independant algorithm for code generation and its use in retarqetable compilers. U Calif at Berkeley UC8-CS-78-01. — see sections 3.2.2 (oaqe 70) 6.5 (pages 198 ?, 199) and A.5.4 (oaqe 340).

CGLA77] Glanvill R.S. A new method for Comoiler Code Generation. U.Calif at Berkley: see also CGRA781. — see sections 3.2.2 (oage 70) 6.5 (pages 198 3 199) and A.3.4 (oaqe 340). fG0G783 Goquen. On the automatic translation of intermediate-lanquaqe orograms to machine language. U. Toronto. — see section A.3.5 (page .543).

CGRA731 Graham S., Glanville R.S. A new method for Compiler Code Generation. Proc 5th ACM symp on orinciples of proq lanqs 231-240. — is a review of CGLA77T. -- see sections 3.2.2 (page 70) 6.5 (Dages 198 $ 199) and A.3.4 (page 340).

CGRA801 Graham S.L., Harrison M.A., ?, Ruzzo W.L. An improved context free recoqniser. T0PLAS 2:3 415-462. — see section 2.3.6 (oage 42).

CHAD781 Haddon, Waite. Experience with the universal intermediate language JANUS. SWPRE 8 601-616. — cr35231. — see also CCOLE^I. — see section 2.3.1 (oaqe 32). 409.

CHAL631 Ha Ipern M.I. Towards a general processor for programming lanquaqes. CACM 11:1 15-25. — see section 2.3.5 (paqe 40).

CHAR751 Harrison '.1. A class of reqister allocation algorithms. IBM research reoort RC-5342. — optimal colourinq methods. — see section 6.7.2.

CHAR771 Harrison W. A new strateqy for code qeneration - the oeneral ourpose ODtimisinq compiler. Proc 4th ACM symo on PrinciDles of Proq Lanqs 29-57. — see also rCAR77~!. — IL schema. — see section A.3 (paqe 338).

CH0P731 Hopwood G.L. Decompilation. University of California at Irvine, and DAI 78-11860. — Discusses possible classes of source and target, costs (human 8 machine), how completely it can be done, and what the translator would look like. — Implemented in LisD for Varian/620/i - decompiles 5000-line proqrams. — see section 8.5 (paqe 305).

CH0R661 Horwitz L.P., Karp R.M., Miller R.E., 8 Winoqrad S. Index Reqister Allocation. J ACM 13:1 43-61. — see section C.3 (paqe 354).

CING721 Inqalls. The execution time profile as a proqramming tools. In: Design 8 optimisation of compilers (Rustin, ed) Prentice Hall, 107-128. — see section 8.3.2 (page 300). 410.

CJ0H731 Johnsson R.K. A survey of Register Allocation. CMU (unnumbered technical report). — see sections 6.7.2 and C. rj()H/51 Johnsson R.K. An aoproach to qlobal register allocation. CMU. — ODtimal colourinq methods. — see section 6.7.2.

CJ0H771 Johnson S.C. A tour through the Dortable C compiler. Bell Teleohone Labs. — see also FSNY751. — s^e section A.3 (oage 338). rJ0H/7J Johnson. LIMT: a C nrogram checker. TR *65. — detects importabilities, dead variables, dead functions, dead code, use of variable before assignment, mismatch of function use ft definition, type checking. — see section 8.3.6 (oage 302).

CJ0H731 Johnson S.C. A Dortable compiler - theory and practice. Proc 5th ACM symD Drinc proq lanqs 97-104. — see also TSNY751. — see section A.3 (page 338).

CJ0Y77] Joyner, Carter, ft Brand. Using machine descriptions in proqram verificat ion. IBM RC 6922. — see section 8.4.1 (page 303).

CKAM771 Kameda T., & Abdel-Wahib H.M. Ootimal code generation for machines with different types of instructions. In: IFIP 77 (ed Gilchrist) 129-132. — see section A.3.6 (naqe 343). 411. rKAT781 Katz S. Program optimisation usinq invariants. IEEE Trans SE-4:5 373-339. -- see section 6.7.1 (page 237).

TKIB771 Kibler D.F., Neighbors J.M., R Standish T.A. Program manipulation via an efficient production system. SIGPLAN 12:3 163-173. — See also rSTA76,76'l. —see sections 4.4 (page 92) and 8.2 (oage 296).

TK11-1791 Kim J., R Tan C.J. Register assignment algorithms for oDtimising micro-code comoilers - Part 1. IBM Research rpport RC7639. -- see section 6.7.2 (pane 242).

CKINSH King J.C. Prooram reduction usinq symbolic execution. I3M Research report RJ3051. — see section 6.7.1 (oaae 237).

TKNU631 Knuth D. Semantics of context free languages. Math Syst Theory 2:127-145. — see section A.2.1 (page 326).

TK0R801 KorneruD P., Kristensen B.B., R Madsen O.L. Intemretat ion and code generation based on intermediate languages. SWPRE 10 635-658. — see section 2.3.2 (page 34). rK0S711 Koster C.H.A. Affix grammars. In: Alqol68 imolementation (ed Peck) 95-109. North Holland. — see section A.2.3 (page 333). 412.

TLAM811 Lamb D.A. Construction of a oeeohole ootimiser. SWP8E 11 639-647 8 CMU-CS-30-141. — see section 6.7.1 (page 237).

CLAV3Q1 Lavington S. Early British Computers. Manchester University Press: ISBN 0-7190-0803-4 (hard), 0-7190-0310-7 (paper) . — CR38386. — see section 1 (oage 14).

CLEA66T Leavensworth. Syntax macros and extended translation. CACM 9 790-793. — macros with alternative call structures, conditional expansion, and positional parameters. — see section 8.2.3.

FLEC781 Lecarme 0., Peyrolle-Thomas, ft M-C. Self-compiling compilers: an aooraisal of their implementation and portability. SWPftE 8:2, 149-170. — see section 2.3.4 (page 37).

CLES7Q] Lester C. A consideration of the extensibility of the 3CPL languaqe-and-compiler system - aopendix 2: "A description of BCPL". Essex U. Deot Como Sci. — see sections 3.2.1 (oaqe 63) and 8.6 (oaqe 305).

CLEV793 Leverett B., Cattell R.G.G., Hobbs S., Newcomer J., Reiner A., Schatz 3., R Wulf W. An overview of the production quality compiler- comoiler proiect. CMU-CS-TR-79-105. — see section A.1.2 (page 317). 413.

!"LEV81] Leverett S.W. Register allocation in optimising compilers. CMU-CS-81-103. — a ranking method. — see sections 6.7.2 (pane 242) A.1.3.1 (page 320) and A.1.2 (paae 318). .

CLOV761 Loveman D.B. Program improvement by source to source transformation. Proc 3rd ACM symp princ oroq lanqs 140-152. — see section 8.2.2. fLOV773 Loveman. Proqr.am improvement by source-to-source t ransformat i on. JACM 24:1 121-145. — optimisation meant by 'improvement'. — oartial model of compilation - code neneration soecified by transforms. — see section 8.2.2 (oaqe 293).

CLUC691 Lucas P., 8 Walk K. On the formal description of PL/1. In: Ann rev of autom Droq 6,3. — see TWAL691, TWEG721. — sap section 8.6 (paqe 305).

CMAH311 Maher Q., 3 Sleeman D.H. A data driven system for syntactic transformations. SIGPLAN 16:10 50-52. — spe section 4.4 (paqe 92).

CMCC801 McCaiq J.M. The design of a portable translator of systems proqramminq languages. Imperial College, CCD. — see section 2.3.5 (nage 40).

CMCK651 McKeeman W.M. Peeohole optimisation. CACM 3:7 443-444. — see section 6.7.1. 414. rMCK74^3 McKeeman W.M., ?, DeR»mer F.L. Feedback-free modularisation of comoilers. Lecture notes in Como Sci #7 77-88. Soringer Verlaq. — see section A.2.1 (oage 326). rMCK74|JJ McKeeman W.M. Programming lanauage design. Chanter 5c of 'Comoiler construction', editors Bauer ft Eickel, Sorinqei—Verlaq. — Conflict: usei—oriented input vs machine oriented output. — see section 2.3 (naqe 30). rMED31 1 Medina-Mora R., 8 Notkin D.S. ALOE users and imolementors guide. CMU-CS-81-145. — A Language Oriented Editor - confuqurable to a specified programming language. — see section 8.3.3 (paqe 301).

CMIL711 Miller P.L. Automatic creation of a code generator from a machine description. MIT MAC-TR-35. — see sections A.2 (oaqe 325) and A.2.2 (paqe 328).

CM0S711 Moses J. Algebraic simplification: a guide for the oemlexed. CACM 14:8 527-537. — see section 8.2.1 (oaqe 297).

CMOS761 Mosses. Comniler generation using denotational semantics. Soringei—Verlaq lecture notes in Comp Sci 45. — see section 2.3.6 (paqe 45).. ...

CNEW721 Newey M.C., Poole P.C., ft Waite W.M. Abstract machine modelling to Droduce portable machine software - a review and evaluation. SWPftE 2:2 107-136. — see section 2.3.1 (oage 32). 415.

CNEW751 Newcomer J.M. Machine indeoendant qeneration of optimal local code. CMU technical report (unnumbered) ft DAI 75-21781. — see section A.3 (paqe 338).

CN00711 Moonan R.E. Computer programming in a Dynamic Alqebra. Perdue U. — canonical form for a class of sortinn algorithms. — Aikin dynamic alqebra. — see CB0Y721. — see section 6.7.1. rN0079] Noonan R.E. The desion of relativly machine-ind^oendant code generators. College of William and Mary, Williamsburg VA ft DAI 79-13628. — CA22367. — uses the idea of CGGL from Doneoan. — CGGL found convenient for expressinq the comolex case analysis needed for good local IL code. — see CDON731. — see section A.3 (oaqe 338).

CPEI771 Peirce, ft Rowell. Transformation-directed compiling system. COMP J 20:2 109-115. — see section 3.2.1 (paqe 63).

CPER79T Perkins D.R., ft Sites R.L. Machine independent PASCAL code optimisation. SIGPLAN 14:8 201-207. — see section 6.7.1 (page 237).

CRICH691 Richards M. BCPL: a tool for compiler writing and systems programming. AFIPS SJCC 34, 557-567. — see section 2.3.2 (pane 33). 416.

CRICH711 Richards M. The portability of the E3CPL comoiler. SVJPftE 1:2, 135-146. — see section 2.3.2 (oaqe 33).

TRICH741 Richards M. OootstraDpinq the OCPL compiler usinq INTCODE. In "Machine oriented high level lanquaqes", Editors van der Poel and Maarsen, Pub I. N.Holland. — see section 2.3.2 (page 33).

TRICH/81 Richie D.M., Johnson S.C., Lesk M.E., ft Kerniqhan 3. VI. The C Droaramming language. Bell system tech J 57:6 1991-2019. — see section 2.5.4 (page 37).

CRICH801 Richards M., ft Wh i tby-Strevens C. 3CPL - the lanquaae and its compiler. Cambridge University Press: ISBN 0 521 21965 5. — see section 2.3.2. rRIP751 Ripken K. Generatina an intermediate-code generator in a comoiler writing system. In: International Comoutinq Symoosium (eds Gelenbe ft Potier) 121-127. North Holland. — see sections A.2 (page 325) and A.2.1 (paqe 326).

TR0S76T Rosin. A qraphical notation for describinq system imDlementation. Iowa state U OeDt of C.S. TR 76-8. — see section U (page 395). 417.

TRUD751 Rudmik. On the generation of optimising compilers. U Toronto, deot of Electronic Engineering ft Canadian theses on microfiche 335120. — a orooosal - apparently not implemented. — program reoresented as graphs of which control graoh is a sub-qraoh. — transforms between qraDhs describing optimising transforms for code motion. — intended more as a descriptional tool than as an implementation method. — see section 8.2.

TRUD791 Rudmik A., 8 Lee E.S. Compiler design for efficient code generation and Drogram optimisation. SIGPLAN 14:3 127-133. — s°e section 6.7.1 (page 237).

TRUT771 Rutter. Improving programs by source-to-source transformation. DAI 77-26774. — reviewed in SIGPLAN 13:1. — system to manipulate Liso proqrams. — reads the proqram and test-data for the program. — oroDoses improvements based on the orogram text and on execution statistics. — verifies correctness of the improvements (for the test data) and measures the improvement. — tynical imorovement of 10% to 50%. ~ see sections 8.2.2 (paqe 298) H.3.2 (page 300) and 8.4.1 (page 303).

CSAL751 Salvadori, Gordon, ft Caostick. Static profile of Cobol proqrams. SIGPLAN 10:3 20-33. — see section 8.3.2 (page 300).

CSAM75D Samet. Provinq the correctness of translations involving optimised code. Stanford U ft DAI 75-25601. — see section 8.4.1 (paqe 303). 418. rsCH743 Schneck, 8 Angel. A Fortran to Fortran ootimising compiler. Como J 16:4 322-330. — machine-independant ODtimi sations . — see section 8.2.2 (oaqe 298). rSCH74T Schneider, ft Winoqrad. Translation grammars for compilation and decompiI at ion. BIT 14:1 78-. — CR27450. — ssection 8.5 (pane 305).

TSET7Q1 Sethi R., 8 tJllman J.D. The generation of optimal code for arithmetic °xoressions. J A C M 17:4 715-728. — see section C.2 (Daae 347).

CSHA531 SHARE ad-hoc committee on Universal languaaes. The oroblem of Droqram communication with changing machines: a proposed solution. C ACM 1, 12-. ~ UNCOL. — see section 2.3.1 (page 32).

FSHA741 Shaw M. Reduction of compilation costs through language contract ion. CACM 17:5 245-. — CR27037. — Fitting the lanquaqe to the application. — see section 8.2.3 (oage 299). rsiT79'l Sites R.L, ft Perkins D.R. Universal P-code definition. U calif at San Dieqo CS-79/029 and NTIS P0292082. — see section A.3.7 (page 343). 419. rSIT791 Sites R.L. Machine independent reqister allocation. SIGPLAN 14:8 221-225. — register allocation Pefore instruction choice by rankinq the variables. — see sectipns 6.7.2 (oage 242) and A.3.7 (oaqe 343).

CSKLA681 Sklansky, Finkelstein, 3 Russell. A formalism for translator interactions. J ACM 15:2 165-. — see section U (page 395).

CSNY753 Snyder A. A portable comoiler for the language C. MIT MAC-TR-149. — see also rJ0H77,73"l. — see section A.3 (naqe 333). nSTA76"l Standish. An examnle of program improvement using source-to-source transformations. U Calif at Irvine, Dept of Inf 8 Como Sci. — see sections 4.4 (oaqe 92) and 3.2 (oage 296).

CSTA76H Standish T.A., Harriman D.C., Kibler F., 8 Neighbors J.M. The Irvine program transformation catalogue. U Calif at Irvine. — See also CKIB771, CSTA761. — see sections 4.4 (page 92) and 8.2 (nage 296).

CSTA761 Standish T.A., Kibler D.F., 3 Neighbors J.M. Improving and refining programs by program manipulation. Proc national ACM conf 509-516. — See also TKI8771, CSTA76'1. — see sections 4.4 (page 92) and 3.2 (page 296).

CSTE611 Steele T.B. A first version of UNCOL. Proc WJCC 19, 371-373. — see section 2.3.1 (oage 32). 420.

CSTE63] Steele T.3. UNCOL - the myth and the fact. Ann Rev Autom Proq 2 325-344. — see section 2.3.1 (paqe 52).

CSTR58] Stronq. The problem of program communication with changing machines: a proposed solution. CACM 1:8 12-18 ft 1 :9 9-15. — UNCOL. — see sect ion 2.3 .1 . rSTRA65I Strachey C. Towards a general ourpose macroDrocessor. COMP J 3 225-241. — see section 2.3.5 (Dane 39).

CTAN771 Tan C.J. Reqister assi nnment algorithms for ODtimisinq conpilers. IBM res°arch reoort RC 6481. — see sections 6.7.2 and C. rTAM821 Tanenbaum A.S., van Staver^n H., ft Stevenson J.IJ. Usinq peeohole optimisation on intermediate code. TOPLAS 4:1 21-36. — a common IL for PASCAL, C, FORTRAN. — attempts to be an UNCOL for a limited class of lanquanes. — oeephole optimisation of their IL, table driven: about 15% improvement. — see sections 2.3.1 and 6.7.1.

CTER78] Terman C.J. The SDecification of code qeneration algorithms. MIT Project MAC. — see section A.3.8 (paqe 344). rTRI781 Triance. A study of Cobol portability. Comp J 21:3 278-. -- Portability within ANS74 Cobol standard. — see section 8.3.6 (paqe 302). 421.

CWAI671 Waite W.M. A language independant macro processor. CACM 10, 433-440. — see section 2.3.5 (page 40). rWAI681 Waite W.M. The STAGE2 macro orocessor. U Colorado, Deot of Electrical Eng. — see section 2.3.5 (oage 40).

CWAI691 Waite W.M. Buildina a mobile programming system. Como J 13:1 28-31, and U Colorado Graduate School Computer Centre. — macro based. — see section 2.3.5 (page 40). rWAl701 Waite W.M. The mobile programming system STAGE2. CACM 12:507-510. — macro based. — see section 2.3.5.

CWAL691 Walk K., 8 al. Abstract syntax and interpretation of PL/1. IBM tech reoort TR 25.098, IBM labs Vienna. — see also CWEG721, TLUC69T. — see section 8.6 (paqe 305).

CWAS721 Wasilew. A compiler writing system with optimisation capabilities fpr cpmplex prder structures. DAI 72-32604 and N.W. Univ. — see section A.1.3.2 (page 322).

TWAT741 Watt D. LR(k)-oarsing of affix grammars. U Glasgow, Dent of Como Sci, TR-7. — see section A.2.3 (oage 333). 422. rWEB801 Weber M., ft Bernstein S.L. Global register allocation for non-equivalent register sets. SIGPLAN 15:2 74-31 . — see section 6.7.2 (page 242).

TWEG72T Wegner P. The Vienna definition languaqe. Como surv 4:1 5-63. — review oaper. — see rWAL691, TLUC691. — see section 3.6 (Dane 305). rWEG751 Weqbreit. Mechanical program analysis. CACM 13 523. — oroqram profiles. — see section 3.3.2 (oage 300).

TWEI731 Weingart S.W. An efficient and systematic method of compiler code generation. Yale U., DeDt Como Sci ft DAI 73-29501. — see section A.3 (oage 338).

CWHI731 White. JOSSLE: a language for SDecifying and structuring the semantic phase of compilers. DAI 74-14349, and U.Cal. at Santa Barbra. — procedural description of code generators. — open ended universal IL.

CWIC751 Wick J.D. Automatic Generation of Assemblers. Yale University, Dent of Comp Sci. — see section 6.9 (oage 273).

TWIL?n Wilcox T.R. Generating machine code for high-level programming lanquaqes. Cornell U., Deot Como Sci TR 71-103.4 ft DAI 72-9959. — oiecewise code qeneration. — see section A.3.5 (paqe 342). 423.

EWIL711 Will 7rs. Syntax-directed graph algorithm for input of equations to a Taylors-series system for soIv i nq 0.D.E.s. Comp J 19:4 344-347. — see section 8.8 (page 507).

CWIL761 Wilhelm, R, Rinken, K, Ciesinoer, J, Ganzinqer, H, Lahner, W, ft Nollman R. Design evaluation of the comoiler generating system MUG1 . Proc 2nd International Conf on Software Engineering 571-676. — s^e sections A.2 (paqe 525) and A.2.3 (oaqe 355).

TWUL711 WuIf W., Russell D.B., ft Haberman A.N. BLISS: a language for systems orogramminq. CACM 14:12 780-790. — see section 2.5.4 (oaqe 57).

CWUL751 WuIf WJohnsson R., Weinstock C., Hobbs S., ft Geschke C. The design of an oDtimisinq comoiler. American Elsevier. — see sections 2.5.4 (oaqe 37) A.1 (oage 316) and A.1.2 (paqe 317).

TWUL803 WuIf W.A. PQCC: A machine-relative compiler technology. CMU-CS-80-144. — exDert systems. — see sections 2.2.3 (oaqe 30) A.1.1 (paqe 316) A.1.2 (page 318) and A.1.3 (page 319). 424.

rY0U741 Young. The Coder: a program module for code generation in high-level language comoilers. U Illinois Deot ComD Sci UIUCDCS-R-74-666 3 DAI vol 35 D1731B. — procedural description of code generation 425. CROSS -REFERENCE TO THE BIBLIOGRAPHY

rfrWO — 1 (oan e 14) — LAV 80 LA.4 (page 92) - STA76' 2 .2 .3 (oaqe 50) — vjuLsn 5.3.2 (pane 144) — DEM77 7 .3 (page 50) :icK/4i? 6.5 (pane 198 ft199 ) -- GLA77 2 .3 .1 — C9N53 6.5 (paqe 198 ft199 ) — GLA77 2 .3 .1 — STR53 6.5 (pane 198 ft199 ) — GRA7 3 2 .5 .1 — TAN32 6.7.1 — FRA79 2 .3 .1 (oaqe 32) — C AMP73 6.7.1 — MCK65 2 .3 .-1 (oane 32) -- C0LE74 6.7.1 — N0071 ? .3 .1 (oane 32) — MAD73 6.7.1 — TAN32 2 .3 .1 (oane 32) — NEW72 6.7.1 (paqe 237) — B0Y72 2 .5 .1 (oaqe 52) -- SHA53 6.7.1 (nana 237) — CHE70 2 .3 .1 ''oane 32) — STE61 6.7.1 (pane 237) — DAV30 2 .3 .1 (o a a e 32) -- STE63 6.7.1 (oane 2 57) — KAT73 2 .3 .2 RICH3Q 6.7.1 (oaqe 237) — KIN31 2 .3 . 7 (oaqe 53) — RICH69 6.7.1 (paqe 237) -- LA 081 9 .3 .2 (nana 53) — RICH71 6.7.1 (oaqe 257) — PER79 I • 3. 2 (oan» 5 5) — RICH74 6.7.1 (pane 237) -- RUD79 ? .5 .2 (oane 34) — .OR 30 8.7.2 — ABR79 9 . 3. 4 (oane 5/) — LEC78 6.7.2 — BRE79 / 7 . 4 (oaqe 57) — RICH78 6.7.2 -- HAR75

2 . 3. 4 (oaqe 5 7) -- VJUL71 6.7.2 — JOH75 ? .3 .4 (oaqe 37) — OUL75 6.7.2 -- JOH75 2 .5 . 5 -- ..'AI/J 4.7.2 -- TAN77 7 . 5. 5 (oane 39) -- E3ROW67 6.7.2 (oane 242) — CHA80 7 . 3. 5 (oane 59) — 3R0W69 6.7.2 (oanp 242) — KIM79 ? .3 .5 (oane 59) -- STRA65 6.7.2 (oage 242) — LEV31 7 / . 5 (oaq° 40) — HAL63 6.7.2 (pane 242) — SIT79

2 .3 .5 (oaqe 40) -- nccon 6.7.2 (paqe 242) — VJE380 > .3 .5 (oaqe 40) — WAI67 6.9 (pane 273) — UIC75 p .3 .5 (oaqe 40) — W A16 8 3.1 (naqe 295) — DAR81 2 .3 .5 (oaqe 40) — OA169 8.2 - - RUD75 7 .6 (oaoe 42) — DER69 3.2 (pane 296) — 30Y72 2 .3 .6 (paqe 42) -- EAR70 8.2 (paqe 296) — KI377 2 . /j .6 (oane 42) — F0S68 3.2 (pane 296) — STA76 2 .3 .6 (oaqe 42) — GRA80 3.2 (paqe 296) — STA76" 7 .5 .6 ''oane 43) -- BR0042 3.2 (pane 296) — STA761 9 .3 .4 (oaqe 43) — BR0063 3.2.1 (paqe 297) — BIR77 2 .3 .6 (oaqe 43) — BR0067 3.2.1 (oaqe 297) — M0S71 2 .3 .6 (nana 44) -- FEL64 3.2.2 — L0V76 2 .3 .6 (oane 45) — BAR 78 3.2.2 (paqe 293) — AUS78 2 .3 .6 (oaqe 45) -- M0S76 8.2.2 (pane 293) — BAG70 5 .2 .1 (oane 65) -- EVA 73 8.2.2 (pane 298) — BIR77' 3 .2 .1 (oaqp 63) -- LES 70 3.2.2 (pane 293) — 3IR79 3 .2 .1 (Daqe 63) — PEI77 8.2.2 (pane 298) — DON 79 5 .2 .2 (oaqe 70) — GLA77 8.2.2 (oaqe 298) — LOV77 3 .2 .2 (oaae 70) — GLA77' 8.2.2 (paqe 298) — RUT77 3 .2 .2 (paqe 70) — GRA78 8.2.2 (paqe 298) — SCH74 4 .4 (pane 91) — F0R81 8.2.3 -- LEA66 4 .4 (oaqe 92) — K1677 8.2.5 (oane 299) — BEN64 4 .4 (oaqe 92) — MAH81 3.2.3 (oage 299) — 3IT74 4 .4 (n a q ° 92) — STA76 8.2.3 (paqp 299) — BIT75 4 .4 (paqe 92) — STA76" 8.2.3 (oaqe 299) — CAM73 426

Q.2 . 8 (naae 299) •— SHA 74 A.2.1 (paae 326) - ;1CK74cL 8 . . 1 (paqe 5' i0) •— C L17 8 A.2.1 (oaqe 326) - RIP75 . 5.1 (oaq° 500) •— CON70 A.2.2 (nane 323) - GAN76 3 .3. 2 (oaqe 500) •— -3AL79 A.2.2 (naqe 323) - MIL71 3 .5. 2 (Daqe 500) -— CHE 78 A.2.3 (oaqe 533) - K0S71 3 .3. 2 (oaqe 300) --- ELS 78 A.2.3 (page 333) - WAT74 8 .3. 2 (oaqe 300) •— ING72 A.2.3 (paqe 555) - GAN76 3 3 .5. (oaqe 500) -— RUT77 A.2.3 (page 335) - WIL76 3 .3. 2 (oaqe 300) --- SAL75 A.2.4 (page 335) - GAN77 -3.3 . 2 (oaqe 300) -— WEG75 A.2.4 (oaqe 336) - GIE78 6 .3. 5 (naae 301) -— "IED3I A.3 (page 333) - CAR77 8 .5. 6 (pane 302) -— 3R071 A.3 (paqe 353) - CAT77 3 .3. 6 (oaqe 302) -— JOH77 A.3 (Daqe 538) - OON73 .3. 6 (oaqe 302) -— TRI 78 A.3 (Dane 338) - D0N79 .4. 1 (oaqe 50 5) -- J0Y77 A, 5 (pane 5 58) - ELS70 .4. 1 (Daqe 303) -- RUT 77 A, 5 (oaqe 333) - HAR77 -> .4. 1 (oaqe 30 5) -— SAM 75 A.3 (oaqe 3 58) J0H77 3 .4. 3 — OAR7 7 A.3 (oaqe 3 38) - J0H78 3 .4. 5 (oaae 304) -— 0 A R 7 8 A.3 (oaqe 3 58) - NEW75 3.5 (oaqe 505) — HOP 7 3 A.3 (oaqe 338) N0079 3.5 (oaqp 305) — SCH74 A.3 (Diqp 533) - SNY75 3.6 (naqe 503) LES70 A.3 (naqe 5 38) - WE 173 3.6 (paqe 305) LUC69 A.3.1 (naqe 353) • AHO76 3.6 (paqe 5)5) WAL69 A.5.1 (oaqo 5 59) • FRA71 3.6 (paqe 305) NEG/2 A.5.2 (oaoe 5 59) • FRA79' 3.3 (oaqe 30 7) 01l 71 A.3.5 (oaqe 3 59) - FRA77 A. 1 (naqe 316) — .'.JUL 75 A.3.4 (naqe 340) • GLA77 A.1.1 (oano 516) • WUL80 A. 5.4 (oaqe 540) • GLA77' A.I .1 (pan* 317) - • FRA/7 A.5.4 (nan- 540) • GRA78 A. 1.2 (pane 317) - • LEV79 A.3.4 (nan* 341) • AHO 75 A . 1 .2 (one 31 7) - • WUL75 A. 3 .5 (oaqe 342) • WIL/1 A. 1.2 (paae 313) - • CAT78 A. 3. 5 (oaqe 34 5) • G0G78 A. 1.2 (Dane 313) - LEV31 A.3.6 (oaqe 345) • KAM77 A . 1 .2 (naqe 318) - WIJL30 A. 5.7 (pane 343) • SIT79 A . 1 2 (page 319) - 0R081 A.3.7 (paqe 543) • SIT79' a. 1 3 (oaqe 319) - WUL80 A.5.8 (oaqe 544) TER78 a. 1 5.1 — AL3R/9 C — JOH75 A . 1 3.1 -- 13RE79 C — TAN 7 7 a. 1 5.1 (paqe 320) — LEV31 C.2 ~ AHO/O a.1.8.1 (oaqe 321) — CHA80 C.2 — FLA79 A. 1.5.2 — CAT79 C.2 (pane 347) - - AH076 A. 1 .5.2 — CAT3D C.2 (page 347) - - AH077 A. 1.5.2 (Daqe 322) — CAT73 C.2 (page 347) - • AH077' A. 1.3.2 (Diqe 322) -- WAS72 C.2 (oaqe 347) - • SET70 A.2 (paqe 325) GAM76 C.2.1 (naqe 343) - FL061 A.2 (paqe 325) — GAN/7 C.5 (oaqe 354) ~ • H0R66 A.2 (paqe 325) — MIL71 IJ (Daqe 395) — [3RA61 A.2 (oaqe 325) — RIP75 U (naqe 395) — EARL70 A.2 (paqe 525) WIL76 U (nape 395) — R0S76 A.2.1 (paqe 326) - • ah071 U (nage 395) — SKLA63 A.2.1 (paqe 326) - • DER73 A.2.1 (paqe 326) - • KNU63