Automatically Achieving Elasticity in the Implementation of Programming Languages

Michael Lee Horowitz

January 1988 CMU-CS-88- I04

Submitted to CarnegieMellon Universityin partial fulf'dlment of the requirements for the degree of Doctor of Philosophy in Computer Science.

Department of Computer Science Carnegie Mellon University , PA 15213

Copyright © 1988 Michael Lee Horowitz. All fights reserved.

This research was sponsored by the Defense Advanced Research Projects Agency (DOD), ARPA Order No. 4976 under contract F33615-87--1499 and monitored by the: Avionics Laboratory Air Force Wright Aeronautical Laboratories Aeronautical Systems Division (AFSC) Wright-Patterson AFB, OHIO 45433-6543 The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the US Government.

Abstract

Several developments in programming language design and software engineering lead to a re- examination of the role of binding times in programming language implementation. We can alleviate problems associated with these developments by using event-driven processing to determine translation binding times. Our ultimate goal is to clarify the effects of different designs on binding times and to reduce binding time concerns in the language design process.

Determining the binding time of a translation action involves two concerns: the ordering of the action's execution relative to other actions and the translation phase in which the execution occurs (e.g. compile- time, link-time, or run-time). Some issues that affect these concerns include: how to handle forward references, language designs that require dynamic interpretation, the role of link-time in languages that allow separate compilation, and how to achieve elasticity.

Elasticity occurs when similar translation actions may be executed at very different binding times, typically in different translation phases. In particular, the binding time of each translation action is determined individually. Elastic translation, therefore, incurs the costs of late bindings only when language flexibility is used by the programmer. Otherwise, the programmer receives the benefits associated with static binding times (e.g. early error notification and efficient program execution).

We can model language translation as an event-driven process. Each semantic action needed to translate a program may be viewed as an event. Each event becomes enabled when its required data becomes available. Event-driven translation progresses by executing those events initially enabled and continuing with those enabled as a result. Since each action is handled individually, event-driven translation naturally determines the proper ordering and phase for executing events and achieves elasticity.

We can also generate event-driven translation systems automatically. Automatic generation removes from the language implementor the need to understand the event-driven translation model. In particular, event-driven systems can be generated from a known specification formalism (i.e. attribute grammars). An automatic generator of event-driven translators can also provide useful information about binding times during the language design process. Finally, event-driven systems may have potential in practical language implementations.

This thesis describes translation problems, the event-driven translation model, a generator of event-driven translators, a language design based on elasticity, and an evaluation of the model for practical language translation. Elastic translation has particular promise in the design and implementation of programming environments and command languages. Event-driven translation may also provide the means to quantify well-known design "principles" such as avoiding pre-emptive decisions.

Acknowledgments

If once a man indulges himself in murder, very soon he comes to think little of robbing; and from robbing he comes next to drinking and Sabbath-breaking, and from that to incivility and procrastination. Thomas De Quincey Murder Considered as One of the Fine Arts (1827) I would like to extend my appreciation to the following people: - To my parents, who taught me the discipline necessary for being a student.

-To Dr. James Naismith, who created the game of basketball, to the Celtics, the Lakers, WTBS, and ESPN for keeping me sane, and to all the gym rats who provided the competition that kept me fit.

- To everyone at the department, who have made, and continue to make, CMU one of the best places in the world, socially as well as technically.

-To my colleagues at Formative Technologies, who have expressed confidence in my abilities, particularly the one to finish.

- To Mary Shaw, who taught me never to be satisfied with less than my best.

- To all of the friends I have made at CS, who never let on that I was procrastinating badly.

- To Sharon Burks, who smoothed all administration bumps.

-To Richard Cohn, who provided many valuable comments and actually read the thesis in doing so.

- To Peter Hibbard, who taught me how to express my ideas better, and to Cynthia Hibbard, who helped me express my ideas better on paper.

- To Mario Barbacci, Chuck Eastman, and Maurice Herlihy, my committee, for their support, direction, and valuable advice.

- To Roger Dannenberg, my advisor, who never lost hope and helped me polish my concepts.

-To my son, Elijah, who has made procrastination ever so sweet and showed me the true meaning of life.

- To my wife, Laura Lupovitz, who has been my everything, my anchor in life, and my prime motivator to finish.

Table of Contents

1. Introduction 1 1.1. Motivation 2 1.1.1. Language design 2 1.1.2. Separate compilation 3 1.1.3. Relative ordering 3 1.1.4. Programming environments 4 1.1.5. Command languages 5 1.2. Goals 5 1.3. Previous Work 6 1.4. Overview 9 2. Event-driven Translation 11 2.1. Translation as Actions 11 2.2. Translation as Events 13 2.3. Binding Times 14 2.4. Examples 16 2.4.1. Relative ordering 17 2.4.2. Link-time as a significant binding time 19 2.4.3. Achieving elasticity 21 2.5. Generation of Event-driven Translators 24 3. Target Machine and Specification Language 29 3.1. Target Machine 30 3.1.1. Stack Machine 30 3.1.2. Auxiliary Models 32 3.2. Specification Language 34 3.2.1. Attribute Grammars 35 3.2.2. The Specification of Events 37 3.2.3. Specification Example 40 4. ELI Generation System 43 4.1. The Generator 43 4.2. The Intermediate Form 46 4.3. The Translation Engine 48 5. Language Design and Elasticity 55 5.1. Smalltalk 55 5.2. Design Options 59 5.3. Elastic-Smalltalk 64 5.3.1. Syntax of Elastic-Smalltalk 64 5.3.2. Semantics of Elastic-Smalltalk 67 5.4. Review of Specification 75 5.5. Review of Motivations 76 6. Evaluation and Practical Applications 79 6.1. Effectiveness of Model 80 6.2. Criteria of Practicality 81 6.2.1. Ease of specification 811 6.2.2. Ease of generation 82 6.2.3. Code generation 82 6.2.4. Practicality of generated translators 83 6.3. Implementing Modula-2 85 6.4. Other Applications 89 6.4.1. Determining Re-compilation 89 6.4.2. Syntax-directed Editing 92 7. Conclusions 95 7.1. Commonality of Binding Time Determination 95 7.2. Event-driven Translation 96 7.3. Generation of Event-driven Translators 97 7.4. Impact on Language Design 97 7.5. Future Directions 98 7.5.1. Generation of practical translators 99 7.5.2. Language design 100 7.5.3. Other applications 100 7.6. Concluding Thoughts 101 Appendix A. Specification Language Syntax 103 Appendix B. Operation Signatures 105 Appendix C. Elastic-Smalltalk Syntax 107 Appendix . Elastic-Smalltalk Examples 109 Appendix E. Elastic-Smalltalk Specification 113 Appendix F. Modula-2 Specification 139 List of Figures

Figure 2-1: Simple translation 12 Figure 2-2: Forward pointer type reference 18 Figure 2-3: Nested Algol68 blocks [Banatre 79] 18 Figure 2-4: Computing the block number of a closed clause 19 Figure 2-5: Consequence of hidden type representation 20 Figure 2-6: Evaluating typed constant expressions 20 Figure 2-7: Generating iterator loop exit test 22 Figure 2-8: Type checking an assignment statement 23 Figure 2-9: Array access bounds check 23 Figure 2-10: Message lookup when type declarations are allowed 24 Figure 2-11: Form of Classical Translator Generator 25 Figure 3-1: Stack Machine Operations Codes, Part I 31 Figure 3-2: Model Operations, Part 1: Atomic models 33 Figure 3-3: Example Attribute Grammar Production 36 Figure 3-4: Syntax of a Production 38 Figure 3-5: Computational Event Syntax 38 Figure 3-6: Production with Computational Events 40 Figure 3-7: Specification for Based Numbers 41 Figure 4-1: Example of event ordering 45 Figure 4-2: Contents of an Abstract Syntax Tree Node 47 Figure 4-3: The Abstract Syntax Tree component of an Intermediate Form 48 Figure 4-4: An Abstract Syntax Tree component during translation 49 Figure 4-5: Operation of a Phase 50 Figure 4-6: Operation of a Compilation Phase 50 Figure 4-7: Operation of Linking Phase 51 Figure 4-8: Production for Modula-2 WHILE Statement 52 Figure 5-1: The need for Abstract Superclasses 57 Figure 5-2: Class using an Abstract Superclass 58 Figure 5-3: Type Checking Possibilities 62 Figure 5-4: Message Lookup Possibilities 63 Figure 5-5: Interface for SortableList 66 Figure 5-6: Implementation for SortableList 67 Figure 5-7: Production for Method Declaration, Part 1 69 Figure 5-8: Production for RETURN statment 71 Figure 5-9: Assignment Expression Production 72 Figure 5-10: Production for Message Invocation 73 Figure 5-11: Production for Actual Argument 74 Figure 6-1: A Legal Forward Reference 85 Figure 6-2: Simple Identifier Reference 86 Figure 6-3: Forward References to Type Identifiers 86 Figure 6-4: Declaration Identifier Reference 87 Figure 6-5: Code Generation Example 88 Figure 6-6: Determining Re-compilation 90 Figure D-l: Interface for FinancialHistory 109 Figure D-2: Implementation for Financialltistory 110 Figure D-3: Interface for LinkList 111 Figure D-4: Implementation for LinkList 111 Chapter 1 Introduction

Several recent developments in programming language design and software engineering have led to a re-examination of the role of binding times in the implementation of programming languages. In this thesis, we describe these developments and show how associated problems may be alleviated by using event-driven processing to determine and control translation binding times. Our ultimate goal is to demonstrate the effects of different language design decisions on binding times and to reduce concerns over implementation binding times in the language design process.

By binding time, we mean the point during program translation at which a translation action is executed. Examples of translation actions include the type checking of expressions, allocation of storage to program variables, bounds checking for array accesses, declaration of names in identifier scopes, and evaluation of program constants. The binding times of the translation actions required for a particular program possess a partial ordering m for example, storage allocation cannot occur before the action that calculates a variable's size. In addition, binding times fall into different classes, determined by the earliest point during translation when each action may be executed. Pratt identifies several classes, including language definition, language implementation, program compilation, program linking, and program execution [Pratt 75]. In this thesis, we focus on the issues concerning the latter three.

As we will demonstrate, each translation action can be considered an event. The information required by an action represent preconditions for the corresponding event. An event-driven translation processor automatically determines the binding time of each individual semantic action needed to translate a program. Furthermore, the event-driven model treats all translation uniformly (including execution semantics), so much so that it is possible to generate event-driven translation systems from non-procedural specifications. We describe the event-driven model for translation more formally in Chapter 2. Subsequent chapters describe the ELI (Event-driven Language Implementation) system, our automatic generator of event-driven translators. I.I. Motivation

Many of the issues we encounter can be characterized by the problem of determining the binding time of each translation action. Tension may exist, for example, between static and dynamic binding times or between different classes of static binding times. 1 In addition, a correct language implementation depends on determining a correct execution order for all translation actions.

A common theme in these issues concerns the level of flexibility a language makes available to programmers. The dictionary defines flexibility as the capability for modification [Webster 70]. We can associate the level of flexibility to the number of choices a programmer has for each language feature. As an example, consider the iterator loop construct (often called the "FOR statement"). In many languages, this construct operates only on integer values. Furthermore, some languages restrict the step value either to the value one (1), to positive values, or to constant values. Shaw and Wulf refer to such restrictions as "pre-emptive decisions" by the language definition or implementation [Shaw 80].

The level of flexibility of a language construct often relies directly on the binding times of the actions needed to translate that construct. A pre-emptive decision such as allowing only integer values or always using a pre-defined step value of one reflects a binding time at language definition. Similarly, requiring a constant step value permits the selection of the loop exit test at compile-time. Early binding times, however, adversely affect flexibility only when they reduce the choices available to the programmer.

1.1.1. Language design

Dynamic binding times have been used frequently to provide flexibility in programming languages, at the cost of slower program execution. Two popular languages, Lisp and Smalltalk, both possess features requiring run-time interpretation. Most actual uses of these features, however, do not need the extra flexibility that necessitate late binding times. Several design modifications have been proposed and implemented to allow compilation (i.e. earlier binding times) of programs to achieve greater execution efficiency.

Lisp has been cited frequently as the canonical example of a language providing great flexibility because of late binding times [Weissman 67]. Dynamic scoping, together with complete knowledge of the language's basic structures, enables one to compose operations that manipulate all possible abstractions (e.g. pretty printing, structure copying). Because of this feature, however, the language processor cannot make any assumptions about the type of an expression involving a program variable. Thus, integer expressions induce overhead in both time and storage to permit run-time type checks. In one Lisp variant, Common Lisp, a programmer may construct "types" (e.g. structures or arrays) and "assert" that variables always take on values of a given type (e.g. (clcel.are (£nteger x) )). Using this information, a can generate code statically that is more efficient in both time and space [Steele 84].

1The terms static and dynamic refer to translation that occurs before and during program execution, respectively. 3

The second example involves the language SmaUtalk [Goldberg 83]. Unlike Lisp, Smalltalk supports user-defined types. A type, or class, is characterized by the operations, or messages, that may be applied to objects of that type. Flexibility results from the dynamic determination of an operation's implementation (called a method) for each invocation. Researchers have investigated how to determine many methods during a static analysis of a program by deducing or declaring the type of each expression [Suzuki 81, Borning 82]. Suzuki's work analyzes unchanged SmaUtalk programs, whereas Borning and Ingalls propose adding type declarations to the language. Implementations of their techniques can realize both better execution efficiency and the benefits of program correctness associated with static type-checking. The flexibility provided by dynamic message lookup, however, cannot be maintained with enforced static type-checking in a language permitting only single inheritance. We prove this assertion and apply the results of this thesis to a similar design change to Smalltalk in Chapter 5.

1.1.2. Separate compilation

The examples above concentrate on the difference between static and dynamic binding times. Languages that provide facilities for separate compilation raise similar issues concerning different classes of static binding times: compilation and linking. In addition to partitioning the name space, separate compilation allows the hiding of information. The simplest facilities hide only the implementations of procedures (e.g. Modula-2 [Wirth 82]). Others conceal the representations of user-defined types, either with syntactic restrictions (e.g. Ada's PR'rv&T_. section 2 [Ada 83]) or by requiring a particular storage model (e.g. CLU's object-oriented abstractions [Liskov 77]3).

Information that is hidden syntactically from the programmer is also hidden from the language compiler. Thus, translation actions that depend on hidden data, such as storage locations for global variables and linkages for inter-module routine calls, cannot be executed until link-time. Other actions that require the entire program (e.g. the use of flow analysis for optimization [Cooper 86]) must also wait until all modules become available. Section 2.4.2 presents several examples of language features that involve more interesting link-time translation.

1.1.3. Relative ordering

The implementation of a language processor depends not only on determining the binding time class of a translation action but also its binding time relative to the execution of other semantic actions within that class. As noted above, each translation action may require and produce semantic data. The binding time of an action, therefore, must occur after its required data becomes available. Furthermore, the execution of actions that depend on the data it computes must also wait.

2Ada isa registeredtrademarkoftheDepartmentofDefense.

3CLU implementations need not use reference semantics for all program data. Integers, for instance, may be allocated using stack storage semantics. The storage allocation for arrays of known dimensions may also be optimized to reside on the stack. The language uses reference semantics, however, to ease the translation of user-defined abstractions that hide representations [Liskov 77]. 4

For example, consider a language that provides a GOTO construct. In normal one-pass compiler construction, forward references to labels are handled by building chaining structures [Banatre 79]. In our terminology, a one-pass compiler is designed to execute each translation action as it is instantiated. When instantiated, however, a semantic action that translates a forward GOTO statement requires an unknown branch destination address. Chaining is normally the means by which the binding times of semantic actions are delayed until these addresses become known.

Each language feature that involves forward references (e.g. recursive definitions of types and procedures) could require different chaining structures. A better approach is to organize the language processor so that delayed semantic actions are handled uniformly [Banatre 79]. A large part of this thesis is devoted to demonstrating that a single paradigm for implementing languages, event-driven translation, deals with binding time problems uniformly, correctly, and efficiently.

All of the issues presented above deal with determining the binding times of translation actions. A theme touched upon in Section 1.1.1 becomes more evident in the following examples. The emphasis shifts to allowing the execution of a class of translation actions (e.g. each array bounds check) over a range of binding times.

1.1.4. Programming environments

In his report on the Cedar programming environment, Teitelman discusses the need for flexible control over binding times [Teitelman 84]. The goals of the Cedar environment included support for multiple languages, specifically Mesa (a compiled language) and InterLisp (an interpreted language). Teitelman considers the support provided inadequate because too many bindings must occur early, even though Mesa includes facilities for delaying some type checking until run-time [Swinehart 85].

Goodwin emphasizes this point by arguing for dynamic type construction and type checking in programming environments [Goodwin 81]. He cites several examples of the need for dynamic typing, including the composition of total functions 4 (such as pretty printing in Lisp) and the manipulation of run-time values constructed by client programs (as in the debugger of a programming environment). Goodwin defines the concept of nominal type; nominal types are recognized and supported as types by the language processor. Some languages (e.g. Lisp) do not allow user-defined nominal types. Most compiled languages, on the other hand, support user abstractions as types, mostly to enable static type checking.

Several of the benefits associated with supporting user-defined nominal types (e.g. type safeness) do not depend on whether type construction and type checking are static. Goodwin cites the language EL15 as an excellent example of a language that allows types as first-class values in programs, noting that EL1 was designed initially to support an integrated programming environment. The language processor for EL1

4A total function is legal on all inputs that can be generated in a program.

5Do not confuse EL1 the language with ELI, the Event-driven Language Implementation system described later in this thesis. allows "compilation" (i.e. static binding), but executes only those translation actions for which required information is known [Wegbreit 74].

1.1.5. Command languages

Perhaps the most significant development leading to the study of binding times involves the realization that command languages for interactive applications (e.g. text editor, debugger, mail handler, operating system) are also programming languages [Mashey 76]. The characteristics desirable in normal programming languages generally apply to command languages as well. In particular, since the needs of individual users cannot be anticipated fully, the need for extensible features becomes evident [Dolotta 80].

One logical consequence is that the same language should be used for both implementation and control of interactive applications [Mashey 76, Stallman 81]. Benefits would include reduced complexity in a software environment (i.e. only one language to learn), availability of application utilities to users interesting in extending an application, and greater ease in combining applications, because the interface between implementation and control disappears [Heering 85].

Difficulties arise, however, in the design and implementation of such a language. Application programs often require efficient implementations that may entail early binding times during translation. Interactive command languages, on the other hand, should provide flexibility (such as dynamic typing; refer back to Section 1.1.4). 6 Heering and Klint describe many desirable characteristics of an integrated language. The language processor, for instance, should perform elastic type checking. That is, type checks are performed when the appropriate information becomes available, whether during static analysis or program execution [Heering 85]. Event-driven translation performs all semantic actions in an elastic fashion, and therefore constitutes a suitable model for implementing such a language.

1.2. Goals

This thesis addresses the construction of translation systems that extend the concept of elasticity to all aspects of translation. We assert that the most natural approach to achieving elastic control over binding times is to view translation as event-driven. In addition, it is possible to generate event-driven translation systems automatically from language specifications.

The goals of this thesis, therefore, may be enumerated as follows. First, we show how to perform language translation in an event-driven model. Since event-driven translation executes each semantic action when its required data becomes available, we can show that the model naturally determines correct binding times.

Second, we demonstrate how event-driven translation solves problems relating to the binding times of

6Flexibility may, but need not, require dynamic translation. semantic actions. In particular, event-driven translators should order the execution of actions correctly, determine the appropriate binding time class for each action, provide explicit control over binding times, and enable elastic binding times (i.e. the execution of similar translation actions at different binding times).

Third, we indicate how to build event-driven translation systems automatically. Automatic generation of translators further certifies the uniform applicability of the event-driven model to all aspects of translation. Also, a generator eliminates the need to understand how to implement the model -- an implementor need only understand how to partition a language's semantics into individual translation events. To achieve this goal, we develop a specification language for describing the semantics of programming languages that allows explicit control over binding times. We then construct a generation system that accepts specifications and produces event-driven translation systems.

Finally, we must illustrate the value of our techniques. To this end, we go through a sample language design and implementation. This exercise will demonstrate the validity of the event-driven model for translation and verify its value, especially with respect to the implementation of elastic binding times. As a side-effect, we can show how easy it is to build an event-driven translation system using the translator generator.

We also present the specification and implementation of a standard language (i.e. Modula-2) in the ELI system. In doing so, we hope to substantiate the value of event-driven translation by showing the potential for constructing practical translators that use the model. Specifically, we address the issues of translator speed, code generation, and efficiency of translations.

1.3. Previous Work

Previous work into determining binding times of translation actions falls into two categories: using a limited form of events and performing flow analysis on programs. Partial evaluation, which extends the concept of constant folding, is another research direction related to the desire for elastic binding times. Finally, recent proposals regarding programming language and system design express the need for elastic binding times.

Banatre, Routeau and Trilling describe the use of a limited form of events in an Algol68 compiler [Banatre 79]. They noted that the translation of certain language constructs sometimes require information supplied later in program text. (Examples include forward go-tos and the definition of recursive data types.) Often, the processing needed to resolve forward references does not justify a second complete pass over the program. A typical solution is to build temporary chaining structures to hold forward references. Then, as needed information becomes available or upon completing the first pass, these structures are traversed to resolve each reference (sometimes called "back-patching"). Unfortunately, different structures must be built specially for each class of forward references.

Banatre, et. al. propose that events be employed in those situations that would otherwise require a special chaining structure in a one-pass compiler. Thus, as required information becomes available, the action that 7 resolves the forward reference would be executed. Their method, however, suffers from several shortcomings. First, the technique must be applied by hand for the implementation of each language. Second, although the compiler writer need not determine a correct ordering for certain translation actions, he must still decide which actions to implement as events. A totally event-driven translation system removes this responsibility. Finally, the actual technique for specifying events suffers from non-locality an event waiting for a value gives no indication where (and therefore when) the required data will become defined. Thus, it is too easy to create an implementation that exhibits starvation (i.e. an event that is never executed) or deadlock (i.e. two or more events having circular dependencies).

Jones and Muchnick realized the importance of binding times in the trade-off between language flexibility and execution efficiency [Jones 76]: The choice of binding time discipline has major consequences for both the run-time efficiency of programs and the convenience of the language in expressing algorithms. Thus, FORTRAN with its extremely early binding times sacrifices user convenience to run-time efficiency, while SNOBOL, at the other end of the binding spectrum, is known for its ease of programming complex tasks, but certainly not for its execution speed. They propose designing a programming language that requires, in general, late binding times. It then becomes the responsibility of the language processor to determine the proper (i.e. earliest possible) binding times for the translation of each individual program and generating the most efficient implementation.

Their technique uses flow analysis to determine the correct earliest binding time during translation. Aided by information from the program text, flow analysis can decide what types a variable may assume, the appropriate storage class for a variable (e.g. stack-based, heap, or own), name binding (static vs. dynamic), and parameter access methods (e.g. by-reference or by-value) [Jones 76]. In their attempt to increase flexibility, however, they lose track of some of the benefits (e.g. type-safeness and internal documentation) associated with explicit declarations. Furthermore, flow analysis must be applied individually for each language design, not in an automatic fashion.

Another translation technique, partial evaluation, performs translation and program execution whenever required data becomes available [Haraldsson 78]. 7 A source program is translated directly into an equivalent program in a simple Turing-equivalent language (e.g. the ). By finding and evaluating function calls in the translation with known arguments, a "compiler" can reduce the complexity of a translation [Schooler 84]. Unfortunately, the method also permits the specialization of functions having a subset of known arguments (called beta-expansion), possibly leading to code explosion. There is also no interest in computing binding times during the translation into the base language, controlling binding times, or dealing with the problems raised by separate compilation.

Controlling binding times and separate compilation are also failings of research into automatic generation of translation systems. Semantics-directed , however, do perform some partial evaluation [Paulson 82, Appel 85]. Translators generated from attribute grammars compute a correct ordering for

7Ershov uses the term mixed computation, since each binding time class may contain a "mix" of translation and program execution [Ershov 77]. actions (using data flow analysis [Farrow 83]), but cannot deal with translation during other phases, particularly link-time.

Attribute grammars have also been used for the automatic construction of programming environments [Reps 84]. Kaiser studies the problems associated with determining the execution order for the run-time semantics of a program [Kaiser 86]. She notes that normal attribute grammars are suitable for characterizing the static properties of a language, since the attribute evaluator handles attribute interdependencies. Dynamic properties, however, depend on execution history. Kaiser therefore concludes that another mechanism, attribute equations, are more suitable for describing these properties. Correct ordering is determined by treating these equations as events. As is the case with Banatre's event mechanism, the language specifier must expressly distinguish between the static and dynamic properties of a language. As a result of this distinction, elastic binding times for translation activities does not seem to be possible. We hope to eliminate all distinctions among semantic actions and furnish elasticity by treating all of them in an event-driven manner.

Finally, several languages, both proposed and implemented, include features that could benefit from elastic control over translation binding times. Most, like Lisp or Smalltalk variants (see the discussion in Section 1.1.1), allow dynamic operations such as type checking or identifier scoping that often could be performed statically. Each language currently in use has its own specific implementation, whereas event- driven translation would handle all languages uniformly.

As mentioned above, the EL 1 language was specifically designed with elastic binding times in mind. For instance, EL1 allows data types as first-class values in programs. A specialized compiler has been written for EL1 that performs whatever translation actions it can, leaving the rest for execution. As more information becomes known, the compiler can generate more efficient translations [Wegbreit 74]. Event- driven translation performs similarly; thus it is our intention to support and implement language designs that require both static and dynamic translation of particular language features.

In Smalltalk, dynamic message lookup enables dynamic type checking and polymorphism. Since, in most cases, the programmer knows the expected class of an object value, some research has been invested in binding message lookup statically. Suzuki, for instance, uses flow analysis to determine the set of classes an object value may assume [Suzuki 81]. Borning and Ingalls augment flow analysis with user- provided declarations to check the suitability of message invocations [Borning 82]. Finally, Duff uses declarations to bind the actual message implementation statically [Duff 86]. Since we apply our techniques for elastic binding times to a re-design of SmaUtalk in Chapter 5, we defer discussion about these approaches until then. 1.4. Overview

Chapter 2 presents the model of event-driven translation. The model is developed by examining and formalizing what happens during translation. To treat translation as event-driven, we characterize what constitutes an "event". We then discuss the concept of the binding time of an event and define elasticity using the model. We also frame several translation problems from the point of view of binding times and show how event-driven translation can solve those problems. Finally, we examine what is needed to produce a practical tool (a translation system generator) based on event-driven translation.

In Chapter 3, we design two related components of the ELI generation system, the target machine and the specification language. The target machine consists of a simple stack machine augmented by auxiliary models known to be valuable for implementing translation. The specification language is based on attribute grammars, modified to allow explicit control over translation event granularity and binding time.

We continue the design of a generation tool in Chapter 4 by describing the generator itself, the translation engine controlled by generator output, and the intermediate forms on which the translation engine operates. The generator's responsibilities, in particular, include analyzing a language specification for event ordering interdependencies. The translation engine is the part that must perform in an event-driven manner, even after it takes ordering information into account. The intermediate form, of course, must be able to represent the semantic content of client programs.

Chapter 5 demonstrates the value of the thesis by designing a language specifically to take advantage of elasticity. Our language is a variant of Smalltalk-80 that allows the programmer to provide information that may be used to perform most message lookups statically. The specification and implementation of this language in the ELI system demonstrates the ease with which event-driven translation provides elasticity to language designers.

In Chapter 6, we evaluate the usefulness of event-driven translation in solving implementation problems, including elasticity, and as a model for building translation tools. As part of this evaluation, we address whether event-driven translation has potential as a practical method for language implementation. To quantify the model's potential, we describe an implementation of a known language (Modula-2) in the ELI generation system. We compare the performance of the resulting translation system with existing, hand- written translators.

Finally, Chapter 7 presents our conclusions and the possibilities for future research. In particular, we explore what it would lake to make event-driven translation practical as well as other translation areas which might benefit from an event-driven approach, such as syntax-directed editing and determining the need for re-compilation. 10 Chapter 2 Event-driven Translation

To show how event-driven translators solve translation problems and aid in programming language design, we need a more formal definition of event-driven translation. We start with a definition of translation in general, describing how to separate the translation of a program into individual semantic actions. Then, using this definition as a basis, we develop a model for event-driven translation by showing how to view actions as events.

Once we have a model, we can formally define the concepts of binding time and binding time classes. We also have the terminology to pose and solve language design and implementation problems from the point of view of binding times. Finally, we can characterize the tasks necessary for automatic generation of event-driven translators. A generation system may be used to demonstrate binding time solutions and examine the effects of various language design decisions on the binding times of translation actions.

The next two sections develop the concept of translation as events. The following section then defines the concept of binding time and the influences that affect the timing of event execution. Section 2.4 presents several examples and Section 2.5 discusses the generation of event-driven translation systems.

2.1. Translation as Actions

Language translation systems convert source programs into equivalent programs in some target language. (Presumably, an interpreter actually exists for the target language, allowing the program to be executed on a "machine".) Informally, a translation consists of a series of actions that convert the various parts of a program. That is, each instance of a language construct provokes the execution of some set of actions to accomplish its translation. For example, the declaration of a variable in Pascal (e.g. VAR i : INTEGER; ) may elicit the following semantic actions: 1. checking the legality of declaring the variable's identifier (i) in the current , 2. declaring the identifier (i) as a variable in the current scope, 3. associating a type (INTEGER) for later type-checking, and 4. assigning a storage location from the current store environment. Each action requires some information about the construct being translated (e.g. the string representing the variable's identifier) and its context (e.g. the current identifier scope), and computes new data that represents the translation of the particular instance (e.g. the assigned storage location). Clearly, each action cannot be executed until the data it requires becomes available. Therefore, one responsibility of the translation system, whether constructed by hand or generated automatically, involves determining which actions to perform for each program and in what order.

11 12

More formally, language translation in its simplest form is just a function from sentences in the source language to sentencesin atargetlanguage: TargetProgram = Translates.,T(SourceProgram) To be correct, it is necessa_ only that the translation produceidentical outputs on the target machine as would the SourceProgram (given the same inputs) on a machine that interpretssourcelanguage programs directly,s While generating a correct translation,the Translates._,T function may compute more datathan strictly required for the TargetProgram, such as type information and declaration associations for identifiers.

PrSourcograme Translate S->T PrTargogramet

Figure 2-1: Simple translation

As in Figure 2-1, one might view translationas a "black box". Although the Translates.,T function may appearmonolithic,it normally is composedof several smaller transformations: TargetProgram - fn(fn-l("""fl(SOurceProgram). . .)) For example, a translationfunction for Pascal mightconsist of one function to translatedeclarations,one forstatements,and another forexpressions.

Such a decomposition, however, separates translation into classes of semantic actions, not into the individual actions needed to translatea given source program. One might infer that a language processor should treatall actions withina given class the same, which would inhibitthe implementationof elasticity. For ourpurposes,we must define theTranslates.,T functionso that its decomposition is basedon the input SourceProgram. That is, Translates._.T(SOurceProgram)= (Decompose (Semanticss.,T, SourceProgram))(SourceProgram) Thus, Decompose representsa functorfrom language semantics expressed for some targetlanguage and source programs to a set of translationfunctions and a partial orderingon that set. The partial ordering constrainsthe legal executionorderof the translationfunctions.

The Decompose functor may separatea translationfunction into component functions of arbitrarysize. The implementor of a language's Semantiess._.T determines this size. These component functions, then, are considered as "basic". The invocations of these "basic" functions constitute individual translation actions.

SNore that this allows the Translates.>T function to be one-to-many. 13

By adjusting the level at which functions become basic, a language implementor can regulate the granularity of translation actions. Larger (smaller) functions tend to require and produce more (less) semantic data. For instance, combining actions (2) and (3) in the variable declaration example above results in an action that requires the variable's identifier, the current scope, and the declaration's type, produces a new scope (containing the declaration), and associates the type with the new declaration.

Because of the variation in the level of required and produced data, granularity can affect the ordering of translation action execution. When granularity is large, more data is required for some actions, which may delay the execution of those actions. If actions (2) and (3) were combined into a single action, any action waiting only for the new identifier scope computed in part (2) would have to wait until the declaration's type becomes available.

2.2. Translation as Events

Event-driven translation treats each action instantiated for the translation of a program as an event. Each event is analyzed to determine the preconditions to its execution. In the simplest case, an event's preconditions consist solely of the data required for its execution. We say that a translation event is enabled when all of its preconditions are satisfied (e.g. when all of its required data becomes available). To continue with the variable declaration example, the event that assigns a storage location (see action (4) above) becomes enabled only when the current store and the size of the associated type are known.

An event-driven translation system translates a program by executing those events, or translation actions, that are initially enabled and then those that become enabled as a result of previously executed events. Events are enabled initially either because they have no preconditions or because the data they require resides in the program text (e.g. identifiers and constants). The execution of an event usually defines data required by other events, which in turn become enabled. A primary influence on the ordering of event execution, therefore, arises implicitly from the data interdependencies between translation actions.

A general event-driven system would allow more interesting preconditions for events than just data dependencies. Greatest sophistication can be achieved by allowing general boolean expressions as preconditions. A boolean condition expresses constraints that must hold before the associated event becomes enabled. Constraints provide the means to delay the execution of an event explicitly, such as postponing code generation until link-time, or to force the early execution of an event, such as noting an undeclared variable in a Modula-2 program at the end of compilation. The specification language used to generate event-driven translation systems therefore provide appropriate predicates to articulate such constraints. 14

2.3. Binding Times

At this point, we have enough machinery to define the concepts associated with binding times of translation actions and elasticity. Viewing translation as event-driven leads to a natural definition for binding time: Definition: The binding time of a translation event is the earliest point during translation when the associated semantic action can be executed. The binding time must take into account the partial ordering that delimits the legal execution of actions. In event-driven translation, the binding time corresponds exactly to when the event becomes enabled m that is, that point in time when the event's constraints (explicit conditions and implicit data dependencies) become satisfied.

Note that this definition avoids any reference to the implementation of translation. Conceptually, all enabled translation events could be executed simultaneously, if a sufficient number of processors were available. (Banatre, et. al. also observed the potential concurrency that can arise by viewing translation as event-driven [Banatre 79].) The enabling conditions enforce some sequential execution of events.

In general, complete translation of a client program requires that all instantiated translation events be executed. For many programming languages, especially those that allow separate compilation or dynamic typing, complete translation of some programs cannot occur all at once. Given an initial set of events and data derived from program text, an event-driven translator will, in these cases, leave unexecuted events. Each remaining event will not be enabled, either because some required data is not available or because an explicit constraint has not been satisfied. Execution of these events must therefore wait for more data or later binding times.

In a separately compiled language, such as Modula-2, the events left after "compilation" represent standard "linking" information m index tables for imported modules containing references to imported global variables and routines. Such information becomes available during link-time, enabling completely static translation. Languages providing facilities for dynamic typing, such as ELI or Lisp, require run-time translation.

By examining when different kinds of information becomes known, we can classify binding times into phases: Definition: A binding time phase consists of the execution of as many remaining enabled translation events as possible given an initial amount of semantic data.

For most languages, the phases of interest include: language definition time (e.g. when language primitives are chosen); language implementation time (e.g. when the representations of language primitives are chosen); compile-time, where the initial data for enabling events comes from program text and (perhaps) other interface files; link-time, where the union of the computed data from several compilations constitutes the initial data; and execution (or run-time), where additional data may come from user input. Event-driven translation automatically determines the phase in which to execute a semantic action based on available information and the event's preconditions. 15

Within each phase, several factors constrain the order of event execution. Explicit constraints may dictate the exact binding time for an event. As noted above, however, data interdependencies between actions exert the primary influence. The execution of one event may detrme the remaining data required by another event, thereby enabling that event. Program text forms another source of ordering information. For instance, translators normally flag the second occurrence of a multiple declaration for an identifier. Finally, when no other constraints apply, an event-driven translator could use the order in which the language implementor specified the events.

Explicit binding time constraints sometimes may be described more easily by specifying a pass within a phase. For example, the translation of a forward pointer type declaration in Pascal (e.g. TYPE NaxtmPtr - "Name; ) requires that the identifier lookup of the forward reference (Name) occurs after all other declarations in the same scope. If a language implementor can guarantee that all declarations are translated after some number of "passes" over the translation actions, such identifier lookups can be allowed in the subsequent "pass".

Although a purely event-driven system performs no passes, the concept could be stipulated by restricting the operation of the event handling processor. We can define a pass in event-driven translation in several ways:

Definition 1: A pass consists of the execution of only those translation events enabled simultaneously during translation, that is, events with identical binding times. Given a sufficient number of processors, all events executed within a pass could be executed concurrently.

Using this definition, an event-driven language processor must determine all enabled events before commencing execution; otherwise, it would not be able to distinguish when an event became enabled. Each successive pass within a phase executes those events enabled by semantic data computed in the previous pass.

Definition 2: A pass consists of the execution of all enabled translation events when examined in some well-defined order.

This definition requires that the collection of instantiated events be ordered. For example, one ordering could be based on a top-down, left-to-right analysis of the input program's parse tree. The primary difference between the two definitions is that, in the second, the execution of some events may enable later events in the same pass. With either definition, a specification language for binding time constraints should allow explicit indication of which pass as well as which phase and relative ordering of events (i.e. execute this event after that event).

The last definition we need involves the concept of elasticity. Most language implementations perform all translation actions of the same class at the same binding time (more or less). Thus, all type checking in Modula-2 programs occurs at compile-time. In this case, the language was designed specifically to allow static type checking. For another example, consider the implementation of Smalltalk described by Goldberg and Robson [Goldberg 83]. All message lookups occur at run-time. As Suzuki shows, however, one may determine statically the legality of many message invocations in a Smalltalk program [Suzuki 81]. 16

Thus, we desire implementations that can perform similar translation actions at totally different binding times:

Definition: A language processor provides binding time elasticity if it can execute similar instances of a translation action at different binding times. In particular, the binding times need not all fall in the same phase.

Note that elasticity resembles the concepts of constant folding and partial evaluation (see Section 1.3). (The notions differ, however, in that partial evaluation allows the specialization of routine invocations having a subset of known arguments [Schooler 84].) Because event-driven translation considers each semantic action individually, event-driven processors provide elasticity naturally. In the next section, we use an event-driven translation model to solve various binding time implementation problems, including the need for elasticity.

2.4. Examples

We can now express various language design and implementation problems in terms of the binding time concepts developed above. Some of these problems do not appear at first to relate to translation binding times. Others have been viewed as binding time problems by other researchers, but have not been solved with the same generality. We demonstrate that, as a model for language implementation, event-driven translation solves these problems naturally.

The essential characteristic of event-driven translation is that each translation action is dealt with individually. Two important benefits arise as a result. First, event-driven translation automatically ascertains a correct ordering of translation actions, because no action is performed before its constraints are satisfied. Furthermore, event execution is never delayed beyond its earliest legal binding time (to within processor constraints -- that is, a "binding time" may span an interval of "execution time"). Second, since each event is enabled independently, different instantiations of the same translation action may have different binding times, allowing elasticity.

We partition the problems into three areas: • determining the relative ordering of event execution within a phase, • utilizing link-time as a significant binding time phase, and • achieving elasticity of binding times. Each set of problems raises questions concerning the design of programming language features. Although event-driven translation eases language implementation issues, design trade-offs between programming flexibility and the efficiency of translations remain. Note, however, that event-driven translation results in "inefficient" translations only for those instances of a feature that actually require dynamic execution. A generator of event-driven translation systems could supply a language designer with valuable information concerning the translation binding times for different instances of proposed features (e.g. some features may require link-time processing, thus delaying code generation, as can happen when an exported type's representation is hidden). After discussing each problem area in the sections below, we investigate how to construct such a generator. 17

2.4.1. Relative ordering

Most translation of programming languages may occur in one pass through client programs. One reason for this is that programming languages, in addition to acting as a formalism for specifying algorithms, are meant to be written and read by humans. Some translation actions, however, cannot be executed in the "first pass," either because they must wait for a later phase or they depend on information available from later references in the program text. The problem of determining the relative ordering of events within a phase results from the second issue.

As noted by Banatre, a typical solution in language implementations involves building temporary chaining structures to hold all forward references and, at the end of the first translation pass, traversing each structure and resolving the references (i.e. back-patching) [Banatre 79]. Each class of forward references may require its own unique structure. Banatre, et. al. recognized the usefulness of event-driven execution for implementing the translation of forward references uniformly. In their scheme, however, the language implementor still must decide which semantic actions to implement as events.

Even very innocent-looking syntax can generate difficulties. Consider, for example, a variable declaration clause in Pascal or Modula-2: VAR i, j : UserType All of the type information for the declared variables comes after the specification of their identifiers. Had the type come before, a translator could process each variable's declaration as it encountered each identifier. In this case, the forward reference occurs within a small, known distance, easily handled in a normal implementation. Such implementations have more difficulty handling potentially unlimited forward references.

Two common language features that exhibit non-local forward references include gott's and forward pointer type definitions. Normal language implementations translate forward gott's by using a chaining structure for each destination label, backpatching when the label becomes defined. Similarly, an event- driven translator executes the semantic action for each goto when the target label becomes defined. The difference lies in how "chaining" is accomplished. In event-driven translation, "chaining" occurs implicitly as unexecuted events.

Normal compilers, on the other hand, implement forward pointer types (such as TYPE qPtr = POINTER TO q; ) using a second pass. Figure 2-2 presents another solution using translation events. By including the requirement of waiting until the second pass (see the Constraint), we always use the definition of the forward reference declared in same block, if one exists. If we remove that requirement as follows: Constraint : - execute (i) when identifierbecomes declared in the identifier scope; otherwise, execute (2) at the end of the compilation phase the forward reference may use a visible definition from an enclosing block, even though another declaration exists within the same block. 18

Syntax: type-identifier ::= identifier

Requires : - identifier scope Computes : - associated type definition

Constraint : -execute (I) after the second pass, when identifierbecomes declared in the scope; otherwise, execute (2) at the end of the compilation phase

Semantic action: (I) retrieve the type definition from the declaration for identifier in in the identifier scope (2) report "undeclared identifier" error at the source position of identifier Figure 2-2: Forward pointer type reference

Banatre, el. al. describe a similar problem concerning the implementation of closed clauses in Algol68. A closed clause represents any enclosed construct that contains statements and declarations. In their compiler, a closed clause constitutes a block only if it actually includes declarations. Declared identifiers are disambiguated by associating with them the enclosing block's nesting level. As Figure 2-3 shows, this computation may be complicated since declarations can come after statements in a closed clause. An implementation cannot decide the nesting level for block (B) until it determines whether closed clause (A) is a block, which must wait until either a declaration, such as (C), or the corresponding end (D) is encountered [Banatre 79].

begin (A) statement; begin (B) statement; declaration; end declaration; (C) end (D) Figure 2-3: Nested Algol68 blocks [Banatre 79]

Figure 2-4 contains a translation event that computes the block nesting level for each closed clause. As it appears, this solution closely resembles that of Banatre, et. al. [Banatre 79]. Event-driven translation, however, handles all translation uniformly without requiring the language implementor to decide exactly which translation actions should be events.

Both the method used by Banatre and the event examples above suffer from non-locality of reference. That is, where might required information, such as the block nesting level of the enclosing clause, be defined? The specification language we present in Chapter 3 solves the locality of reference problem. 19

Syntax: closedclause ::= BEGIN body END

Requires: - block nesting level of enclosing clause - whether body contains a declaration Computes: - block nesting level of this clause

Constraint: none

Semantic action: if body contains a declaration, this clause's block nesting level is one more than the nesting level of the enclosing clause, otherwise it is the same.

Figure2-4: Computing theblocknumberofaclosedclause

2.4.2. Link-time as a significant binding time

Most translation in "compiled" languages occurs during compile-time, almost by definition. Rarely is link-time considered as a binding time for significant translation activity. Classically, only procedure and global variable linkages are performed during this phase. Several problems in the design and implementation of languages allowing separate compilation could be better handled if link-time were a non-trivial binding time.

Separate compilation enhances the software engineering objectives relating to modularity. Abstraction is aided by the ability to hide information. Another goal involves the potential for reducing the costs associated with program development. When changes are made to a program, only those compilation units that depend on those changes need to be re-compiled. Inter-module dependencies relate directly to the amount of information available in module interfaces. By hiding information, dependencies decrease, resulting in reduced propagation of changes. This potential can be achieved, however, only when the new cost of linking does not overwhelm the gains. Thus, hiding too much information may necessitate prohibitive amounts of post-compilation translation.

Languages also provide modularity to allow the separation of specification from implementation, facilitating system design and construction. In many separately compiled languages, however, only routine definitions actually achieve such separation. Each exported routine's signature appears in the interface while the module's implementation contains its realization. Therefore, changes restricted to a routine's code does not propagate outside of the module.

Other features could also benefit from the separation of specification and implementation. The principles of data abstraction imply that the representation of an abstract data type is irrelevant to the abstraction. Languages often insist that the representations of exported types reside in a module's interface so that the compiler may assign storage to variables of those types in other modules. (Recall that assigning storage to a variable requires a size, which in turn depends on the representation of the variable's type.) Hiding a type's representation delays the accessibility of its size until link-time. Figure 2-5 presents a translation 20 event that allocates a variable's storage whenever its associated type's size becomes available, whether at compile-time, link-time or during program execution. Syntax: declaration : := VAR identifier ' :' type

Requires : - storage environment - type's size Computes : - offset for variable's storage

Constraint : none

Semantic action: Allocate storage from the provided environment of the required size, receiving the offset of that storage.

Figure 2-5: Consequence of hidden type representation

Another example concerns the realization of orthogonality and completeness in language design. In languages that provide mechanisms for defining abstract data types, it should be possible to declare constants of user-defined types and to specify a constant value without knowledge of a type's representation. One method for specifying such constants would allow the use of the type's constructor functions. Since a constant may be declared in a module other than the one that exports its type, the calculation of its value may have to wait until link-time. A translation event for evaluating constructor functions is presented in Figure 2-6. Note that permitting the programmer to specify an exported constant's value in the module's implementation also furthers the principles of abstraction. Syntax: expression ::= expression'(' actuals ' ) "

Requires : - continuation of routine expressionand each value of actuals - all values for free variables in continuation Computes : - computes expressionvalue

Constraint : none

Semantic action: Execute continuation on actuals and free values.

Figure 2-6: Evaluating typed constant expressions

Other examples include: • Determining a correct module initialization order -- In languages that allow initialization code in each module (e.g. Modula-2 [Wirth 82] and Ada [Ada 83]), the ordering for module initialization is usually determined at compile-time using import dependencies. All imported modules should be initialized before the current module's initialization can proceed. Import dependencies may be circular, however; the order actually used may therefore be incorrect. By using flow analysis of initialization code at link-time, a correct ordering can be determined. • Allowing in-line expansion and instantiation of generic operations -- Most schemes for expanding procedures in-line and specializing generic operations require the presence of the actual code [Rosenberg 83]. In most separately compiled languages, the code resides only in module implementations (not the interfaces) and therefore remains unavailable until link-time. Code generation, however, is still usually desired at compile-time. 21

* Defining exceptions in separate modules E The definition of exceptions in separately compiled languages requires a global name space. This arises from the need to specify handlers in modules other than the one that specifies the exception. At compile-time, the only solution involves concatenating the module name to the exception name. Finding the appropriate handler for a raised exception would then entail inefficient string comparisons. Assigning unique integer indices, however, becomes possible at link-time, allowing faster searches during execution.

• Performing code optimization based on interprocedural flow analysis E Wall describes a technique for performing global register allocation at link-time [Wall 86]. Normally, this analysis is performed at compile-time. When performed at compile-time, however, procedures in different modules must be analyzed separately, thereby requiring pessimistic register saves for inter-module calls. Furthermore, global variables might be assigned to different registers in different modules. As Wall indicates, link-time analysis solves these problems.

All of these problems arise from the hiding of information inherent in the concept of separate compilation. The delaying of translation event execution until link-time results from dependencies on that hidden information. Because it analyzes such data dependencies, event-driven translation automatically determines not only the correct ordering for execution but also the appropriate binding time phase for each semantic action. Thus, link-time could become a significant binding time without any special consideration.

Note, however, that link-time should constitute a relatively short translation phase during software development, for obvious reasons. Determining correct module initialization order and evaluating constants might be acceptable, but in-line expansion and true separation of a type's specification from its representation could delay code generation, which would be objectionable. It becomes the language designer's responsibility to balance the ease of linking against the consistency of the language. In Chapter 5, we present the design of a language for which no such trade-off is necessary.

2.4.3. Achieving elasticity

When formulating most language constructs, language designers often have in mind a desired binding time phase for the translation of all instances of those constructs. For instance, all type checking of Modula-2 programs may occur statically during compilation [Wirth 82]. Similarly, message lookup in Smalltalk-80 can require run-time execution and most implementations do not attempt any static message lookups [Goldberg 83].

As we have seen, however, some claim that static typing is too restrictive [Goodwin 81] (see Section 1.1.4) while others show that some static message lookups are possible [Suzuki 81, Borning 82] (see Section 1.1.1). The solution, as Jones and Muchnick observe, is to design a language "which allows for extremely late binding times" and produce a language processor that implements "each program in the most efficient manner" [Jones 76]. In other words, the processor must provide binding time elasticity.

To oversimplify, late binding times facilitate flexibility while early binding times allow efficient program execution. Elasticity constitutes the means by which a language can provide flexibility without invoking any extra cost until the flexibility is actually used. As noted earlier, elasticity resembles the concepts of 22 constant folding and partial evaluation [Schooler 84]. Elasticity just extends the principle of executing an operation as soon as its arguments become available to the translation process.

By handling each translation action individually, event-driven translators implement elasticity automatically. We now show how to take advantage of elasticity for a few language features. Some examples allow more flexibility than normally provided in most languages, while others facilitate earlier bindings for inherently dynamic operations.

In the introduction, we described a binding time problem associated with the step value in an iterator loop construct (e.g. the FOR statement in Modula-2). Some languages restrict the step value to the value 1 (i.e. one) or to constant values, thus enabling static determination of the loop exit test (either greater than or less than the limit value) and the increment operation. 9

Certainly, a moderately intelligent compiler could detect when an expression represents a constant, and, if so, make use of its value and sign. Only for non-constant step values should the "inefficient" test and increment be generated. Figure 2-7 displays a semantic event that ascertains the appropriate exit test. Event-driven translation executes this action as soon as the step value becomes known. Syntax: iterator-step-value : := STEP expression

Requires : - value of expression Computes : - iterator loop exit test

Constraint : none

Semantic action: If expression value is zero, generate no test (i.e. loop forever) ; if positive, exit loop when current value is greater than the limit, otherwise exit loop when less than the limit.

Figure 2-7: Generating iterator loop exit test

Many programmers find static type checking too restrictive in some cases. Certain applications, such as programming environments, need to manipulate values whose types may not be known until execution [Goodwin 81]. On the other hand, the benefits of static type checking -- static error detection and less run-time overhead -- should not be ignored.

Goodwin correctly notes that "dynamic typing is an orthogonal dimension along which strong typing mechanisms may be improved" [Goodwin 81]. One strongly typed language that allows dynamic type checking is EL 1 [Wegbreit 74]. Figure 2-8 shows a simple semantic action that type checks an assignment in most conventional strongly typed languages. It suffices for ELI since it contains no notion of binding

9Some computersprovideaveryfastincrement-by-onoeperation. 23

time. Dynamic type checks occur only when the declared types of variables are not known statically. Flexibility is provided without sacrificing the benefits of efficient translations when the types are known. Syntax: assignment : :- l-value : = expression

Requires : - type of l-value - type of expression Computes : - whether a type conflict occurs

Constraint : none

Semantic action: Report an error if the two types do not match.

Figure 2-8: Type checking an assignment statement

Languages that provide a safe array construct verify that each index value is within the bounds of its array. Generally, such checks cannot occur statically unless something is known about the index, such as its actual value. Most programs, of course, use only legal values. Thus, most range checks just add to run-time overhead. Instead of performing these bounds checks statically and leaving any indeterminate ones for run-time, many compilers allow the programmer to disable the checks totally. (Use of this facility presumes that the programmer has enough knowledge and confidence that the checks are not necessary.)

Clearly, the safer solution is the former. Figure 2-9 contains one event that might be used to translate array accesses. The minimum/maximum value information may be difficult to derive statically from the program, but suffices to eliminate unnecessary run-time checks.l°

Syntax: array-access : : = expression ' [' expression ' ] '

Requires : - the lower and upper index bounds of the array expression - the minimum and maximum values of the index expression Computes : - whether the array access is legal

Constraint : - none

Semantic action- - If the minimum possible value is greater than or equal to the lower bound and the maximum value is less than or equal to the upper bound, then do nothing; otherwise, if the actual index value is not in bounds, report an out-of-bounds error.

Figure 2-9: Array access bounds check

1°Constantindex values are one easy case. Note that subrange types in Pascal allow the elimination of array bounds checks but force similar bounds checks at assignment statements [Jensen 74]. 24

As a final example, consider the translation of message lookup in SmaUtalk. Since each program variable may take on any object value, the current language design does not provide a way to determine statically whether any message invocation is legal (i.e. the message is defined by the receiver's actual class). By changing the language to allow type declarations for variables, some message lookups, if not all, can be performed during compilation. (Note that no flexibility is lost since the programmer could declare any variable to be of class Object.) Figure 2-10 describes a semantic event for message lookup in a language that allows type declarations; we devote much of Chapter 5 to such design modifications to SmaUtalk. Syntax: msg-invocation : :-- receiver '. ' message ' (' actuals ' ) '

Requires : - the computed class of the receiver - the message to be invoked Computes : - whether the message invocation is legal

Constraint : - none

Semantic action: - if the computed class does not define the message, then check if the actual class of the receiver defines the message (the latter requires run-time evaluation)

Figure 2-10: Message lookup when type declarations are allowed

2.5. Generation of Event-driven Translators

The examples above illustrate many different problems in language translation that event-driven translation appears to solve. In particular, event-driven translation presents language designers with the opportunity to allow flexibility without necessarily sacrificing the efficiency of program translations. At this point, a more concrete demonstration should convince the reader of the value of the event-driven translation model. Even better would be to show how to produce event-driven translation systems automatically. Before describing the design and implementation of the ELI generation system, we discuss the merits and responsibilities of translator generators.

The automatic generation of language translators from (hopefully) relatively simple specifications aids in the portability of translators and encourages new research in language design. Much of the responsibility for understanding and creating a language processor is removed from the language specifier. In our case, the need to understand how to build an event-driven translator is eliminated. Furthermore, automatic generation also certifies the uniform applicability of the event-driven model to all aspects of translation.

The construction of any translator generator requires the definition of several components: 1. the target machine, on which translated client programs execute; 2. the specification language, which is used to define programming languages; 3. the generator itself, which is responsible for building translation systems from specifications; 4. the translation engine, which controls the translation of programs; and 25

5. the intermediate form, which represents the program to be manipulated by the generated translation system. The generator"takes as input a language specification written in the specification language and produces the different stases of a translator:, lexor, parser, semantic analyzer, "code" generator, linker, and run-time support. Of these, the least understood, and therefore most researched, is the semantic analysis stage. This stage accepts a client program in a given intermediate form (constructed by the lexor/parser) and performs the actual translation of the program into one executable on the target machine. Figure 2-11 presents the form of a classical translator generator, indicating the five important elements.

I.J Trans,ator I[

J

\ /

Representations

\Intermecliat/e

Figure2-11: Form ofClassicalTranslatorGenerator

Before the specification of a programming language and the generation of its translation system can occur, the target machine On which translated programs will execute must be defined. I1 In general, some Turing-equivalent language is chosen, whether applicative (such as the lambda calculus) or operational (such as a stack machine language). The selection of a target machine often affects the case of expressing a language's semantics.

11Thi$istrueevenif anabstracttargmeacht ineisusedas,whenpo_ilitisyanissue. 26

The specification language provides the framework for expressing language semantics in terms of the target machine. The design of a specification language depends on several criteria. First, constructing a generator requires that the formalism be manipulable by a program. Second, since humans write them, specifications of programming languages should be easy to compose and understand. Finally, a formalism should be just that -- a formalism. The ability to deal with specifications mathematically aids in the study of programming languages and language semantics.

Chapter 3 examines more fully the roles of the target machine and specification language in translator generators in general and in the generation of event-driven translation systems in particular. We then define the target machine and specification language used by the ELI generation system.

The generator's basic responsibility consists of translating the specification of a programming language into the components of a translation system. Normally, a generator produces the program units (or the data structures used to control such units) to perform the tasks of: lexing and parsing program text into an intermediate form; performing semantic analysis on intermediate forms; generating code executable on the target machine; and providing run-time support for translated client programs.

To produce the various components, the generator analyzes specifications in order to bridge the differences between the model represented by the specification language and that implemented by the translation engine. For example, the ELI system, in constructing an event-driven translator, extracts individual events and deduces an ordering for executing semantic actions from information implicit in a specification.

The translation engine performs the various activities needed to translate client programs. The activity of greatest interest consists of the semantic analysis stage. This involves all translation after the parsing of source text and before actual code generation. For most languages, semantic analysis includes some static checking, such as determining type correctness and identifier scoping. Current research focuses on the design and construction of this stage of language translation [Paulson 82, Farrow 84] and the generation of useful object code from translations [Sethi 83, Appe185].

Logically, new program units are created from each language specification. As is commonly done when generating lexors and parsers, however, a single driver for semantic analysis could be built once and controlled by individualized data structures produced by the generator. Often, this program skeleton is referred to as a universal translator [Paulson 82]. The ELI system uses a universal translator for operating on intermediate forms supplied by a lexor/parser.

Intermediate forms contain the semantic data representing client programs between the various stages and phases of translation. As many different intermediate forms as phases could be used. Each form could be tailored specifically for each translation task. As we see later, however, the complexity of the translation engine could be reduced by using only one form. We make use of a single intermediate form in the ELI generation system.

We discuss more fully the responsibilities of the generator, translation engine, and intermediate form in 27

Chapter 4. In particular, we concentrate on the issues pertinent to generating event-driven translation systems. Then, we describe each component as implemented in the ELI system in more detail. 28 Chapter 3 Target Machine and Specification Language

Every translation system generator builds translators from language specifications. The result must convert source language programs into target machine programs. The specification language, therefore, must facilitate the expression of source program semantics in terms of target machine program fragments. In this chapter, we present the target machine and the specification language the ELI generation system accepts.

The target machine for a translator generation system defines the set of possible language semantics. The limitations of the target machine form an upper bound on the capabilities of translated programs. A specified language, of course, may restrict these capabilities further. Many semantics-directed compiler systems use the lambda calculus [Paulson 82] or other Turing-equivalent language (e.g. IBL [Schooler 84]) as the target machine. Another approach uses a simple model of a computer that explicitly allows memory state, such as a stack machine.

In both cases, the target machine may be augmented by auxiliary models that help express the semantics of programming languages of interest. That is, the target machine may include objects and operations to facilitate the specification and implementation of common language features (e.g. identifier scoping, storage allocation). The more complex the models, the greater the variety of language features that can be specified easily. Providing such models in the target machine also permits efficient implementations of structures commonly used in translation systems (e.g. symbol tables [Reiss 83]). The first section below describes the target machine used in the ELI translation system generator and the additional models provided.

As noted, the specification language allows one to express the semantics of a programming language in terms of the target machine. A specification must provide the means to relate each syntactic construct of a programming language to its translation. The specification represents the crucial component in a generator of event-driven translation systems. Its form directly affects the construction of the generator and, indirectly, the design of the intermediate representations and the operation of the translation engine. Clearly, the generator must accept specifications in this form. The generator then collects event ordering information from a specification to direct the translation engine. The translation engine receives this information from two sources: semantic data explicitly produced by the generator and ordering contained in intermediate representations. The form of both sources may be shaped by the specification language.

29 30

For the generation of event-driven translators, the specification language must satisfy several criteria. The language must allow the discrimination of individual translation events from a specification. In addition, to control the binding times of event execution, it should provide means to specify absolute and relative timings. A language specifier should be able, for instance, to indicate in which phase an event is to be executed (perhaps signifying a delay) or that one event should be attempted only after another has been executed.

Although many specification languages have been designed [Marcotty 76], the two most popular in recent translator generation systems use either denotational semantics [Tennent 76] or attribute grammars [Watt 83]. For our translation system generator, we chose the attribute grammar formalism as the basis for the specification language. Section 3.2 below describes the motivation behind this choice and presents the language defined for our generation system.

3.1. Target Machine

The target machine for translators generated by our system consists of a simple stack machine augmented by models appropriate for language translation. The stack machine model restricts the complexity of language features that may be specified "easily". For example, no explicit support exists for coroutines or inter-process communication. Even so, the choice of a stack machine does not impair the value of event-driven translation. If these (or other) capabilities were added to the target machine model, the translation techniques developed here would still be valid. 12

The first section below presents a formal model of the basic stack machine. In the second section, the subsidiary models we provide to facilitate the specification and implementation of translators are described.

3.1.1. Stack Machine

Our stack machine model consists of four entities: the stack, global memory, program memory, and a sequencer that executes a stack machine program on the stack and global memory. No restriction is placed on the number of locations in either the stack or the global memory. (Thus, the model is Turing- equivalent.) Each location may contain a single datum, which is limited in its information content. This corresponds to the situation in real computers where each memory word typically contains 32 bits of information. The primary model operation is RunProgram: RunProgram: stack-state x memory x program -> error u stack-state RunProgram accepts an initial stack state and a program and produces either an error or the stack state resulting from the execution of the program on the initial stack state and uninitialized memory. The resulting stack state contains the program's answers (the memory state is irrelevant).

A stack program consists of a list of labeled operation codes (opcodes) and constant values. A program

l:_Ion-nested control activations, such as coroutines, could be handled if the stack machine were augmented by a so-called "spaghetti-stack" discipline [Bobrow 73]. 31

Stack operations :

PushValue siz_ value -- Push value of given size onto top of stack. -- The value may be the label of a stack machine opcode.

PopStack size -- Pop size locations off the stack.

CopyValue size -- Copy value of given size on stack onto the stack

Global memory operations:

Fet chValue size -- Pop memory address off stack, then -- push value of given size from that address.

St oreValue size -- Pop memory address off stack, then -- pop value of given size, storing at that address.

Allocate -- Pop size off stack, -- allocate that number of locations in global memory, and -- push the first location's address onto the stack.

Deallocate -- Pop address and size off stack, and -- deallocate that many locations starting at that address.

InUse -- Pop address off stack; push 1 as top location on the stack -- if that address is in use, 0 otherwise.

Figure 3-h Stack Machine Operations Codes, Part 1 value is composed of a finite sequence of locations. Figure 3-1 summarizes the operations available for constructing stack programs. The opcodes can be partitioned into three categories: stack operations, global memory operations, and sequencing operations. Stack operations include pushing/popping values on/off the stack and copying the value on the top of the stack. Operations involving global memory include storing and fetching values, allocating and deallocation locations, and determining whether a location is in use. The stack machine executes a program's list of opcodes in sequence unless it encounters explicit branch instructions: go-to, conditional branch, or entry and exit. Miscellaneous opcodes include a no-op and fetching the address of a local (i.e. variable or parameter) using either static or dynamic subroutine nesting protocols.

The last opcode allows the calling of auxiliary model operations. As mentioned above, the auxiliary models exist to expedite the tasks associated with language translation. The reader may have noticed that there are no arithmetic or logical operations in the stack machine opcodes. For consistency, these operations are also implemented using auxiliary models. We enumerate these models and their operations next. 32

Sequencing operations:

Goto _bel -- Execute the opcode at given label next.

IfZero _bel -- Pop top location off stack; if zero, execute the opcode -- at the given label next, else continue.

Call -- Pop label off stack, transfer to the opcode at that label, -- and save the next label for subroutine return.

Enter parameter-s_e _cM-s_e s_t_-ne_mg -- Push activation record on stack reserving locations of given -- size for parameters (already on stack) and locals, and -- storing nesting level and saved return label in the record.

Return she -- Pop activation record and parameter/local locations -- off stack, pushing value of given size from top of stack -- to new top of stack. Continue at label from the record.

Miscellaneous operations:

Noop -- Do nothing.

GetLocal -- Pop top location as a flag, then the nesting level and the -- relative offset. Push address of specified local offset at -- specified nesting level, using static or dynamic nesting -- as indicated by the flag.

CallModel operal_n-index -- Call indicated auxiliary model operation.

Figure 3-1: Stack Machine Operations Codes, Part 2

3.1.2. Auxiliary Models

Auxiliary models extend the target machine so that the specification of common language constructs (e.g. identifier scoping, storage allocation) becomes easier and more natural. The values comprising these models fall into two categories: atomic values and aggregate values.

Atomic values are simple values -- the value is either totally defined or not defined. Models consisting of atomic values include Boolean, Integer, String, Name, Position, List, Stack, and ProgramLabcl. Most target machines include the first three as primitive types. The Name model provides a unique mapping of program identifiers, allowing efficient comparisons. The Position model enables one to relate instances of language constructs to positions in source f'des, which is useful for noting syntactic and semantic errors and warnings. The List and Stack models implement two specific, often-used queuing strategies. Finally, ProgramLabel represents the set of target machine program fragments.

On the other hand, aggregate values have disjoint parts that may be defined independently. We use the term "aggregate" because these models provide operations that define an individual part of a value without producing a new, different value. Events may depend on a strict subset of an aggregate value's 33

Model Domain Operations

Boolean inversion, conditional-and, Qondition-or, etc.

Integer addition, subtraction, comparison, bit-wise AND, etc.

String length, concatenation, truncation, comparison, etc.

Name FindNamaIndex : String -> Name

Position MakePosition: file-String x line-Integer x column-Integer -> Position NoteError: error-String x SourcePosition x sever_-Boolean ->

List NewList : -> List ListLength: List -> Integer AppendElt: List x eleme_-_pe -> List AppendLists: List x List -> List ListHead: List -> elemenl-_pe ListTail : List -> List

Stack NewStack: -> Stack StackDepth: Stack -> Integer PushElt: Stack x eleme_-_pe -> Stack TopElt : Stack -> eleme_-_pe PopElt : Stack -> Stack

For separate com_ilation/linkin_

Compile: mod_e-String x source-String x o_pu1-String x /_t-String -> Import: modu/e-String x e_ens_n-String x global-offset-Integer -> global-_pe Link: modu/e-String x extens_gn.String ->

Figure 3-2: Model Operations, Part 1: Atomic models components. In this way, aggregate values help localize event dependencies in generated, event-driven translation systems. In a language like Modula-2, for instance, all of the information about a type might be placed in an aggregate value. Then, without knowing the type's size, type checking of a variable reference could proceed, but storage allocation for the variable could not. Hoover and Teitelbaum discuss other uses of aggregate values in the context of syntax directed editors [Hoover 86].

The models consisting of aggregate values include Store, Mapping, Scope, ScopeS tack, and Aggregation. A language specifier may use Store values to keep track of storage allocation. The operation to allocate a storage location modifies the Store value without producing a new value. Mappings present the abstraction of associative memory, sometimes viewed as finite partial functions (i.e. functions defined on a finite subset of domain values). Scope and ScopeStack values help define the scope and extent of named entities (e.g. variables) and construct symbol tables. The modification operations for these models declare identifiers in Scope values, associating an entity with each declaration.

Note that one cannot determine when an identifier lookup falls because of the aggregate nature of scopes, unless a latest binding time for its declaration can be computed. For example, in Modula-2, variables need not be declared textually before their use. Thus, the event that performs the lookup of a variable reference must wait to report an error until after all declarations have been performed. The end of compilation, when all variables must be declared, comprises an excellent binding time for error reporting.

The last aggregate model is Aggregation. In specification terms, Aggregations implement Cartesian 34 products and discriminated unions of other domains [Watt 79]. Aggregation values resemble record structures in a language like Pascal or Modula-2. Individual fields may be defined and queried independently. In the Modula-2 example above, the semantic description of a type may contain its description (e.g. its base type if a pointer type, its index bounds if an array type) and its size. A particular type can be created and used without assigning to the field representing the type's size immediately.

Figure 3-2 defines the signatures for the operations for the various models. A few operations (e.g. Compile, Import, Link) are provided to enable the translation of languages allowing separate compilation. The stack machine executes model operations in three steps: it pops the parameters off of the stack, executes the operation, then pushes the results (if any) back onto the stack.

Model Domain Operations

Store NewStore : -> Store FindOffset: Store x s/ze-Integer -> Integer NextOffset : Store -> Integer FreeOffset: Store x offset-Integer x s/ze-Integer ->

Scope NewScope: -> Scope AddName: Scope x Name x eleraent.type -> FindName: Scope x Name -> found-Boolean x element-type

ScopeStack NewSaopeStack: -> ScopeStack EnterScope: ScopeStack x _-open-Boolean x m_/a/-Scope -> ScopeStack LeaveScope: ScopeStack -> ScopeStack x popped-Scope DeclareName: ScopeStack x Name x element-type -> undue-Boolean LookupName: ScopeStack x Name -> found-Boolean x element-type

Aggregate NewAggregate : aggregate-kind -> Aggregation IsKind: Aggregation x aggregate-kind -> Boolean HasProperty: Aggregation x aggregate-property -> Boolean GetProperty: Aggregation x aggregate-property -> property-type SetProperty: Aggregation x aggregate-property x property-type ->

Mapping NewMap: -> Mapping IsMapEmpty: Mapping -> Boolean IsKeyPresent: Mapping x key-type -> Boolean AssociateData: Mapping x key-type x data-type -> Mapping GetMappedData: Mapping x key-type -> data-type

Figure 3-2: Model Operations, Part 2: Aggregate models

The models described above form the target machine for generated translators. The construction of stack machine programs is embedded in language specifications. In the next section, we define the specification language and illustrate how the models are incorporated.

3.2. Specification Language

The specification language for this generator is based on attribute grammars. We chose the attribute grammar formalism for several reasons. Attribute grammars exhibit locality of reference. By "locality of reference" we mean that referenced values are defined within a limited scope in a specification (in this case, an attribute grammar production). Attribute grammars are also non-procedural in nature; that is, they satisfy the single-assignment property of data flow languages. Therefore, the evaluation order of expressions is not expressed explicitly. Furthermore, the determination of data flow interdependencies 35 between attribute computations has known solutions [Farrow 83]. These characteristics, as well as the familiarity of the formalism, aid language design.ers in constructing and understanding specifications.

As we shall see, attribute grammars also provide a natural means for delineating individual translation events. Each semantic function that defines the value of a required attribute in a production represents a single translation action. Before continuing with a description of our specification language, we present a definition of attribute grammars and a short history of their use in translation generation systems. The following section then discusses the needs of event-driven translation systems and defines the specification language used by the ELI generator.

3.2.1. Attribute Grammars

Attribute grammars were introduced by Knuth as an extension of context-free grammars for describing the context-sensitive properties of programming languages [Knuth 68, Knuth 71]. In an attribute grammar, context-free grammar productions are augmented by semantic rules. The semantic rules relate and check information from the context of a production with the information represented by the terminals and nonterminals comprising its definition.

More formally, two disjoint sets of attributes (inherited and synthesized) are associated with each terminal and nonterminal symbol in the grammar. Attributes are values representing semantic information about a program written in a specified language. The inherited attributes contain information from the context of the symbol, while the synthesized attributes supply the information characterized by the symbol's instantiation.

The semantic rules for each production define the values of required attributes in terms of the values of given attributes. In the following production, lhs ::= rhs 1 ... rhs n the inherited attributes of the defined symbol (lhs) and the synthesized attributes of the defining symbols (rhs i) are the given attributes. Similarly, the synthesized attributes of lhs and the inherited attributes of the rhs i form the required attributes. For each required attribute, the semantic rules define a function that computes its value using some subset of the given attribute values. This subset represents the dependency set for each semantic function. Using these sets, a local dependency graph expressing the dependencies between the attributes of a production can be derived.

Figure 3-3 exhibits one possible attribute grammar production for a Pascal-like variable declaration clause. We use the following syntax for each grammar symbol:

symbol<> where the ik are inherited attributes and the sk are synthesized attributes. If an attribute is given, an identifier must represent the attribute. When required, a semantic function denotes the attribute's value.

Note that the local dependency graph for a production may not express all attribute dependencies. A given attribute associated with a non-terminal may depend on the value of a required attribute associated 36

var-decl-clause<> ::= var-id-list<> ":" type-spec_cation<>

Attribute properties :

inherited, given in-scope synthesized, required out-scope inherited, required id-in-scope, var-type, type-in-scope synthesized, given id-out-scope , decl-type

Semantic functions :

out-scope = id-out-scope id-in-scope -- in-scope var-type = decl-type type-in-scope -" in-scope Figure 3-3: Example Attribute Grammar Production with the same non-terminal. For example, in Figure 3-3, the attribute id-out-scope (representing the scope after all new declarations have been added) probably requires that var-type be defined. These dependencies arise from other productions that either use or define this one. A global dependency graph can be built for a derivation tree by joining the local dependency graphs of each production used in the tree. Clearly, the sequence for evaluating required attributes must follow the partial ordering information expressed in the global dependency graph.

If a global dependency graph contains a cycle, however, it becomes impossible to define an evaluation order. 13 An attribute grammar is called noncircular if no derivation tree has a global dependency graph containing a cycle. Determining whether an attribute grammar is noncircular is decidable but expensive [Jazayeri 75a]. A slightly weaker property, absolute noncircularity, can be determined much more easily [Kennedy 76]. Since these algorithms are beyond the scope of this thesis, we henceforth assume the attribute grammar portion of language specifications to be noncircular.

Although many orderings may satisfy the constraints expressed in a global dependency graph, determining one for a given derivation tree may be expensive. Early research centered on describing the conditions under which attribute evaluation can occur within a fixed number of passes (pre-order, post- order, or alternating) over an attribute grammar's derivation trees [Bochmann 76, Jazayeri 75b]. Each pass searches for and executes semantic functions whose dependencies have been satisfied. Kennedy and Warren demonstrated how to construct more efficient search patterns by statically analyzing the local dependency graphs of attribute grammars [Kennedy 76]. Katayama eliminated searching altogether for absolutely noncircular attribute grammars by considering the transitive closure of all global dependency graphs [Katayama 84]. In addition, methods that search derivation trees may require prohibitive amounts of storage to store attribute values. Several researchers have therefore devised techniques to reduce the memory in use at one time [Jazayeri 81].

13WeUnot, quiteimpossiblea, sitturnsout. Farrowrecentlyformulateda setof constraintons attributegrammarswithcircularities sothatcomputableevaluationsexist(usingleastfixedpoints)[Farrow86]. 37

The capability of generating efficient evaluators has increased the popularity and practicality of using attribute grammars in compiler-generation systems [Farrow 84, Watt 83]. By building on this experience, we demonstrate not only the value but also the practicality of event-driven Ixanslation.

3.2.2. The Specification of Events

Having chosen attribute grammars as the foundation for the specification language, two significant design problems remain. First, the attribute grammar formalism does not dictate the language for constructing the semantic function expressions; we must therefore define one [Marcotty 76]. Second, it must be possible to delineate individual translation events from specifications. In several ways, these two tasks are interrelated. The semantic function language must allow the specification of individual events and the stipulation of absolute and relative timings for event execution.

Attribute grammars provide a natural framework for defining translation events. Each semantic function that defines the value of a required attribute in a production represents a single translation action. Its dependency set constitutes the preconditions for the execution of the action as an event. All of the semantic functions instantiated in a derivation tree compose the set of events for the translation of the corresponding program.

Without additional features, this form of the specification language does not provide much control over the granularity or timing of translation events. We therefore introduce the concept of computational events. Each computational event may compute some number of intermediate results. These results are stored in temporary attribute "locations" declared local to a production. 14 Thus, a language specifier may divide a semantic function into several actions. Events that define required attributes (called defining events) may then depend on both given and local attributes.

Computational events also present the opportunity to attach explicit binding time constraints. Constraints are just pre-conditions that must be satisfied before the associated event becomes enabled. A language specifier may indicate the pass or phase in which an event should be executed (perhaps forcing a delay) or that one event must follow the execution of another. We now crystallize these features by describing the form of a legal specification. (The reader is advised to refer to Appendix A.)

The language specifier first defines the attribute domains. These domains correspond to the models understood by the target machine. A domain is either one of the base domains (Boolean, Integer, String, Name, Position, Store, ProgramLabel) or built from domain constructors (List, Stack, Scope, Lambda expression, Map, Aggregation). These latter are domain constructors and not just domains because we wish to type-check specifications. Thus, for instance, a List value must be homogeneous (i.e. contain elements of only one domain).

The language designer must then declare the terminals and non-terminals of the grammar, indicating

14These locations are not variables since we wish to maintain the single-assignment property of the specification language. 38 which of the latter is the start symbol. Each declaration specifies the name of the symbol and the domains of its inherited and synthesized attributes. Terminals should have only synthesized attributes, and only if they represent program data (e.g. identifier names and constant values). Together with the productions, these declarations define the context-free syntax of the specified programming language.

The specification of the actual semantics of a language begins with the initialization of attribute values and semantic functions. The constants, types, variables, and routines a client programmer can assume, often called the standard prelude of a language, constitute a significant fraction of this data. The term "run-time library" could be applied to the part of the standard prelude that must be available during program execution. The remainder of the semantic data involves abstractions that aid in constructing translation actions, such as a lambda expression that checks if two types match.

-- For both the left-hand-side and right-hand-side symbols, the Mentifer-list -- are the given attributes, and the expression-listare the defining events -- (i.e. semantic functions) for the required attributes. Therefore, inherited -- attributes come before the up-arrow, synthesized attributes come after.

production::= lhs-symbol " : :,,," [ rhs-symbol ] * [ LOCALS [ attribute.declaration ] * ] "{ " [ computational-event ] * "} .... ; "

lhs-symbol : := symbol-identifer "<<- [ identifier-list ] "^" [ expression-list ] ">>"

rhs-symbol : := symbol-/dentifer [ "<<', [ expression-list ] ..... [ identifier-list ] ">>', ]

Figure 3-4: Syntax of a Production

Productions, in addition to expressing the language's syntax, contain the translation events that define its semantics. As noted above, these events may be either defining required attributes or computing intermediate results. Figure 3-4 shows the syntax of a production and Figure 3-6 displays the production for a designator expression from a specification for Modula-2.

computational-event : := label-identifer ,, :,, ONEOF [ action ] * ENDONEOF I label-identifer ": ,' action

action : := [ PRE expression [ WHERE [ statement ]* ] ] BEGIN [ statement ] * END

statement : := identifer-list ":=" expression "; " I IF expression THEN [ statement ] * elseif-clause ENDIF "; " I expression " (" [ expression-list ] ") .... ; ',

elseif-clause : := [ ELSEIF expression THEN [ statement ]* ]* [ ELSE [ statement ]* ] Figure 3-5: Computational Event Syntax

The language for writing semantic functions consists solely of lambda expressions and invocations, as in normal attribute grammars. For computational events, however, it must be possible to assign values to intermediate attributes and associate binding time constraints. The syntax of computational events, therefore, appears as in Figure 3-5. 39

Label identifiers may be used to order computational events explicitly. The specification language defines the predicate AFTER, which succeeds only if the labeled computational event has been executed.

Each computational event may consist of a set of possible actions, only one of which will be executed. This allows the specifier to create events that depend on different subsets of attributes. Reporting errors when required information is unavailable constitutes the most important use for multiple action events (see the event labeled Lookup in Figure 3-6). Each action's binding time constraint is expressed in an optional PRE-WHERE clause. If this clause is absent, the action may be executed as soon as the attributes it requires become defined, subject to ordering implicit in the derivation tree. Finally, since the language cannot guarantee the single assignment property, the language processor will have to check the execution of each assignment statement.

Appendix B contains the signatures of all non-primitive operations allowed in specification expressions. Other than model operations (described earlier with the target machine), they include binding time inquiries (PAS S, PHASE, AFTER) and target machine program construction operations.

The target machine operations do not involve a program object directly. The target machine program fragment contained by the current node of the derivation tree acts as an implicit parameter of each operation. The language processor appends the appropriate sequence of opcodes to the current program fragment. Most operations append corresponding stack machine instructions to the node's program fragment. For example, the PopValue operation concatenates the appropriate PopStack opcode.

Other operations (e.g. Execute and GetLabel) provide access to the program fragments of child nodes. The Execute operation appends child fragments to the current program fragment. It may also be used to construct loops by appending the current fragment to itself (see the statement Execute (0) ; in Figure 4-8). Similar operations allow the computations of program fragments to be used as attribute values in a specification. Finally, memory operations allow the specifier to control the use of the target machine's global memory (both static and heap) and stack (fetching and storing routine locals as well as global memory variables).

The language processor maintains the target machine program fragments at each node. In addition, the processor is responsible for generating program fragments representing translation semantics that depend on data computed by a program's execution.

Before we describe the language processor more fully and what constitutes the current target machine program fragment in the next chapter, we present a simple example in the ELI specification language below. 4O

designator<> ::= identifier<< ^ name>> LOCALS found: Boolean, sym: Symbol; ( Lookup : ONEOF Pa_ fo..a WHERE found, sym := LookupName(id-scope, name) ; BEGIN END

PRE PASS () = LASTPASS BEGIN NoteError ("Undeclared identifier", GetPosition (i), TRUE) ; END ENDONEOF

IdExpr: ONEOF PRE IsKind(sym, Constant) BEGIN IF lhs-expected THEN NoteError ("Variable expected", GetPosition (i), TRUE) ; ELSE PushValue (GetProperty (sym, Value) ; ENDIF; END

PRE IsKind (sym, Variable) BEGIN Local (GetProperty (sym, Offset), TRUE, GetProperty (sym, Nesting) ); IF NOT lh$-expected THEN Dereference (GetProperty (sym, Size) ); ENDIF; END

BEGIN NoteError ("Wrong class of identifier", GetPosition (I), TRUE) ; END ENDONEOF );

Figure 3-6: Production with Computational Events

3.2.3. Specification Example

Figure 3-7 contains the full specification of a simple language that recognizes based numbers. The syntax of a based number is a sequence of digits optionally followed by a base indication. The base indication consists of a marker (#) and the name of the base (i.e. TEN, EIGHT or TWO). Thus, the following are legal examples of based numbers: 123

177 # EIGHT Numbers are interpreted as decimal when no indication is given.

The only bases recognized are decimal, octal, and binary. An identifier scope maps a base indication to 41

DOMAINS BaseScope _ SCOPE OF Integer;

TERMINALS DIGIT<< ^ Integer>> IDENTIFIER<< ^ Name>> MARKER

NONTERMINALS based number<< ^ >> opt_base<< ^ Integer>> number<>

START based number

CONSTANTS DefaultBase = I0; Bases = NewScope (BaseScope) ;

INITIALLY AddName (Bases, FindNameIndex ("TEN"), i0) ; AddName (Bases, FindNameIndex("EIGHT") , 8) ; AddName (Bases, FindNameIndex("TWO"), 2) ;

GLOBALS Value : Integer;

PRODUCTIONS based number<< ^ >> ::= number<> optbase<< ^ base>> LOCALS value : Integer; ( SetValue : BEGIN Value := value; END );

opt_base<< ^ DefaultBase>>

();

opt_base<< ^ base>> ::= MARKER IDENTIFIER<< ^ name>> LOCALS base, found base: Integer; found: Boolean; ( CheckBase: BEGIN found, found base := FindName(Bases, name); IF found THEN base := found_base; ELSE NoteError ("Unknown base", GetPosition (2) , TRUE) ; base := DefaultBase; ENDIF; END );

number<> ::= n_nber<> DIGIT<< ^ digit>> ();

number<> ::= DIGIT<< ^ digit>> ();

Figure 3-7: Specification for Based Numbers 42 an appropriate base value (used in computing a number's value). In this specification, a semantic check is performed to ensure that a base indication is legal. For space reasons, we did not insert a semantic check to ensure that each digit is legal for the number's base. The semantic representation of the based number (held in Value) consists of the number's value. Chapter 4 ELI Generation System

The ELI generation system, like most generation systems, produces a translation system from a language specification. The components of our generation system (refer back to Figure 2-11) consist of the generator, the translation engine, and the target stack machine interpreter. Since the latter is straightforward, we omit its description from this thesis. Of more scientific interest are the first two and the means for communicating between translation phases (i.e. the intermediate form).

The ELI generator accepts specifications written in the language described in the previous chapter and produces two outputs. One output represents the specified language's semantics. It contains the data structures that control the operation of the translation engine and comprise the run-time support for client programs. In particular, these structures represent the events instantiated for the translation of a program unit. The second output is a context-free syntax specification suitable as input to the parser-generator yacc [Johnson 75]. (We assume that lexor specification and construction are performed separately.) The generated parser transforms program text into an equivalent intermediate form composed of program data (e.g. identifiers, constant values) and the instantiations of the translation events that must be executed.

The next section describes the operation of the generator in more detail. Most importantly, we discuss the influences on event ordering. The following section defines the intermediate form and how it represents events and additional ordering information. Finally, the last section presents the translation engine, including the means by which it constructs stack machine programs.

4.1. The Generator

Before producing a language's parser specification and semantics output, the generator must check that the input specification is well-formed and must analyze the translation actions to determine an efficient ordering for attempting event execution. The first task consists of checking the syntax of the specification and type checking all attribute expressions. In the course of performing this verification, the generator converts a language specification into the set of initialized attribute values (i.e. the language's standard prelude) and the productions defining the non-terminals of the language. Each production, in addition to characterizing the context-free syntax for the non-terminal, includes the events representing the semantics of that production in the order specified. 15 The responsibility of the second task, therefore, involves

15The order may be changed by the end of translator generation.

43 44 reordering the semantic functions and computational events so that no action is attempted before other actions on which it depends.

Reordering analysis must take several factors into account. The primary influence on event ordering involves the data interdependencies between them. In particular, each semantic function defining a required attribute uses some subset of a production's given and local attributes. Simple flow analysis can determine the relative ordering of computational events as well as which computational events to execute before each semantic function (because they provide intermediate, local values).

Ordering dependencies between given and required attributes, however, requires a relatively complex, global algorithm [Katayama 84]. The algorithm decides when and in what order the nodes of a derivation tree should be visited and their events attempted. Since we wish to allow explicit control over the pass in which an event is executed, the generator must maintain the preorder traversal of nodes. Thus, a simplified version of the algorithm is used to determine when to visit subnodes of a derivation subtree in the order defined by each production of an ELI language specification.

Binding time constraints exert another influence on event ordering. Because of the generality allowed in pre-conditions, the generator cannot do much to derive ordering information from them. Those predicates that involve relative ordering of a production's events or absolute pass or phase constraints, however, can be utilized.

The simplest influence on event ordering arises from program text. For example, in the case when an identifier is declared twice within the same scope, the second occurrence should be flagged as an error. Finally, unless no other criterion applies, the generator uses the order of event specification.

Figure 4-1 illustrates how the ELI generation system determines an ordering for attempting event execution within the context of a grammar production. As expected, data interdependencies play the major role. Also, catering to the mostly linear nature of programming languages, the generator maintains the order for visiting children nodes. (In the example, if the second child were a nonterminal instead of a terminal symbol, it would be visited just before attempting the declaration event.)

Once the ordering analysis is complete, the generator stores the language's semantics in a file. As indicated, the semantics consists of the "standard prelude" and the productions. The description of each production includes event representations and an order for attempting the events and visiting subnodes. This file controls the execution of the translation engine for all translation phases (e.g. compilation from program text to intermediate form, linking of several intermediate forms into another) and constitutes a resource for the stack machine interpreter (i.e. run-time support). Section 4.3 below describes how the translation engine uses this information.

The other output of the generator consists of the language's parser specification. In addition to checking the context-free syntax of client programs, the responsibilities of the constructed parser include creating the initial intermediate form on which the initial phase (usually called "compilation") of the translation engine operates. The next section describes the structure of each intermediate form as well as the content of each form before and after different translation phases. 45 enum_list<> : : = enum_list<> ', ' identifier<< ^ name>> LOCALS enum: Symbol; -- Local to this production! { EnumDeclare : BEGIN -- If declaration fails, note multiple declaration -- at identifier IF DeclareName(scope, name, enum) THEN NoteError ("Multiple declaration", GetPosition (3), Fatal) ; ENDIF; END

EnumCreate • BEGIN -- Create constant symbol for enumerated constant, -- associate its type and value. enum := NewObject(Symbol, Constant); SetProp (enum, ItsType, type) ; SetProp(enum, Value, next_v_ue); --0-based value END );

Events are initially ordered as specified: [ (line number) event ] (I) AppendElt (enums, enum) (1) ne_ value + 1 (2) scope {2) value (2) type (6) EnumDeclare (I4) EnumCreate

Orderinq computed by ELI generator for attempting events: (2) scope (2) value (2) type --- visit child 1 (i) ne_ value + 1 (14) EnumCreate (I) AppendElt (enums, enum) (6) EnumDeclare

Note that child 2 is not visited since it is a grammar terminal symbol; its semantic data is assigned during program parsing.

Figure 4-1: Example of event ordering 46

4.2. The Intermediate Form

An intermediate form comprises the data structure used to communicate between the phases of the generated translation system. Initially, the generated lexor/parser converts a source program into an equivalent intermediate form. Each execution of the translation engine then transforms intermediate forms by performing semantic actions.

For languages allowing separate compilation, the translation engine is also responsible for link-time semantic processing. Thus, it must be possible to use intermediate forms resulting from "compilation" as input to a linking phase. Similarly, the result of linking together a module suite (i.e. a subset of a program's modules) should also be acceptable to a subsequent linking phase. (See Figure 2-11 for examples of each case.)

The translation engine could produce different intermediate forms from the different phases, but the generated translation system would then have to specialize for each case. Although the information characterized evolves from syntax to pure semantics, a structure that can represent both simplifies the construction and operation of the translation engine. Thus, we have designed a single intermediate form for use by the ELI translation engine.

For event-driven translation systems, the intermediate form must represent not only the semantic data derived from the program text and computed by executed translation actions but also the set of unexecuted events. Furthermore, the structure should help keep track of event ordering, especially ordering dependent on the program text, so that the translation engine can determine when a pass is complete.

The basic intermediate form in the ELI system contains two objects. The first is global data representing the computed semantics of a program module. Linking phases use this data to allow each module access to the other modules' semantics. The second object consists of an abstract syntax tree representing both the program's syntax and the events instantiated to effect its translation.

Each node in the abstract syntax tree corresponds to a production in the language's attribute grammar. A node refers to its production's description in the language semantics file. Since this description includes the events themselves, the node need only keep a status vector indicating which events remain to be executed and local memory for attribute values. 16 Finally, each node includes a stack machine program fragment expressing the translation of the program section arising from the associated production. This fragment is used by various model operations invoked in specifications to construct larger stack machine programs; these operations are discussed in more detail below. Figure 4-2 shows an exploded view of an abstract syntax tree node.

The responsibility of the generated lexor/parser, then, is to create the initial abstract syntax tree from a module's program text. In addition to associating each node with the appropriate production's semantic

16Local memory allows the event descriptions to be re-entrant. 47

To Parent From (e_tabllehee Parent context_ Production P:

- Event P 1 )duction

IPI:°I""" ipff,,il Description - EventP N Event Status Vector

Attribute

Farget ---- Code Fragment .._..

Children _ -- Language --- Semantics

Figure 4-2: Contents of an Abstract Syntax Tree Node description, the constructionmust initialize the status vector and allocate the correct amount of local storage. Figure 4-3 exhibits an example of an initial tree.

When all of the instantiaw,d events at a node have been executed, its local attribute storage can be recovered. As soon as the same applies to all of its children, the node itself can be eliminated. Therefore, after a phase has been completed, a tnmcated abstract syntax tree represents unexecuted events. Nodes in this pruned tree have either a non-empty status vector or a descendant with unexecuted events. Once all translation actions have been executed, the abstract syntax tree disappears. The global data then contains the module's complete semantics. Figure 4-4 exhibits an abstract syntax tree and possible node contents at some point during translation.

Since an intermediate form must also support linking, it actually consists of a list of module translations characterized by the basic structure described above. After each phase, unexecuted events may depend on information only available from other unlinked modules. Typically, the offsets of imported global variables and routines comprise such information. As discussed in Chapter 2, other examples could include hiding type representations (events may depend on a type's size) and module initialization (events may depend on a routine's implementation). Events gain access to needed information from another module through the global data portion of its intermediate form.

The responsibility of each invocation of the translation engine involves attempting the execution of all remaining events in a given intermediate form. In addition, the translation engine must facilitate cross- references to each module's semantic data. The next section describes the operation of the translation engine in more detail. 48

Figure 4-3: The AbstractSyntaxTree componentof an IntermediateForm

4.3. The Translation Engine

The translation engine performs all semantic analysis in the translation of programs. It functions in a data-driven manner; that is, much like a table-driven parser, the engine executes a common driver controlled by a data structure. The data in this case consists of the language semantics output of the ELI generator.

Each execution of the translation engine on one or more intermediate forms constitutes a translation phase. During a phase, the translator conducts passes over the unexecuted events until either no more translation actions exist or no more events remain enabled. In the latter case, one last pass is made to 49

st_t-l_t

Figure 4-4: An Abstract Syntax Tree component during translation indicate the end of the current phase. (Events can take advantage of this feature by specifying the expression PASS () ffi LASTPAS$ as a binding time constrainL) Finally, ff events still exist, the translation engine can inform the user which events remain unexecuted and why.

Each pass executes a preorder traversai of the abstract syntax tree of every basic intermediate form in turn. On visiting a node, the translator examines the loc_ event status vector indicating which events have yet to be executed. These events are then attempted in the order stipulated by the description of the corresponding language production. Recall that this ordering also indicates when to visit a child node. The visit of a node concludes when all remaining children (i.e. child nodes not yet pruned) have been visited and no unexecuted events are enabled. Thus, a pass is complete after "visiting" the root node of each abstract syntax tree in the input intermediate forms.

Clearly, the goal of each phase is to reduce the number of unexecuted events and increase the data representing the translation semantics of the client program. Figure 4-5 illustrates this concept. The operation of the translation engine is identical for all phases.

The simplest phase is compilation. The intermediate form constructed by the generated lexor/parser contains all of the events that must be executed in order to translate the program module (see Figure 4-6). The only semantic data available is taken directly from the program text, such as identifier names and 5O

Intermediate Form Intermediate Form r i

Remaining Unexecuted Events Unexecuted Events

Instantlatlon Phase Union of Previously Available Semantic Data

ASevamilabantlcle I and ExDataecuteCdomputedEvents by Data I

L J L

Figure 4-5: Operationof a Phase

I Le:_'orI

Intermediate Form Intermediate Form

Unexecuted Initial Set of Events Events Com "ation_ > P_e / AvaUinilableon oafnd Computed Data from Semantic Data Program Text

Figure 4-6: OperatiOn of a Compilation Phase 51

Intermediate Form Intermediate Form

Unexe cuted Unexe cuted Events Events

Available Available Semantic Semantic Data Data

Intermediate Form Intermediate Form

nexeouted Union of Events Unexecuted uRemalnlng 1 Events

Instantiation/ AvaUnionilableoandf Computed Union of Semantic Data Available Data

Figure4-7: (_on ofLinkingPhase constantvalues.At theend of thephase,unexecutedeventsmay remainasa resultof unfulf'dlbinded ing timeconstraintsor unsatisfiededpendencieons attributesfromothermodules.

A linkingphase createsan intermediatfoerm thatcontainsthe unionof the eventsand semantic informationof theinputintermediateforms.Thisunionisproducedby concatenatingthelistsof basic formstogether.The translatothrenoperateson theresult(seeFigure4-7).Unexecule.eAventsthatdepend on attributvaluee swillbe executedffthosevaluesareprovidedby otherprogram unitsin theunion. Normally,a languagespecifiedrefinesthecommunicationof informationbetweenprogramunitsthrough theuse of aggregatevalues(e.g.Aggregationor Scope values;seeSection3.1.2).Aggregatcv_alues importedfromanothermodule'sinterfacemay be incomplete.Usually,however,most of thesevaluesare completeby link-timeusi, ngadditionalinformationprovidedby themodule'simplementation.

In addition to executing events and computing semantic data, the translation engine maintains the semantics of a program's execution. Recall that each node in an abstract syntax tree contains a stack machine program fragment expressing the execution semantics of the program text represented by that node. The translation engine produces these fragments whenever it detects that semantic actions associated with the current node depend on run-time values. 52

statement<> ::= WHILE expression<> DO statement_list<> END ( CheckWhileExpr : BEGIN -- Check that the guard condition is of Boolean type. IF NOT TypesMatch(BooleanType, expr-type)THEN NoteError ("Boolean expected", GetPosition (2), Fatal) ; ENDIF; END

WhileSemantics : BEGIN -- If the execution of the expression is true, execute -- the statement list and then execute self. IF Execute(l, Boolean) THEN Execute (2); Execute (0); ENDIF; END );

Figure 4-8: Production for Modula-2 WHILE Statement

Simple values that depend on run-time state, such as memory contents or user input, result in simple code fragments that access the run-time state. The translation engine tags such values. Then, when it attempts to evaluate a semantic function that depends on tagged values, the translation engine composes those code fragments into an appropriate run-time library subroutine call. (Some specification language models explicitly manipulate code fragments -- refer to the Machine State and Memory models in Appendix B).

Similarly, the translation engine generates code fragments for any statement in a computational event that contains a "run-time" expression. The obvious code generation occurs for IF and call statements, using actual values when available and code fragments otherwise. The resulting code fragments are appended to the fragment maintained for the current node.

If a "run-time" value is assigned to an attribute or an aggregation's field, the translation engine tags the attribute or field and assigns the associated code fragment as its "value". More importantly, such an assignment defines the value so that translation can proceed. Subsequent uses of the attribute or field are considered as run-time. The translation engine returns the code fragment for each access. (Multiple accesses can be optimized by "executing" the fragment once and assigning the result to a temporary variable.)

The translation engine can also detect when a target machine program fragment is independent of machine state and thus represents a constant value. If translation depends on such a value, the actual value is used directly. 17

17Thisresemblestheconceptsof constantfoldingandpartialevaluation. 53

In the example of the Modula-2 WHILE statement (see Figure 4-8), a constant true expression generates an infinite loop whereas a statically false condition generates no "code" at all. On the other hand, if the guard condition is not static, the execution of the IF statement in the event labeled WhileSemanties constructs a conditional branch in the target stack machine program.

The generated code fragments are interpreted during "program execution". Together with special operations for generating standard linking structures, these code fragments can also be used for code generation. As long as the link-time semantic analysis of a language requires only such activities as global address resolution, the generated translation system can perform just like a normal compiler. In Chapter 6, we discuss this possibility and the practicality of using event-driven translators in this manner, but first we take on a more ambitious project -- the design and implementation of a language requiring elastic: binding times. 54 Chapter 5 Language Design and Elasticity

In this chapter, we examine the role of binding times during translation and the value of elastic binding times to programmers by modifying the design of an existing programming language and implementing the result using the translation system generator. We chose Smalltalk [Goldberg 83] as the language for this exercise.

In the first section below, we summarize the concepts present in Smalltalk and examine which features make good candidates for modification to allow flexible binding times during translation. Several design options are available; the second section describes some possible designs that achieve the goals of elasticity. Of course, it must be possible to specify not only the base language (i.e. SmaUtalk) but also the chosen design changes. The specification for our modified Smalltalk and the resulting translation system constructed by the ELI generator are presented in the next section. Section 5.4 then reviews the specification with respect to the ELI features it uses.

Our reasons for designing elasticity into a programming language include the needs inherent in building programming environments (see Section 1.1.4) and command languages (see Section 1.1.5). These needs were strong motivations in the development of the thesis. The last section, therefore, examines how the modified Smalltalk satisfies the demands of implementing and controlling interactive applications.

5.1. Smalltalk

We use Smalltalk as our example language for two reasons. First, Smalltalk embodies a small number of concepts, allowing us to design more easily extensions that retain the language's consistency. Second, Smalltalk has features that inherently require run-time translation. It is our desire to design and implement a modified Smalltalk that allows elastic binding times for those features.

Smalltalk is an object-oriented language. In an object-oriented language, program values are considered to be objects. An object consists of private memory and a set of operations that have exclusive right to access and modify that memory. The specifications, or interfaces, of these operations are called messages. Smalltalk objects are instances of data abstractions called classes. A class defines the form of the private memory for each object instance and implementations (called methods) of the operations that apply to objects of that class. In addition, every class except one (the class Object:) inherits the properties of one other existing class. The inheriting class (referred to as the subclass) initially contains the instance

55 56

variables 18 and operations of the class it inherits (referred to as its superclass). These properties may be extended and, in the case of inherited methods, overridden with new implementations. Thus, operations that apply to instances of a superclass also apply to the instances of every class that directly or transitively inherits the superclass.

In any language, the invocation of an operation in a program requires two translation actions. First, a lookup must be performed to check whether the operation is allowed in the invocation's context. Then, if the invocation is legal, the implementation of the operation must be determined. In most languages, the second action is trivial since each operation may have exactly one implementation; it becomes an issue only when a context may allow more than one implementation for an operation, as in languages allowing procedures as first-class values.

In Smalltalk, the first activity is referred to as message lookup while the second is called method determination. Since a SmaUtalk class may override the implementation of any inherited operation, method determination may require a search of the inheritance hierarchy. Under normal circumstances, both activities occur during the execution of Smalltalk programs. No mechanism exists to express statically the known properties (i.e. the class) of an expression, even though the programmer often knows exactly what sort of objects to expect. 19

Late binding for message lookup provides several benefits. As in other interpretation systems, the overhead of "compilation" in the edit-compile-debug cycle is minimal during program development. The most important benefit, however, involves the flexibility available to programmers. For example, one message may be shared by two totally unrelated classes -- that is, when neither class is a descendant of the other in the inheritance hierarchy. Such classes conceptually share an abstract superclass. Abstract superclasses are not represented by actual classes; they therefore cannot have any instances. 2°

To illustrate the need for abstract superclasses, consider Figure 5-1. Classes Ordered and DJ.splayabl.e present two possible properties of data abstractions: partial ordering and the ability to be displayed. One may define objects that are displayable but not ordered and objects that are ordered but not displayable. The two classes providing the messages that characterize these properties must therefore be unrelated. If they were represented by actual classes, a third class defining objects that are displayable and ordered would have to inherit both, which is impossible using single inheritance (Figure 5-1(a)). Thus, dynamic message lookup allows a limited form of multiple inheritance (Figure 5-1 (b)).

In the current design of Smalltalk, these benefits come at a price. First, the minimal form for variable declarations yields little documentation as to the programmer's intent. The dynamic nature of message lookup further complicates the readability of Smalltalk programs. Second, although Smalltalk programs

lSThe instance variables define the form of an object's private memory.

19As we will see later, however, some information can be extracted from flow analysis of the program text [Suzuki 81].

2°In contrast to our use, Smalltalk applies the term abstract superclass to actual classes that should not be instantiated because at least one of its messages has no implementation (specified by the standard identifier ,,ubclasaRe,_pons:i.bilit:y). 57

cl ass A cl ass B display( }lessThan

J l essThan class C

, class A class B

display[ O]lessThon / / / /

lessThGn displGy class C

(b)

Figure 5-1: The need for Abstract Superdasses 58 are type-safe in that operations cannot be executed on inappropriate objects, violations can be detected only during run-time. In general, one cannot even guarantee the absence of such errors, even with a significant amount of program testing. Finally, late binding for message lookup often results in relatively inefficient program execution.

class name Sort ableLi st superclass LinkLi st

instance methods

"sortList implements a selection sort" sortList I remainder sentinel least i remainder e- self. [ remainder cdr ~~ nil ] whileTrue: [ sentinel e- remainder car. least e- (remainder cdr) select: sentinel. least ~~ nil ifTrue: [ remainder replaca: (least car). least replaca: sentinel ] . remainder e- remainder cdr ]

"Determine ordering based on element ordering" comesBefore : aList I remainder I (self car) comesBefore: (aList car) ifTrue: [ ^ true ]. (aList car) comasBefore: (self car) ifTrue: [ ^ false ]. ^ (self cdr) comesBefore: (aList cdr)

private

"select the minimum element" select : sentinel I aList minSoFar i aList e- self. minSoFar e- nil. [ aList ~~ nil ] whileTrue: [ ((aList car) comesBefore: sentinel) ifTrue: [ minSoFar e- aList ]. aList e- aList cdr ]. ^ minSoFar

Figure 5-2: Class using an Abstract Superclass

For example, consider the Smalltalk class defining sortable lists in Figure 5-2. The inherited list,, LinkList, presents an abstraction similar to S-expressions in Lisp, including the operations car, cdr, cons', and replaca :.21 The new class, SortableList supplies two additional messages. One sorts a given list in place using selection sort (sortList) while the other defines a lexicographic ordering on sortable lists (comesBefore:). Both operations use the partial ordering of the list elements, which is supplied by an abstract superclass.

The class SortableList illuminates why the abstract superclass defining comesBefore : cannot be, represented by an actual class. All sortable lists have a partial ordering (the lexicographic ordering comesBefore:), but not all linked lists are sortable. Thus, LinkList cannot inherit the partial

21Appendix D contains a definition of LinkList in the syntax of the modified Smalhalk. 59 ordering property, whereas SortableList must inherit LinkList. Without dynamic message lookup, such multiple inheritance would not be possible.

As noted above, programmers often know in advance information that can help establish the legality of expressions in their programs, especially the lookup of message invocations. All of the LinkList message invocations in Figure 5-2, for instance, apply to objects that inherit LinkList. Under such circumstances, it should be possible for a programmer to specify that information so that a translator can take advantage of it. In the next section, we explore possible extensions and changes to Smalltalk's syntax and semantics that would permit a translator to reduce the costs while maintaining the flexibility and spirit of the language.

5.2. Design Options

To achieve success, the modifications to Smalltalk should alleviate the problems associated with the lack of internal documentation and the inability to perform static type checking and message lookup without compromising the benefits of interpretation and abstract superclasses or destroying the structure of the language. Furthermore, the implementation of the new language will require elastic binding times for certain translation activities, notably message lookup. The features of interpretation (useful during program development) and abstract superclasses both require run-time message lookup, while static type--checking and the generation of efficient program translations require compile-time message lookup. By using ELI, we show how to achieve all of these objectives through its implementation of the resulting design.

The lack of internal documentation in Smalltalk programs arises from the programmer's inability to provide semantic annotations (e.g. type declarations) and the dynamic nature of message lookup. Both causes can be eliminated by allowing the programmer to associate types with instance variables and the parameters and locals of methods. In order to incorporate type checking into Smalltalk, we must complete three tasks: define what constitutes a "type", determine how to compute a type for each expression in a program, and establish rules for type matching.

We must define some terminology to discuss this last task. Static type checking verifies that the "computed" type of an expression matches a "given" type. A "given" (or "expected") type is associated with the left-hand-side of every assignment and each formal parameter of a message. Note that the rules for matching an l-value's given type and for matching a parameter's given type may differ. An "actual" type of an expression is the type of any value the expression may assume during program execution.

In most other type-checked languages, an actual type always equals its corresponding computed type. In Smalltalk, however, actual types need not equal the computed type. 22 A message invocation on an object expression (called the receiver of the message) is legal whenever the actual class of the receiver inherits the

zZ'Inthis case, type checking can only guarantee the legality of expressions if the matching relation is transitive. 60 class defining the message. Static type checking of SmaUtalk, therefore, need only ensure that each actual class of a receiver inherits its computed type and that the computed type accepts the message.

Several researchers have studied the problem of adding type checking to SmaUtalk. Suzuki, for instance, defines the type of an expression to be the set of classes of the objects that the expression may represent, determined by flow analysis and type inference on the program text [Suzuki 81]. His algorithms avoid the need to declare types for variables and parameters. Thus, every assignment is legal. An actual argument in a message invocation, on the other hand, only matches the corresponding formal parameter if its computed type represents a subset of the formal's expected type.

Unfortunately, Suzuki's analysis requires the availability of the entire program's text. An advantage of this technique, though, is that it can also determine statically which methods would be used for each execution of a message invocation. With this information, a compiler could build a dispatch table for each expression mapping classes to method implementations. Suzuki's work enables static type checking and message lookup but does nothing to increase the internal documentation of SmaUtalk programs.

One approach that allows type declarations has been proposed by Boming and IngaUs [Borning 82]. In their type-checking system, a type is defined by an object's interface -- that is, the messages available in the object's class. A type does not include the object's message implementations (i.e. methods) nor the definition of its private memory. Types may have type parameters. Type parameters provide a limited form of polymorphism. For example, one can check that an expression is of type r.£st-o£-'rnteger instead of type r.i.st. The programmer need not declare a type for each variable or formal parameter. In such cases, type inference is used to determine the set of possible classes for expressions using those objects and the nearest common superclass is computed and assigned as the expression's "type".

The rules for type matching assignments and actual arguments are identical. The computed type of an expression matches the expected type if it equals or inherits the properties of the expected type. For parameterized types to match correctly, the corresponding parameter types must also match.

Neither approach has been used to eliminate dynamic message lookup or reduce the cost of method determination. Both, however, maintain all of SmaUtalk's benefits. Both methods also provide a lot of static information as to the possible correctness of a program, although Suzuki's work does not increase the readability of programs.

Our goals involve incorporating a similar type checking mechanism into a translator for SmaUtalk in order to eliminate most message lookups. For our purposes, then, it is sufficient to use a slightly weake.r concept of type than the one defined by Borning and Ingalls. As above, an object's type is defined by its interface, except that no type parameters are allowed. This concept is not as powerful because we cannot check polymorphism.23

23I am not opposed to introducing polymorphism checking into Smalltalk, but doing so adds nothing to the demonstration of elastic binding times. Nothing in the ELI system precludes adding polymorphism to a language specification. 61

The rule for matching a computed type against a given type also parallels the definition used above. The computed type of a program expression matches the given type for an assignment or a method parameter if and only if the two types are equal or the computed class inherits the given class. This rule guarantees that the actual type of an expression will always match the corresponding expected type, since inheritance is transitive (see Figure 5-3(a)).

For example, any expression whose computed class is equal to or inherits SortableList matches the expected class LinkList (refer back to Figure 5-2). Thus, in the method sortList, if the local remainder is declared as a SortableList, all of the LinkList messages applied to remainder (e.g.car, cdr, replaca :)arelegalinvocations,sinceSortableList inheritsLinkList.

Our design, however, computes types for expressions differently. Borning and Ingalls permit the programmer to omit type declarations, then use flow analysis to determine the set of possible types, whereas we require all type declarations. Rather than noting an error whenever a computed type does not match a given type, our language semantics will indicate an error only when the actual type cannot match the given type. This may be detected statically only when the expected type also does not match the computed type (see Figure 5-3(b)). In this case, the two classes are totally unrelated in the inheritance hierarchy and the actual type (which must match the computed type) cannot possibly be legal. If the expected type matches the computed type, a warning is issued that the actual type must be checked against the expected type, requiring run-time evaluation (see Figure 5-3(c) and (d)).

To compute the type of an expression, the type checking algorithm needs the result type for each message invocation, which in turn requires static message lookup. Given the computed type of the invocation's receiver, one can attempt to find the message in the interface of the associated class. If fi)und, the computed type of the invocation is the type declared for the return value of that message (see Figure 5-4(a)). If not found, an error cannot be issued. Because of the possible use of abstract superclasses, the message may be defined by the actual class of the receiver. Therefore, a warning should be noted instead that the message lookup will occur dynamically in the receiver's actual class (see Figure 5-4(b) and (c)).

Unfortunately, static type checking of the expression containing such an invocation cannot continue unless some "computed" type is assigned as its result type. Rather than perform flow analysis to attempt to discover all classes the invocation might actually produce, our semantics assumes nothing about the return type. Thus, the least informative class, Object, is assigned statically as the invocation's computed type. Flow analysis requires inter-class examination of code. If flow analysis were used, type checking for each class would have to be repeated for each program in which the class appeared. Although our method may force more dynamic message lookups, the type checking of each class need occur only once.

Of course, message lookup cannot occur without some knowledge of the messages provided by each class. Thus, each class in our modified Smalltalk will have an interface enumerating the messages accepted by objects of that class. Because of inheritance, the interface of a class need only specify its superclass and the messages it adds. 62

expected class

computed

• expected cam puted • class class

actuaclassl

(a) Always legal (b) Never legal

_) computedclass compclassuted

expected actual class class

(_ acclasstual (_ expclassected

(c) Leged at run-time (d) Illegal at run-time

Figure 5-3: Type Checking Possibilities 63

class compdefiutednes message

(a) Always legal

class class

computed (_ computed

class class defines does not message define message

(b) Legal at run-t£me (c) ILlegal at run-time

Figure 5-4: Message LookupPossibilities

Providing a fixed interface for each class also allows an optimization. Complete method determination cannotbe performedstatically because the actual class of an object may inherit the computed class of the correstxmdingexpressionand may overridethe message's implementation. Dynamic resolution,however, can be made more efficienL The messages specified in the interfaceof a class may be assigned sequential indices. Furthermore, the indices assigned to inherited messages coincide with those assigned by the superclass.24If the message index for an invocation could be determined statically, method determination reduces to a table lookup at run-time. Therefore, when static message lookup succeeds, one need only obtainthe message's index fromthe interface of thereceiver's computedclass.

These modificationsto the Smalltalk languageincreasethe readabilityof programs,allow static indicationsof definiteandpossibletypeerrors,andreducethe costsassociatedwith dynamicmessage lookupandmethoddetermination.The onlycausesof dynamictypecheckingandmessagelookuparise from the desire to maintain the benefits of the original language: interpretation during system development and the use of abstract superclasses. Abstract superclasses continue to be allowed since dynamic message lookups are still possible.

:_['his propertytakes advantageof single inheritance. Simple indices cannot be assigned so as to voincide with the indices of inheritedmessages in a languageallowinmultipleg inheritance. 64

In the next section, we present the specification of the syntax and semantics of our new language, Elastic-Smalltalk, in the ELI system and show how all of these benefits, including interpretation, are attained. The benefits of the specification technique are also illustrated. Finally, we demonstrate how the translation system generated by ELI exhibits automatic and elastic binding time determination as well as binding times controlled explicitly by the language specifier.

5.3. Elastic-Smalltalk

Elastic-Smalltalk is the name of the language we designed that incorporates the type checking concept,; introduced above. We divide the description of Elastic-Smalltalk's specification into two parts, syntax and semantics. The goal of the separation is to give the reader a flavor for the language so that we can, without recrimination, give program examples that illustrate the specification of its semantics.

5.3.1. Syntax of Elastic-Smalltalk

The modifications to SmaUtalk discussed above require two changes to the syntax of the language: types for variable and parameter declarations and interfaces for class definitions. Appendix C contains the complete syntax of Elastic-Smalltalk. Here, we present only those parts necessary for describing the specification of the semantics.

The style of the syntax is more Algol-like than SmaUtalk-like; this style reflects a personal preference and results in a slightly clearer specification. In the BNF fragments below, non-terminals are in italics, reserved words are capitalized, terminal classes (such as identifiers and program constants) are in italics, and terminal strings are surrounded by double quotes. Square brackets ([ ... ]) surround optional phrases. In the actual specification, however, optional phrases must be represented by non-terminals.

In Elastic-Smalltalk, a class definition is separated into two components: its interface and its implementation. As we concluded in the previous section, the class interface need only identify the inherited class and any additional messages accepted by object instances. Other classes used in the signatures for messages are declared (i.e. imported) in one place near the top. _ Note that, to maintain the abstraction property for types, the form of an object instance's private memory is not defined in the class interface.

class-interface : : = INTERFACE identifier INHERITS class-identifier [ USES class-identifier-list ] CLASS_MESSAGES message-sequence INSTANCE MESSAGES message-sequence identtfier

zS"Importing" referenced types explicitly is not strictly necessary. A class interface could be read whenever a new class name is encountered. 65

The interface, or signature, of a message stipulates the type for each parameter (using the appropriate class name) and the type of its return value. As in SmaUtalk, the receiver's type is implicitly the one corresponding to the class being defined. Similarly, the defined class is also assumed as the return type when one is not given explicitly. message : "= MESSAGE identifier " (" [ class-identifier-list ] ")" [ ":" class-identifier ]

The remainder of the definition for a class is given in an implementation module. This module contains the instance variable declarations and message implementations. class-implementation • •= IMPLEMENTATION identifier [ USES class-identifier-list ] REPRESENTATION instance-variable-declaration-sequence CLASS METHODS method-sequence INSTANCE METHODS method-sequence END identifier

A method implementation looks like a procedure definition in most Algol-like languages. The significant differences from Smalltalk involve only the assertions of a return type and types for parameter and declarations. method : :-" METHOD identifier " (" declaration-sequence ") " [ ":" class-identifier ] [ VAR declaration-sequence ] BEGIN statement-sequence END identifier

declaration " : = identifier-list ":" class-identifier

Before describing Elastic-Smalltalk statements and expressions, we should justify certain decisions made concerning the "primitive" types of the language. In the Smalltalk implementation described by Goldberg and Robson [Goldberg 83], the system uses special representations for integers and statement blocks and implements booleans as a normal class. For reasons of practicality and efficiency, we did not even try to cast these types as objects. Thus, expressions fall into two categories: those involving objects and those involving integers and boolean values. The major consequence of this decision is that new classes cannot inherit the properties of the primitive types. A practical benefit, however, is that Elastic-Smalltalk statements are control structures instead of messages as in SmaUtalk and therefore more efficient during execution.

Currently, nothing in Elastic-Smalltalk corresponds to SmaUtalk statement blocks. The main difficulty in providing such an abstraction lies only in finding means for declaring the type of values a block represents and for applying blocks to lists of arguments. Note that, in many regards, blocks resemble lambda expressions and suffer from the same problems (e.g. the funarg problem [Moses 70]).

We have included in Elastic-Smalltalk control structures corresponding to conditionals 66

(IF-THEN-ELSEIF-ELSE), conditional repetitions (WHILE), and method returns (RETURN). Although these structures suffice for the expository purposes of this thesis, we realize that they are barely adequate in a "real" language. We also use Algol-like syntax for control structures. And, as in Smalltalk, a statement may consist of an expression. (Please refer to the appendix for the syntax of statements.)

Elastic-Smalltalk allows four forms of expressions: message invocations, primitive infix and pref'_ operations, assignments, and reference expressions. Reference expressions include constants, the special pseudo-variables self and super, and accesses of instance variables, parameters, and method locals. Assignments are used to modify local variables and the instance variables of an object. As an expression, an assignment yields the value of its right-hand-side. expression • •= identifier ": -" expression The usual set of primitive type operations exist for integers (e.g. infix addition, prefix negation) and booleans (e.g. prefix inversion, conditional conjunction and disjunction). Finally, expressions involving: objects include infix equivalence (==) and inequivalence (~~) and message invocations. We chose to use: the following syntax for message invocations to enable a clearer specification of Elastic-SmaUtalk's semantics.

expression : : = receiver-expression "." message-identifier " (" [ expression-list ] ") " This form for invocations eliminates ambiguity in the syntax and the need to concatenate selector strings into a single name, as is required in Smalltalk.

INTERFACE SortableList

INHERITS LinkList

CLASS MESSAGES -- None. See LinkList for creation messages.

INSTANCE MESSAGES

-- sortList implements an in-place selection sort -- It returns a SortableList by default. MESSAGE sortList ()

-- Determine lexicographic ordering based on element ordering MESSAGE comesBefore (SortableList) : Boolean

END Sort ableList

Figure 5-5: Interface for SortableList

Figures 5-5 and 5-6 contain the Elastic-SmaUtalk rendition of SortableList. Note how the interface indicates only the signatures of the messages accepted by objects of the class. Furthermore, only the addexl messages are explicitly mentioned -- all other messages are inherited from LinkList, the superclass. The implementation, for the most part, looks just like the SmaUtalk version except for the associations of class names with variable and parameter declarations. Whether or not one agrees with Algol-like syntax versus Smalltalk's syntax, adding types to declarations increases the internal documentation of programs. In describing the specification of Elastic-Smalltalk's semantics, we show how these declarations also allow static message lookup and efficient method determination. 67

IMPLEMENTATION SortableList

REPRESENTATION -- No additional instance variables.

CLASS METHODS -- None.

INSTANCE METHODS

METHOD sortList() -- Returns a SortableList as the default. VAR remainder, least : SortableList sentinel : Object BEGIN remainder := self; WHILE remainder, cdr () ~~ nil DO sentinel := remainder, car () ; least := remainder, cdr () .select (sentinel) ; IF least ~~ nil THEN remainder, replaca (least. car ()) ; least, replace (sentinel) ; END; remainder := remainder, cdr () ; END; END sortList -- Returns self as the default.

METHOD comesBefore (aList : SortableList) : Boolean VAR remainder: SortableList BEGIN IF self. car () .comesBefore (aList. car ()) THEN RETURN true; ELSEIF aList, car () .comesBefore (self. car ()) THEN RETURN false; ELSE RETURN self. cdr () .comesBefore (aList. cdr ()) ; END; END comesBe fore

-- Private message that selects the minimum element. METHOD select(sentinel: Object) -- Returns SortableList as default. VAR aList, minSoFar: SortableList BEGIN aList := self; minSoFar := nil; WHILE aList ~~ nil DO IF aList, car () .comesBefore (sentinel) THEN minSoFar := aList; END; aList := aList.cdr(); END; RETURN minSoFar; END select

END SortableList

Figure 5-6: Implementation for SortableList

5.3.2. Semantics of Elastic-Smalltalk

We now turn to describing the semantics of Elastic-Smalltalk and its specification in the EL1 system. Appendix E provides the full specification. The reader should refer to Chapter 3 for a review of the specification language. In this specification, the domain definitions and the terminal/non-terminal 68 declarations each comprise just less than five percent of the specification. The construction of the standard prelude (i.e. standard classes, metaclasses, and methods) takes up slightly over twenty percent. The remaining seventy percent consists of the attribute grammar that associates semantic events with the syntactic constructs. The entire specification, without comments, is approximately 1650 lines long.

The primary domain definition describes the various components of the language -- classes, metaclasses, instance variables, messages and associated methods, parameters, and local variables. The description of a class includes an identifier scope for declarations, a storage environment for assigning offsets to instance variables, its inherited class, its metaclass, and its program representation. The last is necessary because classes are also object values in programs. Furthermore, every class definition actually creates two classes -- the class itself and its metaclass. The metaclass is an unnamed class that contains the defined class as its only object instance and characterizes the instance creation messages for that class. Individualized metaclasses are necessary to allow specialized instance creation messages for different classes. Otherwise, all creation messages would be identical. Each metaclass is an instance of the pre-defined class MetaeXas,. Since metaclasses are also classes, their semantic descriptions closely resemble those of classes.

A semantic description for a message (and its associated method) contains a local identifier scope, a storage environment for parameter and variable declarations, its signature, its index within the class (for efficient table lookup of methods during execution), and the stack machine program representing the method's statement list. Finally, semantic descriptions for instance variables, parameters, and local variables associate a class for use during type checking and an assigned offset in the appropriate storage environment.

The standard prelude defines standard classes (Object, Metaclass, Class26), standard messages and default methods (class and isKindOf for all objects, new and inheritsFrom for class objects), special "pseudo-classes" for integers and booleans, the base identifier scope for a new class definition, and useful lambda expressions for checking the semantics of Elastic-SmaUtalk programs. The most important: lambda expression decides if a computed class matches an expected class. Its algorithm just checks whether the expected class is an ancestor of the computed class in the inheritance hierarchy.

Each production in the attribute grammar for Elastic-Smalltalk defines the semantics for the specified syntactic construct. Productions at a high level of abstraction, such as the one for a class interface, define semantic context for more specific productions, such as the one for message interfaces. The production for a class interface builds new class and metaclass descriptions, using the semantic domain described above. Only the information that defines the semantics of a "type" is incorporated at this point. Thus, instance variable declarations and the associated storage environment are added later in the specification for a class implementation. To do this, we take advantage of the aggregate property of the semantic domain. Some of the information used by the translation events for a class interface originate in other productions -- the inherited class, "imported" classes, and message interfaces for instance creation and manipulation.

26Class is the metaclass of Object and therefore an instance ofMetaclass. 69

The important aspects of Elastic-SmaUtalk concern automatic and controlled determination of binding times for type checking and message lookup. The language constructs eliciting such activities include assignment and message invocation expressions and the RETURN statement. We now describe the productions for these constructs and for method declarations in some detail.

-- A method includes a name, parameter declarations, return class, local variable declarations, and a statement list. -- Semantic information from context includes the identifier scope for the class, the next available message index, -- the dispatch table for objects of this class and the default return class (either the class or its metaclass). -- Provided to the context is the next index for subsequent method declarations.

-- The attribute check holds the list of parameter classes from the interface's signature; others returns the -- (hopefully empty) list of unchecked parameter classes. If not overriding, parms holds the list of parameter classes.

method<> ::= METHOD identifier<< ^ method name>> "(" opt parm_list<> ") " opt_return<> stmts<> END identifier<< ^ end method name>>

-- Local attributes include the message interface and the method; the next index for subsequent declarations; -- the offset for the receiver; the local identifier scope and storage environment; whether this declaration -- overrides an interface message; and the list of parameter classes to check signature compatibility.

LOCALS message, method: Symbol; new_index, self_offset : Integer; local_scope: SymbolScopeStack; stg_env : Store; override : Boolean; check: SymbolList; { MethodIds : BEGIN -- Trailing identifier and method name should be the same. IF method name <> end method name THEN NoteError ("Name m_smatch", GetPosition (9), Warning) ; ENDIF; END

Figure 5-7: Production for Method Declaration, Part 1

Each method declared in a class may override a message from an inherited interface (such as the method for new in the class Financialliistory; see Figures D-1 and D-2 in Appendix D), or it may implement an added message (such as the message receive), or it may constitute a private message, available only to other methods in the class (such as setTnitialBalanee). In the first two cases, the method's index, used as its offset in a dispatch table, has already been assigned; otherwise, a new index must be calculated. Similarly, the method's signature must match the corresponding message's signature. For our purposes, it is sufficient if the signatures match exactly; a more lenient definition is possible. 27 Finally, each method must provide its own local identifier scope and storage environment for parameter and local variable declarations. Figure 5-7 contains the production for a method declaration.

27Recall the type-matching rule: a computed class matches an expected class if the computed class inherits the expected class. Conceptually, actual values may contain more inforrnation than required in a given context. Similarly, a method signature can require less information of its parameters and provide more information in its return value than specified by its interface signature. Each invocation that executes such a method harmlessly provides more specific parameter information. Thus, a method's signature would match the message's signature if each message parameter inherited the corresponding method parameter and the method's return class inherited the message's return class. This matching rule is demonstrated when a message overrides its inherited signature in the message production; see the appendix. 7O

MethodLookup: BEGIN -- Check if this method overrides an interface message. override, message := LookupName(id_scope, method_name);

IF override THEN -- Ensure not a multiple declaration. No need to use the next index. -- Check compatibility with interface's parameter list. IF HasProp(message, Stmts) THEN NoteError ("Multiple declaration", GetPosition (2) , Fatal) ; ELSE method := message; new index := next index; check := GetProp(message, ArgClasses) ; ENDIF; ELSE -- Create new method, using next index and declaring in the identifier scope for the class. method := NewObject (Symbol, Method) ; SetProp(method, Index, next_index); new index := next index + i; check := NewList (SymbolList) ; IF DeclareName(id_scope, method_name, method) THEN NoteError ("Internal error", GetPosition (I), Fatal) ; ENDIF; ENDIF;

-- Enter method into dispatch table, allocate a local storage environment, enter a new identifier scope. AssociateValue(dispatch_table, GetProp(method, Index), method); stg_env := NewStore () ; self_offset := FindOffset (stg_env, I) ; SetProp (method, Environ, stg_env) ; local_scope := EnterScope (id_scope, Open, NewScope (SymbolScope)) ; SetProp (method, Visible, local_scope) ; END

MethodSemant i c s : BEGIN -- Record the implementation of the method. SetProp(method, Stmts, Label (7)) ; END

MethodCheck: BEGIN -- If overriding, check that return types match and that the parameter counts match. IF override THEN IF returns <> GetProp(message, Returns) THEN NoteError ("Override return mismatch", GetPosition (6), Fatal) ; ENDIF; IF ListLength(others) > 0 THEN NoteError ("Override too few parms", GetPosition (5), Fatal) ; ENDIF; ELSE -- Otherwise, set method's signature. SetProp (method, Returns, returns) ; SetProp(method, ArgClasses, parms); ENDIF; END };

Figure 5-7: Production for Method Declaration, Part 2 71

For the most part, the semantics for statements and expressions resemble that of other languages. Semantic information required from the context includes the identifier scope for variable/parameter lookups, the return type of the current method for type-checking RETURN statements, and the current class (or its metaclass for class methods) so that the type of super may be computed. The semantics of each statement or expression is reflected in its effect on the implicit machine state. Each expression also supplies its computed class to its context.

-- A return expression must match the specified return class, returns. -- The return value is left on top of the stack, and a subroutine return is executed.

-- The return expression may be empty, in which case self is returned by default.

statement<> : := RETURN return_expression<> { ReturnStmt : BEGIN -- Leave the return value on the top of the stack. Execute (I) ;

-- If the computed type, expr_class, does not match the expected type, returns, -- then generate a run-time check unless it cannot match. [I] IF Incompatible(returns, expr_class) THEN [2] IF Incompatible(expr_class, returns) THEN NoteError ("Class mismatch", GetPosition (2), Fatal) ; ELSE -- The types may match. Issue a static warning and check if the actual type matches the -- expected type. Note: the first field (ltsClassLocation) of every object points to its class. [3] NoteError ("Dynamic check", GetPosition (2) , Warning) ; [4 ] IF Incompatible (returns, FetchValue (CopyStack (I, Memory) , It sClassLocation, Memory) ) THEN NoteError ("Class mismatch", GetPosition(2) , Fatal) ; ENDIF; ENDIF; ENDIF; CallReturn (1) ; -- Subroutine return, value on stack. END ); Figure 5-8: Production for RETURN statment

Type checking occurs in three places: assignments, RETURN statements, and actuals in message invocations. For the latter, it may not be possible to do any static checking. When a dynamic message lookup is required, the expected types of the parameters are not available until program execution. For the first two, however, it is always possible to check statically whether the computed type of an assignment's right-hand-side or a return expression matches the expected type. As noted earlier, a type incompatibility cannot be reported if the types do not match, since the actual type of the expression will always inherit the computed type and may therefore inherit the expected type. The productions for a RETURN statement (Figure 5-8) and an assignment (Figure 5-9) both generate a dynamic type-check and a static warning for such cases.

The intention is that each production for expression in the specification will evaluate its computed type (the only synthesized attribute) statically. Thus, the initial test (line [1] in Figure 5-8) for incompatibility should always occur statically. When there is only a possibility for a match, the impossibility check (i.e. 72 testing whether the expected type does not match the computed type, line [2]) also occurs statically. Finally, when a possibility remains, a static warning (line [3]) and a dynamic test (line [4]) are generated. The last test is dynamic because it depends on values from the implicit machine state that normally do not represent constants.

-- The right-hand-side must match the left.hand-side. Evaluate the right-hand-side and store, leaving a copy -- on top of the stack for use in a surrounding expression.

expression<> ::= lhs<> ":=" expression<> { As signmentExpr: BEGIN

-- Leave the right-hand-side on the top of the stack. Execute (2) ;

-- If the computed type does not match the expected type, then -- generate a run-time check unless they cannot match. IF Incompatible(lhs_class, rhs._class) THEN IF Incom_atible(rhs_class, lhs_class) THEN NoteError ("Class mismatch", GetPosition (3), Fatal) ; ELSE

-- The types may match. Issue a static warning and check i/the actual type -- matches the expected type. NoteError ("Dynamlc check", GetPosition (3), Warning) ; IF Incompatible (lhs_class, FetchValue (CopyStack (I, Memory), ItsClassLocation, Memory) ) THEN NoteError ("Class mismatch", GetPosition (3), Fatal) ; ENDIF; ENDIF; ENDIF;

-- Store a copy o/the value into the appropriate location. StoreValue (Execute (i, Memory) , 0, CopyStack(l, Memory) ) ; END };

Figure 5-9: Assignment Expression Production

Before continuing with the specification of type checking for actual parameters, we can motivate the discussion by describing the semantics for message invocations. Figure 5-10 contains the specification for a message invocation. If the computed class of a receiver defines the invoked message, actual arguments can be checked in the same manner as assignments and RETURN statements. When a programmer uses an abstract superclass, however, it is normally not apparent that the message invocation is legal statically. That is, although the computed class of the receiver does not include the message, the actual class of each object instance may. The actual class of the receiver must therefore be interrogated at run-time to discover the message's signature.

Since the signature of the invoked message is not available statically, no information exists as to the number and classes of its formal parameters or the class of its return value. Not knowing the latter complicates the task of computing a type for the message invocation expression. Without investigating all classes that might define the message, the best guess must be Ob5¢et, the class representing the "greatest common denominator". (See line [2] in Figure 5-10.) This allows type checking that does not depend on the actual class of the expression to continue statically. 73

-- Each invocation must check its actual arguments against the message's signature. simple_expression<> ::= receiver<> .... identifier<< ^ msg_name>> " (" opt_actual_list<> ") "

LOCALS msg_class, actual class: Memory; check: SymbolList; message, method: Symbol; is_static, is_dynamic: Boolean; index: Integer; ( InvocationLookup: BEGIN -= Make sure the receiver instance is not nil and find its class. Execute (i) ; [i] IF CopyStack(l, Integer) = 0 THEN NoteError ("Nil value", GetPosition (I), Fatal) ; ENDIF; actual_class := FetchValue (CopyStack (I, Memory) , ItsClassLocation, Memory) ;

-- Try to find message in the receiver's computed class. is_static, message := FindName(FetchValue(receiver_class, ScopeLocation, SymbolScopeSt ack) , msg_name); IF is static THEN check := GetProp(message, ArgClasses) ; index := GetProp(message, Index) ; msg_class := GetProp (message, Returns) ; ELSE

-- Give warning of failure of static lookup, try lookup in receiver's actual class. NoteError ("Dynamic lookup", GetPosition (3), Warning) ; [2] msg_class := ObjectClassValue; is_dynamic, method := FindName (FetchValue (actual_class, ScopeLocation, SymbolScopeSt ack) , msg_name) ; IF is_dynamic THEN [3] check := GetProp(method, ArgClasses); index := GetProp (method, Index) ; ELSE NoteError ("Unknown message", GetPosition (3), Fatal) ; ENDIF; ENDIF; END

InvocationExpr : BEGIN -- Push actual arguments onto the stack, ensure the count is correct, and call method. Execute (3) ; IF ListLength(others) > 0 THEN NoteError("Too few actuals", GetPosition(6), Fatal); ENDIF; Evaluate (GetMappedKey (GetProp (FetchValue (actual_class, MapLocation, SymbolMap), Stmt s), index) ) ; CallStackValue () ; END };

Figure 5-10: Production for Message Invocation 74

The same technique (that is, assuming Object) cannot be used to eliminate the static check for actual parameters. Once the event of evaluating the expected type for a parameter has been executed, it cannot be reinstated. Thus, a stack machine program fragment will be created that passes the actual signature of the invoked method to the semantic action that checks actual parameters. (Refer to line [3].)

Figure 5-11 contains the semantic action for type checking actual parameters. This event looks much like the one for assignment (Figure 5-9). The primary difference is that the expected class of the argument may not be known statically, since the message lookup itself may not be static. In such cases, there is no point in reporting the dynamic check warning, nor is the computed class of the actual argument of any consequence. Thus, we eliminate these semantic actions by examining the binding time nature (via. IsStatic) of the expected class. actual<> : := expression<> LOCALS others : SymbolList; expected: Symbol; ( Actual : BEGIN Execute (i) ; IF ListLength(check) = 0 THEN NoteError ("Too many actuals", GetPosition (i), Fatal) ; ELSE others := ListTail (check) ; expected := GetProp (ListHead (check), Value) ; IF IsStatic (expected) AND Incompatible (expected, expr_class) THEN IF Incompatible(expr_class, expected) THEN NoteError ("Actual mismatch", GetPosition (i), Fatal) ; ELSE NoteError ("Dynamic check", GetPosition (i) , Warning) ; IF Incompatible (expected, FetchValue (CopyStack (i, Memory) , It sClas sLocation, Memory) ) THEN NoteError ("Actual mismatch", GetPosition (i) , Fatal) ; ENDIF; ENDIF; ELSEIF Incompatible (expected, FetchValue (CopyStack (i, Memory), It sClassLocation, Memory) ) THEN NoteError ("Actual mismatch", GetPosition (I), Fatal) ; ENDIF; ENDIF; END };

Figure 5-11: Production for Actual Argument

One special semantic check deserves a digression. In Smalltalk, the nil object is represented by a special class. Essentially, nil accepts no messages. Its class, UndefinedObject, overrides two messages defined by Object -- isNil and notNil. Both may be replaced by the appropriate test for equivalence (i.e. "==" or "--"); neither are therefore defined in Elastic-Smalltalk.

We would like to maintain Smalltalk's semantics, however, and allow nil as an actual argument or return value, or in an assignment. To incorporate nil cleanly into Elastic-Smalltalk, two factors must be. considered. First, the test for type checking must allow nil to match any given class; otherwise these; 75 capabilities would be precluded. Second, since n£:l. acceptsno messages, a test must be inserted into the semantics for message invocations (see line [1] in Figure 5-10) that raises an exception if a receiver instance is n£1. Note that, in general, one cannot guarantee statically that every instance of a message receiver will not equal nil.

5.4. Review of Specification

The ELI generation system provides several features that manipulate and control the binding times of translation actions. Recall that the system automatically determines an event's binding time phase (e.g. compile-time, link-time, run-time), pass within the phase, and relative order within a pass. The system also allows explicit control over binding time. Elastic-Smalltalk, however, makes limited use of ELI's features for explicit control.

We designed Elastic-Smalltalk so that almost all translation of a class occurs during either compilation or run-time. The specification language processor generates target machine code for translation whenever semantic events depend on run-time values. Thus, the translation of a class without statically detectable syntactic or semantic errors will execute all but one category of events during compilation.

That category of translation must occur during link-time. The only information one class needs from another consists of the code addresses of inherited methods (i.e. the implementations of inherited messages that are not overridden). Although we do not explore the potential of link-time translation, some elasticity still occurs. The semantic action that establishes the code address for a method table entry is executed at compile-time if the method exists in the implementation of the class.

The specification of Elastic-Smalltalk uses explicit control over event execution only to detect certain errors statically, such as undeclared identifiers (see references to Ia_,STPASS in Appendix E). We could have used explicit control to increase the utility of Elastic-Smalltalk as an interpreted, prototyping language. Elastic-Smalltalk's ELI specification could include flags that would, when set, force run-time message lookup and method determination. Totally dynamic message lookup allows a programmer to make changes to a class interface without requiring re-compilation of classes that inherit or import the modified class. Totally dynamic method determination enables dynamic linking of modified class implementations without having to re-link inheriting classes.

These flags clearly trade efficiency for flexibility and simplicity. Dynamic linking of modified class interfaces and/or implementations decreases a programmer's effort during the prototyping of an application at the expense of decreased execution speed. Upon completing the prototyping stage, the programmer could reset the flags so that most message lookup would occur during compile-time and method determination could conclude by link-time.

We discuss further advantages of Elastic-Smalltalk below in the context of command languages. 76

5.5. Review of Motivations

The motivations that led initially to our investigation into event-driven translation arose from the needs for elasticity in programming environments (see Section 1.1.4) and command languages (see Section 1.1.5). In this section, we review those motivations and examine how Elastic-Smalltalk satisfies the needs.

A programming environment can provide support for languages requiring late binding times more easily if it allows late binding internally. Furthermore, as a program that manipulates other programs, the implementation of a programming environment could also benefit from the availability of dynamic nominal types. These considerations generally lead one to choose an interpreted language.

Implementing a programming environment, however, is just like implementing any other application. Software engineering principles advocate performing as many semantic checks as possible statically (i.e. type checking). In addition to detecting some errors early, eliminating dynamic translation increases execution performance. We contend that elastic translation provides flexibility when needed, incurring additional cost only when those features are used. Otherwise, elasticity yields the benefits of static binding times.

As programming languages, command languages should also furnish facilities to support software engineering principles and allow efficient implementations of applications. When used to control interactive applications, however, command languages must provide the kinds of flexibility normally associated with interpretation (e.g. dynamic binding, quick prototyping). Elastic translation is indicated again.

How well does Elastic-Smalltalk satisfy these prerequisites? Recall that Elastic-Smalltalk shares most of the advantages of Smalltalk. In particular, it allows abstraction and extensibility (i.e. classes and single inheritance), a limited form of multiple inheritance (i.e. dynamic message lookup), strong type checking, polymorphism, and the potential for dynamic linking and quick prototyping.

Elastic-Smalltalk improves on Smalltalk by permitting elastic translation for message lookup, method determination, and class linking. Thus, Elastic-Smalltalk programs can be more efficient during execution without losing Smalltalk's advantages. In addition, Elastic-Smalltalk permits several benefits relating to software engineering, including static type checking, internal program documentation (i.e. type declarations), and encapsulation.

As a command language, Elastic-SmaUtalk could be improved. The language is weak in the control structures it provides (this could be remedied easily). The set of primitive types is also insufficient; a good character string abstraction should be incorporated. Finally, little thought has been invested in designing the user's execution environment. These issues should be addressed if Elastic-Smalltalk is ever to be used as a command language.

As a programming language, Elastic-Smalltalk could also be improved. Many features that have been explored in other languages have been omitted, such as exceptions or inter-process communication. One 77 feature, in particular, would seem to belong -- true multiple inheritance. All of the analysis for elastic translation of single inheritance applies to multiple inheritance except for one key characteristic. In single inheritance, a method lookup table can be built for each class so that a single dereference will determine the method for each invoked message. Unfortunately, there does not seem to be a simple indexing scheme for messages in multiple inheritance systems. Thus, efficient (i.e. static) method determination may not be possible.

In any case, the process of designing and implementing Elastic-SmaUtalk in the ELI generation system demonstrates the potential of event-driven translation as a language design tool. In the next chapter, we review the model's effectiveness for solving translation problems and evaluate its potential as a practical methodology for implementing translation systems. 78 Chapter 6 Evaluation and Practical Applications

The concept of viewing translation as event-driven has existed since the development of production systems. As we noted in Section 1.3, limited forms of event-driven translation have been recognized as solutions for certain classes of translation problems (e.g. dealing with forward references [Banatre 79] and determining execution order for run-time semantics [Kaiser 86]).

Our thesis is that event-driven translation constitutes an appropriate model for all translation: compilation, link-time, run-time semantics. The event-driven model not only determines correc.t binding time ordering and phase for each translation event, but also enables elasticity. Elastic execution of similar semantic actions can alleviate restrictive influences on language design by alleviating apparent corfflicts.

The primary conflict arises from the need for efficient program execution and static semantic checking on one side and the desire for language flexibility on the other. Often, flexibility can require dynamic: binding, as when allowing limited multiple inheritance in SmaUtalk through dynamic message lookup. Elasticity eliminates this conflict by performing dynamic translation only when necessary, thus achieving the benefits of both sides.

A second assertion of this thesis is that event-driven translation has potential in effective, practical language translation tools. The effectiveness of any model for translation depends on how well the model supports the implementation of translation systems. Implementations must deal with lexical analysis, syntactic analysis, semantic analysis, code generation, and run-time support. This thesis concentrates primarily on solving semantic analysis problems. The ultimate test of a software development, however, depends on its practicality and usability. In this chapter, we focus on substantiating the practical nature of event-driven translation.

The first section below reviews the effectiveness of event-driven translation. Section 6.2 then presents the criteria by which the model can be evaluated as a practical tool and describes how the ELI system satisfies some of those criteria. One way of demonstrating practicality is to implement an existing, well-known language using the ELI generation system. We therefore describe an implementation of Modula-2 in the following section. Finally, in the last section, we examine other application areas that could profit from event-driven translation.

79 80

6.1. Effectiveness of Model

In Chapter 2, we demonstrated the effectiveness of event-driven translation by solving several problems concerning semantic analysis. In this section, we quickly review the problems and their solutions in order to verify the effectiveness of the event-driven translation model.

One problem encountered in translation involves determining when to perform individual translation actions. Some actions must be executed before others, such as computing a variable's size before allocating its storage. Others cannot be executed when instantiated because a required value is not available, as occurs with forward references. One must also decide in which translation phase to execute each action. As we established above, event-driven translation executes each action as soon as its required data values become defined. Thus, relative ordering, delayed execution, and phase determination occurs naturally in event-driven translation systems.

A second problem concerns the implementation of languages that require run-time evaluation of actions normally associated with static translation, such as type checking or f'mding subroutine addresses. The event-driven model makes no distinction between translation and execution semantics and therefore treats all translation phases uniformly, including run-time. Thus, "interpretation" also happens naturally.

Finally, there is occasional need to execute similar translation actions in different phases (i.e. elasticity). In Section 2.4.3, we presented two examples of language features that could benefit greatly from elasticity. Normally, languages are designed so that a translator may generate an iterator loop test during compilation. Allowing dynamically computed step values forces run-time test determination, but also provides greater flexibility. Similarly, the design of Smalltalk requires dynamic message lookup. Modifying the design slightly to allow type declarations enables programmers to express their intent and allow more efficient translations.

Since most language implementations handle each translation phase separately, each phase may have distinct implementations of similar actions. Event-driven translation, however, attempts each instantiated action individually. As a result, elasticity occurs naturally. Furthermore, only one version of the. implementation is ever needed.

All of this should be evident from earlier discussions. As we hoped to demonstrate in building ELI, the event-driven model also forms a suitable basis for generating translation systems automatically. In the next section, we examine criteria for proving the practicality of implementing translation systems using the event-driven model and explore ways to achieve that potential. 81

6.2. Criteria of Practicality

A significant portion of this thesis has concentrated on describing the ELI generation system. We believe that the event-driven model ultimately can comprise a viable methodology for the translation of actual programming languages. Designing and building ELI constitutes just one step toward establishing the model's practicality. In this section, we present criteria by which practicality can be further judged.

6.2.1. Ease of specification

The first criterion is whether the techniques are easy to use in implementing a programming language. The existence of the ELI generator of translation systems goes a long way toward satisfying this criterion. The onus of proof, however, then falls onto the specification language. That is, unless it is easy for language designers to use the specification language, they will not use the generator.

In the ELI system, it is a relatively simple task to specify a programming language's semantics. Since we use a variant of attribute grammars, a known specification formalism, a language specifier need not acquire a deep understanding of the event-driven model. The variations affecting event binding times need only be used when the language specifier needs the power of event-driven translation.

As we noted in Section 3.2, attribute grammars possess several desirable characteristics. For instance, attribute grammars exhibit locality of reference (i.e. references to values are limited in scope). Also, attribute grammars are non-procedural, making no demands on the execution order of translation actions -- just what is needed by the event-driven model. These characteristics and the addition of auxiliary, models facilitate the specification of interesting language features.

In general, the language specifier should not need the special features of ELI's specification language for controlling binding times. The event-driven processor normally determines appropriate binding times. When necessary, however, a language implementor may use special features of the specification language to control the binding times of translation actions explicitly.

By using computational events, the specifier can adjust the granularity of translation actions and thereby affect the level of interdependency among events. Greater interdependency tends to delay event execution since more defined data is required. To achieve greater control, the specifier can attach explicit constraints to events indicating that one action should follow another or that an action should be executed in a specific pass or phase. (A constraint may in fact express any precondition, even involving run-time values, that must be satisfied before the semantic action will be executed.) 82

6.2.2. Ease of generation

The next criterion concerns the ease of building translators using the ELI generation system. Given a language specification, the ELI generator produces the data needed to control the translation engine and a parser specification. It is the responsibility of the language designer to build a lexor and a code generator that translates target stack machine opcodes into instructions executable on a real computer. Both of these tasks can also be automated (e.g. lex [-Lesk 75] for lexor generation and Graham-Glanville [GlanviUe 78] for automating code generation).

To be effective as a language design tool, the generator must not require excessive time to produce translators. After all, we want language designers to experiment with elasticity and explore the ramifications of event binding times. For the languages we have specified, generation time is not significantly longer than compilation of similarly sized modules -- under one minute elapsed time and 15-20 seconds CPU time on a SUN 3/160.

Once generation is complete, programs that perform translation must be built. We have written program skeletons for each translation phase (compilation, linking, and run-time). A language designer need only specify which phase, the input and output file name conventions, and the data file containing the generated language semantics. Thus, translator generation can be totally automatic.

6.2.3. Code generation

A third criterion is that generated translators produce usable code. In a language like Modula-2, usable code for a compilation unit consists of object code for each procedure and linking information for global references, such as exported variables and .

The ELI translation engine constructs target machine fragments for program semantics that depend on run-time values. As examples show in Section 6.3 below, generated translators can emit the fragments that result from the translation of a compilation unit. A small change allows the specification language interpreter to emit linking information as well. Therefore, to produce usable code, a relatively simple pro_vam could be built that transliterates stack machine opcodes and generated linking structure into machine-specific object files. We further demonstrate the code generation properties of constructed translation systems in the next section by implementing Modula-2 using the ELI system.

The event-driven translation model does not dictate the target machine. That is, the target machine need not be a stack machine. As a result, the run-time fragments composed during translation could just as easily be continuation values expressing the denotational semantics of the source program [Tennent 76].28 This implies, in particular, that our research can integrate with recent research that investigates the problem of generating code from semantics-directed language translation [Sethi 83, Appel 85].

Z_Note that the specification language may have to change to incorporate the specific characteristics of the target machine. 83

6.2.4. Practicality of generated translators

A fourth criterion judges the effectiveness of generated translation systems. This criterion can be measured along two dimensions: the resources (i.e. time and space) used during translation and the resources used by translations. In neither case should resource use become excessive relative to existing translator technology.

The time spent during translation equals the sum of the time taken executing semantic actions, the general overhead associated with using a common translation engine, and the overhead needed to decide which action to execute when. The first cost is necessary in any translation system. For the most part, our system is comparable to others. It incurs some added expense because semantic actions are interpreted and not compiled into "native" code. We avoid taking too much of a hit since we augment the target machine with auxiliary models that can be implemented efficiently (refer back to the introduction of Chapter 3).

The second cost, that of using a common translation engine, is negligible as it consists solely of a tree traversal algorithm. A positive evaluation for this criterion, therefore, depends on how well the ELI generator can predetermine the execution order of instantiated semantic actions, thus reducing the need to search for enabled events during translation.

The ELI generator determines an ordering for event execution using data interdependencies and explicit constraints. Since we base our specification formalism on attribute grammars, we can take some advantage of algorithms that calculate both local and global dependency graphs [Katayama 84]. Dependencies on partial aggregate values and the use of explicit binding time constraints, however, can make static event ordering impossible. Thus, the ELI generator can only construct a set of potential orderings. 29 The amount of searching required during translation is directly related to the number of potential orderings.

In practice, the translation engine spends little time searching for enabled events since the actual number of potential orderings tends to be small. In our experiments, the execution time of ELI-generated translators was not much worse (about 30-50%) than existing compilers on similarly sized compilation units. Upon investigation, we discovered that most of this additional cost results from the methe_l we use to store intermediate forms. Although this deficiency does not affect the scientific value of the thesis, it clearly limits the model's potential for practicality. A more efficient means of storing intermediate forms should be developed.

The other resource consumed during translation is space. Generated translators require storage only for intermediate forms -- the language semantics file contains the translation actions and event ordering information. Of primary concern are the maximum storage in use at any time and the total storage needed at the end of translation. The two may not be the same since syntactic representations often require more space than semantic data.

29We represent a set of potential orderings as a set of total orderings on disjoint subsets of instantiated events. A complete ordering is determined when the interdependeneies among these subsets become known. 84

In ELI-generated translators, as in other attribute grammar-based translation systems, maximum storage needs can become excessive. The problem arises from constructing the entire abstract syntax tree for source modules. One solution is to execute enabled events as soon as the tree nodes containing them are recognized by the parser. When exactly one pass is sufficient, only enough storage used by a single path in the syntax tree would be necessary. In practice, since programming languages tend to be linear in nature (i.e. humans must be able to read programs), many tree nodes would be pruned by the end of the first "pass", even though the translation of most languages requires more than one pass. Space requirements during translation should then become reasonable, especially since the ELI generation system encourages the use of aggregate semantic values, which share storage across all levels of a derivation tree.

Total storage demands reflect the semantic content of a compilation unit, including symbol table data and target machine code fragments. The semantic data produced by ELI-generated translators is typically within 20-30% of that produced by normal compilers that supply debugging information. A practical system would permit the separation of data relating only to execution semantics. For "compiled" languages, this is easy -- just keep the target machine fragments. For languages requiring run-time translation, however, other semantic data may be needed at run-time. To identify such data, the translation system must compute the data dependencies between target machine program fragments and other semantic data. The translators generated by the current ELI system do not perform this analysis.

The final evaluation measures the performance of translations produced by generated translators. Certainly, storage requirements of translated programs should not be affected by the means of translation. The language specifier/implementor decides whether a stack-heap storage model (as in C or Pascal) or a totally heap-oriented model (as in Lisp) should be used.

One might expect that the execution efficiency of translations follows similar reasoning. The availability of elastic translation, however, raises the possibility that programs translated using an event-driven system would be more efficient. In particular, translation actions executed always at run-time in normal systems, such as message lookup in Smalltalk, can sometimes be executed statically, as in the ELI implementation of Elastic-Smalltalk.

In addition, event-driven translation does not interfere with classical code optimization. In fact, event- driven processing may actually help since some optimization techniques must wait for certain conditions before making decisions (e.g. register allocation performed after computing usage counts [Wall 86]). The potential benefits of using an event-driven model for optimization should be investigated further; we refer to this issue again in the next chapter when we discuss future research directions. 85

6.3. Implementing Modula-2

We illustrate the potential practicality of event-driven translation by implementing an existing programming language -- Modula-2. 3° We show that the ELI specification language is easy to use, even for a language like Modula-2 that constrains translation binding times. More importantly, we demonstrate the code generation possibilities of ELI-generated translators, including the generation of linking information.

The specification of Modula-2 in the ELI specification language appears in Appendix F. We quote excerpts below to describe the code generation properties of the generated translation system. The specification itself is approximately 2500 lines long without comments. The domain definitiions (for symbol table entries) and terminal/nonterminal declarations comprise about seven percent of the specification. The construction of the standard prelude (i.e. primitive types and operations) consumes another ten percent. The remainder consists of the attribute grammar, which contains almost 120 productions. 31 The ELI system takes 45 seconds elapsed time, 20 seconds CPU time, to perform event ordering analysis and to generate the language semantics file. 32

Note that the specification for Modula-2 is about third longer than the one for Elastic-Smalltalk. We believe that this difference reflects the relative complexity of the two languages. Elastic-Smalltalk presents a smaller number of concepts to the programmer even though the capabilities of the two languages are similar (e.g. static type checking). MODULE program;

MODULE inner;

IMPORT x;

END inner;

VAR x: INTEGER;

END program.

Figure 6-h A Legal Forward Reference

Most translation of Modula-2 programs occurs, by design, during the compilation phase. The bulk of event binding time determination, therefore, involves computing a correct ordering for execution. The scope rules for identifiers in Modula-2 make this determination somewhat interesting. The rules state that the scope of a declared identifier encompasses the entire block in which the declaration occurs. Thus, a reference to an identifier may precede its declaration legally, as in Figure 6-1.

3°WewillnottakeanyspaceherereviewingModula-2'ssemantics.ForthosereadersunfamiliarwithModula-2,thelanguageis a closecousinof Pascal. WereferreadersinterestedinlearningmoretoProgramminginModula-2byNiklausWirth[Wirth82]. 31TheextendedBackus-Naurgrammarin Wirth'sModula-2book usesabout65 productions[Wirth82]. Thisis not a trulyfair comparisonsincetheextendedBNFallowsregularexpressionsb, utourformalismdoesnot. 32Timesareona single-userSUN3/160. 86

Such references must wait until the enclosing block has been reached before noting an error. Figure 6-2 shows the specification of an event for identifier lookup that waits for a declaration until the end of compilation before producing an error message. If and when a declaration occurs, the associated symbol table entry is passed up to the language construct containing the reference.

-- Wait for identifier to be visible in current scope. -- If compilation phase terminates without declaration for identifier, then note error. simple-re ference<> ::= IDENTIFIER<< ^ name>>

LOCALS found: Boolean; reference: Symbol;

Sin_leReference : ONEOF -- Precondition to execution of event is that identifier is declared in the given scope. PRE found WHERE found, reference := LookupName (id-scope, name) ; BEGIN END

-- Only note an error if end of phase arrives. PRE PASS() = LASTPASS BEGIN NoteError ("Undeclared identifier", GetPosition (i), TRUE) ; END ENDONEOF ); Figure 6-2: Simple Identifier Reference

The scope rule differs, however, for an identifier used in a declaration. In this case, an identifier's declaration must precede any declarations that reference it. Pointer type declarations constitute the only exception to this rule; that is, forward references are allowed in pointer type declarations. To elucidate, the array type declaration in Figure 6-3 is not legal while the pointer type declaration is. Thus, the semantic action for a simple identifier reference suffices for the latter case, subject to the discussion in Section 2.4.1. The specification of non-forward declaration identifier references appears in Figure 6-4.

MODULE program;

(* NOT LEGAL ! ! *) TYPE arrayType = ARRAY [I .. I0] OF forwardType;

(* LEGAL ! ! *) TYPE pointerType = POINTER TO forwardType;

TYPE forwardType = (red, yellow, green);

END program.

Figure 6-3: Forward References to Type Identifiers

Otherwise, the specification of Modula-2 is fairly straightforward. We therefore turn now to describing the code generation properties of the resulting Modula-2 implementation.

For each compilation unit, a normal Modula-2 compiler produces object code for each procedure, object 87 code for module initialization, and linking information. The linking information describes the semantic data required from imported modules not available from the module interfaces: offsets of global variables and addresses of exported procedures. Our implementation must produce similar information.

-- If identifier has no declaration, note error immediately. declaration-reference<> ::= IDENTIFIER<< ^ name>>

LOCALS found: Boolean; reference: Symbol;

DeclarationRe ference : BEGIN found, reference := LookupName (id-scope, name) ; IF NOT found THEN NoteError ("Undeclared identifier", GetPosition (i), TRUE) ; ENDIF; END };

Figure 6-4: Declaration Identifier Reference

As mentioned earlier at the end of Chapter 4, we can identify the target machine program fragments that remain at the end of a translation phase. These program fragments correspond to the object code for procedures and module initialization. The remaining part, then, involves generating linking information.

In an event-driven system, unexecuted events that remain after the initial "compilation" phase depend on such "linking information". For example, code generation involving an imported global reference must wait until the global reference becomes defined, typically at link-time. To generate true linking information, then, we must capture the undefined values from other compilation units and convince the translation engine that these values are actually "defined", allowing compilation to proceed. Finally, at the end of compilation, we must tabulate all references to undefined imported values and the references in the object code to those values. We must also tabulate the definitions of values that might be required by other modules.

We have made a small number of modifications to ELI's translation engine to enable the behavior described above. In this "normal compilation mode," the translation engine generates a unique link name for any aggregation's field not defined by the end of the compilation phase. References to such a field during the compilation of another module allows the reference to be logged in a table; a specific target machine code fragment that depends on that reference can then be generated. Multiple uses are associated with the same entry in the table.

When, during the compilation of a module's implementation, a link value is defined, the translation engine logs an entry in another table that expresses the actual values for exported global references. It is these values that replace the link references in generated code fragments of importing modules during the linking phase.

Figure 6-5 illustrates the results. Module Defines exports two global variables and one procedure. Module Uses imports the global references from Defines and initializes one of its variables. The stack 88

DEFINITION MODULE Defines;

EXPORT x, y, z;

VAR x, y: INTEGER;

PROCEDURE z () ;

END Defines;

Defining Module

DEFINITION MODULE Uses;

END Uses;

Using Modde's In_ff_e

IMPLEMENTATION MODULE Uses;

FROM Defines IMPORT x, y, z;

BEGIN x := y; z_); END Uses;

Using Mod_e'slmplemen_fion

Requires: Defines 0 L430 -- The use location Defines 1 L450 L470 Defines 2 L510

Defines: Uses 0 L430 -- Label for Uses initialization _Uses_l %GLOBAL+0 -- Base for Uses globals (and needs)

L430 LINKREF Defines 0 -- Label for Defines initialization L440 CALLSTK 0 -- Call Defines initialization L450 LINKREF Defines 1 -- Base address of Defines globals L460 IADDI 0 -- Compute address of x L470 LINKREF Defines 1 -- Base address of Defines globals L480 IADDI 1 -- Compute address of y L490 DEREF -- Contents of y L500 POPSTK -- Store contents at address of x L510 LINKREF Defines 2 -- Label for z L520 CALLSTK 0 -- Call z (no parameters) L530 RETURN -- Exit Uses initialization

Gener_ed Code

Figure 6-5: Code Generation Example 89 machine code produced by the generated translation system shows the two link tables and the sequence of opcodes for the implementation of rJsea's module initialization.

6.4. Other Applications

We can enhance the practicality and utility of event-driven translation by applying the model to other domains relating to program translation. In Chapter 5, we argued that event-driven translation can aid in the process of designing programming languages. In this section, we describe the potential effect of event-driven processing on two more concrete examples: re-compilation determination and syn_x-directed editors. By doing so, we establish additional evidence as to the effectiveness and value of our model in the domain of language translation.

6.4.1. Determining Re-compilation

One problem that could benefit from event-driven translation involves determining when a program module must be re-compiled for a language that permits separate compilation. Strategies currently in use may compile some modules needlessly since the granularity of the information on which they base their decisions is too large.

In many large programs, a few modules tend to contain declarations essential to the rest of the system. Although programmers often choose compiled languages because of the modularity and static checking provided, great inertia develops against changes to these "central" modules because of the re-compilation required [Teitelman 84]. By reducing the number of needless compilations, the cost, and therefore the inertia, associated with such changes can be reduced. We believe the event-driven translation model can provide control over the granularity of information used to base decisions.

To illustrate the problem, consider a system composed of four modules: Root, ImportDireet, Intermediary and ImportIndirect. As shown in Figure 6-6, both ImportDirect and Intermediary import Root and ImportIndirect ffnpo_sIntermediary. The problem, then, involves determining which modules must be re-compiled when a change is made to the sourc.e code of Root. Depending upon the nature of the change, different answers are appropriate: • If the change corrects the spelling of a comment, no modules must be re-compiled. • If the string terminator character is changed, only Intermediary need be re-compiled. • If the representation of String is modified to include an initial character to maintain a String's length, all other modules must be re-compiled.

The essential observation to note is that, in each case, a module must be re-compiled only when the re-compilation of an imported interface modifies information on which the compilation of that module depends.

The most commonly used strategy re-compiles a module when the result of its last compilation is out of 9O

DEFINITION MODULE Root;

EXPORT MaxLength, String, Terminator, Equal;

CONST MaxLength = 32; Terminator = CHR(0);

(* Each String contians Terminator after its last character *) TYPE String = ARRAY [0 .. MaxLength] OF CHAR;

PROCEDURE Equal (Strl, Str2 : String) : BOOLEAN;

END Root.

IMPLEMENTATION MODULE ImportDirect;

FROM Root IMPORT String;

VAR FileName : String; °°°

END ImportDirect;

DEFINITION MODULE Intermediary;

FROM Root IMPORT String, MaxLength, Terminator;

EXPORT IllegalNameCharacter, NameTable, EnterName;

CONST IllegalNameCharacter = Terminator;

(* Shmegegie proves half maximum name length optimum for hash table *) TYPE NameTable = ARRAY [0 .. MaxLength DIV 2] OF String;

PROCEDURE EnterName (Name : ARRAY OF CHAR; VAR InTable : NameTable) ;

END Intermediary;

IMPLEMENTATION MODULE ImportIndirect;

FROM Intermediary IMPORT NameTable;

VAR SymbolTable : NameTable; °o.

END ImportIndirect; Figure 6-6: DeterminingRe-compilation

date with respect to any of the files representingthe interfaces it imports(e.g. make of Unix 33 [Feldman 79]). Under this scheme, all of the modules in our sampleprogram would be re-compiled regardless of tile natureof the change. One would hope to do much better.

33Unix is a registered trademark of Bell Laboratories. 91

Another strategy is used for the language GNAL at Tartan Laboratories. After each compilation, the previous and new results are compared. The changes, if any, are classified as either insignificant, significant or extensional. If insignificant, no other modules must be re-compiled. Similarly, if the changes are significant, all modules that import the compiled module are tagged to be re-compik_d. When the changes constitute an extension, however, only those modules that use the extension (and therefore must have changed themselves) must be compiled. 34

This process is repeated after each compilation to eliminate as many additional compilations as possible. Although better than the "make" approach, this approach has two major drawbacks. First, the rules concerning extensions are very restrictive (e.g. new variables must be declared after all existing variables). Second, no distinction is made as to the actual nature of a significant change. Changes to a constant declaration force the re-compilation of modules that import the interface, even those that do not use the constant.

Both strategies fail to decide correctly whether a module actually requires re-compilation because they consider only that something changed instead of what changed m in particular, change propagation occurs at module granularity. A third approach utilized in the SAGA system (Software Automation, Generation, and Administration) attempts to make decisions at a finer granularity.

The SAGA system maintains the current parse tree for each compiled module. After re-compiling a module, the system can compute the differences between the previous and current parse trees. Based on those differences and computed dependencies between modules, the system can then select only those modules that require re-compilation. Re-compilation is performed incrementally; that is, only the individual procedures that depend on changed information are re-compiled [Campbell 84].

Although this method concentrates more on what changes and performs incremental re-compilation, it still can re-compile needlessly. Syntactic changes that do not alter semantic information can trigger re-compilations. For example, a change to a data type definition may not change its size -- thus, translation actions that depend only on the type's size need not be re-compiled.

Recall the observation we noted above: A module must be re-compiled only when the re-compilation of an imported interface modifies information on which the compilation of that module depends. Thus, to eliminate needless compilations, one must determine exactly those modules dependent on the specific data that changes in a re-compiled interface as well as the semantic translation actions within those modules.

To do this, the compilation of a module must keep track of exactly what information it requires fi'om each imported interface. Event-driven translation can alleviate this task. The set of semantic actions instantiated for the translation of a program module depend on a fixed set of semantic data. Some of this data will be computed by executed events during translation. The data of interest, however, comes from other modules.

34Presentedina seminaratCMUbyJohnNestor. 92

During compilation, an event-driven translator can maintain the specific semantic data required from imported interfaces as well as the sources of the data. Similarly, semantic data available by the end of the compilation of an interface constitutes potential "required" data for other modules.

To decide, then, whether to re-compile a given module that imports a re-compiled interface, one need only compare the semantic data computed by the interface's previous compilation with that produced by the re-compilation. The given module must be re-compiled only when the specific data it imports from the interface changed as a result of its re-compilation.

Within the module, only those semantic actions that actually depend on the changed data must be re-executed. An event-driven system can propagate changed semantic information easily since the action interdependencies are the same as for translation.

This method of determining re-compilation provides benefit only when the cost of comparing computed semantic data does not exceed the savings from avoiding unnecessary translations. Finer grained decisions require the maintenance of more data structures and more expensive comparisons. Campbell and Kirslis do not discuss the resource requirements for maintaining parse trees and performing tree comparisons. It is important to ascertain the break-even point.

The appropriate unit of re-compilation must also be determined. As we see in the next section, some researchers have investigated minimizing attribute change propagation for attribute grammars in the area of syntax-directed editing [Reps 83]. It may turn out, however, that procedures constitute a more suitable unit for incremental re-compilation, as in SAGA.

6.4.2. Syntax-directed Editing

The development of syntax-directed editors constitutes another area of language translation that could benefit from event-driven processing. Syntax-directed editors allow the editing of programs while concurrently performing several translation-related tasks, such as lexing, parsing, semantic checking, code generation, and, perhaps, interpretation. 35 Thus, a programmer may compose programs more efficiently since the editor can insert entire, language-specific source code fragments. More importantly, the programmer can be notified interactively of syntactic and semantic errors as he composes his programs [Medina-Mora 81, Morris 81].

The current state of syntax-directed editing goes beyond just editing programs in a known language. These editors are now essential components of integrated programming environments. In the simplest situation, the editor generates an object module for the edited program unit. For more sophisticated languages, the editor even forms a part of an interpretive, run-time environment [Kaiser 86]. Recent research has also investigated the potential of generating such editing environments automatically [Reps 84].

35One possible technique for integrating interpretation and editing was proposed for the initial Comell Program Synthesizer [Teitelbaum 81]. 93

An event-driven model for implementing syntax-directed editors is appropriate for several reasons. First, as this thesis demonstrates, the event-driven model can support program translation, including the implementation of run-time semantics.

A second, stronger reason arises from the incremental nature of editing a program [Schwartz 86]. At any time, the editor has only a partial program available on which to perform its tasks. In particular, the missing pieces may be anywhere and completed in any order. Instantiated translation actions can depend on data from either earlier or later in the corresponding program text. An event-driven translator would execute enabled actions resulting from additional information provided by the programmer. No special arrangements are necessary to accommodate incremental input of programs.

A third reason concems the ability to alter existing program fragments during editing. Existing systems maintain copies of each program's parse tree and usage counts so that comparisons can allow the re- translation of affected program units (usually at the procedure level) [Morris 81, Schwartz 86]. 36 A more ambitious approach calculates exactly which semantic values are affected and executes only those translation actions needed to recompute those values [Reps 83].

Theoretically, by maintaining the instantiated translation events and their data interdependencies, an event-driven processor could perform similarly. An editing change that parses correctly would change a subset of the semantic data available from the program text and possibly a subset of the instantiated events. Using the interdependencies, an event-driven processor could re-execute only those semantic actions that depend on the changed data. Thus, it might be possible to perform the minimum ntmaber of recomputations.

The recomputation method described by Reps, et. al. is based on translators built from attribute grammars. In addition, they use attribute grammars as the specification formalism for generating syntax- directed editors automatically [Demers 81, Reps 84]. Our last reason for proposing the event-driven model for syntax-directed editors, then, is a practical one.

As we have demonstrated in the ELI system, the attribute grammar formalism can suffice for the automatic generation of event-driven translators. Since attribute grammars also suffice for the generation of syntax-directed editors, it should be a fairly easy task to generate event-driven syntax-directed editors using both technologies. We believe that such additional potential uses of the event-driven model enhance its overall value as an implementation strategy for language translation activities.

36The Magpie system even allows incremental link/loading of procedure translations [Schwartz 86]. 94 Chapter 7 Conclusions

In previous chapters, we addressed several problems relating to programming language design and implementation. Concerns over the binding times of translation actions constitute a common thread among the problems. This dissertation represents the first focused attempt to solve these problems by viewing binding time determination as the key to these problems and formulating a translation model that alleviates the responsibilities associated with binding time determination.

In this chapter, we review the problems and the motivation that led to our solution based on eve,nt-driven translation. We then discuss the advantages of the model with respect to the design and implementation of programming languages. In particular, we explore the ramifications of automatic generation of event- driven translators.

The next four sections present our major conclusions. Then, in Section 7.5, we summarize several directions by which this research can be extended. Finally, we conclude with some observations concerning what we tried to accomplish.

7.1. Commonality of Binding Time Determination

Initially, we examined the event-driven model for translation as a result of our investigation into the design of command programming languages. Further scrutiny led to the realization that many design and implementation problems relate to the determination of translation action binding times. Thus, our first conclusion is that binding time determination provides valid and important insights into such problems.

Implementation problems pertinent to this conclusion include the ordering of translation action execution (especially the need for special back-patching structures in "near one-pass" compilers) and phase determination (i.e. compile-time, link-time, run-time, etc.). Clearly, these problems involve binding time determination.

We also discussed some language design issues relating to language flexibility. The intent of some pre-emptive decisions, when examined, is to permit early binding times for certain translation actions. For instance, a language that requires a constant step value in a FOrt statement allows compile-time generation of an efficient loop exit test.

95 96

We then considered several design changes that would enhance separate compilation at the cost of potentially executing more translation actions during link-time. Hiding type representations in implementation modules, for example, either delays some code generation until link-time or requires the use of a specific memory model, which would involve a decision at language definition time.

But what about our first motivation? As we noted earlier, command languages epitomize the conflicting inlluences between language flexibility and execution efficiency. Flexibility frequently requires late binding times while efficient translations arise from early, static binding times. One way to resolve this conflict is to design flexibility into a language and allow the programmer to provide information that a translator can use to generate efficient translations (i.e. elasticity).

The concept of elasticity describes the behavior of a translator that incurs the cost of late binding times only when the flexibility is used. Elasticity typifies the issues concerning binding time determination, since the binding time of each translation action must be calculated individually. By concentrating on implementing elasticity, we have solved all of our related problems.

7.2. Event-driven Translation

An implementation strategy that enables elasticity need not solve all of the described binding time problems; it might not deal with determining execution ordering, for instance. The strategy we advocate in this thesis, however, does. Our second conclusion is that event-driven translation constitutes an appropriate model for solving problems related to binding time determination.

Event-driven systems independently decide when each active event becomes enabled. Thus, an event- driven translator determines the binding time of each instantiated semantic action individually, not according to the action's class. Different occurrences of type checking, for example, may be executed at very different binding times, even during different binding time phases. An existing language, EL1, requires such behavior.

In addition to furnishing elasticity, event-driven translation solves the other binding time problems. Since no event is executed until its enabling conditions (i.e. data dependencies) become satisfied, action ordering and phase determination occur naturally. No special "back-patching" structures or "half-passes" are needed.

Of course, unless one can build actual translators based on this model, our thesis would have little practical significance. In the next section, we discuss our conclusions concerning the suitability of the event-driven model for implementing translators.

The event-driven model may also present potential improvements in domains related to translation, such as computing minimum re-compilation and implementing syntax-directed editing. Section 7.5 below considers these possibilities as well as other directions for future research. 97

7.3. Generation of Event-driven Translators

The event-driven model helps language designers and implementors to quantify and solve translation problems in a theoretical framework. We have also built a generation system that automatically constructs functional event-driven translators from reasonably "normal" specifications. Based on the specific strategy we took, we conclude that there is excellent potential that event-driven translation systems can be practical.

First, by building a generation system, we removed the onus of constructing an event-driven processor from the language implementor. His major task, then, consists of specifying the semantic actions that constitute the events that must be executed to translate programs.

Second, we exploit an established formalism, which alleviates the task of specification. In particular, it is fairly straightforward to specify languages using the attribute grammar formalism. The specifier' need not delineate events explicitly or even know about event-driven translation.

As a result, we have demonstrated that producing event-driven translators can be practical. The generated translators, however, must also be practical.

Using attribute grammars allows us to take advantage of known, applicable techniques for improving the performance of generated translators. In addition, event-driven translators can generate object code for standard languages in a familiar form, as we showed in our implementation of Modula-2 using ELI (see the end of Section 6.3). Thus, the potential for practical event-driven translatorsexists.

Achieving acceptable practicability, though, will require more research. If we wish to continue using an approach based on attribute grammars, new techniques must be developed to improve event ordering analysis and to reduce translator storage requirements. We discuss other options below.

7.4. Impact on Language Design

Our last, and most controversial conclusion, concerns the impact of binding time analysis and elasticity on programming language design. In particular, we believe that many existing designs either restrict flexibility to allow early binding times or provide flexibility without regard for efficiency. New language designs should take advantage of the elasticity that event-driven translators can yield to provide flexibility when needed and incur any associated cost (i.e. dynamic binding times) only when used.

Elasticity can open up the design of practical programming languages. The concept is not new; the language EL1, for example, was proposed over ten years ago [Wegbreit 74]. Jones and Muchnick also advocated the construction of language processors that provide elasticity. In their words, a processor should implement ... each program in the most efficient manner ... from complete interpretation to ... highly efficient compiled code, depending upon the usage of language features in each program. They employed flow analysis techniques to achieve elasticity [Jones 76]. 98

More recently, the principles of elasticity have been applied to existing languages known for their flexibility. The design changes allow programmers to provide additional information so that errors can be detected statically and more efficient translations can be generated (CommonLISP [Steele 84] and modifications to Smalltalk [Boming 82]). As evidenced by Chapter 5, we support such modifications, but prefer, for consistency reasons, to design "elastic" languages from scratch.

A language that incorporates these principles enables the development of programs by "stepwise refinement". The programmer builds prototype software utilizing the interpretive capabilities of the language. Then, to transform the program into an efficient application, the programmer "systematically ... eliminate[s]" late binding times that affect its performance. A truly helpful language processor could even facilitate this last task by identifying exactly which uses of language features cause interpretation [Jones 76]. Event-driven translators can make such languages feasible and practical.

Language features that could provide benefit if designed to allow elasticity include: • Type definition and Woe checking -- As mentioned in Section 1.1.4, dynamic type checking and definition of nominal data types would expedite the construction of programming environments, especially for interpreted languages. Static type checking, in addition to generating more efficient object code, permits early error detection. • Array allocation -- Dynamic array allocation provides great flexibility when a maximum size is unknown or prohibitive. Statically sized arrays, of course, enable more efficient accesses. • Storage allocation -- Most storage in programs need not be allocated dynamically. When necessary, though, the recovery of dynamically allocated storage may be explicit (as in Pascal) or implicit (as in LISP). Explicit deallocation distributes the cost of recovery throughout program execution. Implicit recovery (i.e. garbage collection), on the other hand, provides safety but may require inordinate amounts of time at inconvenient moments. • Procedure invocation m Procedure variables allow the creation of "generic" operations. For example, a sort routine might accept an ordering predicate as a parameter. Thus, the same routine could sort in increasing or decreasing order, as well as sort based on various disparate criteria. The use of overriding routine implementations in inheritance-based languages also forces run-time resolution of procedure values, although the work involved can vary greatly (see discussion in Chapter 5). Static determination of procedure addresses is generally more efficient.

• Interpretation -- Several applications, especially those in the domain of artificial intelligence, would benefit greatly from the ability to build and execute programs at run-time. Statically translated programs, though, execute faster.

We discuss other research directions in the next section.

7.5. Future Directions

The work of this thesis can be carded forward in several different directions. First, the ELI generation system demonstrates the potential of generating practical event-driven translation systems automatically. One research direction involves fulfilling that potential.

Second, language design comprises a significant motivation for the development of the event-driven translation model. Another research direction, then, consists of enhancing the reporting capabilities of the ELI generator and exploring the impact elasticity has on language design. 99

Last, as we discussed above, other applications related to translation might also benefit by applying the event-driven model to their implementation. Each application represents another research direction. In the sections below, we describe each of these directions in more detail.

7.5.1. Generation of practical translators

Several ways exist to improve the practicality of event-driven translator generation. We could choose a different specification language or a different target machine. The analysis performed by the generator itself could be more elaborate. The most important improvement lies in enhancing the translation engine's performance. Finally, code generation for actual computers could be incorporated.

A different specification language might alleviate the specifier's tasks. Another formalism, such as denotational semantics [Tennent 76], might allow easier expression of a language's semantics. Of course, it must be possible to delineate events from specifications. Furthermore, the formalism must allow the specifier to define event binding times and event granularity.

Choosing a different specification language or target machine might also simplify the generator's tasks, especially binding time analysis. The more information the generator can provide to the translation engine concerning event interdependencies, the faster the translation engine can execute. A different target machine may also facilitate code generation (e.g. some work has been invested in generating code from denotational semantics [Sethi 83, Appe185]).

The existing ELI generator based on attribute grammars certainly operates fast enough. The analyses it performs, however, could be enhanced. Currently, the generator only takes advantage of local dependencies in the input specification grammar. Although our modifications to the formalism complicate global dependency determination, it should be possible to glean some binding time information from a global analysis of the specification. In addition, global analysis could furnish the language spex:ifier with important information concerning the potential binding times of the semantic actions needed to la'anslate a proposed language feature.

As we noted, more event interdependency knowledge can help the translation engine. More work should also be invested in improving the performance of the translation engine itself.

If we continue to use a translation strategy based on attribute grammar evaluation, we should explore the applicability of known techniques for fast evaluation using minimal space in the event-driven domain [Jazayeri 81, Farrow 84]. In particular, since abstract syntax tree nodes may persist throughout the translation of a program, the possibility of caching the nodes should be investigated.

Another approach would be to use a different processor for the translation engine. Several architectures exist that execute semantic actions in an event-driven manner, including the data flow and production system models. It may be that one of these architectures allows greater efficiency in execution :speed and storage requirements. The generator, of course, would have to be altered to produce the appropriate event interdependency information for any new translation engine. 100

Finally, our methodology may be improved by incorporating true code generation and optimization into the translation engine. At that point, we could compare generated event-driven translators against existing compilers for standard languages.

7.5.2. Language design

As we concluded above, event-driven translation can affect, and hopefully improve, language design,;. Translator generation systems can also provide valuable information to language designers. The existence of a reasonably fast generator, the EL1 system, proves that the information provided can be of use during the design process.

The information provided, however, must be of some value. When dealing with elasticity, the consequences of delayed semantic actions may not be obvious. Thus, it is essential that the generator of event-driven translators inform the specifier about event interdependencies as well as the potential binding times of semantic actions. In the ELI system, additional global dependency analysis is necessary to determine such information. One could also investigate what other practical information a generator might compute for the language designer.

By facilitating the implementation of elasticity, we hope to design more useful programming languages. To achieve this goal, each potential language feature design should be examined to determine if it could benefit from elasticity. We presented several possibilities in Chapter 2 and in Section 7.4 above, but others should be explored.

Given our motivations, we should also ascertain whether a language design based on elasticity can reconcile the flexibility/efficiency conflicts inherent in command programming languages. It may be that all language designs that provide adequate flexibility cannot achieve the required efficiency, even with elasticity. Also, the flexibility/efficiency conflict may not even represent the essential problem associated with command language design. We should resolve these issues by exploring the impact of "elastic '_' features in application command languages.

7.5.3. Other applications

In Section 6.4, we described two applications related to language translation that could profit from event-driven processing -- determining re-compilation and syntax-directed editing. In this section, we, review what is necessary to ascertain the model's value for these applications.

To determine exactly which modules of a program must be re-compiled as a result of changes to a module interface, a successful strategy must compute which modules actually depend on the information that changed. In an event-driven system, this reduces to determining which modules contain translation actions that depend on the changed semantic data produced by the translation of the module interface.

Thus, a representation for the semantic data of a translated module must be developed that allows the 101 determination of differences from one compilation to the next. Then, if a translated module keeps track of its initial set of instantiated events, the modules containing events dependent on the data within the difference must be re-compiled.

Once implemented, this strategy must be tested to learn the optimal granularity for re-compilation, from the module level to individual events. Also, one should check whether the cost of maintaining the extra information and of calculating the differences and dependencies outweighs the savings of fewer re- compilations.

A similar strategycould be used in syntax-directed editing to link instantiated translation actions with the semantic data on which they depend. Then, an algorithm for re-executing affected actions and propagating new semantic changes to other actions should be developed. The objective is to execute only those actions affected by editing changes, that is, to minimize the cost of incremental translation. There is even a potential for generating event-driven syntax-directed editors automatically given similar capabilities for event-driven translators (presented here) and syntax-directed editors [Reps 84]..

7.6. Concluding Thoughts

Most recent research into programming language design concentrates on new models for computation, such as logic, object-oriented, and . In time, these formalisms will influence the design of practical, everyday languages. Some might argue that the advent of faster computers will eliminate the efficiency concerns associated with interpreted languages, but this has yet to happen. Furthermore, increasingly complex interactive applications have tended to consume expansions in computing capacity.

In this dissertation, we have taken a different approach. We have attempted to quantify one area of language design that affects practicality. We have demonstrated that many problems can be viewed from the point of view of binding times. Other formalized views of language design may also provide beneficial insights. For example, feature orthogonality is generally recognized as a desirable trait. It would be quite interesting if orthogonality could be modeled somehow.

In addition to the scientific contributions of this thesis, therefore, we believe we have a social contribution as well m that well-known design "principles," such as avoiding pre-emptive decisions, achieving orthogonality, and providing expressiveness of intent [Hilfinger 81], should be quantified formally to allow language designers to compare different design decisions. 102 Appendix A Specification Language Syntax

Start s_mbol, attribute-grammar

Productions :

attribute-grammar : := DOMAINS [ domain.declaratio]n* TERMINALS [ terminal-declaratio]*n NONTERMINALS [ nonterminal-declaration ]* START symbol-Mentifier CONSTANTS [ constant-definition ] * INITIALLY [ statement]* GLOBALS [ attribute-declaratbn ]* PRODUCTIONS [ production]*

domain-declaration : := identifier ..... domain "; "

domain : := domain-identifier LIST OF domain-identifier STACK OF domain-Mentifier SCOPE OF domain-identifier SCOPESTACK OF domain-identifier domain-identifier-list "->" [ domain-identifier-list ] -- lambda expression signature domain-identifier ..... domain-identifier -- map key and value AGGREGATION [ " (" identifier-list ") " ] -- list of variant names [ field-declaration ] * [ variant-options ] * END

declaration : := identifier-list " : " domain-identifier "; "

kind-options : : = WHERE kind-identifier "=>" [ field-declaration ]*

terminal-declaration : : = symbol-identifier [ "<< ...... [ domain-identifier-list ] ">>" ]

nonterminal-declaration : := symbol-identifier "<<" [ domain-identifier-list ] ..... [ domain-identifier-list ] ">>"

constant-definition : : = identifier "=" expression "; "

production : := symbol-identifier "<<" [ identifier-list ] "^" [ expression-list ] ">>" "::=" [ rhs-symbol ]* [ LOCALS [ attribute-declaration ]* ] . {" [ computational-event ] * "} .... ; "

103 104 rhs-symbol : := symbol-identifier [ "<<" [ expression-list ] ..... [ identifier-list ] ">>" ] I character computational-event : : = label-ide.tifier":" ONEOF [ action]* ENDONEOF I label-identifier ": ', action action : := [ PRE expression [ WHERE [ statement ]* ] ] BEGIN [ statement ]* END statement : : = identifier-list ": =" expression "; " I IF expression THEN [ statement ] * elseif-clause ENDIF "; " I expression " (" [ expression-list ] ") .... ; " I RETURN [ expression-list ] "; " elseif-clause : := [ ELSEIF expression THEN [ statement ]* ] [ ELSE [ statement ]* ] expression-list : := expression [ ", " expression ] * expression : : = TRUE FALSE integer string identifier unary expression expression binary expression "(" expression ")" expression " (" [ expression-list ] ") " LAMBDA "(" [ paranmter-declaration ]* ") ..... >" [ domain-identifier-list ] ";" [ LOCALS [ attribute-declaration ]* ] BEGIN [ statement ] * END unary ::= "+" l .....I NOT binary : : = "+" I ..... I "*" I "/" I MOD I AND I OR I "=" I .... " I "<" I "<=" I ">" I ">=" identifier-list : : = identifier [ ", " identifier ] * Appendix B Operation Signatures

Model O_erations

Binding time expression PASS() -> Integer -- Returns the current pass PHASE () -> String -- Returns the current phase AFTER(label-name) -> Boolean -- TRUE iff labelled event executed IsStatic(any-domain) -> Boolean -- TRUE iff value does not depend on machine state

Machine state Execute (Child: Integer) -> Execute (Child: Integer; domain-name) -> domain Execute (Goto : Label) -> Execute (Goto : Label; domain-name) -> domain GetLabel (Child: Integer) -> Label PushValue (Value : any-domain) -> PopValue (Size: Integer; domain-name) -> domain CopyValue (Size: Integer; domain-name) -> domain Dereference (Size: Integer) -> Call (LambdaValue : lambda-type) -> Call (EntryPoint : Label) -> Call () -> -- Call Label on top of stack Enter(ParmSize, LocalSize, StaticLevel: Integer) -> Local (Offset : Integer; NestStatic: Boolean; Level: Integer) -> Memory CallReturn (SizeOfReturnValue: Integer) ->

Name FindNameIndex (Identifier: String) -> Name

Position MakePosition(File: String; Line, Column: Integer) -> Position NoteError(Error: String; Posn: Position; Severity: Boolean) ->

List NewList (domain-name) -> domain-List ListLength (i : domain-List) -> Integer AppendElt (i : domain-List; Elt : domain) -> domain-List AppendLists (Onto, What : domain-List) -> domain-List ListHead(l : domain-List) -> domain ListTail (i : domain-List) -> domain-List

Stack NewStack (domain-name) -> domain-Stack StackDepth (s : domain-Stack) -> Integer PushElt (s: domain-Stack; Elt : domain) -> domuin-Stack TopElt (s: domain.Stack) -> domain PopElt (s : domain-Stack) -> domain-Stack

105 I06

Store NewStore () -> Store FindOffset(s: Store; Size: Integer) -> Integer NextOffset (s: Store) -> Integer FreeOffset(s: Store; Offset, Size: Integer) ->

Scope NewScope (domain-,ame) -> domain-Scope AddName(s: domain-Scope; n: Name; Entry: domain) -> FindName(s: domain-Scope; n: Name) -> Boolean, domain

ScopeStack NewScopeStack (domain-nulr,_) -> domam-ScopeStack EnterScope (s: domain-ScopeStack; Open: Boolean; Initial: domain-Scope) -> domain-ScopeStack LeaveScope(s: domain-ScopeStack ) -> domain-ScopeStack, domain-Scope DeclareName(s: domain-ScopeStack; n: Name; Entry: domain) -> Boolean LookupName(s: domain-ScopeStack; n: Name) -> Boolean, domain

Aggregation NewAggregation (domain-name; aggregate-kind) -> domain-Aggregation IsKind (o : domain-Aggregation; aggregate-kind) -> Boolean HasProperty (o : domain-Aggregation; aggregate-property) -> Boolean GetProperty (o : domain.Aggregation; aggregate-property) -> property-type SetProperty (o: domain-Aggregation; aggregate-property; Data: property-type) ->

Map NewMap (domain-name) -> domain-Map IsMapEmpty (m: domain.Map) -> Boolean IsKeyPresent(m: domain-Map; Key: key-type) -> Boolean AssociateData(m: domain-Map; Key: key-type; Data: data-type) -> domain-Map GetMappedData(m: domain-Map; Key: key-type) -> data-type Appendix C Elastic-Smalltalk Syntax

Start symbol : c-unit

Productions :

c-unit : : = class-interface I class-implementation

class-interface : : = INTERFACE identifier INHERITS class-identifier [ USES class-identifier-list ] CLASS_MESSAGES [ message ]* INSTANCE MESSAGES [ message ]* END identifier

class-identifier : • = identifier

class-identifier-list " :- class-identifier [ ", " class-identifier ] *

message : "= MESSAGE identifier " (" [ class-identifier-list ] ")" [ ":" class-identifier ]

class-implementation : : = IMPLEMENTATION identifier [ USES class-identifier-list ] REPRESENTATION [ declaration ] * CLASS METHOD [ method ] * INSTANCE METHOD [ method ]* END identifier

declaration • • = identifier-list ..... class-identifier

identifier-list • • = identifier [ ", " identifier ] *

107 108 method : : "" METHOD identifier " (" [ declaration ] * ")" [ ":" class-identifier ] [ VAR [ declaration ]* ] BEGIN [ statement ] * identifier statement : : = expression I I_tJI_ return-expression I IF expression THEN [ statement ] * elseif-part END I WHILE expression DO [ statement ] * END return-expr : "= -- A missing expression means return self. l expression elseif-part : : = [ ELSEIF expression THEN [ statement ] * ] * [ ELSE [ statement ] * ] expression : : = simple-expression I expression binary-op expression I expression "==" expression I expression "~~" expression I unary-op expression I variable-identifier " : =" expression variable-identifier : : = identifier simple-expression " : -- nil false true self constant variable-identifier " (" expression ") " receiver-expression .... message-identifier " (" [ expression-list ] ") " receiver-expression : := simple-expression I super message-identifier : := identifier expression-list : := expression [ ", " expression ] * Appendix D Elastic-Smalltalk Examples

Figures D-1 and D-2 show how a version of the class FinancialHistory [Goldberg 83, page 79] is written in Elastic-SmaUtalk.

Figures D-3 and D-4 define a simple linked-list abstraction.

INTERFACE FinancialHist ory

INHERITS Object

CLASS MESSAGES

-- As a default, this returns an object of the class FinancialHistory

MESSAGE initialBalance (Integer)

INSTANCE MESSAGES

MESSAGE receive (Integer)

MESSAGE spend (Integer)

MESSAGE cashOnHand () : Integer

END FinancialHistory

Figure D-l: Interface for FinancialHistory 110

IMPLEMENTATION FinancialHistory

REPRESENTATION balance: Integer

CLASS METHODS

METHOD initialBalance(amount: Integer) BEGIN RETURN super.new().setInitialBalance(amount); END initialBalance

METHOD new() BEGIN RETURN super.new().setInitialBalance(0); END new

INSTANCE METHODS

METHOD receive(amount: Integer) BEGIN balance := balance + amount; END receive -- self returned as the default.

METHOD spend(amount: Integer) BEGIN balance := balance - amount; END spend

METHOD cashOnHand(): Integer BEGIN RETURN balance; END cashOnHand

-- This method is private and not part of the FinancialHistory type. METHOD setInitialBalance(amount: Integer) BEGIN balance := amount; END setInitialBalance

END FinancialHistory FigureD-2: Implemen_fionforFinancialHistory 111

INTERFACE LinkList

INHERITS Object

CLASS MESSAGES

MESSAGE ncons (Object)

INSTANCE MESSAGES

MESSAGE cons (Object)

MESSAGE car () : Object

MESSAGE cdr ()

MESSAGE replaca (Object)

MESSAGE replacd (LinkList)

END LinkList FigureD-3: Interfacfore LinkList

IMPLEMENTATION LinkList

REPRESENTATION Data: Object Next : LinkList

CLASS METHODS

METHOD ncons (first : Object) BEGIN RETURN new () .replaca (first) ; -- Next set to nil by default. END ncons

INSTANCE METHODS

METHOD cons (newData: Object) BEGIN RETURN new () .replaca (newData) .replacd (self) ; -- Note the cascading. END cons

METHOD car () : Object BEGIN RETURN Data; END car

METHOD cdr () BEGIN RETURN Next; END cdr

METHOD replaca (withData: Object) BEGIN Data := withData; END

METHOD replacd(withTail : LinkList) BEGIN Next := withTail; END

END LinkList

Figure D-4: Implementation for LinkList 112 Appendix E Elastic-Smalltalk Specification

DOMAINS Symbol = OBJECT ( Class, Meta, Method, Field, Local, Parameter )

Posn: Position;

WHERE Class => ItsClass: Symbol; -- meta class Visible: SymbolScopeStack; Value: Memory; -- memory representation Inherits: Symbol; -- super class Messages: SymbolScope; -- union of interface messages MsgCount: Integer; -- count of Messages Methods: SymbolMap; -- all instance/private methods Environ: Store; -- for instance fields TotalSize: Integer; -- union for size of instance ClassName: Name; -- check for circular inherit -- Memory representation of class value -- 0 its class (i.e. meta class) -- 1 its super class (i.e. inherited class) -- 2 SymbolMap for methods -- 3 SymbolScope for messages -- 4 TotalSize for objects of this class -- 5 FieldStart for objects of this class

WHERE Meta => ItsClass: Symbol; -- meta meta class (MetaClass) Value: Memory; -- memory representation Inherits: Symbol; -- meta super class Messages: SymbolScope; -- union of class messages MsgCount: Integer; -- count of Messages Methods: SymbolMap; -- class methods

WHERE Method => ItsClass: Symbol; -- check duplicate decl Visible: SymbolScopeStack; Index: Integer; ArgClasses: SymbolList; -- argument classes Returns: Symbol; -- return class Environ: Store; -- for args and locals Stmts: Continuation;

WHERE Field => ItsClass: Symbol; Offset: Integer;

WHERE Local => ItsClass: Symbol; Offset: Integer;

WHERE Parameter => ItsClass: Symbol;

113 114

Offset : Integer; END;

SymbolList = LIST OF Symbol; SymbolScope = SCOPE OF Symbol; SymbolScopeStack = SCOPESTACK OF Symbol; SymbolMap = Integer ~ Symbol;

TERMINALS IDENTIFIER<< ^ Name>> CONSTANT<< ^ Integer, Memory>> BINARYOP<< ^ Continuation, Memory, Memory, Memory>> UNARYOP<< ^ Continuation, Memory, Memory>> tCLASS t IN STANCE tINTERFACE t IMPLEMENTATION tEND tINHERITS tUSE tMESSAGE tREPRESENTATION tMETHOD tBEGIN tVAR tRETURN tWHILE tDO tIF tTHEN tELSEIF tELSE ASSIGN DOT tNIL t TRUE tFALSE tSELF tSUPER EQUALS NOTEQUALS

NONTERMINALS c unit<< ^ >> interface<< ^ >> class<< ^ >> opt_inherit<< ^ Symbol>> opt_u se<> use_li st <> message_seq<> formal_seq<> formal_list<> opt_returns<> field_seq<> field_ids<> cl as s_id<> method seq<> arg_seq<> arg_list<> arg_ids<> opt_var_seq<> stmt_seq<> statement<> elseif part<> return_expr<> 115

expression<> receiver<> call<> actual_seq<> actual_list<>

START c unit

CONSTANTS NoSuperArgs = AppendElt (NewList (SymbolList), NewObJect (Symbol, Class) ) ;

BaseScope = NewScopeStack (SymbolScopeStack, NewScope (SymbolScope), FALSE) ;

GetClassMessage = NewObJect (Symbol, Method) ; IsKindOfMessage = NewObJect (Symbol, Method) ; ObJectMsgs = NewScope(SymbolScope); ObJectMethods = NewMap (SymbolMap) ; ObJectClass = NewObJect (Symbol, Class) ; ObJectClassName = FindNameIndex ("Object") ; ObjectClassValue = Allocate (6) ;

NewMessage = NewObject (Symbol, Method) ; UseSelf = NewObJect (Symbol, Class); UseSelfValue = Allocate (i) ;

InheritsMessage = NewObJect (Symbol, Method) ; ClassMsgs = NewScope (SymbolScope) ; ClassMethods = NewMap (SymbolMap) ; ClassClass = NewObJect (Symbol, Class) ; ClassClassName = FindNameIndex ("Class") ; ClassClassValue = Allocate (6) ;

MetaClass = NewObJect (Symbol, Meta) ; MetaClassValue = Allocate (6) ;

Primitive = NewObJect (Symbol, Meta) ; PrimitiveValue = Allocate (i) ;

BooleanClass = NewObJect (Symbol, Class) ; BooleanClassValue = Allocate (4) ; IntegerClass = NewObJect (Symbol, Class) ; IntegerClassValue = Allocate (4) ;

NilClassValue = Allocate(l); -- class for "nil" NoReturnAllowed = Allocate (i);

BuildMethodTable = LAMBDA (inherited: SymbolMap; count: Integer) -> SymbolMap ; LOCALS Metho_ap: SymbolMap; BEGIN IF count = 0 THEN RETURN NewMap (SymbolMap) ; ELSE MethodMap := BuildMethodTable (inherited, count - i) ; MethodMap \ [ count - 1 - inherited [ count - 1 ] ]; RETURN MethodMap; ENDIF; END;

InheritsSel f = LAMBDA (inherits: Symbol; classname: Name) -> Boolean ; BEGIN IF GetProp(inherits, ClassName) = classname THEN RETURN TRUE; ELSEIF inherits = ObJectClass THEN RETURN FALSE; ELSE RETURN InheritsSelf (GetProp (inherits, Inherits), classname) ; 116

ENDIF; END;

NotCompatible = LAMBDA (ancestor: Memory; descendant: Memory) -> Boolean ; BEGIN IF ancestor = descendant THEN RETURN FALSE; ELSEIF (descendant = ObJectClassValue) OR (descendant = PrimitiveValue) THEN RETURN TRUE; ELSE RETURN NotCompatible (ancestor, FetchValue (descendant, I, Memory) ) ; ENDIF; END;

Incompatible = LAMBDA (ancestor: Memory; descendant: Memory) -> Boolean ; BEGIN IF descendant = NilClassValue THEN RETURN PrimitiveValue = FetchValue (ancestor, i, Memory) ; ELSE RETURN NotCompatible (ancestor, descendant) ; ENDIF; END;

SlowFieldAccess = LAMBDA (object: Memory; offset: Integer) -> Integer ; BEGIN RETURN Address (object) + FetchValue(FetchValue(obJect, 0, Memory), 5, Integer) + offset; END;

BuildMetaClassValue LAMBDA (meta_inherits : Memory; cmsgs : SymbolScope) -> Memory; LOCALS meta class value: Memory; BEGIN meta class value := Allocate(6); StoreValue_meta_class_value, 0, MetaClassValue) ; StoreValue (recta_class_value, I, meta_inherits) ; StoreValue(meta_class_value, 3, cmsgs); St o reValue (met a_cl as s_value, 4, FetchValue (MetaClassValue, 4, Integer) ) ; St oreValue (met a_cl as s_value, 5, FetchValue (MetaClassValue, 5, Integer} ) ; RETURN met a_class_value; END;

BuildClassValue = LAMBDA (meta_class_value: Memory; inherits: Memory; imsgs: SymbolScope) -> Memory ; LOCALS class value : Memory; BEGIN class value := Allocate(6) ; StoreValue (class_value, 0, meta_class_value) ; StoreValue(class_value, I, inherits); StoreValue(class_value, 3, imsgs); RETURN class_value; END;

GetClassMethod = LAMBDA () -> Continuation ; LOCALS GetClass : Continuation; BEGIN GetClass := Call (LOCAL, 0, 0) ; AppendContinuation (GetClass, Call (DEREF, I) ) ; 117

AppendContinuation (GetClass, Call (PROCRETURN, I) ) ; RETURN GetClass; END;

I sKindOfMethod = LAMBDA () -> Continuation ; LOCALS IsKindOf : Continuation; BEGIN IsKindOf := Call (LOCAL, 0, i) ; AppendContinuation (IsKindOf, Call (DEREF, I) ) ; AppendContinuation(IsKindOf, Call(LOCAL, 0, 0)); AppendContinuation (IsKindOf, Call (DEREF, i) ) ; AppendContinuation (IsKindOf, Call (DEREF, i) ) ; -- value' s class AppendContinuation (IsKindOf, Call (NotCompatible)) ; AppendContinuation (IsKindOf, Call (BNOT)) ; AppendContinuation(IsKindOf, Call (PROCRETURN, i) ) ; RETURN I sKindOf; END;

Inherit sMethod = LAMBDA () -> Continuation ; LOCALS Inherit sFrom: Continuation; BEGIN InheritsFrom := Call (LOCAL, 0, i) ; AppendContinuation (InheritsFrom, Call (DEREF, I) ) ; AppendContinuation(InheritsFrom, Call (LOCAL, 0, 0) ) ; AppendContinuation (InheritsFrom, Call (DEREF, i) ) ; AppendContinuation (InheritsFrom, Call (NotCompatible)) ; AppendContinuation (InheritsFrom, Call (BNOT)) ; AppendContinuation (InheritsFrom, Call (PROCRETURN, I) ) ; RETURN Inherit sFrom; END;

NewMethod = LAMBDA (Glass_value: Memory) -> Memory ; LOCALS NewValue : Memory; BEGIN NeWValue := Allocate (FetchValue (Glass_value, 4, Integer) ) ; StoreValue (NewValue, 0, class_value) ; RETURN NewValue; END;

INITIALLY IF DeclareName(BaseScope, ObJectClassName, ObJectClass) THEN ENDIF; IF DeclareName(BaseScope, ClassClassName, ClassClass) THEN END IF; IF DeclareName (BaseScope, FindNameIndex ("Integer"), IntegerClass) THEN END IF; IF DeclareName(BaseScope, FindNameIndex("Boolean"), BooleanClass) THEN ENDIF;

SetProp (GetClassMessage, Index, 0) ; SetProp(GetClassMessage, ItsClass, ObJectClass); SetProp (GetClassMessage, ArgClasses, NewList (SymbolList)) ; SetProp(GetClassMessage, Returns, ClassClass); SetProp (GetClassMessage, Stmts, GetClassMethod() ) ;

SetProp(IsKindOfMessage, Index, I); SetProp (IsKindOfMessage, ItsClass, ObJectClass) ; SetProp (IsKindOfMessage, ArgClasses, AppendElt (NewList (SymbolList), ClassClass) ) ; SetProp(IsKindOfMessage, Returns, BooleanClass); SetProp(IsKindOfMessage, Struts, IsKindOfMethod());

AddName (ObJectMsgs, FindNameIndex ("Glass"), GetClassMessage) ; AddName (ObJectMsgs, FindNameIndex ("isKindOf") , IsKindOfMessage) ; 118

ObjectMethods \ [ 0 ~ GetClassMessage ] ; ObJectMethods \ [ 1 ~ IsKindOfMessage ] ;

SetProp (ObjectClass, ClassName, ObJectClassName) ; SetProp (ObJectClass, ItsClass, ClassClass) ; StoreValue (ObJectClassValue, 0, ClassClassValue) ; SetProp (ObJectClass, Methods, ObJectMethods) ; StoreValue (ObJectClassValue, 2, ObJectMethods) ; SetProp(ObJectClass, Messages, ObJectMsgs); StoreValue (ObJectClassValue, 3, OhJectMsgs) ; SetProp(ObJectClass, MsgCount, 2); SetProp(ObJectClass, Environ, NewStore ()) ; St oreValue (Oh JectClassValue, 5, FindOffset(GetProp(ObJectClass, Environ), i)); SetProp (ObJectClass, TotalSize, NextOffset (GetProp (ObJectClass, Environ) ) ) ; StoreValue (ObJectClassValue, 4, GetProp(ObJectClass, TotalSize) ) ; SetProp(ObJectClass, Value, ObjectClassValue);

SetProp(NewMessage, Index, 2); SetProp(NewMessage, ItsClass, ClassClass); SetProp (NewMessage, ArgClasses, NewList (SymbolList)) ; SetProp(NewMessage, Returns, UseSelf); SetProp (NewMessage, Stmts, Call (NewMethod)) ;

SetProp (UseSelf, Value, UseSelfValue) ;

SetProp(InheritsMessage, Index, 3); SetProp(InheritsMessage, ItsClass, ClassClass); SetProp (Inherit sMessage, ArgClasses, AppendElt (NewList (SymbolList), ClassClass) ) ; SetProp(InheritsMessage, Returns, BooleanClass); SetProp (InheritsMessage, Struts, InheritsMethod ()) ;

AddName (ClassMsgs, FindNameIndex ("Glass"), GetClassMessage) ; AddName (ClassMsgs, FindNameIndex ("isKindOf"), IsKindOfMessage) ; AddName (ClassMsgs, FindNameIndex ("new"), NewMessage) ; AddName (ClassMsgs, FindNameIndex ("inheritsFrom"), InheritsMessage) ;

ClassMethods \ [ 0 ~ GetClassMessage ] ; ClassMethoda \ [ 1 ~ IsKindOfMessage ] ; ClassMethods \ [ 2 ~ NewMessage ] ; ClassMethods \ [ 3 ~ InheritsMessage ] ;

SetProp(ClassClass, ClassName, ClassClassName); SetProp (ClassClass, ItsClass, MetaClass) ; StoreValue (ClassClassValue, 0, MetaClassValue) ; SetProp (ClassClass, Inherits, ObJectClass) ; StoreValue (ClassClassValue, i, ObJectClassValue) ; SetProp(ClassClass, Methods, ClassMethods); StoreValue (ClassClassValue, 2, ClassMethods) ; SetProp (ClassClass, Messages, ClassMsgs) ; StoreValue (ClassClassValue, 3, ClassMsgs) ; SetProp (ClassClass, MsgCount, 4) ; StoreValue(ClassClassValue, 5, FetchValue(ObJectClassValue, 5, Integer)); SetProp (ClassClass, TotalSize, GetProp (ObJectClass, TotalSize) ) ; StoreValue (ClassClassValue, 4, GetProp (ClassClass, TotalSize) ) ; SetProp (ClassClass, Value, ClassClassValue) ;

SetProp (MetaClass, ItsClass, MetaClass) ; StoreValue (MetaClassValue, 0, MetaClassValue) ; SetProp(MetaClass, Inherits, ClassClass); StoreValue (MetaClassValue, i, ClassClassValue) ; SetProp (MetaClass, Methods, ClassMethods) ; StoreValue (MetaClassValue, 2, ClassMethods) ; SetProp(MetaClass, Messages, ClassMsgs); StoreValue (MetaClassValue, 3, ClassMsgs) ; SetProp (MetaClass, MsgCount, GetProp (ClassClass, MsgCount) ) ; StoreValue (MetaClassValue, 4, GetProp (ClassClass, TotalSize) ) ; 119

StoreValue(MetaClassValue, 5, FetchValue(ClassClassValue, 5, Integer)); SetProp(MetaClass, Value, MetaClassValue);

SetProp (Primitive, ItsClass, Primitive) ; StoreValue (PrimitiveValue, 0, PrimitiveValue) ;

SetProp (BooleanClass, Inherits, Primitive) ; SetProp (BooleanClass, Messages, NewScope (SymbolScope)) ; SetProp (BooleanClass, MsgCount, 0) ; SetProp (BooleanClass, Methods, NewMap (SymbolMap)) ; SetProp (BooleanClass, ItsClass, Primitive) ; SetProp (BooleanClass, Value, BooleanClassValue) ; StoreValue (BooleanClassValue, 0, PrimitiveValue) ; StoreValue (BooleanClassValue, I, PrimitiveValue) ; StoreValue (BooleanClassValue, 2, GetProp (BooleanClass, Methods) ) ; StoreValue (BooleanClassValue, 3, GetProp (BooleanClass, Messages) ) ;

SetProp (IntegerClass, Inherits, Primitive) ; SetProp (IntegerClass, Messages, NewScope (SymbolScope)) ; SetProp (IntegerClass, MsgCount, 0) ; SetProp (IntegerClass, Methods, NewMap (SymbolMap)) ; SetProp (IntegerClass, ItsClass, Primitive) ; SetProp (IntegerClass, Value, IntegerClassValue) ; StoreValue (IntegerClassValue, 0, PrimitiveValue) ; StoreValue (IntegerClassValue, I, PrimitiveValue) ; StoreValue(IntegerClassValue, 2, GetProp(IntegerClass, Methods)); StoreValue (IntegerClassValue, 3, GetProp(IntegerClass, Messages) ) ;

GLOBALS ClassData: Symbol; NoOptimize : Boolean;

PRODUCTIONS

c unit<< ^ >> ::= interface<< ^ >> ();

c unit<< ^ >> ::= class<< ^ >> ();

interface<< ^ >> : := tINTERFACE IDENTIFIER<< ^ namel>> opt_inherit<< ^ inherits>> opt_use<> tCLASS message_seq<> tINSTANCE message_seq<> tEND IDENTIFIER<< ^ name2>> LOCALS visible : SymbolScopeStack; cmsgs, imsgs: SymbolScope; meta_inherits, meta_class: Symbol; ( NewClass : BEGIN

visible := EnterScope (BaseScope, FALSE, NewScope (SymbolScope)) ; meta_inherits := GetProp (inherits, ItsClass) ; cmsgs := GetProp(meta_inherits, Messages); imsgs := GetProp(inherits, Messages);

meta_class := NewObJect (Symbol, Meta) ; SetProp (meta_class, Posn, GetPosition (2)) ; SetProp(meta_class, Inherits, recta_inherits); SetProp (meta_class, ItsClass, MetaClass) ; 120

NoOptimize := TRUE; ClassData := NewObJect (Symbol, Class) ; SetProp (ClassData, Posn, GetPosition (2)) ; SetProp (ClassData, Inherits, inherits) ; SetProp(ClassData, ItsClass, meta_class);

SetProp(ClassData, ClassName, namel); IF DeclareName(visible, namel, ClassData) THEN NoteError ("MultDecl", GetPosition (2), TRUE) ; ENDIF;

IF InheritsSelf (inherits, namel) THEN NoteError ("InheritsSelf", GetPosition (3) , TRUE) ; ENDIF; END

MetaClass : BEGIN SetProp(meta_class, Messages, cmsgs) ; SetProp (meta_class, MsgCount, cmsg_count) ; SetProp (meta_class, Value, BuildMetaClassValue (GetProp (meta_inherits, Value) , cn.,gs)); END

Interface : BEGIN SetProp (ClassData, Messages, imsgs) ; SetProp (ClassData, MsgCount, imsg_count) ; SetProp(ClassData, Visible, visible); END

C1 as sValue : PRE HasProp (meta_class, Value) BEGIN SetProp (ClassData, Value, BuildClassValue (GetProp (meta_class, Value), GetProp(inherits, Value), imsgs) ) ; END

InterfaceIds : BEGIN IF namel <> name2 THEN NoteError ("NameMismatch.., GetPosition (I0) , TRUE) ; ENDIF; END ); opt_inherit<< ^ ObjectClass>> ::= (); opt_inherit<< ^ inherits>> ::= tINHERITS IDENTIFIER<< ^ name>> LOCALS found: Boolean; base, inherits: Symbol; ( CheckInherit : BEGIN found, base := LookupName(BaseScope, name); IF found THEN NoteError ("CannotInheritBase,', GetPosition (2), TRUE) ; ELSE inherits := ReadModule(FALSE, name, ".int", "ClassData", Symbol); 121

ENDIF; END };

opt_use<> ::=

opt_use<> ::= tUSE use list<>

use list<> : := IDENTIFIER<< ^ name>> LOCALS found: Boolean; base, use: Symbol;

UseId: BEGIN found, base := LookupName(BaseScope, name); IF found THEN NoteError ("CannotUseBase-, GetPosition (i), FALSE) ; ELSE

use : =

ReadModule(FALSE, name, ".int", "ClassData", Symbol); IF DeclareName(visible, name, use) THEN NoteError ("MultUse", GetPosition (I), TRUE) ; ENDIF; ENDIF; END }; use list<> ::= use llst<> ',' IDENTIFIER<< ^ name>> LOCALS found: Boolean; base, use: Symbol;

UseIdList : BEGIN found, base := LookupName(BaseScope, name); IF found THEN NoteError ("CannotUseBase", GetPosition (3) , FALSE) ; ELSE

use : =

ReadModule(FALSE, name, ".int", "ClassData", Symbol); IF DeclareName(visible, name, use) THEN NoteError ("MultUse", GetPosition (3), TRUE) ; ENDIF; ENDIF; END }; message_seq<>

message_seq<> ::= message_seq<> tMESSAGE IDENTIFIER<< ^ name>> ' (' formal_seq<> ' ) ' opt_returns<> LOCALS message, super_msg, other: Symbol; new count : Integer; found, multiple: Boolean; super_args : SymbolList;

MessageOverride : BEGIN 122

found, super_msg := FindName(messages, name); IF found THEN IF GetProp(super_msg, ItsClass) = default THEN NoteError ("MultDecl", GetPosition (3), TRUE) ; ELSE super_args := GetProp (super_msg, ArgClasses) ; ENDIF; ELSE super_args := NoSuperArgs; ENDIF; multiple, other := LookupName (visible, name) ; IF multiple THEN NoteError ("MultDecl", GetPosition (3), TRUE) ; ENDIF; END

Message : BEGIN message := NewObject (Symbol, Method) ; SetProp (message, Posn, GetPosition (3)) ; SetProp (message, ItsClass, ClassData) ; SetProp (message, ArgClasses, arg_classes) ; SetProp (message, Returns, returns) ; END

MessageDeclare : ONEOF PRE NOT found BEGIN SetProp(message, Index, count); new count := count + i; AddName (messages, name, message) ; END

PRE found BEGIN SetProp(message, Index, GetProp(super_msg, Index)); new count := count; IF _ncompatible (GetProp (GetProp (super_msg, Returns) , Value), GetProp (returns, Value) ) THEN NoteError ("BadOverride", GetPolition (7), TRUE) ; ENDIF; AddName (messages, name, message) ; END ENDONEOF

CheckSuper : BEGIN IF found AND (ListLength(check_super) <> 0) THEN NoteError ("TooFewArgsOverride", GetPosition (6) , TRUE) ; ENDIF; END }; formal_seq<> ::= (}; formal_seq<> : := formal list<> (}; formal list<> ::= class id<> LOCALS check_args : SymbolList ; ( 123

FormalOverride : BEGIN IF super_args = NoSuperArgs THEN check_args := NoSuperArgs; ELSEIF ListLength(super_args) = 0 THEN NoteError ("TooManyArgsOverride-, GetPosition (i), TRUE) ; ELSEIF Incompatible (GetProp (class, Value), GetProp (ListHead(super_args), Value) ) THEN NoteError ("BadOverride", GetPosition (I), TRUE) ; ELSE

check_args := ListTail (super_args) ; ENDIF; END }; formal_list<> ::= formal_list<> ' e ' class id<> LOCALS check_args : SymbolList;

FormalListOverride : BEGIN IF check_super = NoSuperArgs THEN check_args := NoSuperArgs; ELSEIF ListLength(check_super) = 0 THEN NoteError ("TooManyArgsOverride", GetPosition (3), TRUE) ; ELSEIF Incompatible (GetProp (class, Value), GetProp (ListHead (check_super) , Value) ) THEN NoteError ("BadOverride", GetPosition (3) , TRUE) ; ELSE check_args := ListTail (check_super) ; ENDIF; END }; opt_returns<> ::= (}; opt_returns<> : := ':' class id<>

class<< ^ >> ::= tIMPLEMENTATION IDENTIFIER<< ^ namel>> opt_u se<> tREPRESENTATION field_seq<> tCLASS method_seq<> tINSTANCE method_seq<> tEND IDENTIFIER<< ^ name2>> LOCALS interface, meta_class, inherits, meta inherits: Symbol; interface_visible, class visible: SymbolScopeStack; cmethods, imethods : SymbolMap; environ: Store; fld_start, total_size : Integer;

ClassInterface : BEGIN interface := ReadModule(TRUE, namel, ".int", "ClassData", Symbol); meta_class := GetProp(interface, ItsClass) ; interface visible := GetProp (interface, Visible) ; class_visible := EnterScope (EnterScope (interface_visible, FALSE, GetProp (met a_class, Messages) ) , FALSE, GetProp (interface, Messages) ) ; environ := NewStore () ; inherits := GetProp(interface, Inherits) ; meta inherits := GetProp(meta_class, Inherits) ; cmethods := BuildMethodTable(GetProp(meta_inherits, Methods), GetProp (meta_inherits, MsgCount) ) ; imethods := BuildMethodTable(GetProp(inherits, Methods), GetProp (inherits, MsgCount) ) ; END

Class: BEGIN SetProp (interface, Methods, imethods) ; SetProp(interface, Environ, environ); StoreValue(GetProp(interface, Value), 2, imethods); END

MetaClassImpl : BEGIN SetProp (meta_class, Methods, cmethods) ; StoreValue (GetProp (meta_class, Value), 2, cmethods) ; END

ClassIds: BEGIN IF namel <> name2 THEN NoteError ("NameMismatch", GetPosition (ii), TRUE) ; ENDIF; END

ClassAllocate: PRE HasProp (GetProp (interface, Inherits), TotalSize) BEGIN fld start := GetProp (GetProp (interface, Inherits), TotalSize) ;

total -- size := fld m start + NextOffset(environ); SetProp(interface, TotalSize, total_size); StoreValue (GetProp (interface, Value), 4, total_size) ; StoreValue (GetProp (interface, Value), 5, fld_start) ; END ); field_seq<> ::= (); field_seq<> ::= field_seq<> field_ids<> '-' class id<> (); field_ids<> ::= IDENTIFIER<< ^ name>> LOCALS field: Symbol;

Field: BEGIN IF is local THEN field := NewObJect (Symbol, Local) ; ELSE field := NewObject (Symbol, Field) ; ENDIF; SetProp (field, Posn, GetPosition (I)) ; SetProp(field, ItsClass, ofclass); SetProp (field, Offset, FindOffset (environ, i) ) ; 125

IF DeclareName(visible, name, field) THEN NoteError ("MultDecl", GetPosition (I), TRUE) ; ENDIF; END }; field_ids<> ::= field_ids<> ' ' IDENTIFIER<< ^ name>> LOCALS field: Symbol;

FieldList : BEGIN IF is local THEN field := NewObJect (Symbol, Local) ; ELSE field := NewObJect (Symbol, Field) ; ENDIF; SetProp (field, Posn, GetPosition (3)) ; SetProp(field, ItsClass, ofclass); SetProp(field, Offset, FindOffset (environ, i) ) ; IF DeclareName(visible, name, field) THEN NoteError ("MultDecl", GetPosition (3) , TRUE) ; ENDIF; END }; class id<> : := IDENTIFIER<< ^ name>> LOCALS found: Boolean; class, found_sym: Symbol;

ClassId: BEGIN found, found_sym := LookupName (visible, name) ; IF found THEN IF IsKind(found_sym, Class) THEN class := found_sym; ELSE NoteError ("ClassExpected", GetPosition (I), TRUE) ; ENDIF; ELSE NoteError ("UndeclClassId", GetPosition (i), TRUE) ; ENDIF; END }; method_seq<> ::=

method_seq<> ::= method_seq<> tMETHOD IDENTIFIER<< ^ namel>> ' (' arg_seq<> ' ) ' opt_returns<> opt_var_seq<> tBEGIN stmt_seq<> tEND IDENTIFIER<< ^ name2>> LOCALS new_count, self_offset: Integer; method_scope : SymbolScopeStack; check_args: SymbolList; method env: Store; found: Boolean; method: Symbol; 126

semantics : Continuation; { MethodLookup: ONEOF PRE found WHERE found, method := LookupName(visible, namel); BEGIN IF HasProp(method, Struts) THEN NoteError ("MultDecl", GetPosition (3), TRUE) ; ELSE check_args := GetProp (method, ArgClasses) ; new count := current count; ENDIF; END

BEGIN method := NewObject (Symbol, Method) ; SetProp(method, Posn, GetPosition(3)); SetProp(method, Index, current_count); new count := current count + i; SetProp(method, ItsClass, ClassData); check_args := NoSuperArgs; IF DeclareName(visible, namel, method) THEN NoteError ("MultDecl2 ", GetPosition (3), TRUE) ; ENDIF; END ENDONEOF

MethodMe ssage : BEGIN methods \ [ GetProp(method, Index) ~ method ] ; method env := NewStore(); self offset := FindOffset (method_env, I) ; method_scope := EnterScope (visible, FALSE, NewScope (SymbolScope)) ; END

Method: BEGIN SetProp (method, ArgClasses, arg_classes) ; SetProp(method, Environ, mathod_env); SetProp (method, Visible, method_scope) ; semantics := Call (ENTER, ListLength (arg_classes) + I, NextOffset (method_env)) ; AppendContinuation (semantics, Label (6)) ; AppendContinuation(semantics, Call (LOCAL, 0, 0) ) ; AppendContinuation (semantics, Call (DEREF, I) ) ; AppendContinuation (semantics, Call (PROCRETURN, I) ) ; SetProp(method, Stmts, semantics); END

MethodReturn: BEGIN IF found THEN IF returns <> GetProp(method, Returns) THEN NoteError ("ReturnMismatch", GetPosition (7), TRUE) ; ENDIF; ELSE SetProp (method, Returns, returns) ; ENDIF; END

MethodIds : BEGIN IF namel <> name2 THEN NoteError ("NameMismatch", GetPosition (12), TRUE) ; END IF; 127

END

CheckHeader: BEGIN

IF found AND (ListLength(rem_check_args) <> 0) THEN NoteError ("TooFewArgsHeader-, GetPosition (6), TRUE) ; ENDIF; END }; opt_var_seq<> ::=

opt_var_seq<> ::= tVAR field_seq<> (}; arg_seq<> ::=

arg_seq<> : := arg_list<>

arg_list<> ::= arg_ids<> ''' class id<>

arg_list<> ::= arg_list<> ';' arg_ids<> ''' class id<>

arg_ids<> : := IDENTIFIER<< ^ name>> LOCALS arg: Symbol; rem_args : SymbolList;

ArgFormalMat ch: BEGIN

IF check_args - NoSuperArgs THEN rem_args := NoSuperArgs; ELSEIF ListLength(check_args) = 0 THEN NoteError ("TooManyArgsHeader", GetPosition (i) , TRUE) ; ELSEIF ofclass <> ListHead(check_args) THEN NoteError ("ArgMismatchHeader,,, GetPosition (i) , TRUE) ; ELSE

rem_args := ListTail (check_args) ; ENDIF; END

Arg: BEGIN arg := NewObJect (Symbol, Parameter) ; SetProp (arg, Posn, GetPosition (I)) ; SetProp (arg, ItsClass, ofclass) ; SetProp(arg, Offset, FindOffset (environ, i) ) ; IF DeclareName(visible, name, arg) THEN NoteError ("MultDecl", GetPosition (i), TRUE) ; 128

ENDIF; END }; arg_ids<> ::= arg_ids<> ' ' IDENTIFIER<< ^ name>> LOCALS arg: Symbol; rem_check_args : SymbolList;

ArgLi stFormalMat ch: BEGIN IF rem_args = NoSuperArgs THEN rem_check_args := NoSuperArgs; ELSEIF ListLength(rem_args) = 0 THEN NoteError ("TooManyArgsHeader", GetPosition (3) , TRUE) ; ELSEIF ofclass <> ListHead(rem_args) THEN NoteError ("ArgMismatchHeader", GetPosition (3) , TRUE) ; ELSE rem_check_args := ListTail (rem_args) ; ENDIF; END

ArgLi st : BEGIN arg := NewObJect (Symbol, Parameter) ; SetProp (arg, Posn, GetPosition (3)) ; SetProp(arg• ItsClass, ofclass); SetProp(arg, Offset• FindOffset (environ, i) ) ; IF DeclareName(visible, name, arg) THEN NoteError ("MultDecl", GetPosition (3), TRUE) ; ENDIF; END };

stmt_seq<> ::= (};

stmt_seq<> ::= stmt_seq<> statement<> ';'

Strut Seq: BEGIN Execute (i) ; IF is_expr THEN Execute (Call (INCSPI, -I) ) ; ENDIF; Execute (2) ; END };

statement<> ::= tRETURN return_expr<>

RsturnStmt : BEGIN Execute (i) ; IF returns = NoReturnAllowed THEN NoteError ("NoReturnAllowed", GetPosition (i), TRUE) ; ELSEIF Incompatible(returns, class) THEN IF Incompatible(class, returns) THEN NoteError ("ReturnMismatch", GetPosition (2), TRUE) ; ELSE NoteError ("DynReturnCheck", GetPosition (2), FALSE) ; 129

IF Incompatible (returns, FetchValue (Execute (Call (COPY, I), Memory) , 0, Memory) ) THEN NoteError ("ReturnMismatch", GetPosition (2), TRUE) ; ENDIF; ENDIF; ENDIF; Execute (Call (PROCRETURN, I) ); END ); return_expr<> ( NoReturnExpr: BEGIN Execute (Call (LOCAL, 0, 0) ) ; Execute (Call (DEREF, i) ) ; END ); return_expr<> : := expression<> ( ReturnExpr : BEGIN Execute (i) ; END ); statement<> ::= expression<> ( ExprStmt : BEGIN Execute (I) ; END ); statement<> ::= tWHILE expression<> tDO stmt_seq<> tEND ( WhileCheck: BEGIN

IF Incompatible(BooleanClassValue, class) THEN NoteError ("BoolExpected,,, GetPosition (2), TRUE) ; ENDIF; END

WhileStmt : BEGIN IF Execute (i, Boolean) THEN Execute (2) ; IF i s_expr THEN Execute (Call (INCSPI, -i) ); ENDIF; Execute (0) ; ENDIF; END ); statement<> ::= tIF expression<> tTHEN stmt_seq<> 130

elseif_part<> tEND ( I fCheck: BEGIN IF Incompatible(BooleanClassValue, class) THEN NoteError ("BoolExpected", GetPosition (2), TRUE) ; END IF; END

I fStmt : BEGIN IF Execute(l, Boolean) THEN Execute (2) ; IF is_expr THEN Execute (Call (INCSPI, -I) ) ; ENDIF; ELSE Execute (3) ; ENDIF; END ); elseif_part<> ::= tELSEIF expression<> tTHEN stmt_seq<> elseif__art<> ( ElseifCheck: BEGIN IF Incompatible (BooleanClassValue, class) THEN NoteError ("BoolExpected", GetPosition (2), TRUE) ; ENDIF; END

ElseifPart : BEGIN IF Execute(l, Boolean) THEN Execute (2) ; IF i s_expr THEN Execute (Call (INCSPI, -i) ) ; END IF; ELSE Execute (3) ; ENDIF; END );

elseif_part<> ::= tELSE stmt_seq<> { E1 seP art : BEGIN Execute (i) ; IF is_expr THEN Execute (Call (INCSPI, -I) ) ; ENDIF; END );

elseif_part<>

();

call<> : := tNIL ( NilExpr : 131

BEGIN Evaluate (0) ; END }; call<> : := tFALSE { FalseExpr : BEGIN Evaluate (0) ; END }; call<> : := tTRUE { TrueExpr: BEGIN Evaluate (i) ; END };

call<> : := tSELF

Sel fExpr : BEGIN Execute (Call (LOCAL, 0, 0) ) ; -- self addr Execute (Call (DEREF, I) ) ; -- self END }; receiver<> : := call<> { CallReceiver: BEGIN Execute (I) ; END }; receiver<> ::= tSUPER ( SuperReceiver: BEGIN Execute (Call (LOCAL, 0, 0) ) ; -- self addr Execute (Call (DEREF, I) ) ; -- self' s class addr Execute (Call (DEREF, I) ) ; -- self' s class Execute (Call (IADDI, i) ) ; -- super class addr END }; call<> ::= ' (' expression<> ')'

ParenExpr : BEGIN Execute (I) ; END }; call<> ::= CONSTANT<< ^ value, class>>

ConstantExpr: BEGIN 132

Evaluate (value) ; END }; expression<> ::= expresslon<> BINARYOP<< ^ blnary_op, formall, formal2, result>> expression<>

BinOpCheckl : BEGIN IF Incompatible(formall, argl) THEN NoteError ("BinClassMismatchl", GetPosition (I), TRUE) ; ENDIF; END

BinOpCheck2 : BEGIN IF Incompatible(formal2, arg2) THEN NoteError ("BinClassMismatch2", GetPosition (3), TRUE) ; ENDIF; END

BinOpExpr : BEGIN Execute (i) ; Execute (3) ; Execute (binary_op) ; END };

expression<> ::= expression<> EQUALS expression<>

EqualExpr: BEGIN Execute (i) ; Execute (2) ; IF Incon_gatible(arg2, argl)AND Incompatible(argl, arg2) THEN NoteError ("CannotBeTrue", GetPosition (2), FALSE) ; ENDIF; Execute (Call (EQ, I) ) ; END };

expression<> : := expression<> NOTEQUALS expression<> { NotEqualExpr: BEGIN Execute (i) ; Execute (2) ; IF Incompatible (arg2, argl) AND Incompatible (argl, arg2) THEN NoteError ("CannotBeFalse", GetPosition (2), FALSE) ; ENDIF; Execute (Call (NE, i) ) ; END };

expression<> ::= UNARYOP<< ^ unary_op, formal, result>> expression<>

UnOpCheck: BEGIN IF Incompatlble(formal, arg) THEN 133

NoteError ("UnClassMismatch", GetPosition (2), TRUE) ; ENDIF; END

UnOpExpr : BEGIN Execute (2) ; Execute (unary_op) ; END ); call<> ::_ IDENTIFIER<< ^ name>> LOCALS found: Boolean; sym: Symbol; expr_class: Memory; { IdLookup : ONEOF PRE found WHERE found, sym := LookupName(visible, name); BEGIN IF IsKind(sym, Method) THEN NoteError ("WrongIdClass", GetPosition (I), TRUE) ; ELSE expr_class := GetProp(GetProp(sym, ItsClass), Value); ENDIF; END

PRE PASS () = LASTPASS BEGIN NoteError ("UndeclId", GetPosition (i), TRUE) ; END ENDONEOF

IdExpr: ONEOF PRE (IsKind(sym, Local) OR IsKind(sym, Parameter)) BEGIN Execute(Call (LOCAL, 0, GetProp(sym, Offset) )) ; Execute (Call (DEREF, i) ) ; END

PRE IsKind(sym, Field) AND HaBProp (GetProp (ClassData, Inherits), TotalSize) BEGIN Execute (Call (LOCAL, 0, 0) ) ; Execute (Call (DEREF, 1) ) ; Execute (Call (IADDI, GetProp (GetProp (ClassData, Inherits), TotalSize) + GetProp(sym, Offset) ) ) ; Execute (Call (DEREF, i) ) ; END

PRE IsKind(sym, Field) AND NoOptimize BEGIN Execute (Call (LOCAL, 0, 0) ) ; Evaluate (SlowFieldAccess (Execute (Call (DEREF, i) , Memory) , GetProp(sym, Offset) )) ; Execute (Call (DEREF, I) ) ; END

PRE IsKind(sym, Class) BEGIN Evaluate (GetProp (sym, Value) ) ; END ENDONEOF }; expression<> : := IDENTIFIER<< ^ name>> ASSIGN expression<> LOCALS found: Boolean; s_nn: Symbol; class : Memory; ( AssignLookup : ONEOF PRE found WHERE found, sym := LookupName(visible, name) ; BEGIN IF NOT (IsKind(sym, Local) OR IsKind(sym, Field)) THEN NoteError ("VarExpected", GetPosition (i), TRUE) ; ELSE class := GetProp(GetProp(sym, ItsClass), Value) ; END IF; END

PRE PASS () = LASTPASS BEGIN NoteError ("UndeclId", GetPosition (i), TRUE) ; END ENDONEOF

AssignExpr : ONEOF PRE IsKind(sym, Local) BEGIN Execute (Call (LOCAL, 0, GetProp (sym, Offset) ) ) ; StoreValue (Execute (Call (COPY, i) , Memory), 0, Execute (2, Memory) ) ; Execute (Call (DEREF, I) ) ; END

PRE IsKind(sym, Field) AND HasProp (GetProp (ClassData, Inherits), TotalSize) BEGIN Execute (Call (LOCAL, 0, 0) ) ; Execute (Call (DEREF, 1) ) ; Execute (Call (IADDI, GetProp (GetProp (ClassData, Inherits) , TotalSize) + GetProp(sym, Offset) )) ; StoreValue (Execute (Call (COPY, I) , Memory) , 0, Execute (2, Memory) ) ; Execute (Call (DEREF, i) ) ; END

PRE IsKind(sym, Field) AND NoOptimize BEGIN Execute(Call(LOCAL, 0, 0)); Evaluate (SlowFieldAccess (Execute (Call (DEREF, i) , Memory) , GetProp (sym, Offset) ) ) ; StoreValue (Execute (Call (COPY, I) , Memory) , 0, Execute (2, Memory) ) ; Execute (Call (DEREF, i) ) ; END ENDONEOF

AssignCheck : BEGIN IF Incompatible(class, rhs) THEN IF Incompatible(rhs, class) THEN NoteError ("AssignMismatch", GetPosition (3) , TRUE) ; ELSE NoteError ("DynAssignCheck", GetPosition (3), FALSE) ; 135

IF Incompatible (class, FetchValue (Execute (Call (COPY, 1), Memory) , 0, Memory) ) THEN NoteError ("AssignMismatch", GetPosition (3), TRUE) ; ENDIF; ENDIF; ENDIF; END }; expression<> : := call<> { CallExpr: BEGIN Execute (I) ; END ); call<> ::= receiver<> DOT IDENTIFIER<< ^ name>> ' (' actual_seq<> ' ) ' LOCALS static, dynamic: Boolean; message, method: Symbol; self, returns: Memory; check_args : SymbolList; index : Integer;

MsgLookup: BEGIN Execute (I) ; IF Execute(Call(COPY, i), Integer) = 0 THEN NoteError ("Nil value", GetPosition (i) , TRUE) ; ENDIF; Execute (Call (COPY, i) ) ; self := Execute(Call(DEREF, I), Memory) ; static, message := FindName(FetchValue(class, 3, SymbolScope), name); IF static THEN check_args := GetProp(message, ArgClasses); index := GetProp (message, Index) ; returns := GetProp (GetProp (message, Returns), Value) ; ELSE NoteError ("DynMethodLookup", GetPosition (3), FALSE) ; returns := ObJectClassValue; dynamic, method := FindName(FetchValue(self, 3, SymbolScope), name); IF dynamic THEN check_args := GetProp (method, ArgClasses) ; index := GetProp (method, Index) ; ELSE NoteError ("Unknown message", GetPosition (3) , TRUE) ; ENDIF; ENDIF; END

MsgExpr : BEGIN Execute (3) ; IF ListLength(rem_args) <> 0 THEN NoteError ("TooFewActuals", GetPosition (6), TRUE) ; ENDIF; Evaluate (GetProp (FetchValue (self, 2, SymbolMap) [ index ], Stmts) ) ; Execute (Call (CALLSTK)) ; END 136

}; actual_seq<> ::=

actual_seq<> ::= actual_list<>

ActualSeq: BEGIN Execute (i) ; END }; actual_list<> ::= expression<> LOCALS this_arg: Memory;

Actual : BEGIN Execute (i) ; IF ListLength(check_args) = 0 THEN NoteError ("TooManyActuals", GetPosition (I), TRUE) ; ELSE this_arg := GetProp (ListHead (check_args) , Value) ; IF Incompatible(this_arg, class) THEN IF Incompatible(class, this_arg) THEN NoteError ("ActualMismatch", GetPosition (I), TRUE) ; ELSE NoteError ("DynActualCheck", GetPosition (i), FALSE) ; IF Incompatible (this_arg, FetchValue (Execute (Call (COPY, i) , Memory) , 0, Memory) ) THEN NoteError ("ActualMismatch", GetPosition(1), TRUE) ; ENDIF; ENDIF; ENDIF; END IF; END };

actual_list<> : := actual_list<> ', ' expression<> LOCALS this_arg: Memory;

ActualList : BEGIN Execute (I) ; Execute (2) ; IF ListLength(rem_args) = 0 THEN NoteError ("TooManyActuals", GetPosition (3) , TRUE) ; ELSE this_arg := GetProp (ListHead (rem_args), Value) ; IF Incompatible(this_arg, class) THEN IF Incompatible(class, this_arg) THEN NoteError ("ActualMismatch", GetPosition (3), TRUE) ; ELSE NoteError ("DynActualCheck", GetPosition (3), FALSE) ; IF Incompatible (this_arg, FetchValue (Execute (Call (COPY, I), Memo ry) , 0, Memory) ) THEN 137

NoteError ("ActualMismat ch-, GetPosition (3), TRUE) ; ENDIF; ENDIF; ENDIF; ENDIF; END }; 138 Appendix F Modula-2 Specification

DOMAINS _ypeSpec = OBJECT ( Primitive, En_m, Range, Array, Pointer, Record, Routine, Module )

Size: Integer;

WHERE Enum=> Count: Integer; Constants: SymbolScope;

WHERE Range => StartValue, StopValue: Integer; BaseType: TypeSpec;

WHERE Array => IndexType, BaseType: TypeSpec;

WHERE Pointer => BaseType: TypeSpec;

WHERE Record => Fields: SymbolScope; Environ: Store;

WHERE Routine => BaseType: TypeSpec; -- return type Formals: SymbolList;

WHERE Module => Fields: SymbolScope; END;

Symbol= OBJECT ( Constant, Type, Variable, Field, Parameter, Routine, Module )

Posn: Position;

WHERE Constant => ItsType: TypeSpec; Value: Integer;

WHERE Type => ItsType : TypeSpec;

WHERE Variable => It sType: TypeSpec; Offset: Integer; NestLevel: Integer; InModule: Symbol;

139 140

WHERE Field => ItsTy_m: Typ4Spec; Offset : Integer;

WHERE Parameter => It sType : TypeSpec; Offset : Integer; ByReference: Boolean; IsArrayOf: Boolean;

WHERE Routine => ItsType: TypeSpec; -- signature Environ: Store; NameScope: SymbolScopeSt ack; Code : Continuation;

WHERE Module => NameScope: SymbolScopeStack; Environ: Store; Exported: SymbolScope; Imported: SymbolList; Code : Continuation; ItsType : TypeSpec; Offset: Integer; END;

SymbolList = LIST OF Symbol; SymbolScope = SCOPE OF Symbol; SymbolScopeStack = SCOPESTACK OF Symbol;

TERMINALS IDENTIFIER<< ^ Name>> CONSTANT<< ^ Integer, TypeSpec>> tDEFINITION t IMPLEMENTATION tMODULE tEND tIMPORT tFROM tEXPORT tQUALIFIED tCONST tTYPE tPOINTER tTO tARRAY tOF tRECORD tVAR tPROCEDURE tBEGIN tEXIT tLOOP tWHILE tDO tFOR tREPEAT tUNTIL tIF tTHEN tELSIF tELSE tNOT RANGE ASSIGN

NONTERMINALS c unit<< ^ >> 141

defn mod<< ^ >> impl_mod<< ^ >> main rood<< ^ >>

local_mod<>

import_seq<> import<> import_ids<> opt_export<> opt_qualified<< ^ Boolean>> export_ids<> defn_seq<> defn<> opt_formals<> opt_formal_list<> formal_list<> formal<> opt_byvar<< ^ Boolean>> opt_arrayof<< ^ Boolean>> formal_ids<> opt_return<> decl_seq<> decl<> const_defn_seq<> const_defn<> var_decl_seq<> type_defn_seq<> type_defn<> type_spec<> qual_type_id<> module_qual<> enum id list<> index_base<> bounded_type<> field_list<> field_decl<> field id list<> type_decl_seq<> type_decl<> const_expr<> qual_const_id<> opt_block<> stmt_list<> statement<> elsif_part<> expression<> designator<> array_ref<> call<> opt_actuals<> actual_list<>

START c unit

CONSTANTS BooleanType = NewObJect(TypeSpec, Enum}; BooleanSym= NewObJect(Symbol, Type);

IntegerType = NewObJect(TypeSpec, Primitive); IntegerSym= NewObJect(Symbol, Type);

RealType = NewObJect(TypeSpec, Primitive); RealSym= NewObJect(Symbol, Type); 142

BooleanConstants = NewScope (SymbolScope) ; TRUESym = NewObJect (Symbol, Constant) ; FALSESym = NewObJect (Symbol, Constant) ;

BaseScope = NewScopeStack(SymbolScopeStack, NewScope (SymbolScope) , TRUE) ;

BuildFormals = AppendElt (NewList (SymbolList), TRUESym) ;

NoType = NewObject (TypeSpec, Primitive) ;

NoExitAllowed = Call (NOOP) ;

Type sMat ch = LAMBDA (Typel, Type2: TypeSpec) -> Boolean ; BEGIN RETURN Typel = Type2; END;

IsScalar = LAMBDA (Typ: TypeSpec; CanBeInteger: Boolean) -> Boolean ; BEGIN IF IsKind(Typ, Enum) OR IsKind(Typ, Range) THEN RETURN TRUE; ELSEIF IsKind(Typ, Primitive) AND CanBeInteger THEN RETURN Typ = IntegerType; ELSE RETURN FALSE; ENDIF; END;

INITIALLY AddName (BooleanConst ant s, FindNameIndex ("FALSE" ), FALSESym) ; AddName (BooleanConstant s, FindNameIndex ("TRUE"), TRUESym) ;

SetProp (FALSESym, Value, 0) ; SetProp(FALSESym, ItsType, BooleanType); IF DeclareName(BaseScope, FindNameIndex("FALSE"), FALSESym) THEN ENDIF; SetProp (TRUESym, Value, I) ; SetProp (TRUESym, ItsType, BooleanType) ; IF DeclareName (BaseScope, FindNameIndex ("TRUE"), TRUESym) THEN ENDIF;

SetProp(BooleanType, Size, i); SetProp(BooleanType, Count, 2); SetProp (BooleanType, Constants, BooleanConstants) ;

SetProp(IntegerType, Size, i); SetProp(RealType, Size, I);

SetProp (BooleanSym, ItsType, BooleanType) ; IF DeclareName(BaseScope, FindNameIndex("BOOLEAN"), BooleanSym) THEN ENDIF; SetProp(IntegerSym, ItsType, IntegerType); IF DeclareName(BaseScope, FindNameIndex("INTEGER"), IntegerSym) THEN END IF; SetProp (RealSym, ItsType, RealType) ; IF DeclareName (BaseScope, FindNameIndex ("REAL"), RealSym) THEN ENDIF;

GLOBALS ModuleData: Symbol;

PRODUCTIONS c unit<< ^ >> ::= defn mod<< ^ >>

c unit<< ^ >> 143

::= impl_mod<< ^ >> (); c unit<< ^ >> : := main mod<< ^ >> {); defn mod<< ^ >> ::= tDEFINITION tMODULE IDENTIFIER<< ^ namel>> ';' import_seq<> defn_seq<> tEND IDENTIFIER<< ^ name2>> ' ' LOCALS decl_scope : SymbolScopeStack;

var w store: Store; exported: SymbolScope; mod_type : TypeSpec; ( DefnBegin: BEGIN ModuleData := NewObject (Symbol, Module) ; exported := NewScope (SymbolScope) ; mod_type := NewObJect (TypeSpec, Module) ; SetProp (mod_type, Fields, exported) ; decl_scope := EnterScope (BaseScope, FALSE, exported) ; vat store := NewStore() ; END

DefnIds : BEGIN IF namel <> name2 THEN NoteError ("NameMismatchDefn", GetPosition (7) , TRUE) ; ENDIF; END

De fnEnd: BEGIN SetProp (ModuleData, Posn, GetPosition (3)) ; SetProp(ModuleData, NameScope, decl_scope); SetProp(ModuleData, Environ, var store); SetProp(ModuleData, Exported, exported); SetProp(ModuleData, Imported, imports); SetProp (ModuleData, ItsType, sod_type) ; END );

import_seq<> ::= import_seq<> import<>

import_seq<> ::---- ();

import<> ::= tFROM IDENTIFIER<< ^ name>> tIMPORT import_ids<> ';' LOCALS found: Boolean; entry: Symbol; globals : SymbolList; module_scope : SymbolScopeStack; ( F romLookup : BEGIN IF decl_scope <> look_scope THEN found, entry := LookupName(look_scope, name); IF found THEN IF IsKind(entry, Module) THEN module_scope := GetProp (entry, NameScope) ; ELSE NoteError ("FromModule?", GetPosition (2), TR_E) ; ENDIF; ELSE NoteError ("FromNotFound", GetPosition (2), TRUE) ; ENDIF; globals := uses; ELSE entry := ReadModule (FALSE, name, ". sym", "ModuleData", Symbol) ; IF HasProp(entry, Offset) THEN ENDIF; module_scope := GetProp(entry, NameScope); globals := AppendElt (uses, entry) ; ENDIF; END }; import_ids<> : := import_ids<> ',' IDENTIFIER<< ^ name>> LOCALS found: Boolean; entry: Symbol;

ImportIds : BEGIN found, entry := LookupName(look_scope, name); IF NOT found THEN NoteError ("ImportIdsNotFound", GetPosition (3), TRUE) ; ELSEIF DeclareName(decl_scope, name, entry) THEN NoteError ("MultDeclImportIds", GetPosition (3), TRUE) ; ENDIF; END };

import_ids<> : := IDENTIFIER<< ^ name>> LOCALS found: Boolean; entry : Symbol;

ImportId: BEGIN found, entry := LookupName(look_scope, name); IF NOT found THEN NoteError ("ImportIdNotFound", GetPosition (i), TRUE) ; ELSEIF DeclareName(decl_scope, name, entry) THEN NoteError ("MultDeclImportId", GetPosition (I), TRUE) ; ENDIF; END };

import<> ::= tIMPORT IDENTIFIER<< ^ name>> ';' LOCALS existing, entry: Symbol; declared, found: Boolean; globals : SymbolList; ( In_9o rt Check: BEGIN declared, existing := LookupName (decl_scope, name) ; IF declared THEN NoteError ("MultDeclImport", GetPosition (2), TRUE) ; ENDIF; END

ImportDeclare : PRE NOT declared BEGIN IF decl_scope <> look_scope THEN 145

found, entry := LookupName(look_scope, name); IF NOT found THEN NoteError ("Im_ortNotFound", GetPosition (2), TRUE) ; ELSEIF DeclareName(decl_scope, name, entry) THEN NoteError ("MultDeclImport", GetPosition (2), TRUE) ; ENDIF; globals := uses; ELSE entry := ReadModule (FALSE, name, ". sym", "ModuleData", Symbol) ; IF DeclareName(decl_scope, name, entry) THEN NoteError ("MultDeclImport", GetPositlon (2), TRUE) ; ELSEIF HasProp (entry, Offset) THEN ENDIF; globals := AppendElt(uses, entry); END IF; END ); opt_export<>

(); opt_export<> : := tEXPORT opt_qualified<< ^ is_qualified>> export_ids<> ' ; ' {);

opt_qualified<< ^ FALSE>> ::= ();

opt_qualified<< ^ TRUE>> : := tQUALIFIED {);

export_ids<> : := IDENTIFIER<< ^ name>> LOCALS found: Boolean; sym: Symbol; { Export Id: ONEOF PRE PASS () = LASTPASS BEGIN NoteError ("UndeclExportId", GetPosition (I), TRUE) ; END

PRE found WHERE found, sym := LookupName(look_scope, name) ; BEGIN IF NOT is_qualified THEN IF DeclareName(outer_scope, name, sym) THEN NoteError ("ExportIdMult", GetPosition (I), TRUE) ; ELSE AddName (exported, name, sym) ; ENDIF; END IF; END ENDONEOF );

export_ids<> ::= export_ids<> ' ' IDENTIFIER<< ^ name>> LOCALS found: Boolean; sym: Symbol; 146

( ExportIds : ONEOF PRE PASS () = LASTPASS BEGIN NoteError ("UndeclExportIds", GetPosition (3), TRUE) ; END

PRE found WHERE

found, sym := LookupName(look_scope, name) ; BEGIN IF NOT is_qualified THEN IF DeclareName(outer_scope, name, sym) THEN NoteError ("ExportIdsMult ", GetPosition (3), TRUE) ; ELSE AddName(exported, name, sym); ENDIF; ENDIF; END ENDONEOF ); defn_seq<> ::= defn_seq<> defn_> (); defn_seq<> ::= (); defn<> : := tCONST const_defn_seq<> {}; defn<> ::= tVAR var_decl_seq<> (); defn<> ::_ ,TYPE type_defn_seq<> (); defn<> : := tPROCEDURE IDENTIFIER<< ^ name>> opt_formals<> t;t LOCALS entry: Symbol; signature : TypeSpec; local_scope : SymbolScopeStack; local store: Store; ( P rocDefnBegin: BEGIN local_scope := EnterScope (decl_scope, FALSE, NewScope (SymbolScope)) ; local_store := NewS, ore () ; END

P rocDe fnDecl : BEGIN entry := NewObJect (Symbol, Routine) ; IF DeclareName(decl_scope, name, entry) THEN NoteError ("ProcDefnMultDecl',, GetPosition (2), TRUE) ; ENDIF; END 147

ProaDefnEnd: BEGIN signature := NewObject (TypeSpec, Routine) ; SetProp (signature, BaseType, return_type) ; SetProp (signature, Formals, formals) ; SetProp(entry, ItsType, signature); SetProp (entry, Environ, local_store) ; END }; opt_formals<> ::=

opt_formals<> ::=' (' opt_formal_list<> ') ' opt_return<>

opt_formal_list<> ::=

opt_formal_list<> : := formal_list<> (};

formal_list<> : := formal_list<> '; ' formal<>

formal_list<> ::= formal<>

formal<> ::= opt_byvar<< ^ by_ref>> formal_ids<> ' :' opt_arrayof<< ^ is_arrayof>> qual_type_id<>

opt_byvar<< ^ FALSE>>

opt_byvar<< ^ TRUE>> : := tVAR ();

opt_arrayo f<< ^ FALSE>> ::= {);

opt_arrayo f<< ^ TRUE>> 148

: := tARRAY tOF

formal_ids<> ::= formal_ids<> ' t ' IDENTIFIER<< ^ name>> LOCALS parm: Symbol; out_formals, defn_formals: SymbolList;

FormalIdsDecl : BEGIN IF in formals _ BuildFormals THEN out formals := BuildFormals;

parm := NewObJect (Symbol, Parameter) ; SetProp(parm, ItsType, type_rep); SetProp(parm, ByReference, by_ref); SetProp(parm, IsArrayOf, is_arrayof); defn_formals := AppendElt (prev_formals, parm) ; ELSE defn formals := BuildFormals;

out_formals := ListTail (mid_formals) ; parm := ListHead (mid_formals) ; IF (by_ref <> GetProp(parm, ByReference)) OR (is_arrayof <> GetProp(parm, IsArrayOf)) OR (type rep <> GetProp(parm, ItsType)) THEN NoteError ("FormalIdsMismatch", GetPosition (3), TRUE) ; ENDIF; ENDIF; IF DeclareName(parm_scope, name, parm) THEN NoteError ("MultDeclFormalIds", GetPosition (3) , TRUE) ; ENDIF; END

FormalIdsSize : ONEOF PRE is_arrayof BEGIN SetProp (parm, Offset, FindOffset (parm_store, 2) ) ; END

PRE by_re f BEGIN SetProp(parm, Offset, FindOffset (parm_store, I) ) ; END

PRE HasProp (type_rep, Size) BEGIN SetProp (parm, Offset, FindOffset (parm_store, GetProp (type_rep, Size) )) ; END ENDONEOF };

formal_ids<> ::= IDENTIFIER<< ^ name>> LOCALS parm: Symbol; out formals, defn formals: SymbolList; ( FormalIdDecl : BEGIN IF in formals = BuildFormals THEN 149

out formals := BuildFormals;

parm := NewObject (Symbol, Parameter) ; SetProp(parm, ItsType, type_rep); SetProp(parm, ByReference, by_ref); SetProp(parm, IaArrayOf, is_arrayof); defn formals := AppendElt(NewList(SymbolList), parm); ELSE defn formals := BuildFormals;

out_formals := ListTail (in_formals) ; parm := ListHead(in_formals) ; IF (by_ref <> GetProp(parm, ByReference)) OR (is_arrayof <> GetProp(parm, IsArrayOf)) OR (type_rep <> GetProp(parm, ItsType)) THEN NoteError ("FormalIdMismatch", GetPosition (1) , TRUE) ; ENDIF; ENDIF; IF DeclareName (parm_scope, name, parm) THEN NoteError ("MultDeclFormalId", GetPosition (i), TRUE) ; ENDIF; END

FormalIdSize : ONEOF PRE is_arrayof BEGIN SetProp (parm, Offset, FindOffset (parm_store, 2) ) ; END

PRE by_re f BEGIN SetProp (parm, Offset, FindOffset (parm_store, I) ) ; END

PRE HasProp (type_rep, Size) BEGIN SetProp (parm, Offset, ¥indOffset (parm_store, GetProp (type_rep, Size) ) ) ; END ENDONEOF ); opt_return<< 1 ook_scope ^ NoType>> ::= (); opt_return<> ::= qual_type_id<> {};

const_de fn_seq<> ::= const defn_seq<> const_defn<>

const_de fn_seq<>

{);

con st_de fn<> ::= IDENTIFIER<< ^ name>> '=' const_expr<> ' ; ' LOCALS const_sym: Symbol; ( ConstCreate: BEGIN 150

const_sym :: NewObJect (Symbol, Constant) ; IF DeclareName(decl_scope, name, const_sym) THEN NoteError ("MultDeclConst", GetPosition(1), TRUE) ; ENDIF; END

ConstValueType : BEGIN SetProp (const_sym, Value, value) ; SetProp (const_sym, ItsType, type_rep) ; END }; var_decl_seq<> ::= var_decl_seq<> field id list<> ' : ' type_spec<> ' ; '

var_decl_seq<>

type_de fn_seq<> ::= type_defn_seq<> type_defn<> (}; type_de fn_seq<>

type_de fn<> ::= IDENTIFIER<< ^ name>> ';' LOCALS type_sym: Symbol; type_rep: TypeSpec; ( HiddenDeclare : BEGIN type_sym := NewObJect (Symbol, Type) ; IF DeclareName(decl_scope, name, type_sym) THEN NoteError ("MultDeclHidden", GetPosition (I), TRUE) ; ELSE type_rep := NewObject (TypeSpec, Pointer) ; SetProp(type_rep, Size, i); SetProp(type_sym, ItsType, type_rep); ENDIF; END );

type_de fn<> : := IDENTIFIER<< ^ name>> '=' type_spec<> '; ' LOCALS type_sym: Symbol; ( TypeDeclare : BEGIN type_sym := NewObject (Symbol, Type) ; IF DeclareName(decl_scope, name, type_sym) THEN NoteError ("MultDeclType", GetPosition (I), TRUE) ; ENDIF; END

Type Type : BEGIN SetProp(type_sym, ItsType, type_rep); END ); 151

type_spec<> ::= qual_type_id<>

qual_type_id<> ::= IDENTIFIER<< ^ name>> LOCALS type_rep: TypeSpec; type_sym: Symbol; found: Boolean; ( TypeIdLookup : ONEOF PRE (first pass AND (PASS() = 2)) OR (PASS() > 2) BEGIN NoteError ("UndeclTypeId", GetPosition (i) , TRUE) ; END

PRE found WHERE found, type_sym := LookupName (look_scope, name) ; BEGIN END ENDONEOF

TypeId_et ch: PRE HasProp (type_sym, ItsType) BEGIN type_rep := GetProp (type_sym, ItsType) ; END );

qual_type_id<> ::= module_qual<> ' ' IDENTIFIER<< ^ name>> LOCALS type_rep : TypeSpec; type_sym: Symbol; found: Boolean; ( TypeQual IdLookup : ONEOF PRE (firmt_pasa AND (PASS() = 2)) OR (PASS() > 2) BEGIN NoteError ("UndeclQualTypeId", GetPosition (3) , TRUE) ; END

PRE found WHERE found, type_sym: = FindName (find_scope, nature); BEGIN END ENDONEOF

TypeQual IdFet ch: PRE HasProp (type_sym, ItsType) BEGIN type_rep := GetProp (type_sym, ItsType) ; END };

modul e_qual <> ::= IDENTIFIER<< ^ name>> LOCALS find_acope : SymbolScope; module_sym: Symbol; found: Boolean; { ModuleQual : BEGIN found, module_sym := LookupName (look_scope, name) ; IF NOT found THEN 152

NoteError ("UndeclModuleId", GetPosition (i), TRUE) ; ELSEIF IsKind(module_sym, Module) THEN find_scope := GetProp (module_sym, Exported) ; ELSE NoteError ("ModuleIdExpected", GetPosition (I), TRUE) ; ENDIF; END }; type_spec<> ::= ' (' enum id list<> ')' LOCALS type_rep: TypeSpec; enum_scope: SymbolScope;

EnumCreate : BEGIN type_rep := NewObJect (TypeSpec, Enum) ; enum_scope := NewScope (SymbolScope) ; END

EnumSizeCount : BEGIN SetProp (type_rep, Count, cnt) ; SetProp(type_rep, Size, I); END

EnumConst s : BEGIN SetProp(type_rep, Constants, enum_scope); END }; enum_id list<> : := enum id list<> ' ' IDENTIFIER<< ^ name>> LOCALS newconstant : Symbol; { EnumLi stCreate : BEGIN newconstant := NewObJect(Symbol, Constant); SetProp (newconstant, Posn, GetPosition (3)) ; SetProp (newconstant, Value, last_cnt) ; SetProp (newconstant, ItsType, enum_type) ; END

EnumLi st Declare : BEGIN IF DeclareName(decl_scope, name, newconstant) THEN NoteError ("MultDeclEnumList", GetPosition (3), TRUE) ; ELSE AddName (enum_scope, name, newconstant) ; ENDIF; END };

enum id list<> : := IDENTIFIER<< ^ name>> LOCALS newconstant : Symbol; { EnumCreate : BEGIN newconstant := NewObject(Symbol, Constant); SetProp (newconstant, Posn, GetPosition (i)) ; SetProp (newconstant, Value, cnt) ; SetProp (newconstant, ItsType, enum_type) ; END

EnumDeclare : 153

BEGIN IF DeclareName(decl_scope, name, newconstant) THEN NoteError ("MultDeclEnum", GetPosition (i), TRUE) ; ELSE AddName(enum_scope, name, newconstant); ENDIF; END ); type_spec<> : := ' [' con.t_expr<> RANGE const_expr<> ']' LOCALS type_rep: TypeSpec; okscalarl, okscalar2, okmatch: Boolean; ( RngCreate : BEGIN type_rep := NewObJect (TypeSpec, Range) ; END

RngSi ze: PRE HasProp (typel, Size) BEGIN SetProp(type_rep, Size, GetProp(typel, Size)); END

RngBase : BEGIN SetProp(type_rep, BaseType, typel); END

RngVall : BEGIN SetProp (type_rep, StartValue, valuel) ; END

RngVal2 ". BEGIN SetProp (type_rep, StopValue, value2) ; END

RngCheck: BEGIN okscalarl := IsScalar(typel, TRUE); IF NOT okscalarl THEN NoteError ("NeedScalar", GetPosition (2), TRUE) ; ENDIF; okscalar2 := IsScalar(type2, TRUE); IF NOT okscalar2 THEN NoteError ("NeedScalar", GetPosition (4), TRUE) ; ENDIF; okmatch := TypesMatch (typel, type2) ; IF NOT okmatch THEN NoteError ("Mismatch", GetPosition (4), TRUE) ; ENDIF; IF okscalarl AND okscalar2 AND okmatch AND (value2 < valuel) THEN NoteError ("BadOrder", GetPosition (4), TRUE) ; ENDIF; END );

type_spec<> ::= tPOINTER tTO type_spec<> LOCALS type_rep: TypeSpec; ( PtrCreate: BEGIN type_rep := NewObJeat (TypeSpec, Pointer) ; SetProp(type_rep, Size, i); END

PtrBase : BEGIN SetProp(type_rep, BaseType, base_type); END ); type_spec<> ::= tARRAY index_base<> (); index_base<< 1 ook_scope ^ type_rep>> : := bounded_type<> ',' index_base<> LOCALS type_rep: TypeSpec; index_cnt : Integer;

MultDimArrayCreate : BEGIN type_rep := NewObJect(TypeSpec, Array); END

Mult DimIndex: BEGIN SetProp(type_rep, IndexType, index_type); END

MultDimBase : BEGIN SetProp(type_rep, BaseType, base_type); END

MultDimCount : ONEOF PRE IsKind(index_type, Enum) AND HasProp(index_type, Count) BEGIN index_cnt := GetProp(index_type, Count); END

PRE IsKind(index_type, Range) AND HasProp (index_type, StartValue) AND HasProp (index_type, StopValue) BEGIN index_cnt := GetProp (index_type, StopValue) - GetProp(index_type, StartValue) + I; END ENDONEOF

MultDimSize : PRE HasProp (base_type, Size) BEGIN SetProp (type_rep, Size, index_cnt * GetProp (base_type, Size) ) ; END );

index_base<> : := bounded_type<> tOF type_spec<> LOCALS type_rep: TypeSpec; index cnt : Integer; ( OneDimArrayCreate : BEGIN 155

type_rep := NewObJect (TypeSpec, Array) ; END

OneDimlndex : BEGIN SetProp(type_rep, IndexType, index_type); END

OneDimBase : BEGIN SetProp(type_rep, BaseType, base_type); END

OneDimCount : ONEOF PRE IsKind(index_type, Enum)AND HasProp(index_type, Count) BEGIN index_cnt := GetProp(index_type, Count); END

PRE IsKind(index_type, Range) AND HasProp (index_type, StartValue) AND HasProp (index_type, StopValue) BEGIN index_cnt := GetProp (index_type, StopValue) - GetProp(index_type, StartValue) + I; END ENDONEOF

OneDimSi ze : PRE HasProp (base_type, Size) BEGIN SetProp (type_rep, Size, index_cnt * GetProp (base_type, Size) ) ; END ); bounded_type<> ::= type_spec<> ( Che ckBounded: BEGIN IF NOT IsScalar(type_rep, canbeinteger) TBEN NoteError ("NeedScalar", GetPosition (i), TRUE) ; ENDIF; END );

type_spec<> ::= tRECORD field_list<> tEND LOCALS field_scope : SymbolScope; decl_scope : SymbolScopeStack; type_rep: TypeSpec; field store: Store; ( RecordCreate : BEGIN type_rep := NewObJect(TypeSpec, Record); field store := NewStore(); field_scope := NewScope (SymbolScope) ; SetProp (type_rep, Fields, field_scope) ; decl_scope :_ EnterScope (look_scope, FALSE, field_scope) ; END

RecordSize : PRE PASS () = LASTPASS BEGIN 156

SetProp (type_rep, Environ, field_store) ; SetProp (type_rep, Size, NextOffset (fieldstore)) ; END }; field_list<> ::= field_list<> ' ; ' field_decl<>

field_list<> ::= field_decl<>

field_decl<> ::=

field_decl<> ::= field id list<> ' :' type_spec<>

field id list<> : := IDENTIFIER<< ^ name>> LOCALS newfield: Symbol; ( FieldCreate : BEGIN IF nest level < 0 THEN newfield := NewObJect (Symbol, Field) ; ELSE newfield := NewObJect (Symbol, Variable) ; SetProp (newfield, NestLevel, nest_level) ; SetProp (newfield, InModule, ModuleData) ; ENDIF; SetProp (newfield, Posn, GetPosition (I)) ; IF DeclareName(decl_scope, name, newfield) THEN NoteError ("MultDeclField", GetPosition (I), TRUE) ; ENDIF; END

FieldOffset : PRE HasProp (type_rep, Size) BEGIN SetProp (newfield, Offset, FindOffset (field_store, GetProp (type_rep, Size) ) ) ; END

FieldType : BEGIN SetProp(newfield, ItsType, type_rep); END );

field id list<> ::= field id list<> ' ' IDENTIFIER<< ^ name>> LOCALS newfield: Symbol;

FieldListCreate : BEGIN IF nest level < 0 THEN newfield := NewObJect (Symbol, Field) ; ELSE newfield := NewObJect (Symbol, Variable) ; SetProp (newfield, NestLevel, nest level) ; 157

SetProp (newfield, InModule, ModuleData) ; ENDIF; SetProp (newfield, Posn, GetPosition (3)) ; IF DeclareName(decl_scope, name, newfield) THEN NoteError ("MultDeclField", GetPosition (3), TRUE) ; ENDIF; END

FieldListOffset : PRE HasProp (type_rep, Size) BEGIN SetProp (newfield, Offset, FindOffset (field_store, GetProp (type_rep, Size) ) ) ; END

FieldListType : BEGIN SetProp(newfield, ItsType, type_rep); END ); impl_mod<< ^ >> ::= tIMPLEMENTATION tMODULE IDENTIFIER<< ^ namel>> ';' import_seq<> decl_seq<> opt_block<> tEND IDENTIFIER<< ^ name2>> ' ' LOCALS decl_scope : SymbolScopeStack; defn_imports : SymbolList; vat store: Store; defn_entry, global_rood: Symbol; init return: Continuation; { ImplReturn : BEGIN init return := Call (PROCRETURN, 0) ; END

ImplCont : BEGIN Execute (3) ; Execute (4) ; Execute (init_return) ;

SetProp (defn_entry, Code, Label (0)) ; END

ImplBegin: BEGIN defn_entry := ReadModule (TRUE, namel, ".sym", "ModuleData", Symbol) ; decl_scope := GetProp (defn_entry, NameScope) ; var_store := GetProp (defn_entry, Environ) ; defn_imports := GetProp (defn_entry, Imported) ; END

ImplOffset : ONEOF PRE PHASE () = "Linking" BEGIN globa1_mod := ReadModule (FALSE, FindNameIndex ("MAIN"), ".sym", "ModuleData", Symbol) ; SetProp (ModuleData, Offset, FindOffset (GetProp (global_rood, Environ), NextOffset (var_store)) ) ; 158

END

PRE PASS () = LASTPASS BEGIN END ENDONEOF

ImplIds : BEGIN IF namel <> name2 THEN NoteError ("NameMismatch", GetPosition (8), TRUE) ; ENDIF; END }; main mod<< ^ >> ::= tMODULE IDENTIFIER<< ^ namel>> ';' import_seq<> decl_seq<> opt_block<> tEND IDENTIFIER<< ^ name2>> ' ' LOCALS decl_scope : SymbolScopeStack; var store: Store; init return: Continuation;

MainReturn: BEGIN init return := Call (PROCBETURN, 0) ; END

MainCont : BEGIN Execute (4) ; Execute (3) ; Execute (init_return) ; END

MainBegin: BEGIN decl_scope := EnterScope (BaseScope, FALSE, NewScope (SymbolScope)) ; var store := NewStore(); END

MainIds : BEGIN IF namel <> name2 THEN NoteError ("NameMismatch", GetPosition (8), TRUE) ; END IF; END

MainEnd: BEGIN ModuleData := NewObject (Symbol, Module) ; SetProp (ModuleData, Posn, GetPosition (2)) ; SetProp (ModuleData, NameScope, decl_scope) ; SetProp(ModuleData, Environ, var_store); SetProp(ModuleData, Imported, imports); END };

decl<> ::= local_mod<> {};

local_mod<> ::= tMODULE IDENTIFIER<< ^ namel>> ';' import_seq<> 159

opt_export<> decl_seq<> opt_blook<> tEND IDENTIFIER<< ^ name2>> ';' LOCALS new_soope : SymbolScopeStack; exported: Symbol Scope; mod_type : TypeSpec; local_sym: Symbol;

LocalBegin: BEGIN exported := NewSoope (SymbolScope) ; mod_type := NewObJect (TypeSpec, Module) ; SetProp(mod_type, Fields, exported) ; new_scope := EnterScope (decl_scope, TRUE, NewScope (SymbolScope)) ; END

LocalIds : BEGIN IF namel <> name2 THEN NoteError ("NameMismatchDefn", GetPosition (8), TRUE) ; ENDIF; END

LocalDecl : BEGIN

local_sym := NewObJect (Symbol, Module) ; SetProp (local_sym, Posn, GetPosition (2)) ; SetProp (local_sym, NameScope, new_scope) ; SetProp(local_sym, Environ, var_store); SetProp(local_sym, Exported, exported); SetProp(local_sym, Imported, imports); SetProp(local_sym, ItsType, sod_type) ; IF DeclsreName(decl_scope, namel, local sym) THEN NoteError ("MultLocalDecl", GetPosition (2) , TRUE) ; ENDIF; END

LocalCont : BEGIN Execute (4) ; Execute (5) ; END

decl_seq<> ::= decl_seq<> decl<>

decl_seq<> ::=

decl<> ::= tCONST const_defn_seq<>

decl<> : := tVAR var_decl_seq<>

decl<> ::= tTYPE type_decl_seq<>

decl<> ::= tPROCEDURE IDENTIFIER<< ^ namel>> opt_formals<> ,;t decl_seq<> opt_block<> tEND IDENTIFIER<< ^ name2>> ';' LOCALS entry: Symbol; found: Boolean; local_scope : SymbolScopeStack; local store: Store; proc return: Continuation; check formals: SymbolList; signature : TypeSpec; ( ProcDeclCommon: BEGIN local_scope := EnterScope (decl_scope, FALSE, NewScope (SymbolScope)) ; proc_return := Call (PROCBETURN, 0) ; END

P rocDeclBegin: ONEOF PRE found WHERE found, entry := LookupName (decl_scope, namel) ; BEGIN IF HasProp(entry, Code) THEN NoteError ("MultProcDecl", GetPosition (2), TRUE) ; ELSE local store := GetProp (entry, Environ) ; check formals := GetProp (GetProp (entry, ItsType) , Formals) ; ENDIF; END

BEGIN found := FALSE; entry := NewObJect (Symbol, Routine) ; IF DeclareName(decl_scope, namel, entry) THEN NoteError ("ProcDeclMultDecl", GetPosition (2), TRUE) ; ELSE local store := NewStore () ; check formals := BuildFormals; ENDIF; END ENDONEOF

P rocDeclEnd: BEGIN IF NOT found THEN signature := NewObJect (TypeSpec, Routine) ; SetProp (signature, BaseType, return_type) ; SetProp (signature, Formals, formals) ; SetProp(entry, ItsType, signature); ENDIF; SetProp (entry, NameScope, local_scope) ; SetProp(entry, Code, Label (4)) ; END

ProcIds : BEGIN IF namel <> name2 THEN NoteError ("NameMismatch", GetPosition (8), TRUE) ; ENDIF; END ); 161

type_decl_seq<> ::= type_decl_seq<> type_decl<>

type_decl_seq<> ::= (};

type_decl <> ::= IDENTIFIER<< ^ name>> '=' type_spec<> ' ; ' LOCALS type_sym: Symbol ; found: Boolean; { TypeDeclareImpl : ONEOF PRE found AND HasProp (type_sym, ItsType) AND IsKind(type rep, Pointer) AND HasProp (type_rep, BaseType) WHERE found, type_sym := LookupName(decl_scope, name); BEGIN IF HasProp (GetProp (type_sym, ItsType) , BaseType) THEN NoteError ("MultDeclTypeImpl", GetPosition (i) , TRUE) ; ELSE SetProp (GetProp (type_sym, ItsType) , BaseType, GetProp (type_rep, BaseType) ) ; ENDIF; END

BEGIN type_sym := NewObJect (Symbol, Type) ; IF DeclareName(decl_scope, name, type_sym) THEN NoteError ("MultDeclTypelmpl", GetPosition (I), TRUE) ; ENDIF; END ENDONEOF

TypeTypeImpl : BEGIN SetProp(type_sym, ItsType, type_rep); END );

const_expr<> ::-- CONSTANT<< ^ value, type_rep>>

const_expr<> ::= qual_const_id<> {);

qual_const_id<> ::= IDENTIFIER<< ^ name>> LOCALS value : Integer; type_rep: TypeSpec; const_sym: Symbol; found: Boolean; { ConstIdLookup : BEGIN found, const_sym := LookupName(look_scope, name); IF NOT found THEN NoteError ("UndeclConstId", GetPosition (I), TRUE) ; ENDIF; END 162

ConstIdValue : PRE HasProp (const_sym, Value) AND HasProp (const_sym, ItsType) BEGIN value := GetProp (const_sym, Value) ; type_rep := GetProp (const_sym, ItsType) ; END ); qual_const_id<> : := module_qual<> ' ' IDENTIFIER<< ^ name>> LOCALS value : Integer; type_rep: TypeSpec; const_sym: Symbol; found: Boolean;

ConstQualIdLookup: PRE found WHERE found, const_sym := FindName(find_scope, name); BEGIN IF NOT found THEN NoteError ("UndeclConstQualId", GetPosition (3), TRUE) ; ENDIF; END

QualConstTypeValue2 : PILE HasProp (const_sym, Value) AND HasProp (const_sym, ItsType) BEGIN value := GetProp (const_sym, Value) ; type_rep := GetProp (const_sym, ItsType) ; END };

opt_block<> ::=

opt_block<> : := tBEGIN stmt_list<>

Block: BEGIN Execute (i) ; END };

stmt_list<> ::= stmt_list<> ';' statement<>

CompoundStmtLi st : BEGIN Execute (I) ; Execute (2) ; END };

stmt_list<> ::= statement<>

SimpleStmtList : BEGIN Execute (i) ; END };

statement<> ::= 163

{I, statement<> ::= call<> ( CallStmt : BEGIN IF type_rep <> NoType THEN NoteError ("FunctionNotAllowed", GetPosition (i), TRUE) ; ENDIF; END }; statement<> : := tEXIT ( ExitStmt : BEGIN IF exit = NoExitAllowed THEN NoteError ("ExitNotAllowed", GetPosition (i), TRUE) ; ELSE Execute (exit) ; ENDIF; END }; statement<> ::= tLOOP stmt_list<> tEND

LoopCont : BEGIN Execute (I) ; Execute (0) ; END }; statement<> ::= tWHILE expression<> tDO stmt_list<> tEND ( WhileType : BEGIN IF NOT TypesMatch (type_rep, BooleanType) THEN NoteError ("BoolExpected", GetPosition (2) , TRUE) ; ENDIF; END

WhileCont : BEGIN IF Evaluate (value, Boolean) THEN Execute (2) ; Execute (0) ; ENDIF; END }; statement<> ::= tREPEAT stmt_list<> tUNTIL expression<> { Repeat Type : BEGIN IF NOT TypesMatch (type_rep, BooleanType) THEN NoteError ("BoolExpected", GetPosition (4), TRUE) ; ENDIF; 164

END

RepeatCont : BEGIN Execute (i) ; IF Evaluate (value, Boolean) THEN Execute (0) ; ENDIF; END }; statement<> ::= tIF expression<> tTHEN stmt_list<> elsif_part<> tEND { I fType : BEGIN IF NOT TypesMatch(type rep, BooleanType) THEN NoteError ("BoolExpected", GetPosition(2), TRUE) ; END IF; END

IfCont : BEGIN IF Evaluate(value, Boolean) THEN Execute (2) ; ELSE Execute (3) ; ENDIF; END };

elsif_part<> ::= tELSIF expression<> tTHEN stmt_list<> elsif part<> { ElsifType : BEGIN IF NOT TypesMatch(type_rep, BooleanType) THEN NoteError ("BoolExpected", GetPosition (2), TRUE) ; ENDIF; END

ElsifCont : BEGIN IF Evaluate(value, Boolean) THEN Execute (2) ; ELSE Execute (3) ; ENDIF; END };

elsif_part<> ::= tELSE stmt_list<> { ElseStart : BEGIN Execute (i) ; END };

elsif part<> ::= {}; 165

statement<> ::= designator<> ASSIGN expression<> ( CheckLHS : BEGIN IF IsKind(lhs_type, Module) THEN NoteError ("VarExpected", GetPosition (i), TRUE) ; ENDIF; END

AssignType : BEGIN IF NOT TypesMatch(lhs_type, rhs_type) THEN NoteError ("TypeMismatch", GetPosition (3) , TRUE) ; ENDIF; END

AssignCont : PRE HasProp(lhs, Size) BEGIN StoreValue (Evaluate (l_value, Memory), 0, r_value) ; END );

expression<> : := ' (' expression<> ' ) ' ( P arenCheck : BEGIN IF lhs_expected THEN NoteError ("VarExpected", GetPosition (2), TRUE) ; END IF; END };

expression<> ::= expression<> '=' expression<> LOCALS value : Integer; ( EQopLHS : BEGIN IF lhs_expected THEN NoteError ("VarExpected", GetPosition (2), TRUE) ; ENDIF; END

EQopType : BEGIN IF NOT TypesMatch(typel, type2) THEN NoteError ("TypeMismatch", GetPosition (3), TRUE) ; ENDIF; END

EQopCont : PRE HasProp (typel, Size) BEGIN value := Evaluate (valuel = value2, Integer) ; END );

expression<> ::= expression<> '+' expression<> LOCALS value : Integer; ( PLUSopLHS : BEGIN IF lhs_expected THEN NoteError ("VarExpected", GetPosition (2), TRUE) ; ENDIF; END

PLUSopType : BEGIN IF NOT TypesMatch(typel, IntegerType) THEN NoteError ("TypeMismatch", GetPosition (3), TRUE) ; ENDIF; IF NOT TypesMatch(type2, IntegerType) THEN NoteError ("TypeMismatch", GetPosition (3) , TRUE) ; ENDIF; END

PLUSopCont : BEGIN value := valuel + value2; END ); expression<> ::= CONSTANT<< ^ value, type_rep>>

CheckLHS : BEGIN IF lhs_expected THEN NoteError ("VarExpected", GetPosition (i), TRUE) ; ENDIF; END ); expression<> ::= designator<>

CheckDe sig: BEGIN IF IsKind(type_rep, Module) THEN NoteError ("ValueExpected", GetPosition (i), TRUE) ; ENDIF; END ); designator<> : := IDENTIFIER<< ^ name>> LOCALS type_rep: TypeSpec; sym: Symbol; found: Boolean; value, level: Integer;

IdDesigLookup : ONEOF PRE found WHERE found, sym := LookupName(look_scope, name) ; BEGIN END

PRE PASS () = LASTPASS BEGIN NoteError ("UndeclId", GetPosition (I) , TRUE) ; END ENDONEOF

IdDesigExpr: ONEOF PRE IsKind(sym, Module) AND HasProp(sym, ItsType) 167

BEGIN type_rep := GetProp(sym, ItsType) ; value := 0; END

PRE IsKind(sym, Constant) AND HasProp (sym, Value) AND HasProp(sym, ItsType) BEGIN type rep := GetProp(sym, ItsType) ; value := GetProp(sym, Value);

IF lhs_expected THEN NoteError ("VarExpected", GetPosition (I) , TRUE) ; END IF; END

PRE IsKind(sym, Type) BEGIN IF lhs_expected THEN NoteError ("VarExpected", GetPosition (I), TRUE) ; ELSE NoteError ("ValueExpected", GetPosition (I), TRUE) ; ENDIF; END

PRE IsKind(sym, Parameter) BEGIN type_rep := GetProp (sym, ItsType) ;

IF lhs_expected THEN IF GetProp(sym, ByReference) THEN value := FetchValue (GetLocal (GetProp (sym, Offset) , 0) , 0, Integer) ; ELSE value := Evaluate (GetLocal (GetProp (sym, Offset), 0), Integer) ; ENDIF; ELSE IF GetProp(sym, ByReference) THEN value := FetchValue (FetchValue (GetLocal (GetProp (sym, Offset) , 0) , 0, Memory), 0, Integer) ; ELSE value := FetchValue (GetLocal (GetProp (sym, Offset), 0), 0, Integer) ; ENDIF; ENDIF; END

PRE IsKind(sym, Variable) AND HasProp (sym, Offset) AND HasProp(sym, NestLevel) AND HasProp (sym, ItsType) AND HasProp (GetProp (sym, ItsType) , Size) BEGIN type_rep := GetProp(sym, ItsType) ;

level := GetProp(sym, NestLevel) ; IF level = 0 THEN IF lhs_expected THEN value := GetProp(sym, Offset); ELSE value := FetchValue (Evaluate (GetProp (sym, Offset) , Memo ry) , GetProp (GetProp (sym, InModule) , Offset) , 168

Integer) ; ENDIF; ELSE IF lhs_expected THEN value := Evaluate (GetLocal (GetProp (sym, Offset), nest level - level), Integer) ; ELSE value := FetchValue (GetLocal (GetProp (sym, Offset) , nest level - level), 0, Integer) ; ENDIF; ENDIF; END

PRE IsKind (sym, Routine) AND HasProp (sym, ItsType) AND HasProp (sym, Code) BEGIN type_rep := GetProp(sym, ItsType) ;

value := Evaluate (GetProp(sym, Code), Integer) ;

IF lhs_expected THEN NoteError ("VarExpected", GetPosition (3), TRUE) ; END IF; END ENDONEOF ); designator<> ::= designator<> ' ' IDENTIFIER<< ^ name>> LOCALS type_rep : TypeSpec; sym: Symbol; found: Boolean; value, level: Integer; ( QualDesigLookup: ONEOF PRE IsKind (desig_type, Module) AND HasProp (desig_type, Fieldm) BEGIN found, sym := FindName (GetProp(desig_type, Fields), name) ; IF NOT found THEN NoteError ("IdNotExported", GetPosition (3), TRUE) ; ENDIF; END

PRE IsKind (desig_type, Record) AND HasProp (desig_type, Fields) BEGIN found, sym := FindName(GetProp(desig_type, Fields) , name) ; IF NOT found THEN NoteError ("UndeclFieldId", GetPosition (3), TRUE) ; ENDIF; END

BEGIN NoteError ("ModuleOrRecordExpected", GetPosition (i) , TRUE) ; END ENDONEOF

QualDesigExpr: ONEOF PRE found AND IsKind(sym, Module) AND HasProp(sym, ItsType) BEGIN type_rep := GetProp(sym, ItsType) ; 169

value := 0; IF NOT lhs_expected THEN NoteError ("ValueExpected", GetPosition (3), TRUE) ; ENDIF; END

PRE found AND IsKind(sym, Constant) AND HasProp (sym, Value) AND HasProp (sym, ItsType) BEGIN type_rep := GetProp(sym, ItsType) ; value := GetProp(sym, Value);

IF lhs_expected THEN NoteError ("VarExpected", GetPosition (3) , TRUE) ; ENDIF; END

PRE found AND IsKind(sym, Type) BEGIN IF lhs_expected THEN NoteError ("VarExpected", GetPosition (3), TRUE) ; ELSE NoteError ("ValueExpected", GetPosition (3) , TRUE) ; ENDIF; END

PRE found AND IsKind(sym, Parameter) BEGIN type_rep := GetProp(sym, ItsType) ;

IF lhs_expected THEN IF GetProp(sym, ByReference) THEN value := FetchValue (GetLocal (GetProp(sym, Offset), 0) , 0, Integer) ; ELSE value :_ Evaluate (GetLocal (GetProp (sym, Offset) , 0) , Integer) ; ENDIF; ELSE IF GetProp (sym, ByReference) THEN value := FetchValue (FetchValue (GetLocal (GetProp (sym, Offset) , 0) , 0, Memory) , 0, Integer) ; ELSE value := FetchValue (GetLocal (GetProp(sym, Offset), 0), 0, Integer) ; ENDIF; ENDIF; END

PRE found AND IsKind(sym, Variable) AND HasProp (sym, Offset) AND HasProp (sym, NestLevel) AND HasProp (sym, ItsType) AND HasProp (GetProp (sym, ItsType) , Size) BEGIN type_rep := GetProp (sym, ItsType) ;

level := GetProp(sym, NestLevel); IF level = 0 THEN IF lhs_expected THEN value := GetProp(sym, Offset); ELSE value := FetchValue (Evaluate (GetProp (sym, Offset), Memo ry) , GetProp (GetProp (sym, InModule), 170

Offset), Integer) ; ENDIF; ELSE IF lhs_expected THEN value := Evaluate (GetLocal (GetProp (sym, Offset), nest level - level), Integer) ; ELSE value := PetchValue (GetLocal (GetProp (sym, Offset) , nest level - level), 0, Integer) ; ENDIF; ENDIF; END

PRE found AND IsKind(sym, Routine) AND Ha.Prop (sym, It.Type) AND Ha.Prop (sym, Code) BEGIN type_rep := GetProp(sym, It.Type) ;

value := Evaluate(GetProp(sym, Code), Integer) ;

IF lhs_expected THEN NoteError ("VarExpected", GetPosition (3), TRUE) ; ENDIF; END

PRE found AND IsKind(sym, Field) AND HasProp(sym, Offset) AND Ha.Prop (sym, It.Type) AND HasProp (GetProp (sym, ItsType) , Size) BEGIN type_rep := GetProp(sym, It.Type) ;

IF lhs_expected THEN value := desig_value + GetProp(sym, Offset); ELSE value := FetohValue (Evaluate (desig_value + GetProp (sym, Offset), Memo ry) , 0, Integer) ; ENDIF; END ENDONEOF }; designator<> :-= designator<> '^' LOCALS type_rep: TypeSpec; value: Integer;

DesigDeref : ONEOF PRE NOT IsKind(desig_type, Pointer) OR Ha.Prop (desig_type, BaseType) BEGIN IF IsKind(desiq_type, Pointer) THEN NoteError ("PtrExpected", GetPosition (I), TRUE) ; ELSE type rep := GetProp(desig_type, BaseType); ENDIF; END ENDONEOF

P t rExpr: PRE lhs_expected OR HasProp(desig_type, Size) 171

BEGIN IF lhs_expected THEN value := FetchValue (Evaluate (value, Memory) , 0, Integer) ; ELSE value := FetchValue (FetchValue (Evaluate (value, Memory), 0, Memory) , 0, Integer) ; ENDIF; END ); designator<> ::= array_ref<> ,], LOCALS value : Integer; ( ArrayLHS: ONEOF PRE NOT lhs_expected AND HasProp(type_rep, Size) BEGIN value := FetchValue (Evaluate (value, Memory) , 0, Integer) ; END

BEGIN value := array_value; END ENDONEOF );

array_ref<> ::= array_ref<> ',' expression<> LOCALS type_rep: TypeSpec; value: Integer; ( MultDimArrayExpr: PRE HasProp (array_type, Size) BEGIN value := array_value + index_value * GetProp(array_type, Size); END

MultDimArrayIndex: PRE IsKind(array_type, Array) AND HasProp (array_type, IndexType) BEGIN IF NOT TypesMatch(index_type, GetProp(array_type, IndexType)) TBEN NoteError ("TypeMismatch", GetPosition (3), TRUE) ; ENDIF; END

MultDimArrayDesig: ONEOF PRE NOT IsKind(array_type, Array) OR HasProp (array_type, BaseType) BEGIN IF IsKind(array_type, Array) THEN type_rep := GetProp (array_type, BaseType) ; ELSE NoteError ("ArrayExpected", GetPosition (I) , TRUE) ; ENDIF; END ENDONEOF );

array_ref<> ::= designator<> '[' expression<> LOCALS type_rep: TypeSpec; 172

value : Integer;

FirstDimArrayExpr: PRE HasProp (array_type, Size) BEGIN value := array_value + index_value * GetProp(array_type, Size); END

Fir stDimArrayIndex: PRE IsKind(array_type, Array) AND HasProp (array_type, IndexType) BEGIN IF NOT TypesMatch(index_type, GetProp(array_type, IndexType)) THEN NoteError ("TypeMismatch", GetPosition (3) , TRUE) ; ENDIF; END

FirstDimArrayDesig: ONEOF PRE NOT IsKind(array_type, Array) OR HasProp (array_type, BaseType) BEGIN IF IsKind(array_type, Array) THEN type_rep := GetProp(array_type, BaseType) ; ELSE NoteError ("ArrayExpected", GetPosition (I), TRUE) ; ENDIF; END ENDONEOF ); expression<> : := call<>

ExprCal iCheck: BEGIN IF type_rep = NoType THEN NoteError ("ProcedureNotAllowed", GetPosition (I), TRUE) ; ENDIF; END

ExprCal ILHS - BEGIN IF lhs_expected THEN NoteError ("Var expected", GetPosition (I), TRUE) ; ENDIF; END };

call<> ::= expression<> ' (' opt_actuals<> ' ) ' LOCALS formals: SymbolLi st; value : Integer; type_rep: TypeSpec;

CheckCall : BEGIN IF IsKind(proc_type, Routine) THEN formals := GetProp(proc_type, Formals); type_rep := GetProp(proc_type, BaseType); ELSE NoteError ("RoutineExpected", GetPosition (i) , TRUE) ; END IF; END

ExecCall : BEGIN 173

Execute (2) ; value := Evaluate (Call (Evaluate (proc_value, Continuation) ), Integer) ; END }; opt_actuals<> .--

opt_actuals<> : := actual list<>

actual_list<> ::= actual_list<>

' t ' expression<> LOCALS out formals: SymbolList; formal: Symbol;

ActualsFormal : BEGIN formal := ListHead(mid_formals) ; out_formals := ListTail (mid_formals) ; END

Actual sType : BEGIN IF NOT TypesMatch(GetProp(formal, ItsType), type_rep) THEN NoteError ("ActualsMismatch", GetPosition (3), TRUE) ; ENDIF; END

Actual sExpr : BEGIN Evaluate (value) ; END );

actual_list<> : := expression<> LOCALS out formals: SymbolList; f°rmal : Symbol;

ActualFormal : BEGIN formal := ListHead(in_formals) ; out formals := ListTail (in_formals) ; END

Actual Type : BEGIN IF NOT TypesMatch(GetProp(formal, ItsType), type_rep) THEN NoteError ("ActualMismatch", GetPosition (i), TRUE) ; ENDIF; END

ActualExpr : BEGIN Evaluate (value) ; END ); References

[Ada 83] American National Standard Reference Manual for the Ada Programming Language ANSI/US Dept. of Defense, 1983. ANSI/MIL-STD 1815A-1983.

[Appel 85] A.W. Appel. Compile-time Evaluation and Code Generation for Semantics-directed Compilers. PhD thesis, Carnegie-Mellon University, July, 1985. CMU-CS-85-147.

[Banatre 79] J.P. Banatre, J. P. Routeau, L. Trilling. An Event-Driven Compiling Technique. Communications of the ACM 22(1):34-42, January, 1979. [Bobrow 73] D.G. Bobrow, B. Wegbreit. A Model and Stack Implementation of Multiple Environments. Communications of the ACM 16(10):591-603, October, 1973. [Bochmann 76] G.V. Bochmann. Semantic Evaluation from Left to Right. Communications of the ACM 19(2):55-62, February, 1976. [Borning 82] A.H. Borning, D. H. H. Ingalls. A Type Declaration and Inference System for SmaUtalk. In Ninth Annual ACM Symposium on Principles of Programming Languages, pages 133-141. ACM, January, 1982. [Campbell 84] R.H. Campbell, P. A. Kirslis. The SAGA Project: A System for Software Development. In Proceedings of the ACM SIGSOFTISIGPLAN Software Engineering Symposium on Practical Software Development Environments, pages 73-80. ACM, May, 1984. [Cooper 86] K.D. Cooper, K. Kennedy, L. Torczon. Interprocedural Optimization: Eliminating Unnecessary Recompilation. In Proceedings of the SIGPLAN '86 Symposium on Compiler Construction, pages 58-67. ACM, June, 1986.

[Demers 81] A. Demers, T. Reps, T. Teitelbaum. Incremental Evaluation for Attribute Grammars with Application to Syntax-directed Editors. In Eighth Annual ACM Symposium on Principles of Programing Languages, pages 105-116. ACM, January, 1981. [Dolotta 80] T.A. Dolotta, J. R. Mashey. Using a Command Language as the Primary Programming Tool. In D. Beech (editor), TC 2.7 Working Conference on Command Languages: Command Language Directions, pages 35-48. IFIP, Amsterdam, 1980. [Duff 86] C.B. Duff. Designing an Efficient Language. BYTE 11(8):211-224, August, 1986. [Ershov 77] A.P. Ershov, V. E. Intkin. Correctness of Mixed Computation in Algol-like Programs. In Lecture Notes in Computer Science. Volume 53: Proceedings of the Sixth Mathematical Foundations of Computer Science Symposium, pages 59-77. Springer- Verlag, 1977. [Farrow 83] R. Farrow. Attribute Grammars and Data-Flow Languages. Proceedings of SIGPLAN '83 Symposium on Programming Language Issues in Software Systems 18(6):28-40, June, 1983. [Farrow 84] R. Farrow. Generating a Production Compiler from an Attribute Grammar. IEEE Software 1(4):77-93, October, 1984. [Farrow 86] R. Farrow. Automatic Generation of Fixed-Point-Finding Evaluators for Circular, but Well-Defined, Attribute Grammars. In Proceedings of the SIGPLAN '86 Symposium on Compiler Construction, pages 85-98. ACM, June, 1986. [Feldman 79] S. Feldman. Make - a Computer Program for Maintaining Computer Programs. Software Practice and Experience 9(3):225-265, March, 1979. [Glanville 78] R.S. Glanville, S. L. Graham. A New Method for Compiler Code Generation. In Fifth Annual ACM Symposium on Principles of Programming Languages, pages 231-240. ACM, January, 1978. [Goldberg 83] A. Goldberg, D. Robson. Smalltalk-80: The Language and its Implementation. Addison-Wesley, Reading, Massachusetts, 1983. [Goodwin 81] J.W. Goodwin. Why Programming Environments Need Dynamic Data Types. IEEE Transactions on Software Engineering SE-7(5):451-457, September, 1981. [Haraldsson 78] A. Haraldsson. A Partial Evaluator, and Its Use for Compiling Iterative Statements in Lisp. In Fifth Annual ACM Symposium on Principles of Programing Languages, pages 195-202. ACM, January, 1978. [Heering 85] J. Heering, P. Klint. Towards Monolingual Programming Environments. ACM Transactions on Programming Languages and Systems 7(2): 183-213, April, 1985. [Hilfinger 81] P.N. Hilfinger. Abstraction Mechanisms and Language Design. PhD thesis, Carnegie-Mellon University, June, 1981. CMU-CS-81-147.

[Hoover 86] R. Hoover, T. Teitelbaum. Efficient Incremental Evaluation of Aggregate Values in Attribute Grammars. In Proceedings of the SIGPLAN '86 Symposium on Compiler Construction, pages 39-50. ACM, June, 1986.

[Jazayeri 75a] M. Jazayeri, W. F. Ogden, W. C. Rounds. The Intrinsically Exponential Complexity of the Circularity Problem for Attribute Grammars. Communications of the ACM 18, 1975. [Jazayeri 75b] M. Jazayeri, K. G. Walter. Alternating Semantic Evaluators. In Proceedings of the ACM 75 Annual Conference, pages 230-234. ACM, 1975. [Jazayeri 81] M. Jazayeri, D. Pozefsky. Space-Efficient Storage Management in an Attribute Grammar Evaluator. Transactions on Programming Languages and Systems 3(4):388-404, October, 1981. [Jensen 74] Kathleen Jensen, Niklaus Wirth. 2nd Edition: Pascal User Manual and Report. Springer-Verlag, Berlin, 1974. [Johnson 75] S.C. Johnson. YACC -- Yet Another Compiler Compiler. Computer Science Tech. Rep. 32, Bell Laboratories, Murray Hill, N.J., July, 1975. See also UNIX Programmer's Manual 2, Section 19 (January 1979), Bell Laboratories, Murray Hill, N.J. [Jones 76] N.D. Jones, S. S. Muchnick. Binding Time Optimization in Programming Languages: Some Thoughts toward the Design of an Ideal Language. In Third Annual ACM Symposium on Principles of Programing Languages, pages 77-94. ACM, January, 1976. [Kaiser 86] G.E. Kaiser. Generation of Run-Time Environments. In Proceedings of the SIGPLAN '86 Symposium on Compiler Construction, pages 51-57. ACM, June, 1986. [Katayama 84] T. Katayama. Translation of Attribute Grammars into Procedures. ACM Transactions on Programming Languages and Systems 6(3):345-369, July, 1984. [Kennedy 76] K. Kennedy, S. K. Warren. Automatic Generation of Efficient Evaluators for Attribute Grammars. In Third Annual ACM Symposium on Principles of Programing Languages, pages 32-49. ACM, January, 1976. [Knuth 68] D.E. Knuth. Semantics of Context-Free Languages. Mathematical Systems Theory Journal 2:127-145, 1968. [Knuth 71] D.E. Knuth. Semantics of Context-Free Languages, correction. Mathematical Systems Theory Journal 5:95, 1971. [Lesk 75] M.E. Lesk. Lex -- A Lexical Analyzer Generator. Computer Science Tech. Rep. 39, Bell Laboratories, Murray Hill, N.J., October, 1975. [Liskov 77] B. Liskov, et. al. Abstraction Mechanisms in CLU. Communications of the ACM 20(8):564-576, August, 1977. [Marcotty 76] M. Marcotty, H. F. Ledgard, G. V. Bochmann. A Sampler of Formal Definitions. Computing Surveys 8(2): 191-276, June, 1976. [Mashey 76] J.R. Mashey. Using a Command Language as a High-Level Programming Language. In Proceedings of the Second International Conference on Software Engineering, pages 169-176. IEEE, October, 1976. [Medina-Mora 81]R. Medina-Mora, P. H. Feiler. An Incremental Programming Environment. IEEE Transactions on Software Engineering SE-7(5):472-482, September, 1981. [Morris 81] J.M. Morris, M. D. Schwartz. The Design of a Language-Directed Editor for Block-Structured Languges. Proceedings ACM-SIGPLAN-SIGOA Symposium on Text Manipulation 16(6):28-33, June, 1981. [Moses 70] J. Moses. The Function of FUNCTION in LISP or Why the FUNARG Problem Should be Called the Environment Problem. MIT Technical Memo AI-199, MIT Project MAC, June, 1970. [Paulson 82] L. Paulson. A Semantics-Directed Compiler Generator. In Ninth Annual ACM Symposium on Principles of Programming Languages, pages 224-233. ACM, January, 1982. [Pratt 75] T.W. Pratt. Programming Languages: Design and Implementation. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1975. [Reiss 83] S.P. Reiss. Generation of Compiler Symbol Processing Mechanisms. ACM Transactions on Programming Languages and Systems 5(2): 127-163, April, 1983.

[Reps 83] T. Reps, T. Teitelbaum, A. Demers. Incremental Context-Dependent Analysis for Language-Based Editors. ACM Transactions on Programming Languages and Systems 5(3):449-477, July, 1983. [Reps 84] T. Reps, T. Teitelbaum. The Synthesizer Generator. In Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, pages 42-48. ACM, May, 1984. [Rosenberg 83] J. Rosenberg. Generating Compact Code for Generic Subprograms. PhD thesis, Carnegie-Mellon University, August, 1983. CMU-CS-83-150.

[Schooler 84] R. Schooler. Partial Evaluation as a Means of Language Extensibility. Master's thesis, Massachusetts Institute of Technology, September, 1984. [Schwartz 86] M.D. Schwartz, N. M. Delisle, V. S. Begwani. Incremental Compilation in Magpie. In Proceedings of the SIGPLAN '84 Symposium on Compiler Construction, pages 122-131. ACM, June, 1986. [Sethi 83] R. Sethi. Control Flow Aspects of Semantics-Directed Compiling. ACM Transactions on Programming Languages and Systems 5(4): 554-595, October, 1983.

[Shaw 80] M. Shaw, W. A. Wulf. Toward Relaxing Assumptions in Languages and Their Implementations. SIGPLAN Notices 15(3):45-61, March, 1980. [Stallman 81] R.M. Stallman. EMACS: The Extensible, Customizable Self-Documenting Display Editor. Proceedings A CM-SIGPLAN-SIGOA Symposium on Text Manipulation 16(6): 147-156, June, 1981.

[Steele 84] Guy L. Steele, Jr. Common LISP: The Language. Digital Press, 1984. [Suzuki 81] N. Suzuki. Inferring Types in Smalltalk. In Eighth Annual ACM Symposium on Principles of Programming Languages, pages 187-199. ACM, January, 1981. [Swinehart 85] D.C. Swinehart, P. T. ZeUweger, R. B. Hagmann. The Structure of Cedar. In ACM SIGPLAN 85 Symposium on Language Issues in Programming Environments, pages 230-244. ACM, July, 1985. [Teitelbaum 81] T. Teitelbaum, T. Reps. The Comell Program Synthesizer:. A Syntax-directed Programming Environment. Communications of the ACM 24(9):563-573, September, 1981. [Teitelman 84] W. Teitelman. The Cedar Programming Environment: A Midterm Report and Examination. Technical Report CSL-83-11, Xerox PARC, June, 1984. [Tennent 76] R.D. Tennent. The Denotational Semantics of Programming Languages. Communications of the ACM 19(8):437-453, August, 1976. [Wall 86] D.W. Wall. Global Register Allocation at Link Time. In Proceedings of the SIGPLAN '86 Symposium on Compiler Construction, pages 264-275. ACM, June, 1986. [Watt 79] D.A. Watt. An Extended Attribute Grammar for Pascal. ACM SIGPLAN Notices 14(2):60-74, February, 1979. [Watt 83] D.A. Watt, O. L. Madsen. Extended Attribute Grammars. The Computer Journal 26(2):142-153, May, 1983. [Webster 70] D.B. Guralnick, Ed. in Chief. Webster' s New Worm Dictionary of the American Language. The World Publishing Co., New York, 1970. [Wegbreit 74] B. Wegbreit. The Treatment of Data Types in ELI. Communications of the ACM 17(5):251-264, May, 1974. [Weissman 67] Clark Weissman. Lisp 1.5 Primer. Dickenson Publishing Co., Inc., Belmont, California, 1967. [Wirth 82] N. Wirth. Programming in Modula-2. Springer-Verlag, Berlin, 1982.