Standard ML of New Jersey
Andrew W. Appel∗ David B. MacQueen Princeton University AT&T Bell Laboratories CS-TR-329-91, Dept. of Computer Science, Princeton University, June 1991 This paper appeared in Third Int’l Symp. on Prog. Lang. Implementation and Logic Programming, Springer-Verlag LNCS 528, pp. 1–13, August 1991.
Abstract free grammar, semantically analyzed into an an- notated abstract syntax tree, type-checked, and The Standard ML of New Jersey compiler has been translated into a lower-level intermediate language. under development for five years now. We have This is the “front end” of the compiler. Then developed a robust and complete environment for the intermediate language—Continuation-Passing Standard ML that supports the implementation of Style—is “optimized,” closures are introduced to large software systems and generates efficient code. implement lexical scoping, registers are allocated, The compiler has also served as a laboratory for de- target-machine instructions are generated, and (on veloping novel implementation techniques for a so- RISC machines) instructions are scheduled to avoid phisticated type and module system, continuation pipeline delays; these together constitute the “back based code generation, efficient pattern matching, end.” and concurrent programming features.
1 Introduction 2Parsing Early in the development of the compiler we used Standard ML of New Jersey is a compiler and a hand-written lexical analyzer and a recursive- programming environment for the Standard ML descent parser. In both of these components the language[26] that has been continuously developed code for semantic analysis was intermixed with since early 1986. Our initial goal was to produce the parsing code. This made error recovery dif- a working ML front end and interpreter for pro- ficult, and it was difficult to understand the syn- gramming language research, but the scope of the tax or semantics individually. We now have ex- project has expanded considerably. We believe that cellent tools[8, 32] for the automatic generation of Standard ML may be the best general-purpose pro- lexical analyzers and error-correcting parsers. Syn- gramming language yet developed; to demonstrate tactic error recovery is handled automatically by this, we must provide high-quality, robust, and effi- the parser generator, and semantic actions are only cient tools for software engineering. evaluated on correct (or corrected) parses. This has Along the way we have learned many useful greatly improved both the quality of the error mes- things about the design and implementation of sages and the robustness of the compiler on incor- “modern” programming languages. There were rect inputs. We remark that it would have been some unexpected interactions between the mod- helpful if the definition of Standard ML[26] had in- ule system, type system, code generator, debugger, cluded an LR(1) grammar for the language. garbage collector, runtime data format, and hard- There are two places in the ML grammar that ware; and some things were much easier than ex- appear not to be context free. One is the treat- pected. We wrote an early description of the com- ment of data constructors: according to the def- piler in the spring of 1987[7], but almost every com- inition, constructor names are in a different lexi- ponent of the compiler has since been redesigned cal class than variable names, even though the dis- and reimplemented at least once, so it is worthwhile tinction depends on the semantic analysis of pre- to provide an updated overview of the system and vious datatype definitions. However, by putting our implementation experience. constructors and variables into the same class of Our compiler is structured in a rather conven- lexical tokens, and the same name space, parsing tional way: the input stream is broken into tokens can be done correctly and the difference resolved in by a lexical analyzer, parsed according to a context- semantic analysis. ∗Supported in part by NSF grant CCR-9002786. The other context-dependent aspect of syntax is
1 the parsing of infix identifiers. ML allows the pro- to error messages. Furthermore, these line num- grammer to specify any identifier as infix, with an bers are sprinkled into the annotated abstract syn- operator precedence ranging from 0 to 9. Our solu- tax tree so that the type checker, match compiler, tion to this problem is to completely ignore operator and debugger can also give good diagnostics. precedence in writing our LALR(1) grammar; the expression a+b∗c is parsed into the list [a, +,b,∗,c] and the semantic analysis routines include a simple 3Semanticanalysis operator precedence parser (35 lines of ML). Each production of our grammar is annotated by A static environment maps each variable of the pro- a semantic action, roughly in the style made pop- gram to a binding containing its type and its runtime ular by YACC[16]. Our semantic actions are writ- access information. The type is used for compile- ten like a denotational semantics or attribute gram- time type checking, and is not used at runtime. The mar, where each fragment is a function that takes access information is (typically) the name of a low- inherited attributes as parameters and returns syn- level λ-calculus variable that will be manipulated by thesized attributes as results. Within the actions the code generator. Static environments also map there are occasional side-effects; e.g. when the type- other kinds of identifiers—data constructors, type checker performs unification by the modification of constructors, structure names, etc.—to other kinds ref-cells. of bindings. A complete parse yields a function p parameter- Our initial implementation treated environments ized by a static environment e (of identifiers defined imperatively: the operations on environments were in previous compilation units, etc.). No side-effects to add a new binding to the global environment; occur until p is applied to e,atwhichpointe is to “mark” (save) the state of the environment; to distributed by further function calls to many levels revert back to a previous mark; and, for imple- of the parse tree. In essence, before p is applied to mentation of the module system, to encapsulate e it is a tree of closures (one pointing to the other) into a special table everything added since a par- that is isomorphic to the concrete parse tree of the ticular mark. We did this even though we knew program. Yet we have not had to introduce a myr- better—denotational semantics or attribute gram- iad of data constructors to describe concrete parse mars would have us treat environments as pure val- trees! ues, to be combined to yield larger environments— Delaying the semantic actions is useful to the because we thought that imperative environments error-correcting parser. If an error in the parse oc- would be faster. curs, the parser might want to correct it at a point We have recently changed to a pure functional 10 tokens previous; this means discarding the last style of environments, in which the operations are few semantic actions. Since the actions have had no to create an environment with a single binding, and side-effects, it is easy to discard them. Then, when to layer one environment on top of another nonde- a complete correct parse is constructed, its seman- structively, yielding a new environment. The im- tic value can be applied to the environment e and plementation of this abstract data type has side all the side-effects will go off in the right order. effects, as sufficiently large environment-values are Finally, the treatment of mutually-recursive defi- represented as hash tables, etc. We made this nitions is easier with delayed semantic actions; the change to accommodate the new debugger, which newly-defined identifiers can be entered into the must allow the user to be in several environments environment before the right-hand-sides are pro- simultaneously; and to allow the implementation of cessed. “make” programs, which need explicit control over There is one disadvantage to this arrangement. the static environments of the programs being com- It turns out that the closure representation of the piled. Though we were willing to suffer a perfor- concrete parse tree is much larger than the anno- mance degradation in exchange for this flexibility, tated parse tree that results from performing the we found “pure” environments to be just as fast as semantic actions. Thus, if we had used a more con- imperative ones. ventional style in which the actions are performed This illustrates a more general principle that we as the input is parsed, the compiler would use less have noticed in ML program development. Many memory. parts of the compiler that we initially implemented Our parser-generator provides, for each nonter- in an imperative style have been rewritten piece- minal in the input, the line number (and position meal in a cleaner functional style. This is one within the line) of the beginning and end of the pro- of the advantages of ML: programs (and program- gram fragment corresponding to that nonterminal. mers) can migrate gradually to “functional” pro- These are used to add accurate locality information gramming. Type checking when it interferes with the matching of a signature specification merely because of the use of an imper- The main type-checking algorithm has changed rel- ative style within a function’s definition. Such im- atively little since our earlier description[7]. The plementation choices should be invisible in the type. representations of types, type constructors, and Research continues on this problem[17, 22, 38], but type variables have been cleaned up in various ways, there is no satisfactory solution yet. but the basic algorithm for type checking is still The interface between the type checker and the based on a straightforward unification algorithm. parser is quite simple in most respects. There is The most complex part of the type-checking al- only one entry point to the type checker, a function gorithm deals with weak polymorphism, a restricted that is called to type-check each value declaration at form of polymorphism required to handle mutable top level and within a structure. However, the inter- values (references and arrays), exception transmis- face between type checking and the parser is com- sion, and communication (in extensions like Con- plicated by the problem of determining the scope or current ML[28]). Standard ML of New Jersey im- binding point of explicit type variables that appear plements a generalization of the imperative type in a program. The rather subtle scoping rules for variables described in the Definition[26, 34]. In our these type variables[26, Section 4.6][25, Section 4.4] scheme, imperative type variables are replaced by force the parser to pass sets of type variables both weak type variables that have an associated degree upward and downward (as both synthesized and in- of weakness: a nonnegative integer. A type vari- herited attributes of phrases). Once determined, able must be weak if it is involved in the type of the set of explicit type variables to be bound at a an expression denoting a reference, and its degree definition is stored in the abstract syntax represen- of weakness roughly measures the number of func- tation of the definition to make it available to the tion applications that must take place before the typechecker. reference value is actually created. A weakness de- gree of zero is disallowed at top level, which insures that top-level reference values (i.e. those existing 4 Modules within values in the top level environment) have monomorphic types. The type-checking algorithm The implementation of modules in SML of NJ has uses an abstract type occ to keep track of the “ap- evolved through three different designs. The main plicative context” of expression occurrences, which innovation of the second version factored signatures is approximately the balance of function abstrac- into a symbol table shared among all instances, tions over function applications surrounding the ex- and a small instantiation environment for each pression, and the occ value at a variable occurrence instance[23]. Experience with this version revealed determines the weakness degree of generic type vari- problems that led to the third implementation de- ables introduced by that occurrence. The occ value veloped in collaboration with Georges Gonthier and at a let binding is also used to determine which type Damien Doligez. variables can be generalized. The weak typing scheme is fairly subtle and has been prone to bugs, so it is important that it be for- Representations malized and proven sound (as the Tofte scheme has At the heart of the module system are the internal been [Tofte-thesis]). There are several people cur- representations of signatures, structures, and func- rently working on formalizing the treatment used in tors. Based on these representations, the following the compiler[17, 38]. principal procedures must be implemented: The weak polymorphism scheme currently used in Standard ML of New Jersey is not regarded as the fi- 1. signature creation—static evaluation of signa- nal word on polymorphism and references. It shares ture expressions; with the imperative type variable scheme the fault that weak polymorphism propagates more widely 2. structure creation—static evaluation of struc- than necessary. Even purely internal and tempo- ture expressions; rary uses of references in a function definition will often “poison” the function, giving it a weak type. 3. signature matching between a signature and a An example is the definition structure, creating an instance of the signature, and a view of the structure; fun f x = !(ref x) in which f has the type 1α → 1α, but ought to have 4. definition of functors—abstraction of the func- the strong polymorphic type α → α. This inessen- tor body expression with respect to the formal tial weak polymorphism is particularly annoying parameter; 5. functor application—instantiation of the for- to refer to volatile components, we can avoid the mal parameter by matching against the ac- bound stamps by using this relativized symbol ta- tual parameter, followed by instantiation of the ble alone to represent signatures. functor body. To drop the instantiation environment part of the signature representation, leaving only the symbol It is clear that instantiation of structure tem- table part, we need to revise the details of how envi- plates (i.e. signatures and functor bodies) is a criti- ronments are represented. Formerly a substructure cal process in the module system. It is also a process specification would be represented in the symbol ta- prone to consume excessive space and time if imple- ble by a binding like mented naively. Our implementation has achieved reasonable efficiency by separating the volatile part A → INDstr i of a template, that which changes with each in- stance, from the stable part that is common to all indicating that A is the ith substructure, and the instances and whose representation may therefore rest of the specification of A (in the form of a dummy be shared by all instances. The volatile compo- structure) would be found in the ith slot of the in- nents are stored in an instantiation environment stantiation environment. Since we are dropping the and they are referred to indirectly in the bindings dummy instantiation environment we must have all in the shared symbol table (or static environment) the information specifying A in the binding. Thus using indices or paths into the instantiation envi- the new implementation uses ronment. The instantiation environment is repre- → { } sented as a pair of arrays, one for type constructor A FORMALstrb pos = i, spec = sig A components, the other for substructures. The static representation of a structure is essen- as the binding of A. This makes the substructure tially an environment (i.e., symbol table) contain- signature specification available immediately in the ing bindings of types, variables, etc., and an iden- symbol table without having to access it indirectly tifying stamp[26, 33, 23]. In the second implemen- through an instantiation environment. tation a signature was represented as a “dummy” Another improvement in the representation of instance that differs from an ordinary structure signatures (and their instantiations) has to do with in that its volatile components contain dummy or the scope of instantiation environments. In the old bound stamps and it carries some additional infor- implementation each substructure had its own in- mation specifying sharing constraints. The volatile stantiation environment. But one substructure may components with their bound stamps are replaced, contain relative references to a component of an- or instantiated, during signature matching by cor- other substructure, as in the following example responding components from the structure being signature S1 = matched. Similarly, a functor body is represented sig as a structure with dummy stamps that are replaced structure A : sig type t end by newly generated stamps when the functor is ap- structure B : sig val x : A.t end plied. end The problem with representing signatures (and Here the type of B.x refers to the first type com- functor bodies) as dummy structures with bound ponent t of A. This would be represented from the stamps is the need to do alpha-conversion at var- standpoint of B as a relative path [parent, first sub- ious points to avoid confusing bound stamps. To structure, first type constructor]. To accommodate minimize this problem the previous implementation these cross-structure references when each struc- insures that the sets of bound stamps for each signa- ture has a local instantiation environment, the first ture and functor body are disjoint. But there is still structure slot in the instantiation environment con- a problem with signatures and functors that are sep- tains a pointer to the parent signature or structure. arately compiled and then imported into a new con- Defining and maintaining these parent pointers was text; here alpha-conversion of bound stamps is re- another source of complexity, since it made the rep- quired to maintain the disjointness property. Man- resentation highly cyclical. aging bound stamps was a source of complexity and The new representation avoids this problem by bugs in the module system. having a single instantiation environment shared by The usual way of avoiding the complications of the top level signature and all its embedded signa- bound variables is to replace them with an index- tures. An embedded signature is one that is writ- ing scheme, as is done with deBruijn indices in the ten “in-line” like the signatures of A and B in the lambda calculus[13]. Since in the symbol table part example above. In the above example, the new rep- we already used indices into instantiation arrays resentation of A.t within B is [first type constructor] since A.t will occupy the first type constructor slot relativized to this instantiation environment by re- in the shared instantiation environment. placing direct references by indirect paths. As in the A nonembedded signature is one that is defined case of signature matching, this minimizes the effort at top level and referred to by name. The signa- required to create an instance of the body when the ture S0 in the following example is a nonembedded functor is applied, because the symbol table infor- signature. mation is inherited unchanged by the instance. signature S0 = sig type t end Defining a functor is done in three steps: (1) The signature S1 = formal parameter signature is instantiated to create sig a dummy parameter structure. (2) This dummy structure A : S0 structure is bound to the formal parameter name structure B : sig val x : A.t end in the current environment and the resulting envi- end ronment is used to parse and type-check the func- In this case the type A.t of x uses the indi- tor body expression. If a result signature is spec- rect reference [first substructure, first type construc- ified the functor body is matched against it. (3) tor] meaning the first type constructor in the local The resulting body structure is scanned for volatile instantiation environment of A,whichisthefirst components, identified by having stamps belonging structure component in the instantiation environ- to the dummy parameter or generated within the ment of S1. S1 and B share a common instantiation body, and references to these volatile components environment because B is embedded in S1.ButS0, are replaced by indirect positional references into the signature of A, is nonembedded because it was an instantiation environment. defined externally to S1. It therefore can contain The instantiation of the parameter signature no references to other components of S1 and so it must produce a structure that is free modulo the is given its own private instantiation environment sharing constraints contained in the signature. In having the configuration appropriate to S0. other words, it must satisfy the explicit sharing constraints in the signature and all implicit shar- ing constraints implied by them, but there must Signature Matching by no extraneous sharing. The algorithm used for The goal of the representation of signatures is to this instantiation process is mainly due to George make it easy to instantiate them via signature Gonthier and is vaguely related to linear unification matching. A signature is a template for struc- algorithms. This instantiation process is also used tures, and a structure can be obtained from the sig- to create structures declared as abstractions using nature by adding an appropriate instantiation en- the abstraction declaration of Standard ML of New vironment (and recursively instantiating any sub- Jersey (a nonstandard extension of the language). structures with nonembedded signature specifica- Given this processing of the functor definition, tions). functor application is now a fairly straightforward The signature matching process involves the fol- process. The actual parameter is matched with the lowing steps: (1) Create an empty instantiation en- formal parameter signature yielding an instantia- vironment of a size specified in the signature repre- tion environment relative to the parameter signa- sentation. (2) For each component of the signature, ture. This is combined with a new instantiation in the order they were specified, check that there is a environment generated for the functor body using corresponding component in the structure and that freshly generated stamps in new volatile compo- this component satisfies the specification. When nents. this check succeeds it may result in an instance of a volatile component (e.g. a type constructor) that is entered into the new instantiation environment. (3) 5 Translation to λ-language Finally, having created the instantiation structure, any sharing constraints in the signature are verified During the semantic analysis phase, all static pro- by “inspection.” gram errors are detected; the result is an abstract parse tree annotated with type information. This Functors is then translated into a strict lambda calculus aug- mented with data constructors, numeric and string The key idea is to process a functor definition to constants, n-tuples, mutually-recursive functions; isolate volatile components of the result (those de- and various primitive operators for arithmetic, ma- riving from the parameter and those arising from nipulation of refs, numeric comparisons, etc. The generative declarations in the body) in an instanti- translation into λ-language is the phase of our com- ation environment. Then the body’s symbol table is piler that has changed least over the years. Though the λ language has data constructors, it equality function also uses these tags, so even with does not have pattern-matches. Instead, there is a a sophisticated garbage collector they can’t be done very simple case statement that determines which away with. (One alternative is to pass an equality- constructor has been applied at top level in a given test function along with every value of an equality value. The pattern-matches of ML must be trans- type, but this is also quite costly[36].) lated into discriminations on individual construc- Finally, the treatment of equality types in Stan- tors. This is done as described in our previous dard ML is irregular and incomplete[15]. The paper[7], though Bruce Duba has revised the de- Definition categorizes type constructors as either tails of the algorithm. “equality” or “nonequality” type constructors; but The dynamic semantics of structures and functors a more refined classification would more accurately are represented using the same lambda-language op- specify the effects of the ref operator. Some types erators as for the records and functions of the core that structurally support equality are classified as language. This means that the code generator and nonequality types by the Definition. runtime system don’t need to know anything about the module system, which is a great convenience. Also in this phase we handle equality tests. ML 6 Conversion to CPS allows any hereditarily nonabstract, nonfunctional values of the same type to be tested for equality; The λ-language is converted into continuation- even if the values have polymorphic types. In most passing style (CPS) before optimization and code cases, however, the types can be determined at com- generation. CPS is used because it has clean seman- pile time. For equality on atomic types (like in- tic properties (like λ-calculus), but it also matches teger and real), we substitute an efficient, type- the execution model of a von Neumann register ma- specific primitive operator for the generic equal- chine: variables of the CPS correspond closely to ity function. When constructed datatypes are registers on the machine, which leads to very effi- tested for equality, we automatically construct a cient code generaton[18]. set of mutually-recursive functions for the specific In the λ-language (with side-effecting operators) instance of the datatype; these are compiled into we must specify a call-by-value (strict) order of eval- the code for the user’s program. Only when the uation to really pin down the meaning of the pro- type is truly polymorphic—not known at compile gram; this means that we can’t simply do arbitrary time—is the general polymorphic equality function β-reductions (etc.) to partially evaluate and opti- invoked. This function interprets the tags of ob- mize the program. In the conversion to CPS, all jects at runtime to recursively compare bit-patterns order-of-evaluation information is encoded in the without knowing the full types of the objects it is chaining of function calls, and it doesn’t matter testing for equality. whether we consider the CPS to be strict or non- Standard ML’s polymorphic equality seriously strict. Thus, β-reductions and other optimizations complicates the compiler. In the front end, there are become much easier to specify and implement. special “equality type variables” to indicate poly- The CPS notation[30] and our representation of morphic types that are required to admit equality, it[5] are described elsewhere, as is a detailed descrip- and signatures have an eqtype keyword to denote tion of optimization techniques and runtime repre- exported types that admit equality. The eqtype sentations for CPS[4]. We will just summarize the property must be propagated among all types and important points here. structures that share in a functor definition. We es- In continuation-passing style, each function can timate that about 7% of the code in the front end have several arguments (in contrast to ML, in which of the compiler is there to implement polymorphic functions formally have only one parameter). Each equality. of the actual parameters to a function must be The effect on the back end and runtime system is atomic—a constant or a variable. The operands just as pernicious. Because ML is a statically-typed of an arithmetic operator must also be atomic; the language, it should not be necessary to have type result of the operation is bound to a newly-defined tags and descriptors on every runtime object (as variable. There is no provision for binding the re- Lisp does). The only reasons to have these tags are sult of a function call to a variable; “functions never for the garbage collector (so it can understand how return.” to traverse pointers and records) and for the poly- To use CPS for compiling a programming morphic equality function. But it’s possible to give language—in which functions are usually allowed the garbage collector a map of the type system[1], to return results, and expressions can have nontriv- so that it can figure out the types of runtime objects ial sub-expressions—it is necessary to use continu- without tags and descriptors. Yet the polymorphic ations. Instead of saying that a function call f(x) returns a value a, we can make a function k(a)that This means that the compiler is free to perform expresses what “the rest of the program” would do common sub-expression elimination on record ex- with the result a, and then call fcps(x, k). Then pressions (i.e. convert the first expression above to fcps, instead of returning, will call k with its result the second); the garbage collector is free to make a. several copies of a record (possibly useful for concur- After CPS-conversion, a source-language func- rent collection), or to merge several copies into one tion call looks just like a source-language function (a kind of “delayed hash-consing”); a distributed return—they both look like calls in the CPS. This implementation is free to keep separate copies of means it is easy to β-reduce the call without reduc- a record on different machines, etc. We have not ing the return, or vice versa; this kind of flexibility really exploited most of these opportunities yet, is very useful in reasoning about (and optimizing) however. tail-recursion, etc. In a strict λ-calculus, β-reduction is problemat- ical. If the actual parameters to a function have 7 Closure conversion side effects, or do not terminate, then they can- not be safely substituted for the formal parameters The conversion of λ-calculustoCPSmakesthecon- throughout the body of the function. Any actual trol flow of the program much more explicit, which parameter expression could contain a call to an un- is useful when performing optimizations. The next known (at compile time) function, and in this case phase of our compiler, closure conversion,makes it is impossible to tell whether it does have a side explicit the access to nonlocal variables (using lex- effect. But in CPS, the actual parameters to a func- ical scope). In ML (and Scheme, Smalltalk, and tion are always atomic expressions, which have no other languages), function definitions may be nested side effects and always terminate; so it’s safe and inside each other; and an inner function can have easy to perform β-reduction and other kinds of sub- free variables that are bound in an outer function. stitutions. Therefore, the representation of a function-value (at runtime) must include some way to access the values In our optimizer, we take great advantage of a of these free variables. The closure data structure unique property of ML: records, n-tuples, construc- allows a function to be represented by a pointer to tors, etc., are immutable. That is, except for ref a record containing cells and arrays (which are identifiable at compile time through the type system), once a record is cre- 1. The address of the machine-code entry-point ated it cannot be modified. This means that a fetch for the body of the function. from a record will always yield the same result, even if the compiler arranges for it to be performed ear- 2. The values of free variables of the function. lier or later than specified in the program. This al- lows much greater freedom in the partial evaluation The code pointer (item 1) must be kept in a stan- of fetches (e.g. from pattern-matches), in constant- dardized location in all closures; for when a function folding, in instruction scheduling, and common f is passed as an argument to another function g, subexpression elimination than most compilers are then g must be able to extract the address of f in permitted. (One would think that in a pure func- order to jump to f. But it’s not necessary to keep tional language like Haskell this immutable record the free variables (item 2) in any standard order; property would be similarly useful, but such lan- instead, g will simply pass f’s closure-pointer as an guages are usually lazy so that fetches from a lazy extra argument to f, which will know how to ex- cell will yield different results the first and second tract its own free variables. times.) This mechanism is quite old[19] and reasonably A similar property of ML is that immutable efficient. However, the introduction of closures is records are not distinguishable by address. That usually performed as part of machine-code genera- is, if two records contain the same values, they are tion; we have made it a separate phase that rewrites “the same;” the expressions the CPS representation of the program to include closure records. Thus, the output of the closure- [(x,y), (x,y)] conversion phase is a CPS expression in which it is let val a = (x,y) in [a,a] end guaranteed that no function has free variables; this are indistinguishable in any context. This is not expression has explicit record-creation operators to the case in most programming languages, where the build closures, and explicit fetch operators to ex- different pairs (x,y) in the first list would have dif- tract code-pointers and free variables from them. ferent addresses and could be distinguished by a Since closure-introduction is not bundled to- pointer-equality test. gether with other aspects of code generation, it is easier to introduce sophisticated closure techniques back after the continuation is invoked. In a conven- without breaking the rest of the compiler. In gen- tional compiler, the caller of a function might sim- eral, we have found that structuring our compiler ilarly save registers into the stack frame, and fetch with so many phases—each with a clean and well- them back after the call. defined interface—has proven very successful in al- But some conventional compilers also have lowing work to proceed independently on different “callee-save” registers. It is the responsibility of parts of the compiler. each function to leave these registers undisturbed; Initially, we considered variations on two differ- if they are needed during execution of the function, ent closure representations, which we call flat and they must be saved and restored by the callee. linked.Aflat closure for a function f is a record We can represent callee-save variables in the orig- containing the code-pointer for f and the values of inal CPS language, without changing the code- each of f’s free variables. A linked closure for f con- generation interface. We will represent a contin- tains the code pointer, the value of each free vari- uation not as one argument but as N + 1 argu- able bound by the enclosing function,andapointer ments k0,k1,k2,...,kN. Then, when the continua- to the enclosing function’s closure. Variables free in tion k0 is invoked with “return-value” a,thevari- the enclosing function can be found by traversing ables k1,...,kN will also be passed as arguments to the linked list of closures starting from f;thisis the continuation. just like the method of access links used in imple- Since our code generator keeps all CPS vari- menting static scope in Pascal. ables in registers—including function parameters— It would seem that linked closures are cheaper the variables k1,...,kN are, in effect, callee-save to build (because a single pointer to the enclos- registers. We have found that N = 3 is sufficient ing scope can be used instead of all the free vari- to obtain a significant (7%) improvement in perfor- ables from that scope) but costlier to access (get- mance. ting a free variable requires traversing a linked list). In fact, we investigated many different represen- tational tricks on the spectrum between flat and 8 Final code generation linked closures[6], including tricks where we use the same closure record for several different functions The operators of the CPS notation—especially af- with several different code-pointers[5, 4]. ter closure-conversion—are similar to the instruc- In a “traditional” compiler, these tricks make a tions of a simple register/memory von Neumann significant difference. But in the CPS representa- machine. The recent trend towards RISC machines tion, it appears that the pattern of functions and with large register sets makes CPS-based code gen- variable access narrows the effective difference be- eration very attractive. It is a relatively simple mat- tween these techniques, so that closure representa- ter to translate the closure-converted CPS into sim- tion is not usually too important. ple abstract-machine instructions; these are then There are two aspects of closures that are impor- translated into native machine code for the MIPS, tant, however. We have recently shown that us- Sparc, VAX, or MC68020. The latter two machines ing linked or merged closures can cause a compiled are not RISC machines, and to do a really good program to use much more memory[4]. For exam- jobincodegenerationforthemwewouldhaveto ple, a program compiled with flat closures might add a final peephole-optimization or instruction- use O(N) memory (i.e. simultaneous live data) on selection phase. On the RISC machines, we have an input of size N, and the same program compiled a final instruction-scheduling phase to minimize de- with linked closures might use O(N 2). Though this lays from run-time pipeline interlocks. may happen rarely, we believe it is unacceptable One interesting aspect of the final abstract- (especially since the programmer will have no way machine code generation is the register allocation. to understand what is going on). We are therefore After closure-conversion and before code generation re-examining our closure representations to ensure we have a spill phase that rewrites the CPS expres- “safety” of memory usage; this essentially means sion to limit the number of free variables of any sticking to flat closures. subexpression to less than the number of registers We have also introduced the notion of “callee- on the target machine[5, 4]. It turns out that very save registers.”[9, 4] Normally, when an “unknown” few functions require any such rewriting, especially function (e.g. one from another compilation unit) on modern machines with 32 registers; five spills in is called in a compiler using CPS, all the registers 40,000 lines of code is typical. (variables) that will be needed “after the call” are Because the free variables of any expression are free variables of the continuation. As such, they guaranteed to fit in registers, register allocation is are stored into the continuation closure, and fetched a very simple matter: when each variable is bound, only K other variables are live (i.e. free in the con- the free-space register, followed by the addition of a tinuation of the operation that binds the variable), constant (the size of the new record) to the reg- where K