Hygienic Expansion

Eugene Kohibecker, Daniel P. Friedman, Matthias Felleisen, Bruce Dubs

Computer Science Department Lindley Hall 101 Indiana University Bloomington, Indiana 47405 USA

Abstract. Macro expansion in current Lisp systems its textual parameters, a macro function places them into is naive with respect to block structure. Every macro the appropriately labeled holes of some expansion pattern. function can cause the capture of free user identifiers and Free identifiers in user code may unintentionally be cap- thus corrupt intended bindings. We propose a change to tured by macro-generated bindings. For example, a macro the expansion algorithm so that macros will only violate function for or-exp~ions in Lisp may be understood as the binding discipline when it is explicitly intended. a transformation from patterns of the type (or (ezph (szp)2) 1. Problems with Macro Expansions to an expansion pattern like Lisp macro functions are powerful tools for the extension (let ~ [ ](~7), (if e v [ ](,~,),)).l of language syntax. They allow programmers to add new In other words, the or-macro fills the hole [ ](-i,), syntactic constructs to a . A pro- with (cxp)i. An instance like (or nil v) is transcribed stammer specifies a macro function which translates ac- to (let v nil (if v v v)). This example reveals that the tual instances of a syntactic extension to core language ex- capturing of free user identifiers is dangerous. The ex- pressions. This process can also be pyramided, i.e., macro panded expression will always produce the value nil, in- functions may translate into an already extended language dependently of the value of the user identifier v, quite [5]. The defined set of macro functions is coordinated by a contrary to the expectations of a programmer. , usually called a macro expander. The macro The real danger of these erroneous macros is that they expander parses every user input. If the expander finds are treacherous. They work in all cases but one: when the an instance of a syntactic e~tension, it applies the appro- user--or some other macro writer--inadvertently picks priate macro function. It repeats this process until an the wrong identifier name. expression of the core language is obtained. Various techniques have been proposed to circumvent In most current Lisp systems the expander's task is this capturing problem, but they rely on the individual confined to the process of finding syntactic extensions and macro writer. If even one of the many macro writers is replacing them by their expansions. This implies, in par- negligent, the maCro system becomes unsafe. We claim ticular, that each macro function is responsible for the that the task of safely renaming macro-generated identi- integrity of the program. For Lisp systems (and other tiers is mechanical, it is essentially an o-conversion [2], p. languages with similar macro facilities) this means specif- 26, which is knowledgeable about the origin of identifiers. ically that variable bindings must not be corrupted. This, For these reasons we propose a change to the naive macro however, is not as simple a task as it sounds. expansion algorithm which automatically maintains hy- Macro functions in Lisp generally act like the con- gienic conditions during expansion time. text filling operation in the A-calculus [2], p.29. Given The rest of the paper is devoted to the presentation of the problem definition and the new algorithm. The sec- ond section describes our target programming language and its macro expander language. In the third section we discuss the naive macro expansion algorithm. Section 4 permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct contains the hygienic expansion algorithm and a correct- commercial advantage, the ACM copyright notice and the title of ness theorem. In Section $ we show how to extend the the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specfic 1 T~tsbtb,~*o(Oml~,(W,,I I~,h)) I I~,). permission.

© 1986 ACM 0-89791-200-4/86/0800.0151 75¢ 151 solution to cover important constructs of Lisp. The last The set mstree is the sublanguage of syntactic cztensions. section highlights the merits of the new algorithm and its We assume that instances of macro expressions are implications for macro writers. recognized by the presence of macro tokens, i.e. elements 2. Language Considerations of a distinguished set maetok. Macro tokens can either be A macro expander maps expressions from an extended syntactic extensions by themselves or are the first compo- programming language to an expression in a core lan- nent of an arbitrarily long syntax tree: guage. Hence, our first consideration must be the source mstree ::= mactok [ (mactok street.., strsem) and target language of the expander. for all n > 0. The most important aspect of the target language Since this syntax is ambiguous, we add the provision that with respect to the capturing problem is its lexical scop- an expression of the form (m a) with m ~. maetok and ing mechanism. We have chosen to use the A-calculus s E stree is a ~ntactic extension; it is not an application. as it is the prototype of block structured programming See Figure 1 for a summary of the definitions. languages. It is syntactically simple, yet contains all the required elements to make the case interesting, and has 3. The Naive Approach to Macro Expansion the right level of complexity. Furthermore, it is a fairly Before we can describe the expansion algorithm which is. trivial task to generalize an algorithm for the A-calculus currently employed in Lisp systems, we need to define to [6], Scheme [3], or Algol-like languages. some terminology. Recall that a possible form for a syn- The variant of the A-calculus we use as our target tactic extension is: programming language is defined by the grammar:. Atcrm ::-- vat ( mactok street.., stree,~ ). [ const The trees street through streem are called the syntactic [ (lambda ear Atcrm) of the extension. [ (Aterm Atcrm). We say a syntactic extension or any Atcrm-expression secure in a syntax tree if it is a subtree that is not nested The characters "(~, ~)', and ~lambda" are terminal sym- within the syntactic scope of another syntactic exten- bols and are collectively referred to as the set of core to- sion. For example, the syntactic extension (or z y) occurs kens: coretok = {(,), lambda}. The set const includes within the expression all constants commonly found in Lisp such as strings, vec- tors, numbem, closures, etc. The set ear is composed of (lambda z (or z y)), Lisp symbols that are used as identifier names; it is dis- joint from the set of core tokens. but it does not occur within Variable and constant expressions stand for values (case tag (name(or z y))) as usual. Abstractions, i.e., lambda-expressions, rep- nor within resent procedures of one variable. The lambda-bound (caee tag (or = y)) identifier--the parameter--can only be referred to within the abstraction body, i.e. identifiers are lexically scoped. because it is in the syntactic scope of the case-extension. We call the occurrence of a variable in the parameter part The notion of occurrence reflects the fact that every of a lambda*expreesion its binding instance. Applications sentence in the language of syntax trees has two interpre- correspond to function invocations. tations. It may be considered as an element of the Aterm- The source language needs to be an extension of the language or as a proper syntactic extension. Since the target language. It must allow for one kind of an expres- expansion of a syntactic extension involves a rearrange- sion which is only specified in a rather general way. The ment of (parts of) syntax trees in its syntactic scope, we concrete extension of Atcrm as defined by the indivdual can only be sure about the interpretation of an expression macros will be specializations of this language. We refer when it is not embedded in a syntactic extension. In the to the source language as the language of syntax trees and above example, the list (or z y) is in the first two cases define it inductively by: a syntactic extension; in the third one, however, it only stre¢ ::= ear stands for a list with or in the first position and z and y in the rest of the list. The same is true if we replace the [ const or by lambda. ] mstrec A syntactic transform/unction (E STF) is a macro I (lambda ear strce) function which is defined by the macro writer and which [ (stre¢ stree). expands a particular class of syntactic extensions," e.g., the or-macro of Section 1. The result of applying a trans- ter each transcription step. Bat tkis impression is too form function to an occurrence of a syntactic extension simplistic. Generated identifiers may act as free variables is called a transcription. A transcription step is the one- which are to be captured by the user-supplied program step expansion of a syntactic extension. Symbols which context. They must not be renamed. On the other hand, are introduced during a transcription step are referred to since macro expansion is possibly pyramided it may also as generated symbols. not be quite obvious after a transcription step which iden- tifier is to be free and which one is to be bound. A final The set of macros used during an expansion is a syn- difficulty is that sometimes capturing is desired. Consider taz table (E ST). Applied to an occurrence of a syntactic a loop-macro which transforms patterns of the form extension it produces its transcription. It serves as a dis- patcher and applies the appropriate transform function to (loop (block)) its argument. by filling the expansion pattern Equipped with these definitions, we can define the ez- pansion of a syntax tree with respect to a syntax table as ((Y (lambda f (lambda (f[ ]O~cx))))) 1). the tree in which all occurrences of syntactic extensions are replaced by the expansions of their transcriptions. If This fill-operation of contexts captures the free identifiers the expansion process halts, the result of an expansion is / and e in the expression (block). The capturing by ? is a ~term, i.e., an expression of the core language. We have almost certainly undesired, but the one by e may be quite formalized these definitions in Figure 1. useful. The identifier e is always bound to the result of The major problem with naive expansion is that it the last iteration step, initially it is 1, and it might be nec- does not enforce the integrity of lexical bindings. This essary to have this value around. Given a protocol, which point was illustrated in Section 1 with the incorrect ex- indicates this binding of e within the syntactic scope of pansion of an or-expression. The example may suggest loop for the macro user, one can imagine that the results that one can simply rename all generated identifiers af- of applying the macro function must capture free e's in

Figure I: Naive macro expansion Syntactic Domains:

c E const constant names, v E var identifier names, mE mactok macro tokens, e E mstree macro expressions, s E stree expressions.

Syntax:

s ::= c [ v[e [ (lambdav s) [ (sts~), st ¢~mactok, e ::= m [ (mst...s.) for n ~_ 0 with the above restriction. Semantic Domains:

STF = mstree -* stree, dEST = mstree --* stree, Aterm. Semantic Functions: t,i,, : stree --* ST --* ~term; t,,~,,|cl = ~O.c, e.,,.lvl = ~o.~, t,,~.lel = ~o.t.:.l(oe)|o, ~.,ive|(lambda v s)] = ),O.(lambda v ~'.,/.els]O), e.,,.,l(s, s2)l : ~o.(e.,~.ls,lO t,,,ls2lO).

153 (block) but must avoid the capturing of ]'s. The situa- previous section, that is impossible. One cannot know in tion looks hopeless for a mechanized solution. Therefore, advance which macro-generated identifier will end up in people have invented a variety of techniques which give the a binding position. Hence, it is a quite natural require- designer of the macro functions a mechanism for avoiding ment that one retains the information about the origin capture when required. of an identifier. To this end, we combine the expansion One of the common solutions to the capturing prob- algorithm with a tracking mechanism. lem uses bizarre or freshly created identifier names for Tracking is accomplished with a time-stamping macro-generated bindings. Another solution involves the scheme. Time-stamps, sometimes called clock values, freezing---closing--of nser-code at the right time in the are simply non,negative integem. The domain of time- correct environment [7]. It is clear that bizarre names only stamped variables (tsear) is isomorphic to the product of lower the probability of the problem occurrence, but do identifiers and non-negative integers. We sometimes refer not eliminate it. The freshly created identifier approach to elements of tsear as tokens. The source and target lan- works if the macro writer always specifies which identifiers guage of the individual macros is extended to the language are to be so considered. Freezing and thawing user-code of time-stamped syntax trees. Time-stamped syntax trees is even more complicated since it has to be done in the are defined like syntax trees but instead of identifiers they right environment. All of these solutions suffer from the include elements from the union of identifiers and tokens. same drawback; the macro writer is responsible for their The formal definition is shown in Figure 2. realization. If he is negligent, the macro is insidious. Figure 3 contains functions which connect time- In the next section we present an expansion algorithm stamped domains with pure ones. The function $ takes a which automatically resolves the problem and requires lit- time-stamp as an argument and returns a function which tle modification to the conventional macro writing style. injects identifiers into twar with the given time-stamp. It relies on the fact that almost all generated identifiers Se is the function which stamps an identifier with a 0. It are to be freshly created, and it allows exceptions from will play a role in the treatment of intended capturings. this default assumption when necessary. The function Iv~wit acts like a substitution: given a time- stamped w, a e, and a time-stamped ~term, it substitutes 4. Hyglenk Macro Expansion all free occurrences of ~v by e. We have omitted the for- The capturing problem of the naive expansion algorithm mal definition of the domain of time-stamped ~terms but is analogous to the substitution problem in the A-calculus. it is the subset of tgatrees which do not contain syntactic When an expression M with free variables is to be sub- extensions. stituted into an expression N, the binding variables of The new algorithm consists of four major phases; see N must be different from the free ones in M. Put dif- Figure 4. It starts out by transforming the user-supplied ferently, bindings in N must not capture free variables 8tree into a time-stamped syntax tree. This is accom- in M. Kleene calls this condition "being-free-fermi4]; the plished by the function T which parses the stree and term ~hygiene condition" is a more informal but rather stamps all identifier leaves with the function f. For the descriptive name for it [1]. initial pass, ~ is the function Se = ($0). Then the real So, what we want to impose on macro expansion is expansion process begins and the clock value is increased something like a hygiene condition. With a few excep- to 1. tions, we do not want generated binding instances cre- As before, in the naive algorithm, the function t ated by one transcription step to capture user-supplied parsee through tgetree.expreesions that are also Aterrn- variables or variables from some other transcription step. expressions. When it discovers a syntactic extension, it Thus, not taking intended capturinge into account, we generates the appropriate transcription. But instead of formulate the immediately continuing, the algorithm first time-stamps all the macro-generated identifiers. Again, this time- Hygiene Condition for Macro Expansion. stamping process is performed by the function T in co- Generated identifiers that becon~e binding instances in operation with the function ($j), where j is the current the completely expanded program must only bind vari. clock value. Afterwards, the clock value is increased and shies that are generated at the same transcription step. the expansion continues. (IIC/ME) The result of the function t is a time-stamped ~term. From the ~-calculus, one knows that if the hygiene It differs from the result of the naive algorithm. Wherever the naive result contained a variable, the modified result condition does not hold, it can be established by an appro- contains a corresponding time-stamped variable. For ex- priate number of o-conversions. That is also the basis of ample, when the naive aJgorithm returned the tree our solution. Ideally, a-conversions should be applied with every transformation step, but as we have discussed in the (iambda z (lambda z ((1 z) z)))

154 the modified result may appear as time-stamps. The result of this fourth and last phase is a pure Aterm: (lambda z:0 (lambda z:l ((jr:l z:0) z:l))). (lambda a Oambda b {(jr u) b))). This indicates that according to the HC/ME the naive algorithm would have gotten the bindings wrong. Before we can examine the similarities and differences The third phase of the modified algorithm replaces between the results of the naive and hygienic expansion all bound, time-stamped identifiers by unstamped identi- algorithm, we need to discuss the implications of the mod- tiers. It is important that we can tell when a token was ified expander on the transform functions. Since we have generated. Tokens with different time-stamps came from changed the source and target languages of STFs, we different transcription steps, and this difference must be should expect that transform functions must employ dif- preserved. Since the expre~ion is now a time-stamped ferent functions. However, the change of languages is re- ~term, o-conversions easily achieve the effect. The func- ally a minor one. Indeed, if we consider time-stamped tion ,4 parses the term and applies the appropriate sub- identifiers simply as a special kind of variable, the trans- stitution function to A-abstractions. The above example form functions are not changed except that functions that would become something similar to need to know or compare identifier names must unstamp the appropriate token with the function ~/(or a respective (lambda a (lambda b ((f:l a) b))). restriction thereof). Thus, a syntax table 0 for the naive The bindings are now as intended. The only rem~in- expander indeccs a syntax table ~ for the hygienic one in ink time-stamped identifiers correspond to free identifiers. such a way that Their meanings are determined by the identifier compo- for all S E tsmstre¢, O(Uljr|) = ~lO°(jr)l . nents and, hence, they must be unstamped. This is the task of the function Z/. It parses the tree and removes all

Figure 2: Hygienic macro expansion (!) Syntactic Domains:

c E const constant names, v 6. var, w6 tsvar (time-stamped) identifier names, mE mactok, macro tokens, e E mstrec, fE tsmstree (time-stamped) macro expressions, s E stree, t E tsstree {time-stamped) expressions.

We also refer to

= E const U ear U mactok U corctok, Y E const U twar U mactok U coretok, z E coretok U tsstree.

Syntax: s ::= ¢ i v l • l (lambda v ~) I (81 s~), ol q~ mactok, • ::= m I (m sl... s,,) for n >_ O; t ::-- ¢ I v [ w l f [ (lambda v t) [ (t| t2), tt ~ mactok, jr ::= m l (mtl... t,) for n _> 0 with the above restriction. Semantic Domains:

STF - tsmstree-~tsstree, d EST -" tsmstree-.*tsstree,

155

~r Figure 3: Hygienic macro expansion (2) Auxiliary Functions: $: N ... ear ~ twor, $ i v = v:i. $o: ear .., tin, or; $0 ffi $ O. [ / ]: tovmr X ~r -* tostroe .-~ tsstre¢ where the ~,trees are restricted to time-stamped ~terms;

lvl,lz = z, [t,/w,l(huubdaq t) ffi .'l =~ q "* (lambda u~ t), O,,nbaa,q Iv/,v,lt), Ivlwl(t, t2) = ([v/u, lt, Iviwlt2).

Figure 4: Hygienic macro expansion (3) Semantic Functions:

tkn: Mree .-~ ST .-, Atom, T: tootree .-. (vat .-~ twar) -~ tootree, ~: tsotree -.~ ST ~ N ..t Aterm, .4: tsotr¢o ~ toetre¢, with the domain restricted to time-stamped Merino, U: tootree ..* otree; tl,,Iol = Ae.eiZltlrlol$olOJoli where j0 = l;

'rid = A~.~, 1"i,,i = A~.~v, Tl(z, ... z.)l = A~.(rl~,l~... l"lz.l~);

tlwl = ,o/.w, tl[l= *e/.tl'rl(el)l(S /)leO + z), tl(l.mbd- ~, t) I = Ae/.(lambda uJ (.eltlt~/)) tl(t~ t~)l = ~o/.(llt,l~j tlt2lOj);

Air| = v, Aid = y, ,q(h~bd,, t)l = Ombda v Zllv/wltl) where v is a fresh variable, Al(t, t2)l = (•itd ,qtd);

~l|z| = z, ~/|v:l] : v, ~l(z, ... ~.)! = (Ulz,l... Ulz,i).

156 in other words, if we disregard the time-stamps, 0 and the Step 8.wWe know from the previous step that the re- induced 0' generate the same results. sult of ~' is structurally equivalent to the result of tui, e Another relationship that we have to consider is the and that all identifiers of the hygienic result have a unique one between )~terms as generated by the naive and hy- time-stamp reflecting their origin. Hence, if we ~-convert gienic expander. It is clem" that the hygienic expansion all ~t.expressions such that each time-stamped parame- should work so that the resulting terms are the same ex- ter is replaced by a fresh variable, the result satisfies the cept for the bindings. These must respect the HC/ME. HC/ME and is also structurally equivalent to the input of We call this relation structural equivalence and define it in A. This can easily be verified by showing that [v/wJz is a the following way: Two Merms are structurally equivalent substitution function and that .4 otherwise preserves the if they are equal after replacing all bound variables by the structure. symbol X. Given the notions of induced syntax tables and Step 4.--The input to the last step is a term which structurally equivalent terms, we can formalize the differ- satisfies the HC/ME and is structurally equivalent to the ence between hygienic and naive macro expansion with: naively expanded program modulo time.stamps of free variables. It is a routine matter to prove by induction Theorem. Let 0 be a syntax table and let O' be the in- that the function ~ removes these time-stamps and leaves duced syntax table. Then, for all st~es P, if ~'Ni,elP]0 ex- all other properties intact. pands into a ~ferm, then t~nlPl O' expands into a struc- turally equivalent term which satisfies the hygiene condi- This concludes the proof. Q tion for macro expansion. Implementation Note. From the above discussion and proof one can deduce an important fact about the imple- Proof. The proof is structured according to the four mentation of the time-stamping scheme. Time-stamped phases of the function tt~r variables have two essential properties. First, they are Step l.--The result of "r[Pl$o is structurally equiva- unique with respect to the rest of the program. Second, lent to P; all identifiers have the 0 time-stamp. This claim they must contain a component which indicates the orig- can be verified by an induction on the structure of P. inal name. Hence, one can use gensym'd atoms with a Step 2.--Call the output of Step 1 Po. Then we can property "original-namen. When they turn out to be prove two statements about the relationship of ~=,d,e|P|O bound variables, they can simply stay in place. If they to tlPolO'. are free, they are replaced by the original name. The 1) The two results are equal modulo the time-stamps, functions/{ and/J have to be changed accordingly. End of Note i.e., t,,,,lPIO = tlitlPolOq. This proposition de- pends on the fact that 0 ~ is induced by 0. It implies Now that we have a hygienic expansion algorithm, that the two results are structurally equivalent. we can think about the implementation of exceptions to the HC/ME-rule. The exceptions which we have in mind 2) Moreover, all variables of a transcription step receive should specify that certain "free identifiers~ in a param- the same time-stamp which is unique with respect to eter to a transform function are captured by generated the path from the root of the term to the occurrence binding instances. The meaning of afree identifier," how- of the respective syntactic extension). This follows ever, is not quite clear. To begin with, identifiers may oc- from the fact that all transcriptions previous to the cur in syntactic trees which are not expanded yet. Second, current one were time-stamped with a clock value of identifiers have time-stamps in the modified expander. We lees than j--the current clock value. This statement must ask whether we only want to consider identifiers with is true for all syntactic extensions occurring in Po. It time-stamp O--they are the user-supplied ones--or iden- is re-established by the time-stamping that immedi- tifiers with all kinds of time-stamps. ately follows a transcription step. The variables in (0Jr) are either pure identifiers or tokens that already The response to the first point is clear. If some iden- occurred in ]. The pure ones are stamped with j and tifier is to be captured, then it must be the one which are thus distinguishable from all the previously gen- survives as a free identifier until the input is completely erated identifiers. Afterwards the clock is advanced expanded, no matter whether we can predict it or not. and all following expansions receive time-stamps at s On the other hand, the second point cannot be resolved higher level. As for applications, we know that syn- so easily. If we allowed capturing at all time-stamp lev- tactic extensions in the function and argument part els, it would mean that there could be interaction of var- cannot overlap. Hence, it is justified to continue the ious syntactic extensions which are unpredictable. Since expansion process on both paths with the same clock transform functions are all declared at the same level, i.e., value. there is no scoping as among lexically scoped procedures,

157 the interactions cannot be deduced from static expres- expression. The respective additional lines in these func. sions. This would render the situation worse than before. tions are: We have therefore decided that macros may only capture ,q(quote D)i = (quote B) user-empplied identifiers. The decision should be reconsid- ered when a macro system is being designed which shows ~'l(qnote D)] ffi (quote D) for the modularization of syntactic extensions, for exam- [e/w](quote ~) = (quote ~). pie, by blocks in a lexically ecoped language. The time-stamps in p are ultimately removed when the We modify the hygiene condition to rellect our deci- unstamp function ~ is applied to the entire program. sion:

Modified Hygiene Condition for Macro Expan- 6. Conclusion sion. Generated identifiers that become binding in- The gains of the hygienic macro expander are clear. Macro stances in the compleCefy expanded program must only writers can concentrate on the functional aspects of trans- bind identiliem that are generated at the same trail. form functions and need not worry about scope issues. scription step or identifiers d tee original user.input. Prohibiting the inadvertent capture of lexic~! identifiers (mHC,/ME) has been an additional detail tbat the careful macro writer The realization of this modified rule is simple. We has had to remember. Furthermore, users find hygienic provide the macro writer the function f0 which generates expansion more trustworthy. Careless macro writers will identifiers with a 0 time-stamp. If the transform function no longer surprise a user with unexpected bindings. While places these tokens in a binding position, they capture the user needs to know a semantics for the macro'expres- all the corresponding user-supplied identifiers. It is easy sions, he should not need to know that a pasticular macro to see that ~'~n together with this variation satisfies the accomplishes its goal by binding certain local, temporary above theorem for the mHC/ME. identifiers. For those bindings the macro writer wishes to make public, the algorithm requires a change in conventional ~. Adding More Lisp Constructs macro writing style. We expect the writer to inform the Although the A-calculus is a prototypkal example of a macro system of his decision u well as to document it for programming language, it is by no means a real-world lan- the user. In the past, the writer has been able to-rely on guage. Compared to Lisp it is rather sparse. It lacks as- the expansion algorithm to effect his desired bindings. signment statements, conditional expressions, and quoted We have found that macros requiring capturing iden- structures. When we wish to add to these core forms, we tifiers are rarer than those that introduce local binding extend the set eorefok to include whatever symbols we identifiers. Them we have shifted the expander's default choose to designate them. For example, coretok might behavior from pomibly capturing all ldentifiem to ouly become capturing those explicitly designated, in summary, we ((,), lambda, setl, g, quote}. have given the macro writer lees to worry about. And we Assignments and conditionals c~use no problenm at all have assured the macro user that any identifiers he puts because they are not binding constructs requiring special in a macro expression will have the bindings he expects. treatment by t. Hence, they are treated like Lisp appli- A transform function must untidy two new conditions: cations with a slightly more elaborate syntactic structure. (1) time-stamped identifiers must be mapped to identifier Quoted atoms or lists need special treatment. The ex- names with the function ~ in a situation where abe pander can only recognize syntactic forum not occurring name of an identifier is needed by a tranMorm func- in the syntactic scope of any macro expression. Structures tion; which seem to occur inside the syntactic scope of an ex- (2) identifiers in the output of a transform fuucttou must tension may get rearranged during the expansion process, be unstamped, generated by Se, or time-stamp-equal e.g., the or-expression in Section 3. What appeam to be to input identifier& a quoted structure because of the presence of the symbol The first refers to the situation in which a transform func- 'quote' may not be an actual quoted structure. Thus its tion takes some action that involves an actual identifier components must be time-stamped. However, when E, from the input. Such cases occur, for example, when dif- A, and [ / ] encounter au expl~seion of the type (qnot, ferent expansions are triggered by the presence of different /9), they must inhibit the parsing proeses. The expremion identifiers in the user expression or when the transform is a sentence of the target language and it is a constant function eaves a piece of the input expression for some

158 purpose other than actual expansion. The second means References that we cannot allow spurious identifiers generated by the transform function which appear stamped but were not 1. BARENDREGT, !1. P. Introduction to the lambda cal- produced by $0 or contained in the input expression. We culus. Nieutv Archie/voor Wisenhnde ~ 4 (1984), must be able to recognize our own time-stamps. We have 337-372. found that these restrictions on transform functions are 2. BAItENDItEGT,H. P. The Lambda Calcwlus: Its Syn- usually satisfied or that it requires little effort to adapt tuz and Semantics Revised Edi6on. North-Holland, existing transform functions. Amsterdam, 1984. In conclusion, we feel that hygienic expansion makes the writing and use of macron easier. It is safer than naive 3. CLmGEn, W. D., (tV.). The revised revised report on expansion since the accidental capturing of identifiers that Scheme. Joint Technical Report Indiana University appear in user code cannot occur. and MIT Laboratory for Computer Science, 1985. 4. KLEENE, STEPHEN COLE. Introdection to Metamathe- marion, Van Noetrand, New York, 1952. ~. MOlLRo¥, M. DOUGLAS. Macro instruction extensions Acknowledgements of compiler languages. CACM 8, 4 (1960), 214-220. Mitch Wand provided invaluable help in the presenta- 6. STEELE, GUY L., Jn. Common Lisp: tAe Language. tion of these ideas. Guy Steele pointed out a slight gen- Digital Press, 1984. eralizaion to our original solution. Eugene Kohlbecker is an IBM Graduate Fellow. This material is based on 7. STEELE, GUY L., JR. AND GERALD J. SUgSMAN. The work supported by the National Science Foundation un- revised report on Scheme, a dialect of Lisp. Memo der grants DCR 85-01277 and MCS 83-03325. 452, MIT AI-Lab, 1978.

(Appendix begins on the next page.)

159 Appendix An Implementation in Scheme

(eetioe ehyI (ddine 8 (1--bda (s) (la~e (n) (leabda (thmta) (lint (lame '0]) (O (~ (((S ((T s) S-nuCht)) the?e) 1)))))) (1~* (v) (let (ISle (auq v area)|) (define T (if In2m (l~e (t) (cdr i-*o) (Zu~ta (tme) (let ([amy (pu~ v ":" I)]) (cond (put amy 'or~tul-nm v) [(etontc-uoa-v~r? t) t] (eetl Non [(v~r~ t) (tn t)] (cou [mime (map (lambde (t) ((T ~) tau)) t)])))) (couv nov) mmme)) nmv))))))) (define E (lmbda (t) (define S-naught (8 0)) (lambda (~heta) (l~e (j) (d.~inm */* (coud (1--bda (v w) [(cone? t) t] (lembdm (t) [(~mp.dT t) t] (cond [(quote? t) t] [(mtsnp,d? t) (it (*q? t w) v t)] [(recto? t) [(atoalc*not-stam~d? t) t] (((1[ ((T O;ketm t)) (8 J))) thrace) [(quote? t) t] (a~tdl J))] [ (lambda? t) [(lmtde? 1C) (i~ (mqt v (vut)) ' (UJeOa , (vat t) ' (La.~mDA .w . (body t)) .(((z (body t)) thmte) J))] ' (LI~DI , (vat t) [ (app? t) ,((*/. v v) '(,(((K (~Un t)) theta) J) (body t))))] .(((~ (~rl t)) t~mte) J))]))))) [ (app? t) '(,((*/* v v) (~u t)) ,((*/* v v) (trl t)))])))) (define A (1--~ (t) (cou4 (det/I.~ nmped? [(v~r? t) t] (l~Ixte (w) [(atoalc-non-var? t) t] (~ (m)~bol? v) [(quote? t) t] (pt w 'ortrtasl-um)))) [ (lambda? t) (Zet (Iv (Smnsym (U (vLr t)) (define nactok? ":" "umv")]) (lambda (It) * (LAmDA . v (u4 (synbol? s) .(A ((*/* v (vat t)) (lint • 'm~ctok)))) (body t)))))] [(app? t) (define coretok? '(.(a (~.n t)) (lnWm (c) .(A (ere t)))]))) (ua (symbol? c) (Set c 'corm?ok)))) (definm U (lmmbdm (t) (define quotm? (cond (la~le (t) [(e~oLtc-not-mtmnp*d? t) t] (and (pair? t) [(m~up*dT t) (,q? (c~r t) 'qU01~) (~et t 'orlsinel-a~)] (pair? (c~Lr t)) [mZsm (map ~ t)]))) (null? (cdeLr ~)))))

160

! (define lamlxla? (deflne ST (l~a (t) (lamlxla (s) (and (pair? t) (record-cue n (eq? 'LkMBD~ (car t)) (LET (1 • b) '((LJMBD& .I .b) .e)] (pair? (cd: t)) [ZF (a b c) '(((el ,a) .b) .c)] (vat? (cad: t)) [OR (a b) '(LET v .a (IF v v .b))] (l~i~ (cdd: t)) [NAIVB-OK (a b) (null? (cdddr t))))) (lot (Iv (8-aaqht 'v))) '(L~r .v .a (IF .v .v .b)))] (define app? [F~ (x) '(Wo;z .z)] (lmbda (~) [CASE (exp pair) (and (pelt? t) '(LET v .axp (pair? (cdr t)) (IF ((eq? v) (qUOTK .(car pair))) (null? (cdd~ t))))) .(cadbr pair) false))] (do~ine atomic-non-vat? [else (alTer "ayatlx table: ao mtch" m)]))) (lambda (y) -- demonstration (or (coast? y) (sttmped? y) (mectok? y) ((Ehys ' (LET z (Ca a v) (NAIVE-Ca x v))) ST) (coretok? y)))) 1 (LET x:O (ca a:O v:O) (NAIVE-Ca x:O v:O)) (define at omic-not-stamped? 2 (HAlVE-Cax:O v:O) (lamt~la (x) $ (LET v:O x:O (IF v:O v:O v:O)) (or (coast? x) 4 (IF v:O v:O v:O) (and (vat? x) 4 (((el:4 v:O) v:O) v:O) (not (stamped? x))) 8 ((LANBDAv:O (((el:4 v:O) v:O) v:O)) x:O) (mactok? x) ((Lk~Dk v:O (((el:4 v:O) v:O) v:O)) x:O) (coretok? x)))) 2 (ca a:O v:O) 3 (LET v:2 a:O (IF v:2 v:2 v:O)) (define vat? s~mbol?) 4 (It v:2 v:2 v:O) (de~ine coast? amsber?) 4 (((Of:4 V:2) V:2) v:O) (define vat cad:) S ((Lk~DAv:2 (((el:4 v:~) v:~) v:O)) a:O) (define body cadd:) 2 ((LM4BDAv:2 (((el:4 v:2) v:2) v:O)) a:O) (de, tan tun car) I ((LkqBDAx:O (d~tne ~| cadr) ((LAKBDA v:O (((of:4 v:O) v:O) v:O)) x:O)) ((LA~Dk v:2 (((el:4 v:2) v:2) v:O)) a:O))

((I~U4BDA x:nev (put 'LAJqBDA 'corutok 'true) ((LA~Dkv:nev (((el v:nev) v:nav) v:nev)) x:nev)) (put 'QUOTE 'coretok 'true) ((~DA v:nev (((el v:nev) v:nev) v)) a)) (pat 'LET 'mactok 'true) (put 'IF 'nmctok 'true) (put 'OK '~ctok 'true) (put 'BAI~r~-oR 'l~ctok 'tz~le) ((EhyS '(L~mDA a (CASE (FA~ Q) (QUOTe a)))) ST) (pat 'FJi~E '~actok 't~le) (put 'CJl~ 'mctok 'true) 1 (CASE (FAKE a:O) (QUOTEs:O)) 2 (LET v:l (FAKKa:O) (deflne macro? (ZF ((eq?:l v:l) (QUOlZQUO1~)) a:O false:l)) (lamt~a (n) 3 (IF ((oq?:l v:l) (qUOTE QUOTE)) a:O falan:l) (record-caan m a (((el:3 ((oq?:l v:l) (QTJOTEqUOTE))) a:O) false:l) (LET (vat val body) true] 3 (FAI~ a:O) [IF (a b c) true] 3 (qUOTE a:O) [OK (a b) true] 2 ((LAI~DA v:l (((el:3 ((eq?:l v:l) ((NOTE QUOTE))) a:O) [NAIVE-OK (a b) true] false:l)) [~tl~ (x) true] (qU011~ a:O)) [CASE (a b) true] 1 ((LAI~DA v:l (((ef:$ ((eq?:l v:1) (qUOIE QUOTE))) a:O) [else false]))) false:|)) ((NOTE a:O))

(~DA a:neu ((~DAv:new (((el ((~!? v:nev) (q~)l~ C~I'E))) a:nev) faloe)) (Quo1~ a)))

161