Hygienic Macro Expansion
Total Page:16
File Type:pdf, Size:1020Kb
Hygienic Macro Expansion Eugene Kohibecker, Daniel P. Friedman, Matthias Felleisen, Bruce Dubs Computer Science Department Lindley Hall 101 Indiana University Bloomington, Indiana 47405 USA Abstract. Macro expansion in current Lisp systems its textual parameters, a macro function places them into is naive with respect to block structure. Every macro the appropriately labeled holes of some expansion pattern. function can cause the capture of free user identifiers and Free identifiers in user code may unintentionally be cap- thus corrupt intended bindings. We propose a change to tured by macro-generated bindings. For example, a macro the expansion algorithm so that macros will only violate function for or-exp~ions in Lisp may be understood as the binding discipline when it is explicitly intended. a transformation from patterns of the type (or (ezph (szp)2) 1. Problems with Macro Expansions to an expansion pattern like Lisp macro functions are powerful tools for the extension (let ~ [ ](~7), (if e v [ ](,~,),)).l of language syntax. They allow programmers to add new In other words, the or-macro fills the hole [ ](-i,), syntactic constructs to a programming language. A pro- with (cxp)i. An instance like (or nil v) is transcribed stammer specifies a macro function which translates ac- to (let v nil (if v v v)). This example reveals that the tual instances of a syntactic extension to core language ex- capturing of free user identifiers is dangerous. The ex- pressions. This process can also be pyramided, i.e., macro panded expression will always produce the value nil, in- functions may translate into an already extended language dependently of the value of the user identifier v, quite [5]. The defined set of macro functions is coordinated by a contrary to the expectations of a programmer. preprocessor, usually called a macro expander. The macro The real danger of these erroneous macros is that they expander parses every user input. If the expander finds are treacherous. They work in all cases but one: when the an instance of a syntactic e~tension, it applies the appro- user--or some other macro writer--inadvertently picks priate macro function. It repeats this process until an the wrong identifier name. expression of the core language is obtained. Various techniques have been proposed to circumvent In most current Lisp systems the expander's task is this capturing problem, but they rely on the individual confined to the process of finding syntactic extensions and macro writer. If even one of the many macro writers is replacing them by their expansions. This implies, in par- negligent, the maCro system becomes unsafe. We claim ticular, that each macro function is responsible for the that the task of safely renaming macro-generated identi- integrity of the program. For Lisp systems (and other tiers is mechanical, it is essentially an o-conversion [2], p. languages with similar macro facilities) this means specif- 26, which is knowledgeable about the origin of identifiers. ically that variable bindings must not be corrupted. This, For these reasons we propose a change to the naive macro however, is not as simple a task as it sounds. expansion algorithm which automatically maintains hy- Macro functions in Lisp generally act like the con- gienic conditions during expansion time. text filling operation in the A-calculus [2], p.29. Given The rest of the paper is devoted to the presentation of the problem definition and the new algorithm. The sec- ond section describes our target programming language and its macro expander language. In the third section we discuss the naive macro expansion algorithm. Section 4 permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct contains the hygienic expansion algorithm and a correct- commercial advantage, the ACM copyright notice and the title of ness theorem. In Section $ we show how to extend the the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specfic 1 T~tsbtb,~*o(Oml~,(W,,I I~,h)) I I~,). permission. © 1986 ACM 0-89791-200-4/86/0800.0151 75¢ 151 solution to cover important constructs of Lisp. The last The set mstree is the sublanguage of syntactic cztensions. section highlights the merits of the new algorithm and its We assume that instances of macro expressions are implications for macro writers. recognized by the presence of macro tokens, i.e. elements 2. Language Considerations of a distinguished set maetok. Macro tokens can either be A macro expander maps expressions from an extended syntactic extensions by themselves or are the first compo- programming language to an expression in a core lan- nent of an arbitrarily long syntax tree: guage. Hence, our first consideration must be the source mstree ::= mactok [ (mactok street.., strsem) and target language of the expander. for all n > 0. The most important aspect of the target language Since this syntax is ambiguous, we add the provision that with respect to the capturing problem is its lexical scop- an expression of the form (m a) with m ~. maetok and ing mechanism. We have chosen to use the A-calculus s E stree is a ~ntactic extension; it is not an application. as it is the prototype of block structured programming See Figure 1 for a summary of the definitions. languages. It is syntactically simple, yet contains all the required elements to make the case interesting, and has 3. The Naive Approach to Macro Expansion the right level of complexity. Furthermore, it is a fairly Before we can describe the expansion algorithm which is. trivial task to generalize an algorithm for the A-calculus currently employed in Lisp systems, we need to define to Common Lisp [6], Scheme [3], or Algol-like languages. some terminology. Recall that a possible form for a syn- The variant of the A-calculus we use as our target tactic extension is: programming language is defined by the grammar:. Atcrm ::-- vat ( mactok street.., stree,~ ). [ const The trees street through streem are called the syntactic [ (lambda ear Atcrm) scope of the extension. [ (Aterm Atcrm). We say a syntactic extension or any Atcrm-expression secure in a syntax tree if it is a subtree that is not nested The characters "(~, ~)', and ~lambda" are terminal sym- within the syntactic scope of another syntactic exten- bols and are collectively referred to as the set of core to- sion. For example, the syntactic extension (or z y) occurs kens: coretok = {(,), lambda}. The set const includes within the expression all constants commonly found in Lisp such as strings, vec- tors, numbem, closures, etc. The set ear is composed of (lambda z (or z y)), Lisp symbols that are used as identifier names; it is dis- joint from the set of core tokens. but it does not occur within Variable and constant expressions stand for values (case tag (name(or z y))) as usual. Abstractions, i.e., lambda-expressions, rep- nor within resent procedures of one variable. The lambda-bound (caee tag (or = y)) identifier--the parameter--can only be referred to within the abstraction body, i.e. identifiers are lexically scoped. because it is in the syntactic scope of the case-extension. We call the occurrence of a variable in the parameter part The notion of occurrence reflects the fact that every of a lambda*expreesion its binding instance. Applications sentence in the language of syntax trees has two interpre- correspond to function invocations. tations. It may be considered as an element of the Aterm- The source language needs to be an extension of the language or as a proper syntactic extension. Since the target language. It must allow for one kind of an expres- expansion of a syntactic extension involves a rearrange- sion which is only specified in a rather general way. The ment of (parts of) syntax trees in its syntactic scope, we concrete extension of Atcrm as defined by the indivdual can only be sure about the interpretation of an expression macros will be specializations of this language. We refer when it is not embedded in a syntactic extension. In the to the source language as the language of syntax trees and above example, the list (or z y) is in the first two cases define it inductively by: a syntactic extension; in the third one, however, it only stre¢ ::= ear stands for a list with or in the first position and z and y in the rest of the list. The same is true if we replace the [ const or by lambda. ] mstrec A syntactic transform/unction (E STF) is a macro I (lambda ear strce) function which is defined by the macro writer and which [ (stre¢ stree). expands a particular class of syntactic extensions," e.g., the or-macro of Section 1. The result of applying a trans- ter each transcription step. Bat tkis impression is too form function to an occurrence of a syntactic extension simplistic. Generated identifiers may act as free variables is called a transcription. A transcription step is the one- which are to be captured by the user-supplied program step expansion of a syntactic extension. Symbols which context. They must not be renamed. On the other hand, are introduced during a transcription step are referred to since macro expansion is possibly pyramided it may also as generated symbols. not be quite obvious after a transcription step which iden- tifier is to be free and which one is to be bound. A final The set of macros used during an expansion is a syn- difficulty is that sometimes capturing is desired.