<<

Down with : Dynamic Scope Analysis

Matthias Neubauer Michael Sperber Institut fur¨ Informatik Wilhelm-Schickard-Institut fur¨ Informatik Universitat¨ Freiburg Universitat¨ Tubingen¨ [email protected] [email protected]

ABSTRACT comparatively little focus on significant improvements of the It is possible to translate code written in Emacs Lisp or an- Emacs Lisp interpreter. other Lisp dialect which uses dynamic scoping to a more On the other hand, recent years have seen the advent modern with lexical scoping while of a large number of extension language implementations largely preserving structure and readability of the code. The of full programming languages suitable for the inclusion in biggest obstacle to such an idiomatic translation from Emacs application software. Specifically, several current Scheme Lisp is the translation of dynamic binding into suitable in- implementations are technologically much better suited as stances of lexical binding: Many binding constructs in real an extension language for Emacs than Emacs Lisp itself. programs in fact exhibit identical behavior under both dy- In fact, the official long-range plan for GNU Emacs is to namic and lexical binding. An idiomatic translation needs to replace the Emacs Lisp substrate with Guile, also a Scheme detect as many of these binding constructs as possible and implementation [28]. The work presented here is part of a convert them into lexical binding constructs in the target different, independent effort to do the same for XEmacs, a language to achieve readability and efficiency of the target variant of GNU Emacs which also uses Emacs Lisp as its code. extension language. The basic prerequisite for such an idiomatic translation Replacing such a central part of an application like XEmacs is thus a dynamic scope analysis which associates variable presents difficult pragmatic problems: It is not feasible to re- occurrences with binding constructs. We present such an implement the entire Emacs Lisp code base by hand. Thus, analysis. It is an application of the Nielson/Nielson frame- a successful migration requires at least the following ingre- work for flow analysis to a semantics for dynamic binding dients: akin to Moreau’. Its implementation handles a substantial Emacs Lisp code must continue to run unchanged for portion of Emacs Lisp, has been applied to realistic Emacs • a transitory period. Lisp code, and is highly accurate and reasonably efficient in practice. An automatic tool translates Emacs Lisp code into the • language of the new substrate, and it must produce 1. MIGRATING EMACS LISP maintainable code. Emacs Lisp [16, 29] is a popular programming language Whereas the first of these ingredients is not particularly hard for a considerable number of desktop applications which run to implement (either by keeping the old Emacs Lisp imple- within the Emacs editor or one of its variants. The actively mentation around or by re-implementing an Emacs Lisp en- maintained code base measures at around 1,000,000 loc1. gine in the new substrate), the second is more difficult. Even As the Emacs Lisp code base is growing, the language is though a direct one-to-one translation of Emacs Lisp into showing its age: It lacks important concepts from modern a modern latently-typed functional language is straightfor- practice as well as provisions for ward by using dynamic assignment or dynamic-environment large-scale modularity. Its implementations are slow com- passing to implement dynamic scoping, it does not result in pared to mainstream implementations of other Lisp dialects. maintainable output code: Users of modern functional lan- Moreover, the development of both Emacs dialects places guages use dynamic binding only in very limited contexts

1 such as exception handling or parameterization. As it turns The XEmacs package collection which includes many pop- out, the situation is not much different for Emacs Lisp users: ular add-ons and applications currently contains more than 700,000 loc. For many lets and other binding constructs in real Emacs Lisp code, dynamic scope and lexical scope are identical! Consequently, a good “idiomatic” translation of Emacs Lisp into, say, Scheme, should convert these binding constructs Permission to make digital or hard copies of all or part of this work for into the corresponding lexical binding constructs of the tar- personal or classroom use is granted without fee provided that copies are get substrate. not made or distributed for profit or commercial advantage and that copies The only problem is to recognize these binding constructs, bear this notice and the full citation on the first page. To copy otherwise, to or rather, distinguish those where the programmer “meant” republish, to post on servers or to redistribute to lists, requires prior specific dynamic scope from those where she “meant” lexical scope. permission and/or a fee. ICFP’01, September 3-5, 2001, Florence, Italy. Since with dynamic scope, bindings travel through the pro- Copyright 2001 ACM 1-58113-415-0/01/0009 ...$5.00. gram execution much as values do, this requires a proper (let* ((filename (expand-file-name filename)) (let ((file-name-handler-alist nil) (file (file-name-nondirectory filename)) (format-alist nil) (dir (file-name-directory filename)) (after-insert-file-functions nil) (comp (file-name-all-completions file dir)) (coding-system-for-read ’binary) newest) (coding-system-for-write ’binary) (while comp (find-buffer-file-type-function (setq file (concat dir (car comp)) (if (fboundp ’find-buffer-file-type) comp (cdr comp)) (symbol-function ’find-buffer-file-type) (if (and (backup-file-name-p file) nil))) (or (null newest) (unwind-protect (file-newer-than-file-p file newest))) (progn (setq newest file))) (fset ’find-buffer-file-type newest)) (lambda (filename) t)) (insert-file-contents Figure 1: Typical usage of let in Emacs Lisp. filename visit start end replace)) (if find-buffer-file-type-function (fset ’find-buffer-file-type find-buffer-file-type-function) flow analysis. This paper presents such an analysis called (fmakunbound ’find-buffer-file-type)))) dynamic scope analysis. Specifically, our contributions are the following: Figure 2: Parameterizations via dynamic let in We have formulated a semantics for a subset of Emacs Emacs Lisp. • Lisp, called Mini Emacs Lisp, similar to the sequential evaluation function for Λd by Moreau [20]. We have applied the flow analysis framework of Niel- contains five variable bindings, all introducing temporary • son and Nielson [22] to the semantics, resulting in an names for intermediate values. The bindings of the vari- acceptability relation for flow analyses of Mini Emacs ables filename, file, dir, comp, and newest are all visible Lisp programs. in the other functions reachable from the body of the let, yet none of them contain occurrences of these names. The We have used the acceptability relation to formulate only variable occurrences which access the bindings are in • and implement a flow analysis for Emacs Lisp which the body of the let* itself, and all are within the lexical tracks the flow of bindings in addition to the flow of scope of the bindings. Hence, translating the let* into a values. lexically-scoped counterpart in the target language would preserve the behavior of this function. We have applied the analysis to real Emacs Lisp code. Figure 2 shows an example for idiomatic use of dynamic • More specifically, the analysis is able to handle medium- binding (also taken from files.el): It is part of the im- sized real-world examples with high accuracy and rea- plementation of insert-file-contents-literally which sonable efficiency. calls insert-file-contents in the body of the let. The The work presented here is a part of the el2scm project that definition of insert-file-contents indeed contains occur- works on the migration from Emacs Lisp to Scheme. How- rences of the variables bound in the let with the exception ever, the other aspects of the translation (such as front-end of find-buffer-file-type-function. Therefore, it is not issues, correct handling of symbols, the code-data duality, permissible to translate the let with a lexically-scoped bind- treatment of primitives and so on) are outside the (lexical) ing construct. scope of this paper. Indeed, the analysis could be used for a For the vast majority of binding constructs in real Emacs number of other purposes, among them the development of Lisp code, dynamic scope and lexical scope coincide. Thus, an efficient compiler for Emacs Lisp, or the translation to a the ultimate goal of the analysis is to detect as many of these different substrate such as . bindings constructs as possible. In general however, value flow and the flow of bindings in- Overview. The next section presents some code examples teract during the evaluation of Emacs Lisp programs. Hence, which show the need for a dynamic scope analysis. Sec- it is not possible to apply standard flow analyses based on tion 3 defines the syntax of Mini Emacs Lisp. Section 4 lexical-binding semantics to solve the problem; a new anal- develops an operational semantics with evaluation contexts. ysis is necessary. Based on the semantics, Section 5 presents a specification of a correct flow analysis. The next section sketches a cor- 3. SYNTAX OF MINI EMACS LISP rectness proof. Our implementation approach is described For the sake of simplicity, we concentrate on a subset in Section 7. Section 8 describes some experimental results of Emacs Lisp called Mini Emacs Lisp in the paper. We gained with our implementation prototype. We end with a omit multi-parameter (and variable-parameter) functions, discussion of related work and a conclusion. catch/throw, dual name spaces for functions and “ordinary” values, the resulting gratuitous split between funcall and 2. EXAMPLES regular application as well as the data/code duality which Consider the Emacs Lisp code shown in Figure 1, taken appears in various contexts in Emacs Lisp. Adding these literally from files.el in the current XEmacs core. It features to the analysis is straightforward and does not re- quire significant new insights, which is why we omit it here. Only the value bound to a variable by a bind term does not Our implementation of the analysis does treat all of these carry a because bind expressions only show up during features. evaluation, but not in the analysis which only looks at the Here is the syntax for Mini Emacs Lisp: . l Lab ::= ... s, x ∈ SymVar ::= fritz franz ... 4.2 Environments ∈ Lit ::= 0 1 |2 ... | Environments ρ are finite mapping from symbols to values b ∈ Prim ::= cons| | car| ... and contain bindings: ∈ | | t Term ::= c ρ Env = SymVar fin Val. ∈ → ∈ (quote s) | (lambda (x) ) The notation for the empty environment is []. The modifi- | x cation of an existing environment through the new mapping | (setq x e) of a symbol x to a value v is written as ρ[x v]. | 7→ (e0 e1) | (let x e1 e2) 4.3 Evaluation Contexts | (if e0 e1 e2) Here are the evaluation contexts for Mini Emacs Lisp: | (b e1 . . . en) | EvalContext ::= [ ] l E ∈ − l e Exp ::= t ( e2) ∈ | E l (if e1 e2) | E l d Def ::= (defvar x e) (let x e2) ∈ | E l (defun x0 (x1) e) (bind x v ) | | El (setq x ) p Prg ::= d∗ e | E l (p v∗ e∗) ∈ | E All expressions carry unique labels which the analysis uses for identifying locations in the program source. The set of x0 VarContext(x0) ::= [ ] V ∈ − l literals is trivially extensible. Note that Emacs Lisp uses the ( e2) | E l nil symbol for boolean false, and everything else for true. (if x0 e1 e2) | V l An Emacs Lisp program consists of a sequence of defini- (let x x0 e2) | V l tions followed by a single expression—the entry point of the (bind x1 v x ) | V 0 program. if x0 = x1 l6 (setq x x0 ) | V l 4. A SEMANTICS FOR MINI EMACS LISP (b v∗ x e∗) | V 0 We present a structural operational or small-step seman- tics [23] for Mini Emacs Lisp. We use evaluation contexts The rules for EvalContext describes all contexts in which and syntactic rewriting as developed by Felleisen and Fried- a reduction step in Mini Emacs Lisp can occur. Variable ac- man [6]. cess needs the most recent dynamic binding of the variable. The variable contexts in VarContext(x0) help accomplish 4.1 Values and Intermediate Terms this; they describe all contexts that do not contain any bind- We use separate syntactic categories for intermediate ex- ings associated with the symbol x0. pressions and values. Here is the syntax for literals and abstractions: 4.4 Reductions f Fun ::= (func (x) e) An evaluation state consists of a partially evaluated ex- ∈ pression and a global environment. Thus, a configuration v Val ::= (prim c) γ of Conf is a tuple consisting of an environment and a ∈ (sym s) current expression: | f | l l (pair v 1 v 2 ) γ Conf = Env Exp. | 1 2 ∈ × The primitive steps of the evaluation process are reduction it ITerm ::= (bind x v e) ∈ rules. Some expressions immediately reduce to a value: l l e Exp ::= v it l l ∈ | [c] ρ, [c ] ρ, [(prim c) ] The elements of Val, called values, are results from success- E → E ful computations. They represent primitive values, symbols [quote] ρ, [(quote s)l] ρ, [(sym s)l] and functions, and correspond to the the expressions of Exp E → E which produce them. [lambda] ρ, [(lambda (x) e)l] ρ, [(func (x) e)l] The semantics uses intermediate bind terms to handle dy- E → E namic binding: They result from reducing let expressions Note that in Emacs Lisp, abstractions do not evaluate to with the value to be bound to the variable already evaluated. closures—this is dynamic scope, after all. Expressions attach labels to values and intermediate terms. Here are the semantic mechanics for dealing with variable access: putting all possible configurations before and after a reduc-

l l0 tion step during evaluation in relation. Its reflexive transi- [var] ρ, [(bind x v x [x ]) ] ? E V l →l0 tive closure is written . ρ, [(bind x v x [v ]) ] → E V l l 4.5 Expression Contexts [varglob] ρ, x [x ] ρ, x [v ] if x dom(ρ), ρ(x) = v V → V ∈ So far, only the meaning of expression is defined by the A variable may have either a local or a global binding. The relation. For programs, we define another kind of context,→ let and lambda constructs introduce local bindings. For a the expression contexts of ExpContext: variable occurrence, the closest bind context for that vari- able holds its value. The [var] rule expresses this behavior; ExpContext ::= (defvar x [ ]) p the context x guarantees that there is no other binding X ∈ [ ] − closer to theV variable. Lacking a local binding, a global one | − must apply; the [varglob] rule takes over. The machinery for mutating bindings by setq is analogous to the one for referencing variables: 4.6 Reductions for Programs

l0 l l1 Equipped with the notion of program configurations [setq] ρ, [(bind x v x [(setq x v0 ) ]) ] E V l l1 → ρ, [(bind x v0 x [v0]) ] δ PConf = Env Prg, E V ∈ × l0 l l [setqglobal] ρ, x [(setq x v ) ] ρ[x v0], x [v ] V 0 → 7→ V 0 as well as contexts for programs and the reduction relation for expressions, it is possibleX to state the rewriting rules In the case of a local binding, the [setq] rule changes the → d for programs: value in the corresponding bind context. Assignments to → global variables mutate the global environment. l0 [defvar] ρ, (defvar x v ) p d ρ[x v0], p Here are the reductions for function applications and local 0 → 7→ variable bindings: if x dom(ρ) 6∈ l0 l l [app] ρ, [((func (x1) e2) e1) ] ρ, [(let x1 e1 e2) ] [defun] ρ, (defun x (x ) e) p E → E 0 1 d ρ[x0 (func (x1) e)], p [let] ρ, [ x vl1 e l] ρ, [ x v e l] → 7→ (let 1 2) (bind 1 2) if x0 dom(ρ) E → E 6∈ l1 l l [bind] ρ, [(bind x v0 v ) ] ρ, [v ] ρ, e ρ0, e0 E 1 → E 1 [exp] → ρ, [e] d ρ0, [e0] The [app] rule reduces a function application to a binding of X → X the function parameter wrapped around the function body The [defvar] and [defun] rules satisfy top-level definitions. and the environment. The [defvar] rule inserts the new global binding in the vari- The [let] rule of EvalContext turns a into able environment ρ. The condition x dom(ρ) guarantees a corresponding bind expression. Evaluation continues with that there is only one global variable6∈ for every name. The the body e2 until it becomes a value. Then, the [bind] rule [defun] rule does the equivalent for procedures. The [exp] removes the obsolete context. rule allows the use of all the reductions for expressions at Note that the distinction between let expressions and the places defined using the contexts ExpContext. Again bind expressions is unnecessary when considering only the d∗ is the reflexive transitive closure of d. semantics, but the formulation of the flow analysis requires → → their separation. 4.7 The Evaluation Function of Programs The [if1] and [if2] rules handle conditionals: The reduction relation d rewrites the program until it l0 l1 l l → [if1] ρ, [(if v t1 e2) ] ρ, [t1] if v = (sym nil) gets a final answer. This does not always happen: the pro- E → E 6 gram may loop in which case the reduction sequence is in- l0 l2 l l [if2] ρ, [(if v e1 t2 ) ] ρ, [t2] if v = (sym nil) finite, or evaluation may get stuck at a configuration with E → E no matching reduction rule. Thus, the reduction relation Here are reduction rules for selected primitives, namely those induces a partial evaluation function : dealing with pairs:

l1 l2 l l1 l2 l eval : Prg 99K Val [cons] ρ, [(cons v1 v2 ) ] ρ, [(pair v1 v2 ) ] E → E v if [], p d∗ ρ, v for some ρ eval(p) = ( → [car] ρ, [(car (pair vl1 vl2 )l0 )l] ρ, [vl ] undefined otherwise E 1 2 → E 1

l1 l2 l0 l l [cdr] ρ, [(cdr (pair v1 v2 ) ) ] ρ, [v2] E → E 5. SPECIFICATION OF THE ANALYSIS The [cons] rule produces a pair value from two argument This section specifies a flow analysis for Mini Emacs Lisp. values. Car selects the first component of pairs by rule [car], With the help of the definitions for the abstract domains the [cdr] rule handles cdr. The combination of the reduction rules defines the reduc- of the analysis we define an acceptability relation for correct flow analyses which employ these domains. The actual anal- tion relation ysis results directly from the definition of the acceptability Conf Conf relation. → ⊆ × 5.1 Abstract Domains The [c], [quote], and [lam] clauses register their abstract Here are the abstract domains of the analysis: counterpart in the abstract cache under the program point l and the current birthplace environment bpe. Note that bp BP = Lab [lam] does not require that the analysis is also valid for the ∈ ∪ {} bpe BPEnvd\ = SymVar BP body of each lambda term, because an acceptable analysis ∈ → d must only treat the reachable functions correctly. pˆ Cons\ = Lab Lab BPEnv\ ∈ × × 5.2.2 Expressions ˆv Val = (SymVar ω Fun Cons\) ∈ P ∪ { } ∪ ∪ Occurrences of variable references and mutations induce d further validity constraints. The [var] rule for variable ref- ρˆ Env = (SymVar BP) Val ∈ × → erences enforces that the abstract value for the variable x Cˆ Cached\ = (Lab BPEnv\d) Vald ∈ × → and its current birthplace bpe(x), held in the abstract envi- d ronment, must be a subset of the abstract value that linked Birthplaces—BP for short—denote syntactic locations of it to its label and birthplace environment in the abstract variable bindings. The stands for top-level bindings. The d cache: label of the body of a function or of a let expression serves l as the birthplace for the binding it creates. [var](Cˆ, ρˆ) =bpe x | Birthplace environments BPEnv\ are abstractions over iff ρˆ(x, bpe(x)) Cˆ(l, bpe) regular variable environments; they map variables to birth- ⊆ The [setq] clause enforces that the analysis for the right- places instead of regular values. hand side is also valid. Moreover, a valid analysis allows Cons\ is one part of the abstract value domain; it is the values that result from the subexpression t0 to be possible set of all possible abstract pairs and contains all triples of values for the variable x under the current bindings bpe and two labels and a birthplace environment. The two labels also for the whole expression: are the labels of these two argument subexpressions of the cons expression which created the pair. The birthplace en- ˆ l0 l [setq](C, ρˆ) =bpe (setq x t0 ) vironment registers the abstract bindings active at the time | l0 iff (Cˆ, ρˆ) =bpe t of creation of the pair. Registering the birthplace environ- | 0 ∧ Cˆ(l0, bpe) ρˆ(x, bpe(x)) ment is necessary because we differentiate program points ⊆ ∧ Cˆ(l0, bpe) Cˆ(l, bpe) depending on the birthplace environments they occur under. ⊆ Val is the set of all possible abstract values ˆv. An abstract The [app] clause specifies the constraints for procedure valued represents a set of run-time values. Not every run-time l0 calls. Its first and second condition, (Cˆ, ρˆ) =bpe t and value is relevant to the analysis: the single symbol ω repre- | 0 (Cˆ, ρˆ) = tl1 , guarantee that the analysis is also valid un- sents all primitive values except for symbols. The analysis bpe 1 der the| same birthplace environment for the operator t and tracks symbols needed (eventually) for variable names, func- 0 the operand t : tions, primitive values, and abstract pairs. 1 An abstract cache Cˆ of Cache\ is an abstract profile of ˆ l0 l1 l [app](C, ρˆ) =bpe (t0 t1 ) all values which occur during a program run. It tracks the | l0 iff (Cˆ, ρˆ) =bpe t abstract values of program subexpressions, differentiated by | 0 ∧ (Cˆ, ρˆ) = tl1 birthplace environment. bpe 1 | ∧lb ˆ ( (func (x1) tb ) C(l0, bpe). An abstract environment ρˆ of Env is a union of the en- ∀ l ∈ (Cˆ, ρˆ) = t b vironments that occur during the evaluation of a program. bpe0 b d ˆ | ∧ It associates a variable name and one of its birthplaces with C(l1, bpe) ρˆ(x1, lb) ˆ ⊆ ˆ ∧ an abstract value. C(lb, bpe0) C(l, bpe) ⊆ ∧ where bpe = bpe[x1 lb]) 5.2 Acceptability for Programs 0 7→ lb We define an acceptability relation for programs =: Every function (func (x1) tb ) which might occur in the | operator position t0 of a procedure call under bpe must have = Cache\ Env BPEnv\ Prg. a valid analysis for its body as well, under an expanded | ⊆ × × × d birthplace environment bpe which contains a binding for The = relation defines the validity of analyses (Cˆ, ρˆ) with 0 | the function parameter x1. Moreover, the analysis must regard to a program p and a current birthplace environment link the abstract values of the argument with those of the bpe. From now on, the notation is formal parameter x1 as well as the possible results of the

(Cˆ, ρˆ) =bpe p. body with the those of the whole expression. | The [let] clause works analogously to function application: 5.2.1 Value Expressions l1 l2 l [let](Cˆ, ρˆ) =bpe (let x t t ) | 1 2 iff (Cˆ, ρˆ) = tl1 ˆ l bpe 1 [c](C, ρˆ) =bpe c ˆ | l2∧ | (C, ρˆ) =bpe t2 iff ω Cˆ(l, bpe) | 0 ∧ Cˆ(l , bpe) ρˆ(x, l ) ˆ ∈ l 1 2 [quote](C, ρˆ) =bpe (quote s) ˆ ⊆ ˆ ∧ | C(l2, bpe0) C(l, bpe) iff s Cˆ(l, bpe) ⊆ where bpe = bpe[x l2] ∈ l 0 [lam](Cˆ, ρˆ) =bpe (lambda (x) e0) 7→ | iff (func (x) e0) Cˆ(l, bpe) In the [if] clause, each branch contributes to a valid analysis: ∈ that valid analyses agree with the semantics—that is, that they are semantically correct. However, the reduction rules [if](Cˆ, ρˆ) = (if tl0 tl1 tl2 )l bpe 0 1 2 generate intermediate terms not covered by the rules so far. ˆ| l0 iff (C, ρˆ) =bpe t0 Here they are: ˆ | l1 ∧ (C, ρˆ) =bpe t1 | ∧ ˆ l Cˆ(l1, bpe) Cˆ(l, bpe) [const](C, ρˆ) =bpe (prim c) ˆ ⊆ l2 ∧ | ˆ (C, ρˆ) =bpe t2 iff ω C(l, bpe) | ∧ ∈ l Cˆ(l , bpe) Cˆ(l, bpe) [sym](Cˆ, ρˆ) =bpe (sym s) 2 | ⊆ iff s Cˆ(l, bpe) 5.2.3 Primitives ∈ l [proc](Cˆ, ρˆ) =bpe (func (x) e0) | Each primitive has its own associated flow behavior. Pair iff (func (x) e0) Cˆ(l, bpe) construction and selection serve as examples. ∈ The [cons] rule for the pair constructor is straightforward: The [const], [sym], and [proc] clauses are identical to their A cons produces an abstract pair from abstract values with equivalents [c], [quote], and [lam] because their semantics the labels of its arguments, under the original birthplace are identical. environment: The [pair] clause is a simpler version of the [cons] clause. The only difference is—since a pair already carries two val- [cons](Cˆ, ρˆ) = tl1 tl2 l bpe (cons 1 2 ) ues in it—that it is unknown under which prior birthplace ˆ| l1 iff (C, ρˆ) =bpe t1 environment the evaluation took place. The only require- ˆ | l2 ∧ (C, ρˆ) =bpe t2 ment is that a suitable birthplace environment bpe exists: | ∧ 0 (l1, l2, bpe) Cˆ(l, bpe) ∈ ˆ l1 l2 l [pair](C, ρˆ) =bpe (pair v1 v2 ) The [car] and [cdr] clauses are also straightforward: They | ˆ l1 iff bpe0.(C, ρˆ) =bpe0 v1 induce validity constraints on the argument, and then sim- ∃ | l2 ∧ (Cˆ, ρˆ) =bpe v ply pick the first or second component respectively of the | 0 2 ∧ (l1, l2, bpe ) Cˆ(l, bpe) abstract pairs flowing into it: 0 ∈ ˆ l0 l [car](C, ρˆ) =bpe (car t0 ) 5.2.6 Intermediate Expressions ˆ| l0 iff (C, ρˆ) =bpe t0 The final clause in the definition of acceptability handles | ∧ ˆ ( (l1, l2, bpe0) C(l0, bpe). intermediate bind expressions. A bind expression binds a ∀ˆ ∈ˆ variable x to a value v during the evaluation of the body C(l1, bpe0) C(l, bpe)) 1 ˆ ⊆l0 l t2: [cdr](C, ρˆ) =bpe (cdr t0 ) | l ˆ 0 l2 l iff (C, ρˆ) =bpe t0 [bind](Cˆ, ρˆ) = (bind x v t ) | ∧ bpe 1 2 ˆ | l2 ( (l1, l2, bpe0) C(l0, bpe). iff (Cˆ, ρˆ) =bpe t ∀ˆ ∈ˆ | 0 2 ∧ C(l2, bpe0) C(l, bpe)) Cˆ(l2, bpe ) Cˆ(l, bpe) ⊆ 0 ⊆ ∧ v1 (ˆρ(x, l2), Cˆ) 5.2.4 Definitions A where bpe0 = bpe[x l2] The [defvar] and [defun] clauses extend the notion of 7→ valid scope analyses to entire programs. The [defvar] clause The [bind] rule requires a valid analysis for the body un- handles variable definitions: der a suitably extended birthplace environment bpe0. More- over, the value of the body becomes the value of the bind l0 [defvar](Cˆ, ρˆ) =bpe (defvar x t ) p 0 expression. The supplementary constraint v1 (ˆρ(x, l2), Cˆ) ˆ| l0 A iff (C, ρˆ) =bpe t0 reflects that the actual new binding also has to show up | ∧ Cˆ(l0, bpe) ρˆ(x, bpe(x)) in the abstract variable environment under the the relevant ⊆ ∧ (Cˆ, ρˆ) =bpe p birthplace l2; the relation is explained in the next section. | A A valid analysis must reflect the value initially bound to the 5.3 The Approximation Relation variable. It must also associate the variable x with its ab- A Intuitively, the = relation determines if a dynamic scope stract values under the current binding bpe(x). Moreover, a ˆ | valid analysis must take into account the rest of the program analysis (C, ρˆ) correctly reflects the evaluation process of a p, too. program a in an abstract sense. The formulation of = uses the approximation relation that regulates the approxima-| The [defun] clause registers the procedure in the abstract A environment. As in the [defvar] clause, the rest of the pro- tion of values Val with abstract equivalents. Here is its gram p must be valid as well. formal definition: Val Val Cache\ A ⊆ ˆ × × [defun](Cˆ, ρˆ) =bpe (defun x0 (x1) e) p v (ˆv, C) d | A iff (func (x1) e) ρˆ(x0, ) iff c s f v1 v2. ((v = c ω ˆv) ∈  ∧ ∀ ∀ ∀ ∀ ∀ ⇒ ∈ ∧ (Cˆ, ρˆ) =bpe p (v = s s ˆv) | (v = f ⇒ f ∈ ˆv)∧ ⇒ ∈l ∧l 5.2.5 Values (v = (pair v 1 v 2 ) 1 2 ⇒ So far, the definition of the relation = for all possible bpe. (l1, l2, bpe) ˆv | ∃ ∈ ∧ expressions and programs checks whether a certain analysis v1 (Cˆ(l1, bpe), Cˆ) A ∧ for a program is valid or not. Now, the next goal is to show v2 (Cˆ(l2, bpe), Cˆ))) A holds between a value v and its correct representation expressions: asA a set of abstract values and an abstract cache. This is = Cache\ Env BPEnv\ Conf straightforward except for the treatment of pairs: The rep- | ⊆ × × × d resentation of a pair consists of its components’ creation (Cˆ, ρˆ) =bpe ρ, e iff (Cˆ, ρˆ) =bpe ρ points and a birthplace environment. An abstract repre- | | ∧ (Cˆ, ρˆ) =bpe e sentation however must also map to abstract values for its | components. This is why a value cache Cˆ participates in the Furthermore, it is possible to define an acceptability relation definition of . for program configurations—combinations of programs and A environments: 5.4 The Well-Definedness of = | = Cache\ Env BPEnv\ PConf It is not immediately clear that the acceptability relation | ⊆ × × × ˆ d ˆ = from Subsection 5.2 is unambiguous. Structural induction (C, ρˆ) =bpe ρ, p iff (C, ρˆ) =bpe ρ | | | ∧ by itself is not sufficient because the [app] clause is not (Cˆ, ρˆ) =bpe p compositional. | On the other hand, the specifications of = can be consid- 6. SEMANTIC CORRECTNESS | ered as a functional The semantics developed in Section 4 employs evaluation : (Cache\ Env BPEnv\ Exp) contexts and rewriting rules. Hence, the specification of the Q P × × × → (Cache\ dEnv BPEnv\ Exp) semantics uses almost exclusively syntactical means with the P × × × exception of the notion of environments: a program tran- d with sitions through a sequence of configurations which include l l valid programs or expressions until it reaches a final value, (Cˆ, ρ,ˆ bpe, (let x t 1 t 2 )l) () iff 1 2 ∈ Q gets stuck or loops forever. R(Cˆ, ρ,ˆ bpe, tl1 ) 1 The definition of the acceptability relation in the previ- ˆ ∧ l2 R(C, ρ,ˆ bpe[x l2], t2 ) ous section was derived intuitively. A correctness proof is ˆ 7→ ∧ C(l1, bpe) ρˆ(x, l2) necessary which must show that every valid analysis stays ⊆ ∧ Cˆ(l2, bpe[x l2]) Cˆ(l, bpe) valid under the evaluation process. 7→ ⊆ (Cˆ, ρ,ˆ bpe,...) (R) iff ... This section summarizes the most import lemmas and ∈ Q theorems involved in the proof. For details, the reader is This change in perspective leads to a specification of = using referred to Neubauer’s thesis dissertation [21]. The first sound mathematical means. is a monotone function| on Q lemma states that a dynamic scope analysis is valid for a the complete lattice value if and only if the value is part of the abstract cache at ( (Cache\ Env BPEnv\ Exp), ) its label and the given bpe: P × × × v d l because Lemma 1 (Cˆ, ρˆ) =bpe v iff v (Cˆ(l, bpe), Cˆ) | A ( (Cache\ Env BPEnv\ Exp), ) is a com- • P × × × v plete lattice withd respect to the partial order R1 Proof. By structural induction over v. v R2 iff t.t R1 t R2, and ∀ ∈ ⇒ ∈ Another lemma states the obvious assumption, that if an is a monotone function on this complete lattice— abstract value ˆv is a correct approximation of a true value v, •Q 1 that is, it is also a correct approximation of another abstract value R1,R2.R1 R2 (R1) (R2). ˆv which includes the former one: ∀ v ⇒ Q v Q 2 Consequently, has a greatest fixed point. Thus, a well- ˆ ˆ Q Lemma 2 If v (ˆv1, C) and also ˆv1 ˆv2 then v (ˆv2, C). defined definition of = works by coinduction as A ⊆ A | = := gfp( ). | Q Proof. By structural induction over v. Each case of the 5.5 Acceptability for Environments proof is obtained individually by inspecting the definition of . Since dynamic scope analysis is ultimately concerned with A scope and hence with environments, it is necessary to extend The specification of the acceptability relation has the im- the notion of acceptability to environments: portant property stated by the following lemma: if an anal- ysis is valid for a term t at label t , and the abstract values = Cache\ Env BPEnv\ Env 1 | ⊆ × × × flowing through it are all contained in the values flowing d (Cˆ, ρˆ) =bpe ρ iff x dom(ρ).ρ(x) (ˆρ(x, bpe(x)), Cˆ) through label l2, the analysis is also valid at label l2: | ∀ ∈ A This acceptability relation for environments examines ev- Lemma 3 If (Cˆ, ρˆ) = tl1 and Cˆ(l , bpe) Cˆ(l , bpe) then ery binding in an actual environment which occurs during bpe 1 2 l2 | ⊆ also (Cˆ, ρˆ) =bpe t . evaluation and relates it to its abstract counterpart for cor- | rectness. Proof. by case analysis over the rules of Term. As an 5.6 Acceptability for Configurations example, here is the case for setq expressions: The combination of the acceptability relation for programs From the first premise with that for environments produces an acceptability rela- l0 l1 (Cˆ, ρˆ) =bpe (setq x t ) tion for configurations—combinations of environments and | 0 follows 7. IMPLEMENTATION The definition of the acceptability relation presented in ˆ l0 (C, ρˆ) =bpe t0 (1) Section 5.2 is a blueprint for a practical implementation of | a dynamic scope analysis. Our own analysis is constraint- Cˆ(l0, bpe) ρˆ(x, bpe(x)) (2) ⊆ based [1]; it uses a set of syntactic entities to represent appli- ˆ ˆ C(l0, bpe) C(l1, bpe) (3) cations of the rules generated by the acceptability relation. ⊆ The analysis, just like every other constraint-based program by the [setq] clause of =. The assumption together with analysis, consists of two phases: constraint generation and | (3) yields constraint simplification. In the following we consider a fixed program p and de- ∗ Cˆ(l0, bpe) Cˆ(l2, bpe). (4) scribe how to compute the least dynamic flow analysis for p ⊆ which is acceptable with respect to the acceptability relation∗ The backwards application of the [setq] clause together with =. (1) and (2) yields the proposition. The other cases work | Since the program p is finite, it is possible to enumerate analogously. all its occurring labels,∗ symbol, and functions. We call these finite sets Lab , SymVar and Fun , respectively. Simi- Another central insight is that the validity of the dynamic larly, the sets of∗ possibly occurring∗ birthplace∗ environments scope analysis of an expression carries over those subexpres- BPEnv\ and possibly occurring abstract pairs Cons\ are sions which are in an evaluation context. Even stronger, also identifiable∗ and finite. Accordingly, the finite set of∗ all such a subexpression can be replaced by another valid one abstract values that are conceivable for all possible program without violating its validity. With this result, the further runs of p is proof of the correctness of a reduction step can concen- ∗ trate on the possible redexes of all expressions; the following a Abs = SymVar ω Fun Cons\ . lemma then allows us to generalize the result to the big pic- ∈ ∗ ∗ ∪ { } ∪ ∗ ∪ ∗ ture. This facility is known as replacement lemma in the The finite sets serve as basis for the specification of the dy- namic scope analysis for a program p . realm of combinatory logic [13]: ∗ 7.1 Generating Constraints l1 l1 Lemma 4 If (Cˆ, ρˆ) =bpe [t ] where [t ] is carrying the In the constraints generated by the analysis, flow vari- | E 1 E 1 label l, then there exists bpe0 such that ables V stand for sets of abstract values. A flow variable Cl,bpe stands for the set of abstract values in the abstract l1 a) (Cˆ, ρˆ) =bpe t holds. cache at label l and birthplace environment bpe. A flow | 0 1 variable rx,l stands for a set of abstract values in the ab- ˆ l1 ˆ l1 stract environment. b) If also (C, ρˆ) =bpe t2 then (C, ρˆ) =bpe [t2 ]. | 0 | E A constraint co in our analysis belongs to one of three c) If also VarContext(x) then bpe (x) = bpe(x). different kinds. A simple constraint of the form E ∈ 0 a V, { } ⊆ Proof. Structural induction over . where a is an abstract value of Abs , states that a certain E abstract value a must be member of the∗ set of abstract values The first main theorem is subject reduction for expressions A.A variable constraint under the reduction relation . A valid dynamic scope V0 V1 analysis for an expression e and→ a correct approximation of ⊆ the environment stay valid after one step with for the says that the abstract values of V0 are all contained in those → resulting expression e0 and the modified environment: of V1.A conditional constraint a V = co { } ⊆ ⇒ Theorem 1 If (Cˆ, ρˆ) =bpe ρ, e and ρ, e ρ0, e0 then also | → where a is an abstract value of Abs and co is another (Cˆ, ρˆ) =bpe ρ0, e0. ∗ | constraint, states that the constraint co must hold if the abstract value a is a member of the set of abstract values denoted by V. Proof. By case analysis over the reduction relation . → By inspecting the rules of the acceptability relation, we define the function p bpe that constructs the set of con- The second theorem formulates subject reduction for entire straints to be solved,GJ K as shownM in Figure 3. Its first pa- programs, adapting the previous theorem one to the reduc- rameter is the program or expression for which constraints tion relation p: are generated. The second one, bpe, is the birthplace envi- → ronment, relative to which the generation of the constraints ˆ takes place. The third parameter, , is a set of pairs of Theorem 2 If (C, ρˆ) =bpe ρ, p and ρ, p d ρ0, p0 then also M | → a label of a body of a procedure lb and a birthplace envi- (Cˆ, ρˆ) =bpe ρ0, p0. | ronment bpe each. This set memoizes instances of pairs of procedures and birthplace environments already handled by the constraint generation. The analysis uses it to prevent Proof. By case analysis over d. generating duplicate constraints. → l c bpe = ω Cl,bpe G M l {{ } ⊆ } J(quoteK s) bpe = s Cl,bpe G M l {{ } ⊆ } J(lambda (xK) e0) bpe = (func (x) e0) Cl,bpe GJ l K M {{ } ⊆ } x bpe = rx,bpe(x) Cl,bpe M GJ K l0 l { l0 ⊆ } (setq x t0 ) bpe = t0 bpe Cl0,bpe rx,bpe(x) GJ K M GJ K M ∪ { ⊆ } Cl0,bpe Cl,bpe l0 l1 l l0 ∪ { l1 ⊆ } (t0 t1 ) bpe = t0 bpe t1 bpe M M M GJ K GJ K ∪ GJ K lb (func (x1) tb ) Cl0,bpe = co ∪ {{ }lb ⊆ ⇒ (func (x1) t ) Fun , | b ∈ ∗ bpe = bpe[x1 lb], 0 7→ (lb, bpe ) , 0 6∈ M 0 = (lb, bpe0) , M Mlb ∪ { } co tb bpe 0 M0 ∈ GJ Klb } (func (x1) tb ) Cl0,bpe = Cl1,bpe rx1,lb ∪ {{ }lb ⊆ ⇒ ⊆ (func (x1) t ) Fun b ∗ | lb ∈ } (func (x1) tb ) Cl0,bpe = Cl ,bpe[x11 l ] Cl,bpe b 7→ b ∪ {{ }lb ⊆ ⇒ ⊆ (func (x1) t ) Fun b ∗ l1 l2 l l1 l2| ∈ } (let x t1 t2 ) bpe = t1 bpe t2 bpe G M G M ∪ G M J K J K CJ l ,bpeK rx,l ∪ { 1 ⊆ 2 } Cl2,bpe[x l2] Cl,bpe l0 l1 l2 l l0 ∪ { l1 7→ ⊆ } (if t0 t1 t2 ) bpe = t0 bpe t1 bpe GJ K M GJ K M ∪ GJ K M} Cl1,bpe Cl,bpe ∪ { l2 ⊆ } t2 bpe ∪ GJ K M} Cl2,bpe Cl,bpe l1 l2 l l1 ∪ { l2 ⊆ } (cons t1 t2 ) bpe = t1 bpe t2 bpe G M G M ∪ G M J K J K J(l1,K l2, bpe) Cl,bpe l0 l l0 ∪ {{ } ⊆ } (car t0 ) bpe = t0 bpe (l1, l2, bpe0) Cl0,bpe = Cl1,bpe0 Cl,bpe GJ K M GJ K M ∪ {{ } ⊆ ⇒ ⊆ (l1, l2, bpe0) Cons ∗ l0 l l0 | ∈ } (cdr t0 ) bpe = t0 bpe (l1, l2, bpe0) Cl0,bpe = Cl2,bpe0 Cl,bpe GJ K M GJ K M ∪ {{ } ⊆ ⇒ ⊆ (l1, l2, bpe0) Cons ∗ l0 l0 | ∈ } (defvar x t0 p) bpe = t0 bpe Cl0,bpe rx,bpe(x) G M G M ∪ { ⊆ } J K J K p bpe ∪ GJ K M (defun x0 (x1) e) p bpe = (func (x1) e) rx0, p bpe GJ K M {{ } ⊆ } ∪ GJ K M Figure 3: Generating Constraints.

The constraint generation rules in Figure 3 are mostly 7.2 Solving the Constraints straightforward translations of the corresponding rules of The generated set of constraints express the behavior of the acceptability relation. all valid dynamic scope analyses. To get the least dynamic The most involved case is the treatment of procedure ap- scope analysis, we close the generated constraints under the plications. In addition to the generation of constraints for following inference rules : the terms at the operator position and the parameter po- S a V0 V0 V1 sition, every procedure flowing into the operator triggers [VarProp] { } ⊆ ⊆ the generation of constraints for its body under the current a V1 { } ⊆ birthplace environment via a conditional constraint. The treatment of the primitives and works in a a V a V = co car cdr [CondProp] { } ⊆ { } ⊆ ⇒ similar way: we do not know, which abstract pairs could co occur at the operator position. Therefore, the anaylsis gen- erates conditional constraints for all abstract pairs. and write (CO) for the closure of a set of constraints CO The well-definedness and the termination of the algorithm under . S follow by simple fix-point arguments on a underlying finite TheS actual dynamic scope analysis results from the solv- complete lattice. ing phase as all abstract values associated with a variable p bpe as specified generates a large number of condi- V after generating the initial constraints and closing those tionalGJ K constraintsM in the application and car and cdr rules, constraints under : many of which are never triggered during the constraint- S solving phase. Therefore, our implementation defers the dsa(p)(V) = a a V ( p bpe ) { | { } ⊆ ∈ S GJ K  ∅ } generation of their right-hand sides until constraint solv- where bpe denotes the top-level birthplace environment. ing. It would have been possible to specify the analysis this For our implementation, we employ the standard tech- way from the beginning, but this would mean having to nique of using a graph representation for the constraint set mix the constraint-generation and constraint-solving phase, and apply a worklist algorithm on the graph to compute the which would obscure the presentation. least solution of the original constraints [2, 14, 32]. Package Lines Prims Bps Dynamic Iters Analysis Bps Time (sec) mail-utils.el 355 51 63 0 4159 0.96 rfc822.el 378 48 56 1 89428 81.84 add-log.el 718 74 67 1 22284 8.32 pop3.el 839 67 169 5 93640 130.49 footnote.el 975 47 153 0 115930 73.86

Figure 4: Analyzed Emacs Lisp packages, their size in lines of code after expansion, the number of additionally used primitives, the number of birthplaces, the number of birthplaces recognized as dynamic binding, the number of iterations the worklist algorithm used, and the analysis time.

The worklist algorithm always terminates. Every program describes the constraint generation for the built-in primitive induces only a finite set of abstract values (Abs ) and there file-exists-p that checks for the existence of a file with a is only a finite number of potential nodes since there∗ is only given name. a finite number of program points, variables, and birthplace Currently, the system emits an annotated version of the environments. Hence, the analysis propagates a finite num- input program, marking those bindings which would have ber of data objects over a finite number of nodes. The pro- to stay dynamic under lexical binding. The binding-type cess ends after a finite number of steps: at the latest when condition which we use to decide to which type a variable every datum has arrived at every node. reference belongs, is the following: The algorithm has exponential worst-case complexity with respect to the size of the analyzed program: the number Binding-type condition A variable is used dynamically of all possible birthplace environments is already exponen- iff the abstract cache registers some abstract object for the tial. However, the next section shows that our prototype variable under its label for a different birthplace than the implementation is already practical for medium-sized real- one, that is iff world examples. Also, since the translation of Emacs Lisp l0 x with static birthplace l1 is dynamic programs into a new substrate ideally happens only once, iff speed is not quite as important as, say, in compilers which bpe.bpe(x) = l1 Cˆ(l0, bpe) = . run often. Instead, precision is at a premium. ∃ 6 ∧ 6 ∅ Further conditions exist for our implementation to recog- 8. MEETING THE REAL WORLD nize Emacs Lisp lets used in the flavor of statically scoped Our prototype implementation of the algorithm is in about let*’s or letrec’s in Scheme. 5500 lines of Scheme code and runs atop [15], a The results in this section were obtained by running the byte-code implementation of Scheme. It handles a large sub- implementation under the Scheme system Scheme 48 0.53 on set of Emacs Lisp programs. Specifically, it correctly deals an Athlon 1 GHz system with 256 kByte second-level cache with a number of aspects of the language not treated in this and 256 MByte of physical memory. We did not put any paper including the following: effort to highly optimize or to compile our implementation to native code; feasibility was our main concern. multi-parameter functions and optional arguments, Figure 4 lists the packages used for the experimental re- • sults. They are all part of the regular XEmacs distribu- catch and throw, tion. Mail-utils contains utility functions used both by • the other packages rmail and rnews. Rfc822 implements funcall and the duality between functions and their a parser for the standard internet messages. Pop3 provides • names, and POP3 functionality for email clients. Add-log lets a pro- grammer manage files of changes for programs. Footnote separation of function and value components of bind- offers the functionality to add footnotes to XEmacs buffers. • ings. Figure 4 shows that the analysis is highly accurate: it only leaves behind a small number of dynamic binding constructs. In this section, we present the results of the analysis run on various packages taken directly from the XEmacs package collection. To receive accurate information from real pack- 9. LIMITATIONS ages, the implementation must know the flow behavior of a While the analysis described here solves some of the hard- substantial number of used XEmacs’s primitives. est problems associated with translating Emacs Lisp pro- The implementation contains a small macro language to grams to readable Scheme programs, a few remain: describe the flow behavior of basic primitives for which no implementation in Emacs Lisp exists. Using those macros eval Emacs Lisp has an eval function which interprets a simplifies the description of the primitives tremendously. piece of data as an Emacs Lisp expression. Its seman- For instance, the three lines tics is naturally quite undefined in a Scheme environ- ment. Except for simple cases (for example, where (primitive-flow (FILENAME) the expression to be evaluated is a symbol), there is ((const) (union (symbol t) no idiomatic translation for eval forms. Programmers (symbol nil)))) must transform the Emacs Lisp code not to use eval before attempting translation. Dynamic generation of [22]. Their work provides the theoretical framework for the symbols as well as some introspection capabilities of specification of our analysis. the language also belong in this category. 10.3 Subject reduction buffer-local variables Emacs Lisp features buffer-local The notion of correctness we used is generally called a sub- variables which implicitly change value according to ject reduction result. Curry and Feys [5] introduced subject the current buffer. This an unfortunate conflation of reduction to show the correctness of predicates in the lan- the language semantics and the application domain, guages of combinatory logic. Mitchell and Plotkin [19] used and often yields to unexpected and hard-to-track be- the idea to show a type correctness result for a λ-calculus havior of Emacs Lisp code. However, buffer local- like language, whereas Wright and Felleisen [30, 31] adapted ity is usually a global property of variables—programs it to the more flexible concept of operational semantics with rarely use the same variable both buffer-locally and reduction rules and evaluation contexts. Wright and Jagan- buffer-globally. Hence, a feasible approach is to trans- nathan [32] used the same technique for their polymorphic late buffer-local variables into special designated data splitting flow analysis. structures and access them via special constructs rather than preserving their implicit nature. No special anal- 10.4 Emacs Lisp and Scheme ysis is required as long as the calls to make-variable- buffer-local are close to their variable declarations. A number of other projects have built or are currently building Scheme-based variants of Emacs. The oldest is Note that these kinds of problems are inherent in almost any Matt Birkholz’s Emacs Lisp interpreter which allows run- translation from one programming language to another, if ning Emacs Lisp programs on top of MIT Scheme’s Edwin maintainability is to be preserved. editor. Current efforts include Ken Raeburn’s work on creat- ing a Guile-based Emacs [25], the Guile Emacs project [10] 10. RELATED WORK as well as Per Bothner’s JEmacs [3, 4] which aims at re- implementing Emacs atop Java bytecodes, leveraging Both- 10.1 Dynamic Binding ner’s Kawa compiler for Scheme. As far as Emacs Lisp is Despite the fact that languages with dynamic variable concerned, only JEmacs seems to have seen significant work binding have existed for a long time, formulations of se- as far as making Emacs Lisp programs run. None of these mantics for these languages are quite rare. On the other projects address permanently translating Emacs Lisp code hand, it is folklore that dynamic binding can be eliminated to Scheme while retaining maintainability. by a dynamic-environment passing translation that makes the dynamic bindings explicit [24]. 11. CONCLUSION AND FUTURE WORK Gordon [9] initially formalized dynamic binding in the We have specified, proved correct and implemented a flow context of early Lisp dialects and studied their metacircular analysis for Emacs Lisp whose distinguishing feature is its interpreters, using denotational semantics. correct handling of dynamic binding. The primary purpose Moreau [20] rounded up the view on dynamic binding by of the analysis is to aid translation of Emacs Lisp programs introducing a syntactic theory of dynamic binding with a into more modern language substrates with lexical scoping calculus allowing equational reasoning. From this theory, since most binding in real Emacs Lisp programs behaves he also derived a small-step semantics using evaluation con- identically under lexical and dynamic scoping. Our analysis texts and syntactic rewriting as developed by Felleisen and is highly accurate in practice. Our prototype implementa- Friedman [6]. Wright and Felleisen [30, 31] and Harper and tion is reasonably efficient. Stone [11] formulated semantics for exception mechanisms We have two main directions for future research: which also employ a kind of dynamic binding. Lewis at al. [17] introduce a language feature called im- Improving the efficiency of the analysis by ordinary plicit parameters that provides dynamically scoped variables • optimization, compilation code and modularization of for languages with Hindley-Milner type systems, and formal- the constraints [8], and ize it with an axiomatic semantics. However, functions with implicit parameters are not first-class values in their setting. integration of the analysis into a translation suite from • 10.2 Flow Analysis Emacs Lisp to Scheme. Most realistic implementations of flow analysis for func- tional programming languages are simple monovariant (or Acknowledgments. We would like to thank the initial mem- 0-CFA) flow analyses [12, 7, 8, 26], that is, the analysis bers of the el2scm project: Martin Gasbichler, Johannes looks at every program point independent of context. Hirche, and Peter Biber. Specifically, Peter Biber devel- Shivers [27] proposed the splitting of the analysis at a oped a precursor to the analysis presented here, demonstrat- function call sites depending on the context of the last recent ing the feasibility of the project. Peter Thiemann provided k procedure calls (called k-CFA). Other splittings, also de- valuable suggestions for the paper. We also thank the ICFP pending on procedure calls, were proposed by Jagannathan referees for valuable comments. and Weeks [14] as poly-k-CFA and by Wright and Jagan- nathan [32] as polymorphic splitting. The concept of coinduction arose from Milner and Tofte’s 12. REFERENCES works [18] on semantics and type systems of an extended λ- [1] A. Aiken. Set constraints: Results, applications and calculus with references. Nielson and Nielson were the first future directions. Lecture Notes in Computer Science, to use coinduction as a means for specifying a static analysis 874:326–335, 1994. [2] A. Aiken and E. Wimmers. Type inclusion constraints [18] R. Milner and M. Tofte. Co-induction in relational and type inference. In Proceedings of the FPCA 1993, semantics. Theoretical Computer Science, 87:209–220, pages 31–41, 1993. 1991. [3] P. Bothner. JEmacs-the java/scheme-based emacs. In [19] J. C. Mitchell and G. D. Plotkin. Abstact types have Proceedings of the FREENIX Track: 2000 USENIX existantial type. In ACM Transcations on Programmin Annual Technical Conference (FREENIX-00), pages Languages and Systems, volume 10, pages 470–502, 271–278, Berkeley, CA, June 18–23 2000. USENIX July 1988. Ass. [20] L. Moreau. A Syntactic Theory of Dynamic Binding. [4] P. Bothner. JEmacs—the Java/Scheme-based Emacs Higher-Order and Symbolic Computation, text editor. http://jemacs.sourceforge.net/, Feb. 11(3):233–279, Dec. 1998. 2001. [21] M. Neubauer. Dynamic scope analysis for Emacs Lisp. [5] H. B. Curry and R. Feys. Combinatory Logic, Master’s thesis, Eberhard-Karls-Universit¨at Tubingen,¨ volume I. North-Holland, Amsterdam, 1958. Dec. 2000. http://www.informatik.uni-freiburg. [6] M. Felleisen and D. P. Friedman. Control operators, de/~neubauer/diplom.ps.gz. the SECD-machine, and the λ-calculus. In M. Wirsing, [22] F. Nielson and H. R. Nielson. Infinitary control flow editor, Formal Description of Programming Concepts analysis: a collecting semantics for closure analysis. In III, pages 193–217. North-Holland, 1986. Proc. POPL’97, pages 332–345. ACM Press, 1997. [7] C. Flanagan and M. Felleisen. Set-based analysis for [23] G. D. Plotkin. A structural approach to operational full scheme and its use in soft-typing. Technical semantics. Technical Report DAIMI FN-19, Computer Report TR95-254, Rice University, Oct., 1995. Science Department, Aarhus University, Aarhus, [8] C. Flanagan and M. Felleisen. Componential set-based Denmark, Sept. 1981. analysis. ACM Transactions on Programming [24] C. Queinnec. Lisp in Small Pieces. Cambridge Languages and Systems, 21(2):370–416, Mar. 1999. University Press, 1996. [9] M. J. C. Gordon. Programming Language Theory and [25] K. Raeburn. Guile-based Emacs. its Implementation. Prentice-Hall, 1988. http://www.mit.edu/~raeburn/guilemacs/, July [10] Guile Emacs. http://gemacs.sourceforge.net/, July 1999. 2000. [26] M. Serrano and M. Feeley. Storage use analysis and its [11] R. Harper and C. Stone. An interpretation of applications. In Proceedings of the 1fst International Standard ML in type theory. Technical Report Conference on Functional Programming, page 12, CMU-CS-97-147, Carnegie Mellon University, Philadelphia, June 1996. Pittsburgh, PA, June 1997. (Also published as Fox [27] O. Shivers. Control-Flow Analysis of Higher-Order Memorandum CMU-CS-FOX-97-01.). Languages. PhD thesis, Carnegie-Mellon University, [12] N. Heintze. Set-based analysis of ML programs. In May 1991. ACM Conference on Lisp and Functional [28] R. Stallman. GNU extension language plans. Usenet Programming, pages 306–317, 1994. article, Oct. 1994. [13] R. Hindley and J. Seldin. Introduction to Combinators [29] B. Wing. XEmacs Lisp Reference Manual. and λ-Calculus, volume 1 of London Mathematical ftp://ftp.xemacs.org/pub/xemacs/docs/a4/ Society Student Texts. Cambridge University Press, lispref-a4.pdf.gz, May 1999. Version 3.4. 1986. [30] A. K. Wright and M. Felleisen. A syntactic approach [14] S. Jagannathan and S. Weeks. A Unified Treatment of to type soundness. Technical Report 91-160, Rice Flow Analysis in Higher-Order Languages. In POPL, University, Apr. 1991. Final version in Information 1995. and Computation 115 (1), 1994, 38–94. [15] R. A. Kelsey and J. A. Rees. A tractable Scheme [31] A. K. Wright and M. Felleisen. A syntactic approach implementation. Lisp and Symbolic Computation, to type soundness. Information and Computation, 7(4):315–335, 1995. 115(1):38–94, 1994. Preliminary version in Rice TR [16] B. Lewis, D. LaLiberte, R. Stallman, and the GNU 91-160. Manual Group. GNU Emacs Lisp reference manual. [32] A. K. Wright and S. Jagannathan. Polymorphic http://www.gnu.org/manual/elisp-manual-20-2.5/ splitting: an effective polyvariant flow analysis. ACM elisp.html, 1785. Transactions on Programming Languages and [17] J. Lewis, M. Shields, E. Meijer, and J. Launchbury. Systems, 20(1):166–207, Jan. 1998. Implicit parameters: Dynamic scoping with static types. In Proceedings of the 27th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Boston, Massachusetts, pages 108–118, Jan 2000.