Generalized Escape Analysis and Lightweight

Abstract Might and Shivers have published two distinct analyses for rea- ∆CFA and ΓCFA are two analysis techniques for higher-order soning about programs written in higher-order languages that pro- functional languages (Might and Shivers 2006b,a). The former uses vide first-class continuations, ∆CFA and ΓCFA (Might and Shivers an enrichment of Harrison’s procedure strings (Harrison 1989) to 2006a,b). ∆CFA is an abstract interpretation based on a variation reason about control-, environment- and data-flow in a program of procedure strings as developed by Sharir and Pnueli (Sharir and represented in CPS form; the latter tracks the degree of approxi- Pnueli 1981), and, most proximately, Harrison (Harrison 1989). mation inflicted by a flow analysis on environment structure, per- Might and Shivers have applied ∆CFA to problems that require mitting the analysis to exploit extra precision implicit in the flow reasoning about the environment structure relating two code points lattice, which improves not only the precision of the analysis, but in a program. ΓCFA is a general tool for “sharpening” the precision often its run-time as well. of an abstract interpretation by tracking the amount of abstraction In this paper, we integrate these two mechanisms, showing how the analysis has introduced for the specific abstract interpretation ΓCFA’s abstract counting and collection yields extra precision in being performed. This permits the analysis to opportunistically ex- ∆CFA’s frame-string model, exploiting the group-theoretic struc- ploit cases where the abstraction has introduced no true approxi- ture of frame strings. mation. We apply the improved ∆CFA to escape analysis (discovering Our message in this paper is that these two techniques are syn- when procedures can have their environment records allocated on ergistic: the extra precision provided by ΓCFA is exactly what is the stack), and lightweight creation (when continu- needed for us to perform extra “cancellations” in a critical approx- ations created with call/cc can be implemented with a simple imation step of ∆CFA. This specifically improves our ability to stack pointer, eliminating the need to copy the stack out into heap reason about a program’s stack behaviour, and consequently, can storage). While Might and Shivers’s description of ∆CFA applied be applied profitably to code improvements based on escape anal- the analysis to general environment problems in functional lan- ysis. (We should note that while procedure-string abstractions are guages, here we focus on reasoning about stack behaviour, which especially well suited to discovering properties about the run-time is, in fact, the strength of the model, thus returning to the origi- stack, Might and Shivers’ development of ∆CFA did not focus on nal application for which Harrison designed its “procedure-string” this area of analysis.) predecessor. Additionally, the precision improvements are also specifically The paper includes a complete, standalone reformulation of good for situations that arise in reasoning about and fusing net- ∆CFA; a new, factored abstraction for frame-strings; a generaliza- works of coroutines. tion to alternate stack models; a discussion of frame-string-based This work contributes: escape analysis; a review of our implementation and results; and a • A framework for an escape analysis that generalizes to more description of lightweight continuation conversion. exotic stack and control mechanisms, such as coroutines and cooperative multithreading. • A novel abstraction for frame strings that is compatible with 1. Introduction the ΓCFA technology suite—abstract counting and abstract garbage collection (Might and Shivers 2006b). In particular, An entire class of analyses is devoted to reasoning about the life- abstract counting allows this new abstraction to recover enough times of dynamically allocated objects. If, for example, an analysis of the group-theoretic properties of frame strings in the abstract can derive that some object is never accessed after its allocating to handle coroutines. procedure returns, then this object can be allocated on the run-time • stack, rather than in the garbage-collected heap. In higher-order An optimization, lightweight continuation conversion, designed functional languages, such as Scheme, SML and Haskell, these ob- to reduce continuation objects from entire stack copies to just a jects can include procedures and continuations created with primi- single pointer. tives such as Scheme’s call/cc operator. Higher-order languages • A method for verifying the safety of constrained stack-plus- (by which we mean object-oriented languages as well as functional control-flow manipulation operators such as C’s setjmp and ones) complicate matters, as program execution can proceed “into” longjmp. these dynamically (and possibly stack-) allocated objects. When we add general continuations to the set of possibilities, things be- come even more complex, as continuations themselves represent 2. Partitioned CPS entire stack contexts. An object can be allocated, then returned past In the interests of a self-contained presentation, we now recap the its creator’s stack frame and later invoked, yet still be invoked in basic development of ∆CFA before introducing our ΓCFA-based a context “below” its creator’s frame, if its creation context hap- improvements. ∆CFA (Might and Shivers 2006a), operates over a pened to be captured by an escaping continuation that is later used syntactically partitioned continuation-passing style (CPS) (Steele to restore this context. Jr. 1978) input language. Partitioned CPS is intended for use as One reason continuations are an important construct to anal- an intermediate form generated from programs written in a direct- yse is that they can be used to implement coroutines; reasoning style λ-calculus language, with user-level access to full, first-class about continuations enables us to implement these coroutines in continuations, such as Scheme or SML/NJ. a lightweight, fusible manner. Fusible coroutines are a linguis- By partitioned, we mean that all the forms (variables, call ar- tic facility that permits the construction of modular but high- guments, calls and λ expressions) are statically marked as belong- performance stream-processing applications (Shivers and Might ing to either the user world, or the continuation world. The term 2006). user world emphasizes the fact that continuation forms cannot be

(rcs 1.23 (2007 7 14)) (define fact pr ∈ PR ::= (λ (halt) call) (λ (n k) v ∈ VAR = UVAR + CVAR (%if-zero n u ∈ UVAR = a set of identifiers (λ () [k 1]) k ∈ CVAR = a set of identifiers (λ () (- n 1 (λ (m) lam ∈ LAM = ULAM + CLAM (fact m (λ (a) (* a n k))))))))) ulam ∈ ULAM ::= (λℓ (u k) call) clam ∈ CLAM ::= (λγ (u) call) Figure 2. Recursive factorial written in CPS. Continuation λ terms are marked with an underline; continuation calls are marked with f , x ∈ EXP = UEXP + CEXP square brackets. The two-way conditional is encoded by the primi- h, e ∈ UEXP = UVAR + ULAM tive procedure %if-zero, which takes an integer and two continu- q ∈ CEXP = CVAR + CLAM ations. call ∈ CALL = UCALL + CCALL ucall ∈ UCALL ::= (h eq)ℓ ccall ∈ CCALL ::= (q e) γ the definition of square contains within it a multiplication step). ψ, κ ∈ LAB = ULAB + CLAB Notice how the call/return entries properly nest like brackets. ℓ ∈ ULAB = a set of labels We can view this trace, or procedure string, as a sequence γ ∈ CLAB = a set of labels of control (call/return) operations, but we can also view it as a sequence of stack (push/pop) operations. However, in functional Figure 1. Partitioned CPS languages, this direct correspondence between control and stack operations breaks down somewhat. For example, we write loops in functional languages with tail-recursive function calls. A functional programmer would think of a fifty-iteration loop as consisting expressed directly by the programmer (the user) in the original, of fifty calls followed by a single return. Similarly, exceptions, direct-style source. In the translation from direct-style code to CPS, coroutines and general continuation invocation all depart from the each λ expression from the source maps to a “user” λ expression, simple model, where calls and returns nest in a simple way. while return points or evaluation context in the direct-style form are However, no matter what the call/return behaviour is, it is still mapped to “continuation” λ expressions. true that the associated stack operations nest properly. That is, if we The point of partitioning is two-fold: push frame a, then push frame b, the two frames will necessarily be • It permits us to exploit a cheaply obtained partition amongst the popped in the order “b, then a.” Might and Shivers pointed out that abstract-semantic elements of our analysis; we get a more precise model of program behaviour in the presence • of these control constructs if we take models based on procedure yet we can otherwise operate in a general CPS setting, where strings and change to abstractions whose nesting and cancellation we can express and reason about all control and environment properties are driven by analogues to stack behaviour (Might and structure in the program via the uniform, universal mechanism Shivers 2006a). of a CPS procedure call. This takes us from the classic, “FORTRAN-like” view of proce- Looking ahead, this partitioning preserves enough information to dure call to the view articulated in Steele’s Rabbit thesis (Steele Jr. allow us to recover stack operations, which is the focus of our 1978). Steele’s protocol for function call is the basic mechanism analysis. that manages a stack properly in the functional-language setting Figure 1 gives the partitioned CPS grammar. Every syntactic we’ve described. When we shift to a partitioned CPS representa- category—VAR, LAM , EXP and CALL—is composed of a user- tion, this protocol is directly tied to the syntax. world set and a continuation-world set. We mark user-world λ ex- pressions and calls with labels ℓ ∈ ULAB, while continuation- world items are marked with distinct labels γ ∈ CLAB. Note 4. CPS and stack models the distinctive mark of a CPS grammar: procedure calls cannot Thus we can develop a stack model in our more general, functional- be nested, as call arguments may only be variable references or language setting by considering how based on parti- λ terms. This reflects the semantic constraint that procedure calls tioned CPS representations (Steele Jr. 1978; Kranz et al. 1986) map in CPS do not return; they are a one-way control transfer. Thus the program structure into stack operations. The function-call protocol q continuation argument(s) passed at user calls package up the en- employed by Might and Shivers for ∆CFA is a simple variation of tire computational context of the call. To simplify the analysis and Steele’s protocol; we can describe it with the aid of some example improve precision, we also require the program to be alphatised, code. that is, no two bound variables have the same name. More exhaus- Consider the definition of a recursive factorial procedure shown tive treatments of the partitioned CPS representation are available in Figure 2. The point to keep in mind when studying this code elsewhere (Kranz et al. 1986; Might and Shivers 2006a; Steele Jr. is that the stack management associated with executing it is tied 1978). directly to the management of its continuation terms and values. Evaluating a continuation λ term to produce a continuation value closes the term over the current stack frame, which we can manage 3. Procedure strings and stack models simply by adding to the stack a pointer to the machine code pro- A procedure string, as used by Sharir and Pnueli (Sharir and Pnueli duced for the λ term. Thus continuations represent return points: a 1981), or Harrison (Harrison 1989), is the sequence of call and re- stack frame and an associated return pc. turn actions performed during some segment of computation. E.g., Assume that the arguments passed to function calls, and the re- were we to trace the sequence of actions involved in the execu- turn values produced by these calls are passed on the stack. Then tion of (- (square 3) 5), it might produce the sequence “call procedure call consists of (1) evaluating the arguments, (2) pushing square, call *, return *, return square, call -, return -” (assuming them on the stack, and (3) jumping to the called procedure. Con-

2 (rcs 1.23 (2007 7 14)) sider what must be done, then, to carry out the factorial procedure’s p, q ∈ F = Φ∗ (Frame string) recursive call: ψ φ ∈ Φ ::= ht | (push) (fact m (λ (a) (* a n k))) ψ | |t i (pop) The continuation for this call is a λ term, which represents the ψ ∈ Ψ = λ term labels work that must be done when the factorial of m has been computed: t, i ∈ Time = an infinite set of times multiply the result by n and return that result to our continuation k. Evaluating this λ term as part of step 1 produces a closure Figure 3. Frame strings that points to the current frame; from this frame, we can access the current values of n and k (that is, the free variables of the continuation), along with the call’s return pc, which is a pointer to of some segment of a computation. More precisely, a frame string the continuation’s machine code. (In this way, these saved values is a sequence of characters, with each character representing a will be available when the recursive call “returns” by invoking this frame operation (Figure 3). Each character captures three items of continuation.) Since we are passing this new continuation as the information about the operation it represents: (1) the label ψ of the second argument of the call, it is a “live” value, and thus the stack λ term attached to that frame; (2) the time t of the frame’s creation; · frame that incarnates it is live and may not be reclaimed: this isa and (3) the action taken, either a push represented as a “bra” h·| or · l3 “true” recursive call. a pop represented as a “ket” |·i. Thus, the character h87| represents In contrast, consider the call (* a n k). The continuation ar- the act of allocating a stack frame when execution enters λ term l3 l3 gument of this call is not a λ term, but rather a variable reference, at time 87, while |87i represents popping that frame at some later k. Since continuations are represented as stack frames, the value of time. k is a stack pointer; we simply pass it along. Before performing the Keep in mind that, as we have discussed, in a sophisticated call, we can also pop the stack back to k—which will clear our own functional-language implementation, the correspondence between frame off the stack. This, then, is a tail call. So, if the continuation frame pushes and pops, and procedure calls and returns, is not arguments to a user call are variables, that call is a “tail call” and one-to-one. A frame pop can occur because a procedure is return- does not push a frame. (Or, rather, in this model the tail call pops ing, but might also occur early, during a tail call. Invoking a full the current frame immediately before pushing a fresh one for the continuation captured with the Scheme call/cc operator can re- procedure being entered, for no net stack growth.) If the call takes push frames after they have previously been popped; performing a multiple continuation arguments, for conditional control operators coroutine switch can likewise pop a large number of frames off the such as %if-zero, then we pop the stack back to the highest of the stack, followed by a large number of pushes, as the computation continuations being passed. switches to a distinct stack context. How, then, are continuations invoked? Consider the only such explicit call in our example, [k 1]. This is simply invoking a 5.1 Operations on frame strings closure, given that the closure’s environment record is a stack The operations Harrison developed for procedure strings can be frame: we load k into the top-of-stack register, and jump to the adapted, mutatis mutandis, to frame strings (Might and Shivers return pc stored in that frame. As this code executes, it can access 2006a). It is here that we begin to introduce extensions to previous free variables by loading their values from the frame. (For example, work with frame strings. this is how (λ (a) (* a n k)) accesses its needed n value.) The operator + is the string-concatenation operator. The oper- Executing our factorial code invokes continuations at sites other ator ⌊·⌋ cancels out opposing, adjacent frame-action pairings until than the [k 1] call point. These invocations will happen “inside” ψ no more cancellations can be made. That is, if ht | occurs to the the primitive *, - and %if-zero functions. For example, after ψ passing * two integers (a and n) and a continuation (k), it will immediate left or right of |t i in a frame string, we may delete the pair; when no further annihilations in p are possible, the remainder multiply the integers together and then jump to the continuation, a a b b passing it the resulting product. is ⌊p⌋, e.g.⌊h1 |1 ih2|⌋ = h2|. This is known as taking the net of a 1 All of this gives us the following model for stack management: frame string. ∗ • It follows that the set Φ modulo the net operation ⌊·⌋ is a group. A frame is pushed when assembling arguments to be passed This fact turned out to be critical for proving the correctness of to a procedure. (This includes when we pass arguments to environment analysis. These group-theoretic properties of the set continuations, which is the CPS version of returning from a Φ∗ are also key in creating our novel abstractions of frame strings. call.) A key distinction between this work and Might and Shivers’ work • Evaluating a continuation λ term produces a pointer to the on ∆CFA is that our upcoming abstraction recovers more precisely current frame that keeps that frame live across the ensuing call. the group-theoretic abstract concatenation operation. −1 • The stack is reset to a previously-saved pointer when perform- The operation p reverses frame string p, flips each frame ing a user call whose continuations are variables. action to its anti-frame action, and then nets it. This is the inverse of p modulo ⌊·⌋. • The stack is reset to a previously-saved pointer when invoking To connect these operators back to our stack-management a continuation. model, if we have a frame string p that describes the trace of a Again, this model is a variant of Steele’s function-call protocol: program execution up to some point in time, then ⌊p⌋ gives us a Steele’s protocol passed arguments in a separate register set, instead picture of the stack at that time. Alternately, if p represents some of on the stack, which shifts frame allocation to the continuation- contiguous segment of a program’s trace, then ⌊p⌋ yields a sum- evaluation step. mary of the stack change that occurred during the execution of that segment. We will, in fact, make more frequent use of this second 5. Frame strings interpretation, which connects two points in a program’s execution, We can shift to this stack-oriented view of an execution trace with 1 You may be wondering how a push action could possibly wind up on Might and Shivers’ frame strings. A frame string is a record of the the right of its matching pop action. The answer involves the use of full stack-frame allocation and deallocation operations over the course continuations.

(rcs 1.23 (2007 7 14)) 3 tr S (ǫ)= ǫ ς ∈ State = Eval + Apply ψ ψ tr S (ht | + p)= ht | + tr S(p) if ψ ∈ S Eval = CALL × BEnv × Heap ψ ψ ∗ ∗ tr S (|t i + p)= |t i + tr S(p) if ψ ∈ S Apply = Proc × D × D × Heap ψ tr S (ht | + p)= tr S (p) if ψ 6∈ S h ∈ Heap = VEnv × Log × Time ψ tr S (|t i + p)= tr S (p) if ψ 6∈ S β ∈ BEnv = VAR ⇀ Time b ∈ Bind = VAR × Time tr T (ǫ)= ǫ ve ∈ VEnv = Bind ⇀ D ψ ψ tr T (ht | + p)= ht | + tr T (p) if t ∈ T proc ∈ Proc = Clo + {halt} ψ ψ tr T (|t i + p)= |t i + tr T (p) if t ∈ T clo ∈ Clo = LAM × BEnv × Time ψ tr T (ht | + p)= tr T (p) if t 6∈ T d, c ∈ D = Proc ψ tr T (|t i + p)= tr T (p) if t 6∈ T δ ∈ Log = Time ⇀ F

dir ∆(p)= {re ∈ ∆ : p ∈ L(re)} Figure 5. Concrete semantics domains Figure 4. Analytic tools for frame strings time t and now; the net of this string tells us the net effect of the than we will of the first one. If frame string p describes some se- intervening computation on the stack. As we’ll see later, this focus quence of actions on the stack, then p−1 produces the frame string on change will be key to exploiting the non-standard semantics for that will “undo” these actions, restoring the stack to its state at the optimisation-driven analyses that focus on the relationship between point in the computation corresponding to the beginning of p. This two points in a computation. is just what we will need to handle general continuations (as well The basic semantic domains for the language are given in Fig- as the more prosaic task of handling simple returns). ure 5. A machine configuration is either an “eval” or an “apply” In Figure 4, we define three tools for selecting, extracting and state. In an Eval state, control is at a call site; it is given by a call expression, an environment context for that expression, and the cur- testing structure from frame strings. The function tr S produces the trace of a frame string with respect to procedure labels in S by rent log and time. We represent environments with the factoring throwing away any frame action whose procedure label is not in taken from Shivers’ CFA work (Shivers 1991): an environment is S. We can likewise take the trace of a frame string with respect to split into a “variable environment,” ve ∈ VEnv, and a “binding environment,” β ∈ BEnv. A binding environment maps a variable a set of creation times T with tr T . The function dir ∆ returns the direction of its argument with respect to a set of regular expressions to a time stamp, the time its binding was made. A variable environ- ∆. That is, it returns the subset of ∆ whose members match the ment records all bindings that have occurred during the execution 2 of the program. Thus it maps a variable and a binding time to its argument supplied to dir ∆. Depending on the analysis or optimization we’re conducting, value for that time. In an Apply state, control is moving into a user there are a number of sets which make sense for ∆. For instance, function or a continuation; it is given by the procedure to apply, a · ∗ · ∗ · ∗ · ∗ vector of user-world arguments, one of continuation arguments, the ∆Ton = {h·| , |·i , |·i h·| } extracts the tonicity of a string, that is: global variable environment, and the current log and time. The set of denotable values, D, is the same as the set of pro- p is push-monotonic if · ∗ dir p h·| ∈ ∆Ton ( ) cedures A member of Proc is a procedure: either a closure or the p is pop-monotonic if · ∗ dir p |·i ∈ ∆Ton ( ) halt continuation. We represent a closure clo with a λ term plus p is pop/push-bitonic if · ∗ · ∗ dir p |·i h·| ∈ ∆Ton ( ) the contour environment β giving the bindings of its free variables, The nesting property of frame strings entails the following: plus a third component: the birth date of the closure, that is, the time Lemma 5.1 (Bitonicity of the Net). The net of any frame-string the λ expression was evaluated, producing the closure. A closure lam lam ULAM change between two points in execution is pop/push-bitonic. ( ,β,t) can represent either a user closure, if ∈ , or a continuation closure, if lam ∈ CLAM . For Time, we assume some ordered, denumerable set, and write t0 for the start time at 6. Concrete frame-string semantics which program execution begins. We advance time with the tick We are now in a position to define a concrete semantics for our CPS function; this function may take additional arguments beyond the language that provides a formal definition of the meaning of frame current time as an aid to the analysis we are trying to capture with strings—that is, we can formally express our stack model in terms our semantics, e.g., tick ∈ Time × State → Time. of a frame-string semantics. Figure 6 contains the auxiliary functions used in our semantics. For the frame-string (FS) semantics, the domains given in Fig- The function A takes an argument and returns its value in some ure 5 are nearly identical to standard environment-based CPS se- context given by ve, β and t: if the expression is a variable, A looks mantics domains. The changes are that closures, Clo, now carry a it up in the current environment; if the argument is a λ expression, timestamp marking their creation time, and that machine configura- A uses it to construct a closure. The function ageδ produces the tions include a frame-string log. The frame-string log δ for a given “life history” of a continuation: it takes the birth-date of the closure, configuration is a function that maps some time from the past to t, and uses it to index the log. The halt continuation is handled by a frame string describing all the actions performed since then. We defining its birth as the beginning of time. The function youngest should call attention to the particular way we’ve defined the log: it takes a vector of continuations, and returns the shortest such “life is relative, not absolute. We could just as easily have defined the log history”—that is, the frame string representing the life-span of the to map a time t to the actions performed by the computation from youngest continuation in the vector. start to t; the net of this string would tell us what the stack looked The function I maps a program into the machine’s initial state. like at time t. Instead, the log tells us what has happened between Final states are apply states where the procedure to be applied is the halt continuation, but that is not important for our non-standard 2 These regular expressions will be matching net frame strings that describe analysis. Instead, we define a collecting semantics with the function the change in the stack between two points in execution; thus the use of ∆. V, which maps a program to the entire set of states through which

4 (rcs 1.23 (2007 7 14)) A β vet lam =(lam,β,t) destroy the orderedness of time, so this tactic is too fragile for our A β vet v = ve(v, β(v)) purposes. Instead, we switch back to space-like criteria. The func- tion youngest equivalently makes its choice by returning the short- ageδ(halt) = δ(t0) est frame string: the frame with the shortest “life story” is clearly ageδ(lam,β,t)= δ(t) the youngest frame. Consider what happens when a non-tail call is performed. A youngest δ hproc1,...i = Shortest {ageδ(proc1),... } non-tail call is one in which a continuation argument q is a λ term (as opposed to a variable reference). In this case, evaluating q with I(pr)=((pr, [],t0), hi, hhalti, [], [t0 7→ ǫ],t0) A will capture the current time t in the (lam,β,t) tuple. Since this ∗ V(pr)= {ς : I(pr) ⇒ ς} newborn value is as young as it is possible to be, the ∇ς frame- string change will be the empty string. So the call will not first pop Figure 6. Auxiliary definitions for FS semantics the current frame off the stack, as a tail-call would. In contrast, a tail call is one where all the q are variable refer- ences. Evaluating these variables with A will produce older con- ′ tinuations that were born at previous times. This will cause the its execution evolves; we write ς ⇒ ς to say that state ς steps to c −1 ′ (youngest δ ) expression to produce a frame string whose op- state ς under the machine’s small-step transition relation ⇒. ψ erations will specify some stack adjustment, in the form of | ′ i pop The heart of the semantics is given by the two rules of Figure 7 t characters. Thus we will pop frames off the stack as we perform the defining the transition relation: one axiom each for Eval and Apply call: this is a tail call. machine configurations. The call rule evaluates the elements of Once we’ve computed the stack change needed, we update the the call, and transitions to an apply state, where the procedure log so that any future fetch from it will produce an answer with this will be applied to the argument values. The apply rule binds the new segment of actions appended. variables of the procedure’s λ expression, then transitions to a call The log maintenance for the apply rule is much simpler. When state, where the λ expression’s body will be evaluated in the new ψ a procedure is applied, we push a frame for its arguments: ′ . environment. What’s of interest in this simple, otherwise standard ht | system is the extra machinery to manage the stack, in the form of The net effect of this stack-maintenance machinery is to obey the log. Most of the work happens in the call rule, in the calculation our informally defined protocol for functional languages with of the stack change ∇ς. It is managed just as described in Section 4. proper tail calls and even full continuations. A simple call pushes a The expression f in the procedure position of the call is evaluated to frame; a simple return pops a frame. A tail call first pops a frame, the value proc. If f is a continuation (f ∈ CEXP), then this call then pushes one. Exotic uses of continuations do what it is needed will reset the stack to proc’s stack frame. The function age tells to be consistent with the contract. Once again, it’s worth empha- us everything that has happened to the stack since proc was born sizing that these two rules give us a mechanism that enormously (that is, since its frame was allocated on the stack). Inverting this generalises “function call,” allowing us to handle every form of frame string provides the series of actions that must be performed control that occurs in a program, from basic-block sequencing to on the stack to revert it back to that state. Remember: continuation coroutines. invocation restores stack; this is where the restoration happens. In the standard case of a simple return, all of this machinery amounts 7. Context-sensitive abstract frame strings to a single pop action. But if we were invoking a continuation to Given that our goals for ∆CFA differ from previous uses, our “throw” outwards in an exception-like manner, we might return requirements on abstract frame strings also differ. In this section, over multiple frames, and thus our ∇ς action might consist of we develop a novel, more precise abstraction for frame strings multiple pop actions. More exotic still, if we were invoking a that better preserves their group-theoretic structure. In addition, “downwards” continuation, the action could include push actions to this new abstraction is factored so as to allow full exploitation of restore previously-popped frames. Finally, if the continuation is a ΓCFA’s abstract garbage collection and abstract counting. “let continuation,” that is, if f is a λ expression that we are invoking For lightweight continuations and escape analysis, we require: at its point of appearance, the frame action is the empty string: the continuation will run in the current stack context. 1. F , a set of abstract frame strings; On the other hand, the form f might be a user expression, rather 2. | · | : F → F , an abstraction operation for frame strings; than a continuation. If so, it won’t evaluate to a stack pointer as b a continuation would, and so doesn’t require any action on the 3. ⊗ : F × F → F , an operator for “concatenating” abstract part of our stack-management policy. However, user procedures are frame strings;b and passed continuations as arguments: these are the qj arguments in 4. ·−1, anb abstractb “inverse”b operation. the call form. These expressions evaluate to the continuations cj . If we think of these continuations cj as stack pointers, we want In particular, note that we do not require the specialized abstract to reset the stack back to the outermost such pointer, the high- comparison relation from Might and Shivers’ work anymore. water mark that will preserve all of these continuations. Again, the To pack an infinite set of frame strings into a finite set F , function youngest computes this for us. It is worth considering, for we have to choose where to lose precision. Might and Shivers’ a moment, how this is done, as it relates to our relative (as opposed original formulation of ∆CFA used a direct analog of Harrison’sb to absolute) view of the stack, as well as the relation between our abstract procedure strings, which completely discarded the time time-like and space-like view of the computation. component. That is, Might and Shivers’ abstract frame strings are The mechanism we are using to track the stack is the log δ, a mapping from procedure labels to a set of regular expressions which tells us, for time t, everything that has happened to the stack describing net change for that procedure, i.e., Ψ → P(∆). While since t. Now, given a set of continuations or live stack frames, sound and computationally efficient, this abstraction lost most of the outermost one (a space criterion) must be the youngest one the advantages of switching to a group. Our new abstraction seeks (a time criterion): the stack is a LIFO mechanism. The function to recover as much of the group-theoretic structure as possible in youngest could choose this frame based on its birth-date. How- abstract frame strings, in part, by integrating with a generalization ever, we plan to abstract this semantics, and our abstraction will of ΓCFA’s abstract counting machinery.

(rcs 1.23 (2007 7 14)) 5 length(d)= length(u) length(c)= length(k) ′ ′ ′ ′ ′ [[(f eq)κ]], β, ve,δ,t ⇒ (proc, d, c, ve, δ ,t) (([[(λψ (u k) call)]],β,tb), d, c, ve,δ,t) ⇒ (call, β ,ve , δ ,t ) proc = A β ve t f t′ = tick(t)  ′ ′ ′ di = A β ve t ei β = β[ui 7→ t , kj 7→ t ]  c = A β ve t q  ′ ′ ′ j j where ve = ve[(ui,t ) 7→ di, (kj ,t ) 7→ cj ] where  −1  ψ  (age proc) f ∈ CEXP ∇ς = h ′ |  ∇ς = δ  t  c −1 δ′ δ λt. ς t′ ǫ ((youngest δ ) otherwise =( +( ∇ ))[ 7→ ] ′   δ = δ +(λt.∇ς)      Figure 7. The transition relation ς ⇒ ς′

We abstract a frame string p to a function mapping a procedure’s ς ∈ State[ = [Eval + Apply\ label ψ and an abstract time t to a description of the net stack [ \ [ [ motion in the string p for just that procedure’s frames created at Eval = CALL × BEnv × Heap \ [ ∗ ∗ [ times that map to abstract timeb t. Thus our set of abstract frame b Apply = Proc × D × D × Heap strings is h ∈ Heap[ = VEnv\ × Log × Count\ × Time[ [b b b F =Ψ → Time →P(∆), β ∈ BEnv[ = VAR → Time[ b [ d where the set ∆ is a set of regular expressions describing the net b ∈ Bind = VAR × Time[ b b motion for a given procedure; here, we use ve ∈ VEnv\ = Bind[ → D · · · · + · · + · + · + b ∆= {ǫ, h·|, |·i, h·|h·| , |·i|·i , |·i h·| }. c, d ∈ D = P(Val) valc ∈ Val = [Proc b Note that there is no regular expression in the set ∆ for the b [b d pattern h·|+|·i+, or any other exotic combination for that matter. By procb ∈ Proc = Clo + {halt} · · c d Lemma 5.1, any frame string generated by the concrete semantics clo ∈ Clo = LAM × BEnv[ × Time[ is covered by the pattern set , even if we allow for full user d d ∆ dφ ∈ Frame\ = Ψ × Time[ continuations. c d [ It might seem that allowing an abstract string to return sets of δ ∈ Log = Time → F b regular expressions is unnecessary, as the abstract string |p| for any µ ∈ Count\ = (Bind[ + Frame\ ) → {0, 1, ∞} concrete frame string will always match only one member of for b d b ∆ t ∈ Time[ = a finite set of abstract times each procedure. However, we require sets when concatenating two abstract frame strings, which degrades precision. b We define our abstraction operator with b Figure 8. ∆CFA domains |p| = λψ.λt.dir (tr (tr ⌊p⌋)). ∆ {ψ} t The result of ∆CFA for a program pr is the set of visited states, For brevity, we use the notation ǫ to mean ǫb λψ.λt. ǫ ; and V(pr): b | | = ( { }) ψ · ∗ we use the notation ht | in place of ǫ[ψ 7→ (λ .{ǫ})[t 7→ {h·|}]]. V(pr)= ς : I(pr) ≈> ς . −1 b We define · to be the mostb precise operator satisfyingb [ b Figure 8 defines the finite state-spacen State ando its component do- b−1 −1 b b b |p|⊑ p =⇒ |p |⊑ (p), mains. As in most non-standardb abstract semantics,b these domains closely mimic their concrete counterparts. The notable departures which is: are Time[ , which is now a finite set, and the D, which is now b · b· ǫ 7→ ǫ, h·| ↔ |·i, the power set of abstract procedures. To aid presentation, we use −1 · · + · · + p = λψ.λt.map h·|h·| ↔ |·i|·i , (p(ψ)(t)). the symbol d to denote user-world values andb the symbol c for  · + · + · + · +  |·i h·| ↔ |·i h·| continuation-world values. These domains also include a CFA-style (Might and Shivers b b   b b b Γ b Unlike Might and Shivers’ or Harrison’s abstract concatenation 2006b) counter component, µ, which maps an abstract resource to operator, our abstract concatenation operator ⊗µ : F × F → F is the number of concrete resources it represents (zero, one or more parameterized by an abstract-to-concrete cardinality counter µ. We than one). This information allows us to prove concrete objects b b delay the definition of abstract concatenation until web haveb definedb equal to one another by knowing only their abstract counterparts: this counter. b suppose two sets A and B are equal; if each set has size one, then we also know that for any values a ∈ A and b ∈ B, that a = b ∆ holds. 8. Extended CFA The program-to-state injector I abstracts to As an abstract interpretation over a flat state-space, there are three components to the analysis at the top level: I(pr)=((pr, [], t0), hi, h{halt}i, [], [t0 7→ |ǫ|], (λ .0), t0). Figure 9 defines the abstract state-to-state transition relation for 1. State[, a finite set of abstract states. b ∆CFA. The auxiliaryb next-time allocatorbtick : Time[ →bTime[ 2. I : PR → State[, an injection from programs to initial states. can be any function obeying the following soundness constraint: 3. ≈> ⊂ State[ × State[, a transition relation. |t|⊑ t =⇒ |tick(t)|⊑ tickd(t). b b d b 6 (rcs 1.23 (2007 7 14)) The argument evaluator A abstracts directly: This touching function extends naturally to states to yield the root set of a garbage collection: {(f, β, t)} f ∈ LAM A β ve t f = T (call, β, ve, t)= {(v, β(v)) : v ∈ free(call)} ve(f, β(f)) f ∈ VAR.  b b T (proc, d, c, ve, t)= T (proc) ∪ T (ci) ∪ T (di). The functionb b youngestc\b simply joins over all continuation argu- b b c b b c b i i ments: [ [ To leapb overd theb b valuesc bbetweenb dbindings inb theb abstract heapb b (since \ bindings are the resource of interest), we use the binding-to-binding youngest δ hc1,..., cni = age δ(c1) ⊔···⊔ ageδ(cn), relation ;ve : where the functionb age converts an abstractb value intob an abstract b ;ve b iff b ∈ T (ve(b )). frame string capturingb relative,b netd stackb motion sinced thabt value’s c hand apple apple hand creation: From this, we get the set of bindings reachable from a state through d b c b b b b the function R : State[ →P(Bind[): c ageδ {proc1,..., procn} = ageδ∗(proc1) ⊔···⊔ age δ∗(procn) ;∗ ageδ (halt)= δ t0 R(ς)= {b : broot ∈ T (ς) and broot veς b}. b∗ b b b agedδ∗ lamd , β, t =dδ t . d d d d b  In a non-frame-string-based abstract interpretation,c b this is where d b b abstract garbageb b collectionb b normallyb b stops.b For ∆CFA,b however, b   8.1d Counting-basedb b b abstractb concatenation we also want to garbage collect dead frames. The birthdates func- [ [ tion χve : P(Bind) → P(Time) returns the set of abstract birth- Our novel abstract frame-string concatenation operator ⊗µ : F × F F dates reachable from a set of bindings B in the heap ve: → now takes advantage of counting information to recover c more group-theoretic properties. In the concrete, a pop folb lowedb ψ ψ χve (B)= t :(lam, β, t) ∈ ve(b) and b ∈ B . byb its pushb cancels out: ⌊|t i + ht |⌋ = ǫ. However, when time b c information is lost (as in earlier models of abstract frame strings), c n [ \ o ψ ψ The function bF : P(Time) →Pb (Frameb) findsb all ofb the frames it is no longer the case that || i| ⊗ |h || = ǫ, since it’s unknown δ b b c t t reachable in the log δ from a set of birthdates: whether or not the push and the pop represent the same frame. b With counting, the analysis tracks the number of concrete frames to b b F (T )= (ψ, t) : tbirthdate ∈ T and δ(tbirthdate)(ψ)(t) 6= {ǫ} . which an abstract frame corresponds. When an abstract frame has δ b only one concrete counterpart, then a pop followed by a push for n o Inb otherb b words, ab frameb (ψ, t) isb reachableb b if there’s ab birthdate for that frame does cancel out in the abstract, because a count of one which the relative motion may include this frame. guarantees that they’re referring to the same frame. The garbage collection function Γ : State[ → State[ transforms Thus, abstract concatenation changes depending on the count: b a state ς into its garbage-collected counterpart: a ∈ p(ψ)(t) and b p ⊗ q = λψ.λt. a ∈ cat (a , a ) : 1 , Γ (ς =(..., δ, µ, t))=(..., δ|T , µ|S, t), µ µ(ψ,t) 1 2 a ∈ q(ψ)(t) b  2  where: b b b b b b b b where the functionb cat : ∆ × ∆ → P(∆) is the function cat b b b b b b b b b b b ∞ b b b B = R(ς) (reachable bindings) defined in Might and Shivers (Might and Shiversb b 2006a). (Read- ers familiar with Harrison’s work (Harrison 1989) are cautioned T = χve (B) (reachable birthdates) b b b that this operation cat ∞ behaves slightly differently than Harri- Φ= F (T) (reachable frames) son’s cat, as abstract frame strings are an abstraction of a group, b cδ b while Harrison’s abstract procedure strings are an abstraction of a S = B ∪ Φ. b bb b monoid.) The function cat 0 is λ(a1, a2).{ǫ}. Lastly, the function · · As in Might and Shivers’ work on ΓCFA (Might and Shivers cat 1 = cat ∞[(|·i, h·|) 7→ {ǫ}]. b b b 2006b), this yields the collecting abstract transition relation ≈>Γ : The function cat 1’s ability to cancel a pop-then-push pair is b b ′ important when full continuations are used, as this is where such Γ (ς) ≈> ς b ′ behavior arises. In cat ∞, this input would have yielded the disas- ς ≈>Γ ς · + · + trously imprecise ǫ, |·i h·| . b This transition can be used wheneverb b b a collection is deemed ap-  propriate. Usually, this is onlyb whenb a loss of precision would be 8.2 Integrating ΓCFA: Abstract garbage collection imminent otherwise. Left unattended, abstract counts quickly hit ∞. The abstract 8.3 Correctness garbage collection component of ΓCFA intermittently and oppor- tunistically resets counts back to zero, and incidentally, it also Proving the soundness of this enhanced ∆CFA is nearly identical to prevents zombies from polluting flow sets during the analysis. We the proofs of correctness for ∆CFA and ΓCFA (Might and Shivers extend ΓCFA to garbage collect dead frames in addition to dead 2006a,b). The novel parts to the proof require an abstraction map bindings. | · |µ : VEnv × Log → Count\ that produces the least imprecise Step one in this process is to define the notion of touchability, or abstract counter µ: which bindings are touched by a value. The “touched-by” function [ [ |(ve, δ)|µ b = size b ∈ dom(ve) : |b| = b T :(Proc ∪ D) →P(Bind) is: b n |t| = t and o T (lam, β, t)= {(v, β(v)) : v ∈ free(lam)} µ b d b b b |(ve, δ)| (ψ, t)= size (ψ, t) : p ∈ range(δ) and , where   T (halt)= {} (tr {ψb} ◦ tr t)⌊p⌋= 6 ǫ b b b b   b d T {val 1,..., val n} = T (val 1) ∪···∪ T (val n). size(S)= if size(S) ∈ {0, 1} then size(S) else ∞. b   b c c b c b c d (rcs 1.23 (2007 7 14)) 7 ′ ([[(f eq)κ]], β, ve, δ, µ, t) ≈> (proc, d, c, ve, δ , µ, t) proc ∈ A β ve t f di b=cA βb veb bt ei d b b c b b b  cdi = A βb veb ct qbi where  −1 b b bagec b{proc} f ∈ CEXP ∇ς = δ b b byoungestc\b c −1 otherwise ( b δ  ′ δ = δ ⊗µd(λt.∇dς)  b b   b  b length d b= lengthb (u)b blength(c)= length(k) ′ ′ ′ ′ ′ [[(λψ (u k) call )]], β, tb , d, c, ve, δ, µ, t ≈> call, β , ve , δ , µ , t b ′ b  t = tick(t)    ′ b b ′ b ′ b b b b b β = β[ui 7→ t , kj b7→ct ] b c b  ′ ′ ′ bve =dve ⊔b (ui, t ) 7→ di, (kj , t ) 7→ cj where  ∇b ς =bhψ| b b  t    ′ b b ′ b δc= δc⊗µ′ (λt.∇ς) ⊔ t 7→ |ǫ| b ′ b ′ ′ ′ µ b= µ ⊕ (λ .0)[(ui, t ) 7→ 1, (kj , t ) 7→ 1, (ψ, t ) 7→ 1  c    b b b b b   Figureb b 9. The abstractb transition relationb ς ≈> ς′b

b b 9. Applications It’s worth noting that one of the driving factors in generalizing We present escape analysis, lightweight continuations and verify- to context-sensitive frame strings was to boost the precision of ing stack safety as motivating applications for this extended ∆CFA. escape analysis in the presence of coroutines. Might and Shivers’ abstraction for frame strings was simply too weak to perform any escape analysis in the presence of coroutines. That is, their analysis 9.1 Generalized escape analysis is sound, but the results are too imprecise to be useful. (However, Escape analysis consists of determining whether or not an object escape analysis was not their stated goal.) outlives the frame in which it was born. If an object’s last use occurs while the frame in which it was born is still live on the stack, then 9.2 Lightweight continuations the object may be stack-allocated instead of heap-allocated. This Even with clever run-time support, full continuations are an ex- effectively makes the object statically garbage-collected. pensive control construct for stack-based languages. In order to be In object-oriented and , this is espe- safe, an implementation must ensure that it can restore the entire cially important, because it directly addresses a common idiom— program stack from a continuation’s point of creation whenever using objects and closures as carriers of data. For instance, in the the continuation is invoked (which could happen multiple times). following code: Often, this means that the entire program stack is copied into the heap upon continuation creation, and copied back over the stack (map (λ (x) (+ x 1)) totals) upon continuation invocation. (Stackless implementations such as SML/NJ (Appel 1992) go the other direction: they make continua- it is clear (to the programmer) that this λ term (λ (x) (+ x 1)) tions cheap and everything else expensive.) does not outlive its frame of creation. Our enhanced ∆CFA makes Whenever possible, we would like to make continuations this fact and others like it equally clear to the . lightweight. A lightweight continuation is represented simply by a pointer into the run-time stack; such a continuation can be created Detecting non-escaping objects through ∆CFA To determine and invoked very cheaply. Given the rules above, a continuation whether the concrete counterparts to a heap-allocated abstract ob- which does not escape is eligible for a lightweight representation. ˆ ject val can escape, find the set of abstract states Σ that make use Note that ∆CFA can also detect partially lightweight continu- of this object. A use in this case constitutes applying the object as ations. A partially lightweight continuation is one which is some- a function,c or supplying it as a strict argument to a primitive op- times invoked while its frame of creation is still live and sometimes eration. A use does not constitute merely passing this value as an while it is not. Partially lightweight continuations must still save the argument. This distinction becomes important for languages allow- entire stack; however, for contexts of invocation where the frame of ing full use of continuations. It is entirely possible for an object to creation is still live, the implementation can use a stack-pointer re- outlive its frame of creation, only to have a continuation restore that set rather than a stack-copy operation. frame before the object is used once again. (This kind of behavior Once again, this optimization directly addresses the common is commonplace when continuations are used to model coroutines.) idioms involving full continuations: early return, back-tracking In each state ς ∈ Σˆ where the object is used, index into the search and emulation of exceptions. In all three of these idioms, state’s log δς with the object’s birthdate tb to obtain an approxima- the continuations are guaranteed to be lightweight. Our enhanced tion of stack changeb since the object’s birth: δς (tb). If any entry in ∆CFA makes detecting this fact possible. this abstractbb frame string contains a pop, then this object may have b 9.3 Proving the stack safety of setjmp and longjmp escaped its frame of creation. On the otherb hand,b b the absence of pops guarantees that it does not escape, and hence, that the object When translating C into partitioned continuation-passing style, is stack-allocatable. setjmp and longjmp admit an explicit representation:

8 (rcs 1.23 (2007 7 14)) setjmp ≡ (λ (env k) Name % Lightwt. % Non-escape (set-cell! env (λ (val k’) (k val)) fact-tail 100% 100% (λ () (k 0)))) fact-y-combinator 60% 29% loops 100% 100% longjmp ≡ (λ (env val k) closure-loops 33% 0% (get-cell env foldl 100% 100% (λ (reset) foldr 0% 0% (reset val k)))) map 100% 100% coroutine 86% 46% The contract for setjmp and longjmp—that longjmp be called sat-solve-amb 64% 18% while the frame from its setjmp is still live—is impossible to capture with completeness in a type system. However, once con- verted to CPS, proving the safety of setjmp and longjmp reduces Table 1. Results. to checking whether the continuation captured by setjmp (cc) is lightweight.

10. Implementation and experimentation precise, potentially incomputable form of the analysis. This is We have implemented a parameterized version of this enhanced useful for computing optimal escape analysis for code that is ∆CFA for a subset of R5RS Scheme. (We utilize the Petit Larceny known to halt and receives no user input. compiler’s front end to perform macro expansion.) We are also cur- • Top-down counting. With top down-counting, the initial counter rently building a new back end for LLVM that targets this imple- is set to the map ⊤, except for variables and λ terms hand- mentation. Beyond the pure CPS λ calculus described here, the im- selected by the programmer. This can reduce the running time plementation supports explicit recursion (letrec); mutable vari- of the algorithm in its exponential forms. ables (set!); side-effecting stores for arrays, strings, records and • Heap-merging. When garbage-collection is enabled, the heap lists; run-time-arity arguments (apply); labeled arguments; con- does not grow monotonically. If set to n, the heap-mering ditionals (cond, if, case); primitive operations; full continuations threshold causes heaps to merge (via ⊔) once the analysis (call/cc); and branching coroutines (switch). The ideas and con- records more than n distinct heaps per program-point plus con- cepts presented previously adapt naturally to these language con- text. structs. The implementation underscores the framework aspect of ∆CFA, • Join-point merging. If join-point merging is enabled, the state- as the analysis can be run with any combination of the parameters space exploration algorithm becomes a two-stage depth-first below. With most parameters, there is a speed/precision tradeoff, search. In the first stage, states applying join-point continua- although the degree and existence of this tradeoff depends heav- tions are enqueued into the join-point worklist. In the second ily on the control structure of the code under scrutiny. When the stage, these join-point states become the worklist. Join-point analysis is modularized at the per-function level, parameters can be merging removes forking in the abstract due to conditionals, set for maximum precision and exponential worst-case complexity. lowering precision and raising speed. Likewise, the parameters can be tuned toward efficient constraint- • Equality-based termination. By default, the analysis uses solving but lower precision as the degree of modularity approaches the partial order ⊑ to test for termination of a branch. If the whole program. state-equality-based termination is enabled, state hashes and a hashtable of visited states is used to check whether a state has • Context sensitivity. Context-sensitivity can be set to any level already been seen. This will cause the algorithm to visit more of the k-CFA hierarchy. Likewise, precision for the store can states, but the termination test becomes cheaper. be set to global, per-program-point and sub-program-point. A higher context-nesting value k means more precision and less speed. Polynomial-time worst-case complexity is not possible 10.1 Results for k> 1. We have run the enhanced ∆CFA on test cases ranging from toy • Heap-widening. With heap-widening, the heap is widened to a examples to continuation-based, back-tracking SAT solvers. For all global heap after each transition. With a 0CFA-level contour cases, options were set to maximum precision, i.e., 1CFA, counting, set, this degenerates to a polynomial-time constraint-solving garbage collection and whole-program. For each test, Table 1 re- algorithm. ports the percentage of continuation λ terms marked as lightweight • Abstract counting. If counting is disabled, every state uses the and the percentage of λ terms marked as creating non-escaping clo- map ⊤ for a counter. Disabling counting gains speed but loses sures. precision. Critically, cancelling a pop-then-push pair in the ab- The first two tests (fact-tail, fact-y-combinator) are toy stract becomes impossible. For programs which do not use full examples that compute factorial. The next two tests (loops and continuations, the impact of disabling counting on precision is closure-loops) cover iteration in nested loops; the first test is less severe. a simple doubly-nested for loop, while the second computes a • sum in a doubly-nested loop where the inner loop creates a clo- Abstract garbage collecting. The garbage collection interval sure over the outer loop’s iteration variables. The next three tests can be set between every transition and never. Support for more (foldl, foldr, map) cover three highly common patterns for itera- fine-grained, per-variable control of the garbage collector is tion/recursion in higher-order functional programming; the poor underway. Garbage collection always increases precision and performance on foldr indicates ∆CFA’s weakest point: code sometimes lowers analysis running time. It tends to improve whose only looping construct is non-tail recursion. The next test speed for highly iterative or tail-recursive code. (coroutine) covers a two-stage, two-stack coroutine pipeline. The • Contour-narrowing. With contour-narrowing enabled, the last test (sat-solve-amb) analyzes a continuation-based, back- analysis is narrowed into the concrete semantics—a perfectly tracking SAT solver.

(rcs 1.23 (2007 7 14)) 9 11. Related work cheap and function call expensive. Hieb et al. used a hybrid stack This work lies at the confluence of several lines of research: pre- technique to make continuations more expensive but function call vious work with procedure strings, previous work on CPS-based cheaper (Hieb et al. 1990); given that this technique is dynamic, it program representations, and the general body of work on program meshes perfectly with our own. Bruggeman et al. pointed out that analysis based on the λ-calculus. all one-shot continuations are also lightweight (Bruggeman et al. Globally, our entire body of work is an instantiation of the 1996); however, our technique is more general: not all lightweight Cousots’ “non-standard abstract semantics” framework of program continuations are one-shot. analysis (Cousot and Cousot 1977, 1979). We make use of their widening and narrowing operations to achieve either the desired References computational complexity or precision of the analysis. Andrew W. Appel. Compiling with Continuations. Cambridge Sharir and Pnueli (Sharir and Pnueli 1981) provide a good intro- University Press, 1992. duction to the call-string paradigm, using call strings to provide the polyvariance needed to specialise function context in interprocedu- Carl Bruggeman, Oscar Waddell, and R. Kent Dybvig. Represent- ral data-flow analysis. Sestoft (Sestoft 1988) has used definition- ing control in the presence of one-shot continuations. In ACM use path strings to globalize function parameters. Much of our SIGPLAN Conference on Programming Language Design and work draws on Might and Shivers’ adaptation (Might and Shiv- Implementation, June 1996. ers 2006a) of Harrison’s dissertation (Harrison 1989), which used Patrick Cousot and Radhia Cousot. Abstract interpretation: a uni- call-down/return-up procedure strings for detecting read-write de- fied lattice model for static analysis of programs by construction pendencies in a parallelising compiler. In particular, we have taken or approximation of fixpoints. In ACM SIGPLAN Symposium on two items from this body of work: First, our basic string abstraction Principles of Programming Languages, volume 4, pages 238– (functions mapping code points to regular expressions over stack 252, Los Angeles, California, January 1977. actions) is Harrison’s. Second, the “relativization” of frame-string change was also Harrison’s insight. Might and Shivers’ (Might and Patrick Cousot and Radhia Cousot. Systematic design of program Shivers 2006a) generalised Harrison’s procedure strings by adding analysis frameworks. In ACM SIGPLAN Symposium on Princi- contours to concrete frame strings, which enriches its structure ples of Programming Languages, volume 6, pages 269–282, San from a monoid to a group; we make the necessary modifications Antonio, Texas, January 1979. to preserve more of this group structure in the abstract. Williams Ludwell Harrison. The interprocedural analysis and au- We also import Might and Shivers’ exploitation of CPS as a uni- tomatic parallelization of Scheme programs. Lisp and Symbolic fying mechanism for control. Work outside of ∆CFA on procedure Computation, 2(3/4):179–396, October 1989. strings has treated procedures as “large grain” blocks of program Robert Hieb, R. Kent Dybvig, and Carl Bruggeman. Representing structure, with alternate mechanisms employed to handle “intra- control in the presence of first-class continuations. In ACM procedural” control flow, such as sequencing, loops and conditional SIGPLAN Conference on Programming Language Design and branches. Where other frameworks have call and return, ∆CFA has Implementation, pages 66–77, June 1990. call and the far more general call to continuation. Might and Shivers’ pointed out three distinct advantages for David Kranz, Richard Kelsey, Jonathan Rees, Paul Hudak, James CPS in ∆CFA: First, simplicity: the semantics has but two transi- Philbin, and Norman Adams. ORBIT: An tion rules. Second, universality: CPS can encode any computation. for Scheme. In SIGPLAN Symposium on Compiler Construc- Third, power: the semantics are more precise. tion, volume 21, pages 219–233, Palo Alto, California, June Harrison discussed handling continuation-passing style in his 1986. more complicated semantics. However, he missed the fact that CPS Matthew Might and Olin Shivers. Environment Analysis via terms can be partitioned, deciding that, in CPS, all stack motion ∆CFA. In ACM SIGPLAN Symposium on Principles of Pro- is “downward.” In that view, CPS becomes all calls, no returns. gramming Languages, pages 127–140, Charleston, South Car- Escape analysis loses its meaning under this perspective: nothing olina, January 2006a. could ever escape. The shift toward Steele-like stack-management, with its consequent focus on stack-allocation operations as opposed Matthew Might and Olin Shivers. Improving Flow Analysis via to control operations, liberates the analysis to general control appli- ΓCFA: Abstract Garbage Collection and Counting. In Pro- cability. ceedings of the 11th ACM SIGPLAN International Conference We also draw heavily on Might and Shivers’ ΓCFA (Might on Functional Programming (ICFP 2006), Portland, Oregon, and Shivers 2006a) for abstract garbage collection and abstract September 2006b. counting. Abstract counting’s ability to track the cardinality of an Peter Sestoft. Replacing Function Parameters by Global Variables. abstract resource improves both precision and power in this work. Master’s thesis, DIKU, University of Copenhagen, Denmark, We have extended this tool to provide the cardinality of abstract October 1988. frames, and we use that knowledge to enhance the precision of Micha Sharir and Amir Pnueli. Two approaches to interprocedural frame string operations in the abstract. data flow analysis. In Steven Muchnick and Neil Jones, editors, Our work is also firmly planted in the line of research de- Program Flow Analysis, Theory and Application, chapter 7. veloping the CPS-as-intermediate-representation thesis. We have Prentice Hall International, 1981. already outlined what CPS offers as a medium for analysis by way of contrast with non-CPS work. The seminal work here is Olin Shivers. Control-Flow Analysis of Higher-Order Languages. by Steele (Steele Jr. 1978), who also first articulated the style PhD thesis, School of Computer Science, Carnegie-Mellon Uni- of function-call protocol we have exploited. Shivers’ disserta- versity, Pittsburgh, Pennsylvania, May 1991. Technical Report tion (Shivers 1991) described the “k-CFA” family of abstractions, CMU-CS-91-145. of which this work is an extension. Olin Shivers and Matthew Might. Continuations and Transducer The power and cost of continuations has made them an attractive Composition. In ACM SIGPLAN Conference on Programming opportunity for optimization. SML/NJ reduced the cost of contin- Language Design and Implementation, pages 295–307, Ottawa, uations by going stackless (Appel 1992); this makes continuations Canada, June 2006.

10 (rcs 1.23 (2007 7 14)) Guy L. Steele Jr. RABBIT: a compiler for SCHEME. Master’s the- sis, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, May 1978. Technical report AI-TR-474.

(rcs 1.23 (2007 7 14)) 11