Pushdown Control-Flow Analysis of Higher-Order Programs

Christopher Earl Matthew Might David Van Horn ∗ University of Utah Northeastern University {cwearl,might}@cs.utah.edu [email protected]

Abstract 0CFA says that the flow set for the variable a contains both 3 and 4. In fact, so does the flow set for the variable b. For return- Context-free approaches to static analysis gain precision over clas- 1 sical approaches by perfectly matching returns to call sites— flow, 0CFA says that the invocation of (id 4) may return to the a property that eliminates spurious interprocedural paths. Var- invocation of (id 3) or (id 4) and vice versa; that is, according doulakis and Shivers’s recent formulation of CFA2 showed that it to Shivers’s 0CFA, this program contains a loop. is possible (if expensive) to apply context-free methods to higher- To combat merging, control-flow analyses have focused on order languages and gain the same boost in precision achieved over increasing context-sensitivity [Shivers 1991]. Context-sensitivity first-order programs. tries to qualify any answer that comes back from a CFA with a To this young body of work on context-free analysis of higher- context in which that answer is valid. That is, instead of answer- order programs, we contribute a pushdown control-flow analy- ing “λ42 may flow to variable v13,” a context-sensitive analysis sis framework, which we derive as an abstract interpretation of a might answer “λ42 may flow to variable v13 when bound after call- CESK machine with an unbounded stack. One instantiation of this ing f.” While context-sensitivity recovers some lost precision, it is framework marks the first polyvariant pushdown analysis of higher- no silver bullet. A finite-state analysis stores only a finite amount order programs; another marks the first polynomial-time analy- of program context to discriminate data- and control-flows during sis. In the end, we arrive at a framework for control-flow analysis analysis. Yet, the pigeon-hole principle remains merciless: as long that can efficiently compute pushdown generalizations of classical as the state-space is finite, merging is inevitable for some programs. control-flow analyses. Of all the forms of merging, the most pernicious is the merg- ing of return-flow information. As the example shows, a finite-state control-flow analysis will lose track of where return-points return 1. Introduction once the maximum bound on context is exceeded. Even in pro- Static analysis is bound by theory to accept diminished precision as grams with no higher-order functions, return-flow merging will still the price of decidability. The easiest way to guarantee decidability happen during control-flow analysis. for a static analysis is to restrict it to a finite state-space. Not surprisingly, finite state-spaces have become a crutch. 1.2 A first shot: CFA2 Whenever an abstraction maps an infinite (concrete) state-space Vardoulakis and Shivers’s recent work on CFA2 [Vardoulakis and down to the finite state-space of a static analysis, the pigeon-hole Shivers 2010] constitutes an opening salvo on ending the return- principle forces merging. Distinct execution paths and values can flow problem for the static analysis of higher-order programs. and do become indistinguishable under abstraction, e.g., 3 and 4 CFA2 employs an implicit pushdown system that models the stack both abstract to the same value: positive. of a program. CFA2 solves the return-flow problem for higher-order Our message is that finite abstraction goes too far: we can ab- programs, but it has drawbacks: stract into an infinite state-space to improve precision, yet remain decidable. Specifically, we can abstract the concrete semantics of 1. CFA2 allows only monovariant precision. a higher-order language into a pushdown automaton (PDA). As an 2. CFA2 has exponential complexity in the size of the program. infinite-state system, a PDA-based abstraction preserves more in- formation than a classical finite-state analysis. Yet, being less pow- 3. CFA2 is restricted to -passing style. erful than a Turing machine, properties important for computing Our solution overcomes all three drawbacks: it allows polyvari- control-flow analysis (e.g. emptiness, intersection with regular lan- 6 guages, reachability) remain decidable. ant precision, we can widen it to O(n )-time complexity in the monovariant case and we can operate on direct-style programs. 1.1 The problem with merging A short example provides a sense of how the inevitable merging 1.3 Our solution: Abstraction to pushdown systems that occurs under a finite abstraction harms precision. Shivers’s To prevent return-flow merging during higher-order control-flow 0CFA [Shivers 1991] produces spurious data-flows and return- analysis, we abstract into an explicit pushdown system instead of a flows in the following example: finite-state machine. The program stack, which determines return- flow, will remain unbounded and become the pushdown stack. As a (let* ((id (lambda (x) x)) result, return-flow information will never be merged: in the abstract (a (id 3)) semantics, a function returns only whence it was called. (b (id 4))) a) 1 “Return-flow” analysis asks to which call sites a given return point may ∗ Supported by the National Science Foundation under grant 0937060 to the return. In the presence of tail calls, which break the balance between calls Computing Research Association for the CIFellow Project. and returns, return-flow analysis differs from control-flow analysis. 1.4 Overview [Sipser 2005]. Even those familiar with pushdown theory may want This paper is organized as follows: first, we define a variant of the to skim this section to pick up our notation. CESK machine [Felleisen and Friedman 1987] for the A-Normal 2.1 Syntactic sugar Form λ-calculus [Flanagan et al. 1993]. In performing analysis, we 0 wish to soundly approximate intensional properties of this machine When a triple (x, `, x ) is an edge in a labeled graph, a little when it evaluates a given program. To do so, we construct an ab- syntactic sugar aids presentation: stract interpretation of the machine. The abstracted CESK machine ` 0 0 operates much like its concrete counterpart and soundly approxi- x  x ≡ (x, `, x ). mates its behavior, but crucially, many properties of the concrete Similarly, when a pair (x, x0) is a graph edge: machine that are undecidable become decidable when considered 0 0 against the abstracted machine (e.g. “is a given machine configura- x  x ≡ (x, x ). tion reachable?” becomes a decidable property). We use both string and vector notation for sequences: The abstract counterpart to the CESK machine is constructed by bounding the store component of the machine to some finite a1a2 . . . an ≡ ha1, a2, . . . , ani ≡ ~a. size. However, the stack component (represented as a continuation) 2.2 Stack actions, stack change and stack manipulation is left unabstracted. (This is in contrast to finite-state abstractions that store-allocate [Van Horn and Might 2010].) Un- Stacks are sequences over a stack alphabet Γ. Pushdown systems like most higher-order abstract interpreters, the unbounded stack do much stack manipulation, so to represent this more concisely, implies this machine has a potentially infinite set of reachable ma- we turn stack alphabets into “stack-action” sets; each character chine configurations, and therefore enumerating them is not a fea- represents a change to the stack: push, pop or no change. sible approach to performing analysis. For each character γ in a stack alphabet Γ, the stack-action set Instead, we demonstrate how properties can be decided by trans- Γ± contains a push character γ+ and a pop character γ−; it also forming the abstracted CESK machine into an equivalent push- contains a no-stack-change indicator, : down automaton. We then reduce higher-order control-flow anal- g ∈ Γ± ::=  [stack unchanged] ysis to deciding the non-emptiness of a language derived from the | γ for each γ ∈ Γ [pushed γ] PDA. (This language happens to be the intersection of a regular lan- + guage and the context-free language described by the PDA.) This | γ− for each γ ∈ Γ [popped γ]. approach—though concise, precise and decidable—is formidably In this paper, the symbol g represents some stack action. expensive, with complexity doubly exponential in the size of the program. 2.3 Pushdown systems We simplify the algorithm to merely exponential in the size of A pushdown system is a triple M = (Q, Γ, δ) where: the input program by reducing the control-flow problem to push- down reachability [Bouajjani et al. 1997]. Unfortunately, the ab- 1. Q is a finite set of control states; stracted CESK machine has an exponential number of control states 2. Γ is a stack alphabet; and with respect to the size of the program. Thus, pushdown reachabil- ity for higher-order programs appears to be inherently exponential. 3. δ ⊆ Q × Γ± × Q is a transition relation. Noting that most control states in the abstracted CESK machine We use PDS to denote the class of all pushdown systems. are actually unreachable, we present a fixed-point algorithm for de- Unlike the more widely known pushdown automaton, a push- ciding pushdown reachability that is polynomial-time in the num- down system does not recognize a language. ber of reachable control states. Since the pushdown systems pro- duced by abstracted CESK machines are sparse, such algorithms, For the following definitions, let M = (Q, Γ, δ). though exponential in the worst case, are reasonable options. Yet, we can do better. • The configurations of this machine—Configs(M)—are pairs Next, we add an -closure graph (a graph encoding no-stack- over control states and stacks: change reachability) and a work-list to the fixed-point algorithm. Configs(M) = Q × Γ.∗ Together, these lower the cost of finding the reachable states of a 4 5 2 4 pushdown system from O(|Γ| m ) to O(|Γ| m ), where Γ is the • The labeled transition relation (7−→M ) ⊆ Configs(M) × stack alphabet and m is the number of reachable control states. Γ± × Configs(M) determines whether one configuration may To drop the complexity of our analysis to polynomial-time in transition to another while performing the given stack action: the size of the input program, we must resort to both widening and  (q, ~γ) 7−→ (q0, ~γ) iff q q0 ∈ δ [no change] monovariance. Widening with a single-threaded store and using M  a monovariant allocation strategy yields a pushdown control-flow γ− γ− 6 (q, γ : ~γ) 7−→ (q0, ~γ) iff q q0 ∈ δ [pop] analysis with polynomial-time complexity, at O(n ), where n is M  the size of the program. γ γ+ 0 + 0 Finally, we briefly highlight applications of pushdown control- (q, ~γ) 7−→ (q , γ : ~γ) iff q  q ∈ δ [push]. flow analyses that are outside the reach of classical ones, discuss M related work, and conclude. • If unlabelled, the transition relation (7−→) checks whether any stack action can enable the transition: g c 7−→ c0 iff c 7−→ c0 for some stack action g. 2. Pushdown preliminaries M M • For a string of stack actions g . . . g : In this work, we make use of both pushdown systems and push- 1 n g1...gn g1 g2 gn−1 gn down automata. (A pushdown automaton is a specific kind of push- c0 7−→ cn iff c0 7−→ c1 7−→ ··· 7−→ cn−1 7−→ cn, down system.) There are many (equivalent) definitions of these M M M M M machines in the literature, so we adapt our own definitions from for some configurations c0, . . . , cn. • For the transitive closure: requires that all arguments to a function be atomic: ~g c 7−→∗ c0 iff c 7−→ c0 for some action string ~g . e ∈ Exp ::= (let ((v call)) e) [non-tail call] M M | call [tail call] Note Some texts define the transition relation δ so that δ ⊆ | æ [return] ∗ 0 Q×Γ×Q×Γ . In these texts, (q, γ, q , ~γ) ∈ δ means, “if in control f, æ ∈ Atom ::= v | lam [atomic expressions] state q while the character γ is on top, pop the stack, transition to control state q0 and push ~γ.” Clearly, we can convert between lam ∈ Lam ::= (λ (v) e) [lambda terms] these two representations by introducing extra control states to our call ∈ Call ::= (f æ) [applications] representation when it needs to push multiple characters. v ∈ Var is a set of identifiers [variables]. 2.4 Rooted pushdown systems We use the CESK machine of Felleisen and Friedman[1987] to specify the semantics of ANF. We have chosen the CESK machine A rooted pushdown system is a quadruple (Q, Γ, δ, q ) in which 0 because it has an explicit stack, and under abstraction, the stack (Q, Γ, δ) is a pushdown system and q ∈ Q is an initial (root) state. 0 component of our CESK machine will become the stack component is the class of all rooted pushdown systems. RPDS of a pushdown system. For a rooted pushdown system M = (Q, Γ, δ, q ), we define a 0 First, we define a set of configurations (Conf ) for this machine: the root-reachable transition relation: g 0 ∗ g 0 c ∈ Conf = Exp × Env × Store × Kont [configurations] c 7−→−→ c iff (q0, hi) 7−→ c and c 7−→ c . M M M ρ ∈ Env = Var * Addr [environments] In other words, the root-reachable transition relation also makes σ ∈ Store = Addr → Clo [stores] sure that the root control state can actually reach the transition. clo ∈ Clo = Lam × Env [closures] We overload the root-reachable transition relation to operate on ∗ control states as well: κ ∈ Kont = Frame [continuations] g g φ ∈ Frame = Var × Exp × Env [stack frames] q 7−→−→ q0 iff (q, ~γ) 7−→−→ (q0, ~γ 0) for some stacks ~γ, ~γ 0. M M a ∈ Addr is an infinite set of addresses [addresses]. For both root-reachable relations, if we elide the stack-action label, To define the semantics, we need five items: then, as in the un-rooted case, the transition holds if there exists some stack action that enables the transition: 1. I : Exp → Conf injects an expression into a configuration. 0 g 0 2. A : Atom×Env ×Store * Clo evaluates atomic expressions. q 7−→−→ q iff q 7−→−→ q for some action g. M M 3. E : Exp → P (Conf ) computes the set of reachable machine 2.5 Pushdown automata configurations for a given program. 4. (⇒) ⊆ Conf × Conf transitions between configurations. A pushdown automaton is a generalization of a rooted pushdown system, a 7-tuple (Q, Σ, Γ, δ, q0, F, ~γ) in which: 5. alloc : Var × Conf → Addr chooses fresh store addresses for newly bound variables. 1. Σ is an input alphabet; Program injection The program injection function pairs an ex- 2. δ ⊆ Q × Γ± × (Σ ∪ {}) × Q is a transition relation; pression with an empty environment, an empty store and an empty 3. F ⊆ Q is a set of accepting states; and stack to create the initial configuration: ∗ 4. ~γ ∈ Γ is the initial stack. c0 = I(e) = (e, [], [], hi). We use PDA to denote the class of all pushdown automata. Atomic expression evaluation The atomic expression evaluator, Pushdown automata recognize languages over their input al- A : Atom × Env × Store * Clo, returns the value of an atomic phabet. To do so, their transition relation may optionally con- expression in the context of an environment and a store: sume an input character upon transition. Formally, a PDA M = ∗ A(lam, ρ, σ) = (lam, ρ) [closure creation] (Q, Σ, Γ, δ, q0, F, ~γ) recognizes the language L(M) ⊆ Σ : A(v, ρ, σ) = σ(ρ(v)) [variable look-up].  ∈ L(M) if q ∈ F 0 Reachable configurations The evaluator E : Exp → P (Conf ) 0 0 aw ∈ L(M) if δ(q0, γ+, a, q ) and w ∈ L(Q, Σ, Γ, δ, q , F, γ : ~γ) returns all configurations reachable from the initial configuration: 0 0 aw ∈ L(M) if δ(q0, , a, q ) and w ∈ L(Q, Σ, Γ, δ, q , F, ~γ) E(e) = {c : I(e) ⇒∗ c} . 0 0 0 aw ∈ L(M) if δ(q0, γ−, a, q ) and w ∈ L(Q, Σ, Γ, δ, q , F, ~γ ) Transition relation To define the transition c ⇒ c0, we need three 0 where ~γ = hγ, γ2, . . . , γni and ~γ = hγ2, . . . , γni, rules. The first rule handles tail calls by evaluating the function into a closure, evaluating the argument into a value and then moving to where a is either the empty string  or a single character. the body of the λ-term within the closure: 0 c c 3. Setting: A-Normal Form λ-calculus z }| { z 00}| 0 { ([[(f æ)]], ρ, σ, κ) ⇒ (e, ρ , σ , κ) , where Since our goal is to create pushdown control-flow analyses of ([[(λ (v) e)]], ρ0) = A(f, ρ, σ) higher-order languages, we choose to operate on the λ-calculus. For simplicity of the concrete and abstract semantics, we choose a = alloc(v, c) to analyze programs in A-Normal Form, however this is strictly a ρ00 = ρ0[v 7→ a] cosmetic choice; all of our results can be replayed mutatis mutandis 0 in a direct-style setting. ANF enforces an order of evaluation and it σ = σ[a 7→ A(æ, ρ, σ)]. Non-tail call pushes a frame onto the stack and evaluates the call: cˆ ∈ Conf\ = Exp × Envd × Store\ × Kont[ [configurations] c c0 z }| { z }| { ρˆ ∈ Env = Var * Addr[ [environments] ([[(let ((v call)) e)]], ρ, σ, κ) ⇒ (call, ρ, σ, (v, e, ρ): κ) . d   Function return pops a stack frame: σˆ ∈ Store\ = Addr[ → P Clod [stores] 0 c c cloc ∈ Clod = Lam × Envd [closures] z }| 0 { z 00}| 0 { ∗ (æ, ρ, σ, (v, e, ρ ): κ) ⇒ (e, ρ , σ , κ) , where κˆ ∈ Kont[ = Frame\ [continuations] a = alloc(v, c) φˆ ∈ Frame\ = Var × Exp × Envd [stack frames] ρ00 = ρ0[v 7→ a] aˆ ∈ Addr[ is a finite set of addresses [addresses]. σ0 = σ[a 7→ A(æ, ρ, σ)]. Allocation The address-allocation function is an opaque param- Figure 1. The abstract configuration-space. eter in this semantics. We have done this so that the forthcoming abstract semantics may also parameterize allocation, and in so do- ing provide a knob to tune the polyvariance and context-sensitivity the initial configuration: of the resulting analysis. For the sake of defining the concrete se- ˆ n ˆ ∗ o mantics, letting addresses be natural numbers suffices, and then the E(e) = cˆ : I(e) ; cˆ . allocator can choose the lowest unused address: Because there are an infinite number of abstract configurations, Addr = N a na¨ıve implementation of this function may not terminate. In alloc(v, (e, ρ, σ, κ)) = 1 + max(dom(σ)). Sections5 through8, we show that there is a way to compute a finite representation of this set. 4. An infinite-state abstract interpretation Transition relation The abstract transition relation (;) ⊆ Our goal is to statically bound the higher-order control-flow of the Conf\ × Conf\ has three rules, one of which has become non- CESK machine of the previous section. So, we are going to conduct deterministic. A tail call may fork because there could be multiple an abstract interpretation. abstract closures that it is invoking: 0 Since we are concerned with return-flow precision, we are go- cˆ cˆ ing to abstract away less information than we normally would. z }| { z 00}| 0 { Specifically, we are going to construct an infinite-state abstract in- ([[(f æ)]], ρ,ˆ σ,ˆ κˆ) ; (e, ρˆ , σˆ , κˆ) , where terpretation of the CESK machine by leaving its stack unabstracted. ([[(λ (v) e)]], ρˆ0) ∈ Aˆ(f, ρ,ˆ σˆ) (With an infinite-state abstraction, the usual approach for comput- ing the static analysis—exploring the abstract configurations reach- aˆ = alloc[(v, cˆ) able from some initial configuration—simply will not work. Sub- ρˆ00 =ρ ˆ0[v 7→ aˆ] sequent sections focus on finding an algorithm that can compute a finite representation of the reachable abstract configurations of the σˆ0 =σ ˆ t [ˆa 7→ Aˆ(æ, ρ,ˆ σˆ)]. abstracted CESK machine.) We define all of the partial orders shortly, but for stores: For the abstract interpretation of the CESK machine, we need an abstract configuration-space (Figure1). To construct one, we (ˆσ t σˆ0)(ˆa) =σ ˆ(ˆa) ∪ σˆ0(ˆa). force addresses to be a finite set, but crucially, we leave the stack A non-tail call pushes a frame onto the stack and evaluates the call: untouched. When we compact the set of addresses into a finite set, the machine may run out of addresses to allocate, and when cˆ cˆ0 z }| { z }| { it does, the pigeon-hole principle will force multiple closures to ([[(let ((v call)) e)]], ρ,ˆ σ,ˆ κˆ) ; (call, ρ,ˆ σ,ˆ (v, e, ρˆ) :κ ˆ) . reside at the same address. As a result, we have no choice but to force the range of the store to become a power set in the abstract A function return pops a stack frame: configuration-space. To construct the abstract transition relation, cˆ cˆ0 we need five components analogous to those from the concrete z }| 0 { z 00}| 0 { semantics. (æ, ρ,ˆ σ,ˆ (v, e, ρˆ ) :κ ˆ) ; (e, ρˆ , σˆ , κˆ) , where Program injection The abstract injection function Iˆ : Exp → aˆ = alloc[(v, cˆ) Conf\ pairs an expression with an empty environment, an empty ρˆ00 =ρ ˆ0[v 7→ aˆ] store and an empty stack to create the initial abstract configuration: σˆ0 =σ ˆ t [ˆa 7→ Aˆ(æ, ρ,ˆ σˆ)]. cˆ0 = Iˆ(e) = (e, [], [], hi). Allocation, polyvariance and context-sensitivity In the abstract Atomic expression evaluation The abstract atomic expression semantics, the abstract allocation function alloc[ : Var × Conf\ → ˆ evaluator, A : Atom × Envd × Store\ → P(Clod), returns the value Addr[ determines the polyvariance of the analysis (and, by exten- of an atomic expression in the context of an environment and a sion, its context-sensitivity). In a control-flow analysis, polyvari- store; note how it returns a set: ance literally refers to the number of abstract addresses (variants) Aˆ(lam, ρ,ˆ σˆ) = {(lam, ρ)} [closure creation] there are for each variable. By selecting the right abstract allocation ˆ function, we can instantiate pushdown versions of classical flow A(v, ρ,ˆ σˆ) =σ ˆ(ˆρ(v)) [variable look-up]. analyses. Reachable configurations The abstract program evaluator Eˆ : Monovariance: Pushdown 0CFA Pushdown 0CFA uses variables Exp → P(Conf\) returns all of the configurations reachable from themselves for abstract addresses: PDA\(e) = (Q, Σ, Γ, δ, q0,F, hi), where Addr[ = Var Q = Exp × Env × Store\ alloc(v, cˆ) = v. d Σ = Q Context-sensitive: Pushdown 1CFA Pushdown 1CFA pairs the variable with the current expression to get an abstract address: Γ = Frame\ (q, , q0, q0) ∈ δ iff (q, κˆ) ; (q0, κˆ) for all κˆ Addr[ = Var × Exp (q, φˆ , q0, q0) ∈ δ iff (q, φˆ :κ ˆ) ; (q0, κˆ) for all κˆ alloc(v, (e, ρ,ˆ σ,ˆ κˆ)) = (v, e). − (q, φˆ0 , q0, q0) ∈ δ (q, κˆ) ; (q0, φˆ0 :κ ˆ) κˆ Polymorphic splitting: Pushdown poly/CFA Assuming we com- + iff for all piled the program from a programming language with let-bound (q0, hi) = Iˆ(e) polymorphism and marked which functions were let-bound, we can F = Q. enable polymorphic splitting:

Addr[ = Var + Var × Exp Figure 2. PDA\ : Exp → PDA. ( (v, [[(f æ)]]) f is let-bound alloc(v, ([[(f æ)]], ρ,ˆ σ,ˆ κˆ)) = v otherwise. It is easy to prove that the abstract transition relation simulates the concrete transition relation: Pushdown k-CFA For pushdown k-CFA, we need to look beyond Theorem 4.1. If: the current state and at the last k states. By concatenating the expressions in the last k states together, and pairing this sequence α(c) v cˆ and c ⇒ c0, with a variable we get pushdown k-CFA: then there must exist cˆ0 ∈ Conf\ such that: Addr[ = Var × Expk α(c0) v cˆ0 and cˆ ⇒ cˆ0. alloc[(v, h(e1, ρˆ1, σˆ1, κˆ1),...i) = (v, he1, . . . , eki). Proof. The proof follows by case-wise analysis on the type of the 4.1 Partial orders expression in the configuration. It is a straightforward adaptation of For each set Xˆ inside the abstract configuration-space, we use the similar proofs, such as that of Might[2007] for k-CFA. ˆ ˆ natural partial order, (vXˆ ) ⊆ X × X. Abstract addresses and syntactic sets have flat partial orders. For the other sets: 5. From the abstracted CESK machine to a PDA • The partial order lifts point-wise over environments: In the previous section, we constructed an infinite-state abstract ρˆ v ρˆ0 iff ρˆ(v) =ρ ˆ0(v) for all v ∈ dom(ˆρ). interpretation of the CESK machine. The infinite-state nature of the abstraction makes it difficult to see how to answer static analysis • The partial orders lifts component-wise over closures: questions. Consider, for instance, a control flow-question: (lam, ρˆ) v (lam, ρˆ0) iff ρˆ v ρˆ0. At the call site (f æ), may a closure over lam be called? • The partial order lifts point-wise over stores: If the abstracted CESK machine were a finite-state machine, an 0 0 algorithm could answer this question by enumerating all reach- σˆ v σˆ iff σˆ(ˆa) v σˆ (ˆa) for all aˆ ∈ dom(ˆσ). able configurations and looking for an abstract configuration ([[(f æ)]], ρ,ˆ σ,ˆ κˆ) in which (lam, ) ∈ Aˆ(f, ρ,ˆ σˆ). However, be- • The partial order lifts component-wise over frames: cause the abstracted CESK machine may contain an infinite number (v, e, ρˆ) v (v, e, ρˆ0) iff ρˆ v ρˆ0. of reachable configurations, an algorithm cannot enumerate them. Fortunately, we can recast the abstracted CESK as a special • The partial order lifts element-wise over continuations: kind of infinite-state system: a pushdown automaton (PDA). Push- ˆ ˆ ˆ0 ˆ0 ˆ ˆ0 down automata occupy a sweet spot in the theory of computation: hφ1,..., φni v hφ1,..., φni iff φi v φi. they have an infinite configuration-space, yet many useful proper- • The partial order lifts component-wise across configurations: ties (e.g. word membership, non-emptiness, control-state reacha- 0 0 0 0 0 0 bility) remain decidable. Once the abstracted CESK machine be- (e, ρ,ˆ σ,ˆ κˆ) v (e, ρˆ , σˆ , κˆ ) iff ρˆ v ρˆ and σˆ v σˆ and κˆ v κˆ . comes a PDA, we can answer the control-flow question by check- 4.2 Soundness ing whether a specific regular language, when intersected with the language of the PDA, turns into the empty language. To prove soundness, we need an abstraction map α that connects The recasting as a PDA is a shift in perspective. A configura- the concrete and abstract configuration-spaces: tion has an expression, an environment and a store. A stack char- α(e, ρ, σ, κ) = (e, α(ρ), α(σ), α(κ)) acter is a frame. We choose to make the alphabet the set of control α(ρ) = λv.α(ρ(v)) states, so that the language accepted by the PDA will be sequences of control-states visited by the abstracted CESK machine. Thus, G α(σ) = λa.ˆ {α(σ(a))} every transition will consume the control-state to which it transi- α(a)=ˆa tioned as an input character. Figure2 defines the program-to-PDA

αhφ1, . . . , φni = hα(φ1), . . . , α(φn)i conversion function PDA\ : Exp → PDA. (Note the implicit use ∼ α(v, e, ρ) = (v, e, α(ρ)) of the isomorphism Q × Kont[ = Conf\.) At this point, we can answer questions about whether a speci- α(a) is determined by the allocation functions. fied control state is reachable by formulating a question about the intersection of a regular language with a context-free language de- 1. S is a finite set of nodes; scribed by the PDA. That is, if we want to know whether the control 2. Γ is a set of frames; state (e0, ρ,ˆ σˆ) is reachable in a program e, we can reduce the prob- lem to determining: 3. E ⊆ S × Γ± × S is a set of stack-action edges; and ∗ 0 ∗ 4. s is an initial state; Σ · (e , ρ,ˆ σˆ) · Σ ∩ L(PDA\(e)) 6= ∅, 0 such that for any node s ∈ S, it must be the case that: where L1 · L2 is the concatenation of formal languages L1 and L2. ∗ (s0, hi) 7−→ (s, ~γ) for some stack ~γ. Theorem 5.1. Control-state reachability is decidable. G Proof. The intersection of a regular language and a context-free In other words, a Dyck state graph is equivalent to a rooted push- down system in which there is a legal path to every control state language is context-free. The emptiness of a context-free language 2 is decidable. from the initial control state. We use DSG to denote the class of Dyck state graphs. Clearly: Now, consider how to use control-state reachability to answer ⊂ . the control-flow question from earlier. There are a finite number of DSG RPDS possible control states in which the λ-term lam may flow to the A Dyck state graph is a rooted pushdown system with the “fat” function f in call site (f æ); let’s call the this set of states Sˆ: trimmed off; in this case, unreachable control states and unreach- able transitions are the “fat.” n 0 0o Sˆ = ([[(f æ)]], ρ,ˆ σˆ):(lam, ρˆ ) ∈ Aˆ(f, ρ,ˆ σˆ) for some ρˆ . We can formalize the connection between rooted pushdown systems and Dyck state graphs with a map: What we want to know is whether any state in the set Sˆ is reachable in the PDA. In effect what we are asking is whether there exists a DSG : RPDS → DSG. control state q ∈ Sˆ such that: Given a rooted pushdown system M = (Q, Γ, δ, q0), its equivalent ∗ ∗ Dyck state graph is DSG(M) = (S, Γ, E, q0), where the set S Σ ·{q}· Σ ∩ L(PDA\(e)) 6= ∅. contains reachable nodes: If this is true, then lam may flow to f; if false, then it does not. n ∗ o S = q :(q0, hi) 7−→ (q, ~γ) for some stack ~γ , Problem: Doubly exponential complexity The non-emptiness- M of-intersection approach establishes decidability of pushdown and the set E contains reachable edges: control-flow analysis. But, two exponential complexity barriers n g g o E = q q0 : q 7−→−→ q0 , make this technique impractical.  M First, there are an exponential number of both environments and s0 = q0. (|Addr[ ||Var|) and stores (2|Clod|×|Addr\|) to consider for the set Sˆ. On In practice, the real difference between a rooted pushdown sys- top of that, computing the intersection of a regular language with a tem and a Dyck state graph is that our rooted pushdown system will context-free language will require enumeration of the (exponential) be defined intensionally (having come from the components of an control-state-space of the PDA. As a result, this approach is doubly abstracted CESK machine), whereas the Dyck state graph will be exponential. For the next few sections, our goal will be to lower the defined extensionally, with the contents of each component explic- complexity of pushdown control-flow analysis. itly enumerated during its construction. Our near-term goals are (1) to convert our abstracted CESK 6. Focusing on reachability machine into a rooted pushdown system and (2) to find an efficient method for computing an equivalent Dyck state graph from a rooted In the previous section, we saw that control-flow analysis reduces pushdown system. to the reachability of certain control states within a pushdown sys- To convert the abstracted CESK machine into a rooted push- tem. We also determined reachability by converting the abstracted CESK machine into a PDA, and using emptiness-testing on a lan- down system, we use the function RPDS\ : Exp → RPDS: guage derived from that PDA. Unfortunately, we also found that RPDS\ (e) = (Q, Γ, δ, q0) this approach is deeply exponential. Since control-flow analysis reduced to the reachability of Q = Exp × Envd × Store\ control-states in the PDA, we skip the language problems and go Γ = Frame\ directly to reachability algorithms of Bouajjani et al.[1997], Ko-  0 0 dumal and Aiken[2004], Reps[1998] and Reps et al.[2005] that q  q ∈ δ iff (q, κˆ) ; (q , κˆ) for all κˆ determine the reachable configurations within a pushdown system. φˆ − 0 ˆ 0 These algorithms are even polynomial-time. Unfortunately, some q  q ∈ δ iff (q, φ :κ ˆ) ; (q , κˆ) for all κˆ of them are polynomial-time in the number of control states, and φˆ in the abstracted CESK machine, there are an exponential number + 0 0 ˆ q  q ∈ δ iff (q, κˆ) ; (q , φ :κ ˆ) for all κˆ of control states. We don’t want to enumerate the entire control ˆ state-space, or else the search becomes exponential in even the best (q0, hi) = I(e). case. To avoid this worst-case behavior, we present a straightforward 7. Compacting rooted pushdown systems pushdown-reachability algorithm that considers only the reachable We now turn our attention to compacting a rooted pushdown system control states. We cast our reachability algorithm as a fixed-point (defined intensionally) into a Dyck state graph (defined extension- iteration, in which we incrementally construct the reachable subset of a pushdown system. We term these algorithms “iterative Dyck 2 We chose the term Dyck state graph because the sequences of stack state graph construction.” actions along valid paths through the graph correspond to substrings in A Dyck state graph is a compacted, rooted pushdown system Dyck languages. A Dyck language is a language of balanced, “colored” G = (S, Γ, E, s0), in which: parentheses. In this case, each character in the stack alphabet is a color. ally). That is, we want to find an implementation of the function 8. Efficiency: Work-lists and -closure graphs DSG. To do so, we first phrase the Dyck state graph construction We have developed a fixed-point formulation of the Dyck state as the least fixed point of a monotonic function. This will provide a graph construction algorithm, but found that, in each iteration, it method (albeit an inefficient one) for computing the function DSG. wasted effort by passing over all discovered states and edges, even In the next section, we look at an optimized work-list driven algo- though most will not contribute new states or edges. Taking a cue rithm that avoids the inefficiencies of this version. from graph search, we can adapt the fixed-point algorithm with a The function F : RPDS → (DSG → DSG) generates the work-list. That is, our next algorithm will keep a work-list of new monotonic iteration function we need: states and edges to consider, instead of reconsidering all of them. F(M) = f, where In each iteration, it will pull new states and edges from the work list, insert them into the Dyck state graph and then populate the M = (Q, Γ, δ, q ) 0 work-list with new states and edges that have to be added as a 0 0 f(S, Γ, E, s0) = (S , Γ,E , s0), where consequence of the recent additions. 0 n 0 0o S = S ∪ s : s ∈ S and s 7−→−→ s ∪ {s0} M 8.1 -closure graphs n g g o E0 = E ∪ s s0 : s ∈ S and s 7−→−→ s0 . Figuring out what edges to add as a consequence of another edge  M requires care, for adding an edge can have ramifications on distant  q  q0 Given a rooted pushdown system M, each application of the func- control states. Consider, for example, adding the -edge  tion F(M) accretes new edges at the frontier of the Dyck state into the following graph: graph. Once the algorithm reaches a fixed point, the Dyck state γ+ γ− q q 0 q graph is complete: 0 / q / 1 Theorem 7.1. DSG(M) = lfp(F(M)). As soon this edge drops in, an -edge “implicitly” appears between q0 and q1 because the net stack change between them is empty; the resulting graph looks like:

Proof. Let M = (Q, Γ, δ, q0). Let f = F(M). Observe that  n e b _ \ Y lfp(f) = f (∅, Γ, ∅, q0) for some n. When N ⊆ M, then it easy m h V Q to show that f(N) ⊆ M. Hence, DSG(M) ⊇ lfp(F(M)). < Ó γ γ To show DSG(M) ⊆ lfp(F(M)), suppose this is not the case. +  0 −  q0 / q / q / q1 Then, there must be at least one edge in DSG(M) that is not in lfp(F(M)). Let (s, g, s0) be one such edge, such that the state s where we have illustrated the implicit -edge as a dotted line. is in lfp(F(M)). Let m be the lowest natural number such that s To keep track of these implicit edges, we will construct a sec- appears in f m(M). By the definition of f, this edge must appear in ond graph in conjunction with the Dyck state graph: an -closure f m+1(M), which means it must also appear in lfp(F(M)), which graph. In the -closure graph, every edge indicates the existence of is a contradiction. Hence, DSG(M) ⊆ lfp(F(M)). a no-net-stack-change path between control states. The -closure graph simplifies the task of figuring out which states and edges are impacted by the addition of a new edge. 7.1 Complexity: Polynomial and exponential Formally, an -closure graph, is a pair G = (N,H), where N To determine the complexity of this algorithm, we ask two ques- is a set of states, and H ⊆ N × N is a set of edges. Of course, all tions: how many times would the algorithm invoke the iteration -closure graphs are reflexive: every node has a self loop. We use function in the worst case, and how much does each invocation cost the symbol ECG to denote the class of all -closure graphs. in the worst case? The size of the final Dyck state graph bounds the We have two notations for finding ancestors and descendants of run-time of the algorithm. Suppose the final Dyck state graph has a state in an -closure graph G = (N,H): m states. In the worst case, the iteration function adds only a single ←−  0 0 G [s] = s :(s , s) ∈ H [ancestors] edge each time. Since there are at most 2|Γ|m2 + m2 edges in the 2 2 −→  0 0 final graph, the maximum number of iterations is 2|Γ|m + m . G [s] = s :(s, s ) ∈ H [descendants]. The cost of computing each iteration is harder to bound. The cost of determining whether to add a push edge is proportional to 8.2 Integrating a work-list the size of the stack alphabet, while the cost of determining whether Since we only want to consider new states and edges in each to add an -edge is constant, so the cost of determining all new push iteration, we need a work-list, or in this case, two work-graphs. and pop edges to add is proportional to |Γ|m + m. Determining A Dyck state work-graph is a pair (∆S, ∆E) in which the set ∆S whether or not to add a pop edge is expensive. To add the pop edge contains a set of states to add, and the set ∆E contains edges to be γ− 0 3 s  s , we must prove that there exists a configuration-path to added to a Dyck state graph. We use ∆DSG to refer to the class the control state s, in which the character γ is on the top of the of all Dyck state work-graphs. stack. This reduces to a CFL-reachability query [Melski and Reps An -closure work-graph is a set ∆H of new -edges. We use 3 3 2000] at each node, the cost of which is O(|Γ±| m ) [Kodumal ∆ECG to refer to the class of all -closure work-graphs. and Aiken 2004]. To summarize, in terms of the number of reachable control 8.3 A new fixed-point iteration-space states, the complexity of the most recent algorithm is: Instead of consuming a Dyck state graph and producing a Dyck 2 2 3 3 4 5 O((2|Γ|m + m ) × (|Γ|m + m + |Γ±| m )) = O(|Γ| m ). state graph, the new fixed-point iteration function will consume and produce a Dyck state graph, an -closure graph, a Dyck state work- While this approach is polynomial in the number of reachable graph and an -closure work graph. Hence, the iteration space of control states, it is far from efficient. In the next section, we provide an optimized version of this fixed-point algorithm that maintains a 3 Technically, a work-graph is not an actual graph, since ∆E 6⊆ ∆S × work-list and an -closure graph to avoid spurious recomputation. Γ± × ∆S; a work-graph is just a set of nodes and a set of edges. 0 checks whether a new state could produce any new push edges or F (M) = f, where no-change edges. We can represent its behavior diagrammatically: M = (Q, Γ, δ, q0) s 0 0 0 0 f(G, G, ∆G, ∆H) = (G ,G, ∆G , ∆H − H), where δ γ+  (S, Γ, E, s0) = G ×Ö ÕÔ δ 0  ÐÑ ÒÓ 00 (S, H) = G q q (∆S, ∆E) = ∆G which means if adding control state s: [  0 (∆E0, ∆H0) = sprout M (s) add edge s  q if it exists in δ, and γ 00 s∈∆S add edge s  + q if it exists in δ. [ 0 (∆E1, ∆H1) = addPushM (G, G)(s, γ+, s ) Formally: 0 (s,γ+,s )∈∆E sprout (Q,Γ,δ)(s) = (∆E, ∆H), where [ 0 (∆E2, ∆H2) = addPopM (G, G)(s, γ−, s ) 0  γ γ  (s,γ−,s )∈∆E n   o + + ∆E = s  q : s  q ∈ δ ∪ s  q : s  q ∈ δ [ 0 (∆E3, ∆H3) = addEmpty (G, G)(s, s ) M n  o (s,,s0)∈∆E ∆H = s  q : s  q ∈ δ . [ 0 (∆E4, ∆H4) = addEmpty M (G, G)(s, s ) Considering the consequences of a new push edge Once our (s,s0)∈∆H algorithm adds a new push edge to a Dyck state graph, there is S0 = S ∪ ∆S a chance that it will enable new pop edges for the same stack frame 0 somewhere downstream. If and when it does enable pops, it will E = E ∪ ∆E also add new edges to the -closure graph. The addPush function: H0 = H ∪ ∆H addPush(Q,Γ,δ) : DSG × ECG → δ → (P (δ) × P (Q × Q)), 0 ∆E = ∆E0 ∪ ∆E1 ∪ ∆E2 ∪ ∆E3 ∪ ∆E4 checks for -reachable states that could produce a pop. We can ∆S0 = s0 :(s, g, s0) ∈ ∆E0 represent this action diagrammatically: 0 ∆H = ∆H ∪ ∆H ∪ ∆H ∪ ∆H ∪ ∆H γ+  γ− 0 1 2 3 4 s / q / q0 / q00 0 0  δ G = (S ∪ ∆S, Γ,E , q0) G 0 0 0  G = (S ,H ) ×Ö ÕÔ ×Ö ÕÔ ÐÑ ÒÓ ÐÑ ÒÓ  ∆G0 = (∆S0 − S0, ∆E0 − E0). γ which means if adding push-edge s  + q: 0 0 γ 00 Figure 3. The fixed point of the function F (M) contains the if pop-edge q  − q is in δ, then Dyck state graph of the rooted pushdown system M. 0 γ− 00 add edge q  q , and 00 add -edge s  q . the new algorithm is: Formally: γ+ addPush (G, G )(s q) = (∆E, ∆H), where IDSG = DSG × ECG × ∆DSG × ∆ECG. (Q,Γ,δ)    γ γ  0 − 00 0 −→ 0 − 00 (The I in IDSG stands for intermediate.) ∆E = q  q : q ∈ G [q] and q  q ∈ δ

 γ  8.4 The -closure graph work-list algorithm 00 0 −→ 0 − 00 ∆H = s  q : q ∈ G [q] and q  q ∈ δ . 0 The function F : RPDS → (IDSG → IDSG) generates the required iteration function (Figure3). Please note that we implicitly Considering the consequences of a new pop edge Once the al- distribute union across tuples: gorithm adds a new pop edge to a Dyck state graph, it will create at  0 0 0 least one new -closure graph edge and possibly more by matching (X,Y ) ∪ (X ,Y ) = (X ∪ X,Y ∪ Y ). up with upstream pushes. The addPop function:

The functions sprout, addPush, addPop, addEmpty calculate addPop(Q,Γ,δ) : DSG × ECG → δ → (P (δ) × P (Q × Q)), the additional the Dyck state graph edges and -closure graph edges checks for -reachable push-edges that could match this pop-edge. (potentially) introduced by a new state or edge. We can represent this action diagrammatically:

Sprouting Whenever a new state gets added to the Dyck state γ+  γ− s / s0 / s00 / q graph, the algorithm must check whether that state has any new  δ I edges to contribute. Both push edges and -edges do not depend on the current stack, so any such edges for a state in the pushdown  ×Ö ÕÔ ×Ö ÕÔ system’s transition function belong in the Dyck state graph. The  ÐÑ ÒÓ ÐÑ ÒÓ 00 γ sprout function: which means if adding pop-edge s  − q: γ+ 0 sprout (Q,Γ,δ) : Q → (P (δ) × P (Q × Q)), if push-edge s  s is already in the Dyck state graph, then 0 add -edge s  q. Theorem 8.1. lfp(F (M)) = (DSG(M),G, (∅, ∅), ∅). Formally: Proof. The proof is similar in structure to the previous one. γ 00 − addPop(Q,Γ,δ)(G, G)(s  q) = (∆E, ∆H), where 8.6 Complexity: Still exponential, but more efficient  γ  0 ←− 00 + 0 As with the previous algorithm, to determine the complexity of ∆E = ∅ and ∆H = s  q : s ∈ G [s ] and s  s ∈ G . this algorithm, we ask two questions: how many times would the Considering the consequences of a new -edge Once the algo- algorithm invoke the iteration function in the worst case, and how rithm adds a new -closure graph edge, it may transitively have to much does each invocation cost in the worst case? The run-time of add more -closure graph edges, and it may connect an old push to the algorithm is bounded by the size of the final Dyck state graph (perhaps newly enabled) pop edges. The addEmpty function: plus the size of the -closure graph. Suppose the final Dyck state graph has m states. In the worst case, the iteration function adds 2 2 addEmpty (Q,Γ,δ) : only a single edge each time. There are at most 2|Γ|m +m edges DSG × ECG → (Q × Q) → (P (δ) × P (Q × Q)), in the Dyck state graph and at most m2 edges in the -closure graph, checks for newly enabled pops and -closure graph edges: Once which bounds the number of iterations. again, we can represent this action diagrammatically: Next, we must reason about the worst-case cost of adding an edge: how many edges might an individual iteration consider? In   the worst case, the algorithm will consider every edge in every   iteration, leading to an asymptotic time-complexity of: 2 2 2 2 4 γ+      γ− O((2|Γ|m + 2m ) ) = O(|Γ| m ). s / s0 / s00 / s000 / s0000 / q   G δ O While still high, this is a an improvement upon the previous algo- rithm. For sparse Dyck state graphs, this is a reasonable algorithm. ×Ö ÕÔ  ×Ö ÕÔ ÐÑ ÒÓ  ÐÑ ÒÓ 9. Polynomial-time complexity from widening

 In the previous section, we developed a more efficient fixed-point  algorithm for computing a Dyck state graph. Even with the core 00 000 improvements we made, the algorithm remained exponential in the which@A means if adding -edge s  s : BC worst case, owing to the fact that there could be an exponential 0000 γ if pop-edge s  − q is in δ, then number of reachable control states. When an abstract interpreta- tion is intolerably complex, the standard approach for reducing add -edge s  q; and complexity and accelerating convergence is widening [Cousot and 0000 γ add edge s  − q; Cousot 1977]. (Of course, widening techniques trade away some add -edges s0 s000, s00 s0000, and s0 s0000. precision to gain this speed.) It turns out that the small-step variants    of finite-state CFAs are exponential without some sort of widening Formally: as well. addEmpty (G, G )(s00 s000) = (∆E, ∆H), where To achieve polynomial time complexity for pushdown control- (Q,Γ,δ)   flow analysis requires the same two steps as the classical case: γ  0000 − 0 ←− 00 0000 −→ 000 (1) widening the abstract interpretation to use a global, “single- ∆E = s  q : s ∈ G [s ] and s ∈ G [s ] and γ threaded” store and (2) selecting a monovariant allocation function + 0 s  s ∈ G to collapse the abstract configuration-space. Widening eliminates a  0 ←− 00 0000 −→ 000 source of exponentiality in the size of the store; monovariance elim- ∆H = s  q : s ∈ G [s ] and s ∈ G [s ] and inates a source of exponentiality from environments. In this section, γ + 0 we redevelop the pushdown control-flow analysis framework with s s ∈ G  a single-threaded store and calculate its complexity. n 0 000 0 ←− 00 o ∪ s s : s ∈ G [s ]  9.1 Step 1: Refactor the concrete semantics n 00 0000 0000 −→ 000 o ∪ s  s : s ∈ G [s ] First, consider defining the reachable states of the concrete seman- tics using fixed points. That is, let the system-space of the evalua- n 0 0000 0 ←− 00 0000 −→ 000 o ∪ s  s : s ∈ G [s ] and s ∈ G [s ] . tion function be sets of configurations: C ∈ System = P (Conf ) = P (Exp × Env × Store × Kont). 8.5 Termination and correctness Because the iteration function is no longer monotonic, we have to We can redefine the concrete evaluation function: prove that a fixed point exists. It is trivial to show that the Dyck state E(e) = lfp(fe), where fe : System → System and graph component of the iteration-space ascends monotonically with f (C) = {I(e)} ∪ c0 : c ∈ C and c ⇒ c0 . each application; that is: e Lemma 8.1. Given M ∈ RPDS,G ∈ DSG such that G ⊆ M, if 9.2 Step 2: Refactor the abstract semantics 0 0 0 0 0 F (M)(G, G, ∆G) = (G ,G, ∆G ), then G ⊆ G . We can take the same approach with the abstract evaluation func- Since the size of the Dyck state graph is bounded by the original tion, first redefining the abstract system-space: M   pushdown system , the Dyck state graph will eventually reach a Cˆ ∈ System\ = P Conf\ fixed point. Once the Dyck state graph reaches a fixed point, both work-graphs/sets will be empty, and the -closure graph will also   = P Exp × Envd × Store\ × Kont[ , stabilize. We can also show that this algorithm is correct: and then the abstract evaluation function: With a monovariant allocation scheme that eliminates abstract en- vironments, the number of iterations ultimately reduces to: Eˆ(e) = lfp(fˆe), where fˆe : System\ → System\ and n o n 0 0o |Exp| × (2|Vard| + 1) × |Exp| + |Var| × |Lam|, fˆe(Cˆ) = Iˆ(e) ∪ cˆ :c ˆ ∈ Cˆ and cˆ ; cˆ . which means that, in the worst case, the algorithm makes a cubic What we’d like to do is shrink the abstract system-space with a number of iterations with respect to the size of the input program.4 refactoring that corresponds to a widening. The worst-case cost of the each iteration would be dominated by a CFL-reachability calculation, which, in the worst case, must 9.3 Step 3: Single- the abstract store consider every state and every edge: 3 3 We can approximate a set of abstract stores {σˆ1,..., σˆn} with O(|Var| × |Exp| ). the least-upper-bound of those stores: σˆ t · · · t σˆ . We can 1 n Thus, each iteration takes O(n6) and there are a maximum of exploit this by creating a new abstract system space in which the O(n3) iterations, where n is the size of the program. So,total store is factored out of every configuration. Thus, the system-space 9 contains a set of partial configurations and a single global store: complexity would be O(n ) for a monovariant pushdown control- flow analysis with this scheme, where n is again the size of the 0   System\ = P PConf\ × Store\ program. Although this algorithm is polynomial-time, we can do better. πˆ ∈ PConf\ = Exp × Env × Kont[ . d 9.5 Step 5: Reintroduce -closure graphs We can factor the store out of the abstract transition relation as well, Replicating the evolution from Section8 for this store-widened σˆ so that ( ) ⊆ PConf\ × (PConf\ × Store\): analysis, we arrive at a more efficient polynomial-time analysis. _ An -closure graph in this setting is a set of pairs of store-less, σˆ 0 0 0 0 0 0 0 0 (e, ρ,ˆ κˆ) ((e , ρˆ , κˆ ), σˆ ) iff (e, ρ,ˆ σ,ˆ κˆ) ; (e , ρˆ , σˆ , κˆ ), continuation-less partial states: _ 0 0   ˆ0 which gives us a new iteration function, fe : System\ → System\ , ECG[ = P PState\ × PState\ . ˆ0 ˆ ˆ0 0 fe(P, σˆ) = (P , σˆ ), where Then, we can set the system space to include -closure graphs:   000 0 0 σˆ 0 00 System\ = DSG[ × ECG[ × Store\. Pˆ = πˆ :π ˆ (ˆπ , σˆ ) ∪ {πˆ0} _ Before we redefine the iteration function, we need another fac-   0 G 00 σˆ 0 00 tored transition relation. The stack- and action-factored transition σˆ = σˆ :π ˆ (ˆπ , σˆ ) σˆ _ relation (+g ) ⊆ PState\ × PState\ × Store determines if a tran- sition is possible under the specified store and stack-action: (ˆπ0, hi) = Iˆ(e). (e, ρˆ) +σˆ ((e0, ρˆ0), σˆ0) iff (e, ρ,ˆ σ,ˆ κˆ) ; (e0, ρˆ0, σˆ0, φˆ :κ ˆ0) ˆ 9.4 Step 4: Dyck state control-flow graphs φ+ Following the earlier Dyck state graph reformulation of the push- (e, ρˆ) +σˆ ((e0, ρˆ0), σˆ0) iff (e, ρ,ˆ σ,ˆ φˆ :κ ˆ) ; (e0, ρˆ0, σˆ0, κˆ0) ˆ down system, we can reformulate the set of partial configurations as φ− a Dyck state control-flow graph.A Dyck state control-flow graph σˆ (e, ρˆ) + ((e0, ρˆ0), σˆ0) iff (e, ρ,ˆ σ,ˆ κˆ) ; (e0, ρˆ0, σˆ0, κˆ0). is a frame-action-labeled graph over partial control states, and a  partial control state is an expression paired with an environment: Now, we can redefine the iteration function (Figure4). 00 System\ = DSCFG\ × Store\ Theorem 9.1. Pushdown 0CFA can be computed in O(n6)-time, where n is the size of the program. DSCFG\ = P(PState\ ) × P(PState\ × Frame\ ± × PState\ ) Proof. As before, the maximum number of iterations is cubic in ψˆ ∈ PState\ = Exp × Env. d the size of the program for a monovariant analysis. Fortunately, the In a Dyck state control-flow graph, the partial control states are cost of each iteration is also now bounded by the number of edges partial configurations which have dropped the continuation com- in the graph, which is also cubic. ponent; the continuations are encoded as paths through the graph. If we wanted to do so, we could define a new monotonic iter- 10. Applications ation function analogous to the simple fixed-point formulation of Pushdown control-flow analysis offers more precise control-flow Section7: 00 00 analysis results than the classical finite-state CFAs. Consequently, fˆe : System\ → System\ , pushdown control-flow analysis improves flow-driven optimiza- tions (e.g., constant propagation, global , inlin- again using CFL-reachability to add pop edges at each step. ing [Shivers 1991]) by eliminating more of the false positives that A preliminary analysis of complexity Even without defining the block their application. system-space iteration function, we can ask, How many iterations The more compelling applications of pushdown control-flow will it take to reach a fixed point in the worst case? This question analysis are those which are difficult to drive with classical control- is really asking, How many edges can we add? And, How many flow analysis. Perhaps not surprisingly, the best examples of such entries are there in the store? Summing these together, we arrive at 4 the worst-case number of iterations: In computing the number of frames, we note that in every continuation, the variable and the expression uniquely determine each other based on DSCFG edges store entries the let-expression from which they both came. As a result, the number of z }| { z }| { abstract frames available in a monovariant analysis is bounded by both the |PState\ | × |Frame\ ±| × |PState\ | + |Addr[ | × |Clod| . number of variables and the number of expressions, i.e., |Frame\ | = |Var|. ˆ ˆ ˆ ˆ ˆ0 ˆ0 ˆ 0 00 frames on the pop-edges determine the frames which could have f((P, E), H, σˆ) = ((P , E ), H , σˆ ), where been on the stack when in the control state. The frames that are live ( ) on the stack determine the procedures that are live on the stack. Ev- φˆ + 0 0 σˆ 0 0 ery procedure that is live on the stack has a read-dependence on any Tˆ+ = (ψˆ ψˆ , σˆ ): ψˆ + (ψˆ , σˆ )  ˆ resource being read in the control state, while every procedure that φ+ is live on the stack also has a write-dependence on any resource be- ˆ n ˆ  ˆ0 0 ˆ σˆ ˆ0 0 o T = (ψ  ψ , σˆ ): ψ + (ψ , σˆ ) ing written in the control state. This logic is the direct complement  of “if f calls g and g accesses a, then f also accesses a.” φˆ  00 − 000 0 00 σˆ 000 0 Tˆ− = (ψˆ ψˆ , σˆ ): ψˆ + (ψˆ , σˆ ) and  ˆ φ− 11. Related work φˆ + 0 Pushdown control-flow analysis draws on work in higher-order ψˆ ψˆ ∈ Eˆ and  control-flow analysis [Shivers 1991], abstract machines [Felleisen ˆ0 ˆ00 ˆ ψ  ψ ∈ H and Friedman 1987] and abstract interpretation [Cousot and Cousot 0 1977]. Tˆ = Tˆ+ ∪ Tˆ ∪ Tˆ− 0 n 0o Context-free analysis of higher-order programs The closest re- Eˆ = eˆ : (ˆe, ) ∈ Tˆ lated work for this is Vardoulakis and Shivers very recent work 00 G n 0 0 0o on CFA2 [Vardoulakis and Shivers 2010]. CFA2 is a table-driven σˆ = σˆ :( , σˆ ) ∈ Tˆ summarization algorithm that exploits the balanced nature of calls n o and returns to improve return-flow precision in a control-flow anal- Hˆ = ψˆ ψˆ00 : ψˆ ψˆ0 ∈ Hˆ and ψˆ0 ψˆ00 ∈ Hˆ     ysis. Though CFA2 alludes to exploiting context-free languages, φˆ context-free languages are not explicit in its formulation in the ˆ  ˆ ˆ000 ˆ + ˆ0 ˆ ˆ0 ˆ00 ˆ H+− = ψ  ψ : ψ  ψ ∈ E and ψ  ψ ∈ H same way that pushdown systems are in pushdown control-flow φˆ analysis. With respect to CFA2, pushdown control-flow analysis is ˆ00 − ˆ000 ˆ and ψ  ψ ∈ E polyvariant, covers direct-style, and the monovariant instatiation is lower in complexity (CFA2 is exponential-time). ˆ 0 ˆ ˆ H = H ∪ H+− On the other hand, CFA2 distinguishes stack-allocated and n g o Pˆ0 = Pˆ ∪ ψˆ0 : ψˆ ψˆ0 . store-allocated variable bindings, whereas our formulation of push-  down control-flow analysis does not and allocates all bindings in the store. If CFA2 determines a binding can be allocated on the Figure 4. An -closure graph-powered iteration function for push- stack, that binding will enjoy added precision during the analysis down control-flow analysis with a single-threaded store. and is not subject to merging like store-allocated bindings.

Calculation approach to abstract interpretation Midtgaard and analyses are escape analysis and interprocedural dependence anal- Jensen[2009] systematically calculate 0CFA using the Cousot- ysis. Both of these analyses are limited by a static analyzer’s abil- Cousot-style calculational approach to abstract interpretation [Cousot ity to reason about the stack, the core competency of pushdown 1999] applied to an ANF λ-calculus. Like the present work, Midt- control-flow analysis. (We leave an in-depth formulation and study gaard and Jensen start with the CESK machine of Flanagan et al. of these analyses to future work.) [1993] and employ a reachable-states model. The analysis is then constructed by composing well-known Galois connections to re- 10.1 Escape analysis veal a 0CFA incorporating reachability. The abstract semantics approximate the control stack component of the machine by its top In escape analysis, the objective is to determine whether a heap- element. The authors remark monomorphism materializes in two allocated object is safely convertible into a stack-allocated object. mappings: “one mapping all bindings to the same variable,” the In other words, the is trying to figure out whether the other “merging all calling contexts of the same function.” Essen- frame in which an object is allocated outlasts the object itself. In tially, the pushdown 0CFA of Section4 corresponds to Midtgaard higher-order languages, closures are candidates for escape analysis. and Jensen’s analsysis when the latter mapping is omitted and the Determining whether all closures over a particular λ-term lam stack component of the machine is not abstracted. may be heap-allocated is straightforward: find the control states in the Dyck state graph in which closures over lam are being created, CFL- and pushdown-reachability techniques This work also then find all control states reachable from these states over only - draws on CFL- and pushdown-reachability analysis [Bouajjani edge and push-edge transitions. Call this set of control states the et al. 1997, Kodumal and Aiken 2004, Reps 1998, Reps et al. 2005]. “safe” set. Now find all control states which are invoking a closure For instance, -closure graphs, or equivalent variants thereof, ap- over lam. If any of these control states lies outside of the safe set, pear in many context-free-language and pushdown reachability al- then stack-allocation may not be safe; if, however, all invocations gorithms. For the less efficient versions of our analyses, we implic- lie within the safe set, then stack-allocation of the closure is safe. itly invoked these methods as subroutines. When we found these algorithms lacking (as with their enumeration of control states), we 10.2 Interprocedural dependence analysis developed Dyck state graph construction. In interprocedural dependence analysis, the goal is to determine, for CFL-reachability techniques have also been used to compute each λ-term, the set of resources which it may read or write when classical finite-state abstraction CFAs [Melski and Reps 2000] it is called. Might and Prabhu showed that if one has knowledge and type-based polymorphic control-flow analysis [Rehof and of the program stack, then one can uncover interprocedural depen- Fahndrich¨ 2001]. These analyses should not be confused with dencies [Might and Prabhu 2009]. We can adapt that technique to pushdown control-flow analysis, which is computing a fundamen- work with Dyck state graphs. For each control state, find the set tally more precise kind of CFA. Moreover, Rehof and Fahndrich’s of reachable control states along only -edges and pop-edges. The method is cubic in the size of the typed program, but the types may be exponential in the size of the program. In addition, our technique SIGPLAN-SIGACT Symposium on Principles of Programming Lan- is not restricted to typed programs. guages, pages 25–37, New York, NY, USA, 1998. ACM. Ahmed Bouajjani, Javier Esparza, and Oded Maler. Reachability analysis Model-checking higher-order recursion schemes There is ter- of pushdown automata: Application to model-checking. In CONCUR minology overlap with work by Kobayashi[2009] on model- ’97: Proceedings of the 8th International Conference on Concurrency checking higher-order programs with higher-order recursion schemes, Theory, pages 135–150, London, UK, 1997. Springer-Verlag. which are a generalization of context-free grammars in which Patrick Cousot. The calculational design of a generic abstract interpreter. productions can take higher-order arguments, so that an order- In M. Broy and R. Steinbruggen,¨ editors, Calculational System Design. 0 scheme is a context-free grammar. Kobyashi exploits a re- NATO ASI Series F. IOS Press, Amsterdam, 1999. sult by Ong[2006] which shows that model-checking these re- Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice cursion schemes is decidable (but ELEMENTARY-complete) by model for static analysis of programs by construction or approximation transforming higher-order programs into higher-order recursion of fixpoints. In Conference Record of the Fourth ACM Symposium on schemes. Given the generality of model-checking, Kobayashi’s Principles of Programming Languages, pages 238–252, New York, NY, technique may be considered an alternate paradigm for the analysis USA, 1977. ACM Press. of higher-order programs. For the case of order-0, both Kobayashi’s Matthias Felleisen and Daniel P. Friedman. A calculus for assignments technique and our own involve context-free languages, though ours in higher-order languages. In POPL ’87: Proceedings of the 14th is for control-flow analysis and his is for model-checking with re- ACM SIGACT-SIGPLAN Symposium on Principles of Programming spect to a temporal logic. After these surface similarities, the tech- Languages, pages 314+, New York, NY, USA, 1987. ACM. niques diverge. Moreover, there does not seem to be a polynomial- Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen. The time variant of Kobyashi’s method. essence of compiling with continuations. In PLDI ’93: Proceedings of the ACM SIGPLAN 1993 conference on Programming Language Design Other escape and dependence analyses We presented escape and and Implementation, volume 28, pages 237–247, New York, NY, USA, dependence analyses to prove a point: that pushdown control-flow June 1993. ACM. analysis is more powerful than classical control-flow analysis, in Naoki Kobayashi. Types and higher-order recursion schemes for verifica- the sense that it can answer different kinds of questions. We have tion of higher-order programs. In POPL ’09: Proceedings of the 36th not yet compared our analyses with the myriad escape and depen- Annual ACM SIGPLAN-SIGACT Symposium on Principles of Program- dence analyses (e.g.,[Blanchet 1998]) that exist in the literature, ming Languages, pages 416–428, New York, NY, USA, 2009. ACM. though we do expect that, with their increased precision, our anal- John Kodumal and Alex Aiken. The set constraint/CFL reachability con- yses will be strongly competitive. nection in practice. In PLDI ’04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, 12. Conclusion pages 207–218, New York, NY, USA, 2004. ACM. David Melski and Thomas W. Reps. Interconvertibility of a class of set con- Pushdown control-flow analysis is an alternative paradigm for straints and context-free-language reachability. Theoretical Computer the analysis of higher-order programs. By modeling the run-time Science, 248(1-2):29–98, October 2000. program stack with the stack of a pushdown system, pushdown Jan Midtgaard and Thomas P. Jensen. Control-flow analysis of function control-flow analysis precisely matches returns to their calls. We calls and returns by abstract interpretation. In ICFP ’09: Proceedings derived pushdown control-flow analysis as an abstract interpre- of the 14th ACM SIGPLAN International Conference on Functional tation of a CESK machine in which its stack component is left Programming, pages 287–298. ACM, 2009. unbounded. As this abstract interpretation ranged over an infi- Matthew Might. Environment Analysis of Higher-Order Languages. PhD nite state-space, we sought a decidable method for determining thesis, Georgia Institute of Technology, June 2007. th reachable states. We found one by converting the abstracted Matthew Might and Tarun Prabhu. Interprocedural dependence analysis CESK into a PDA that recognized the language of legal control- of higher-order programs via stack reachability. In Proceedings of state sequences. By intersecting this language with a specific regu- the 2009 Workshop on Scheme and , Boston, lar language and checking non-emptiness, we were able to answer Massachussetts, USA, 2009. control-flow questions. From the PDA formulation, we refined the C. H. Luke Ong. On model-checking trees generated by higher-order recur- technique to reduce complexity from doubly exponential, to best- sion schemes. In 21st Annual IEEE Symposium on Logic in Computer case exponential, to worst-case exponential, to polynomial. We Science (LICS’06), pages 81–90. IEEE, 2006. ended with an efficient, polyvariant and precise framework. Jakob Rehof and Manuel Fahndrich.¨ Type-based flow analysis: From poly- Future work Pushdown control-flow analysis exploits the fact morphic subtyping to CFL-reachability. In POPL ’01: Proceedings of the 28th ACM SIGPLAN-SIGACT Symposium on Principles of Program- that clients of static analyzers often need information about control ming Languages, pages 54–66, New York, NY, USA, 2001. ACM. states rather than stacks. Should clients require information about complete configurations—control states plus stacks—our analy- Thomas Reps. Program analysis via graph reachability. Information and Software Technology, 40(11-12):701–726, December 1998. sis is lacking. Our framework represents configurations as paths through Dyck state graphs. Its results can provide a regular de- Thomas Reps, Stefan Schwoon, Somesh Jha, and David Melski. Weighted scription of the stack, but at a cost proportional to the size of the pushdown systems and their application to interprocedural dataflow analysis. Science of Computer Programming, 58(1-2):206–263, 2005. graph. For a client like abstract garbage collection, which would pay this cost for every edge added to the graph, this cost is unac- Olin G. Shivers. Control-Flow Analysis of Higher-Order Languages. PhD ceptable. Our future work will examine how to incrementally sum- thesis, Carnegie Mellon University, 1991. marize stacks paired with each control state during the analysis. Michael Sipser. Introduction to the Theory of Computation. Course Technology, 2 edition, February 2005. Acknowledgements We thank Dimitrios Vardoulakis for com- David Van Horn and Matthew Might. Abstracting abstract machines. ments and discussions, and reviewers of ICFP and this workshop. In ICFP ’10: Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming. ACM, September 2010. References Dimitrios Vardoulakis and Olin Shivers. CFA2: a Context-Free Approach European Symposium on Programming Bruno Blanchet. Escape analysis: Correctness proof, implementation and to Control-Flow Analysis. In (ESOP) LNCS experimental results. In POPL ’98: Proceedings of the 25th ACM , volume 6012 of , pages 570–589. Springer, 2010.