Shape analysis of higher-order programs: A colorless green idea?

Matthew Might University of Utah matt.might.net www.ucombinator.org Is shape analysis of higher-order programs meaningful? What is shape analysis of higher-order programs? It’s still shape analysis, but with different words. address :: binding structure :: binding environment heap :: value environment shape analysis :: environment analysis Why bother? Top-down reason: Need to move beyond CFAs. Bottom-up reason Bottom-up reason

CFA Pointer analysis Bottom-up reason

CFA Pointer analysis Bottom-up reason

CFA Pointer analysis

Shape analysis Bottom-up reason

CFA Pointer analysis

? Shape analysis What is “higher order?” The essence of higher-order: Lambda calculus. Syntax

Variables; function abstractions; applications. Syntax

Variables; function abstractions; applications.

v (λ (v) e) (e1 e2) Semantics

Value = Value → Value No integers. No floats. No arrays. No structs. No pointers. No mutation. Lambda-calculus lacks linked, mutable, dynamic structures. Shape analysis studies linked, mutable, dynamic structures. So, does shape analysis of the λ-calculus mean anything? Do functions have shape?

What determines the shape of these functions? Parameters. f(x) = ax2 + bx + c f(x) = A sin(ωx + φ) f(x) = A sin(ωx + φ) f = λx.A sin(ωx + φ) Free variables determine function shape. What determines the value of free variables? Environments. Function = Closure = Lambda-term + Environment λx.A sin(ωx + φ) π (λx.A sin(ωx + φ),[A=1,ω=1,φ= /2]) cos Environments are linked, mutable, dynamic data structures. Shape analysis studies linked, mutable, dynamic structures. Shape analysis of the λ-calculus is environment analysis. Shape analysis determines the meaning of functions. Same tools apply

• Singleton abstraction • Relational abstraction • Heap/shape predicates But first, do environments matter? CFA Limitation: Super-β inlining Inlining a function based on flow infor- mation is blockedApplication: by the lack Inlining of environmental precision in control-flow analysis. Shivers termed the inlining of a function based on flow information super-β in- lining [27], because it is beyond the reach of ordinary β-reduction. Consider: (let ((f (lambda (x h) (if x (h) (lambda () x))))) (f #t (f #f nil))) Nearly any CFA will find that at the call site (h), the only procedure ever invoked is a closure over the lambda term (lambda () x). The lambda term’s only free variable, x, is in scope at the invocation site. It feels safe to inline. Yet, if the replaces the reference to h with the lambda term (lambda () x), the meaning of the program will change from #f to #t. This happens because the closure that gets invoked was closed over an earlier binding of x (to #f), whereas the inlined lambda term closes over the binding of x currently in scope (which is to #t). Programs like this mean that functional must be conservative when they inline based on information obtained from a CFA. If the inlined lambda term has a free variable, the inlining could be unsafe.

Specific problem To determine the safety of inlining the lambda term lam at the call site [[(f ...)]], we need to know that for every environment ρ in which this call is evaluated, that ρ[[ f]] = ( lam, ρ￿) and ρ(v)=ρ￿(v) for each free variable v in the term lam.2

CFA Limitation: Globalization Sestoft identified globalization as a second blindspot of control-flow analysis [25]. Globalization is an optimization that con- verts a procedure parameter into a global variable when it is safe to do so. Though not obvious, globalization can also be cast as a problem of reasoning about en- vironments: if, for every state of execution, all reachable environments which contain a variable are equivalent for that variable, then it is safe to turn that variable into a global.

Specific problem To determine the safety of globalizing the variable v,weneedto know that for each reachable state, for any two environments ρ and ρ￿ reachable inside that state, it must be that ρ(v)=ρ￿(v)ifv dom(ρ) and v dom(ρ￿). ∈ ∈

CFA Limitation: Compilers for imperative languages have found that it can be beneficial to rematerialize (to recompute) a value at its point of use if the values on which it depends are still available. On mod- ern hardware, rematerialization can decrease register pressure and improve cache performance. Functional languages currently lack analyses to drive rematerial- ization. Consider a trivial example: 2 The symbol ρ denotes a conventional variable-to-value environment map. CFA Limitation: Super-β inlining Inlining a function based on flow infor- mation is blockedApplication: by the lack Inlining of environmental precision in control-flow analysis. Shivers termed the inlining of a function based on flow information super-β in- lining [27], because it is beyond the reach of ordinary β-reduction. Consider: (let ((f (lambda (x h) (if x (h) (lambda () x))))) (f #t (f #f nil))) Nearly any CFA will find that at the call site (h), the only procedure ever invoked is a closure over the lambda term (lambda () x). The lambda term’s only free variable, x, is in scope at the invocation site. It feels safe to inline. Yet, if the compiler replaces the reference to h with the lambda term (lambda () x), the meaning of the program will change from #f to #t. This happens because the closure that gets invoked was closed over an earlier binding of x (to #f), whereas the inlined lambda term closes over the binding of x currently in scope (which is to #t). Programs like this mean that functional compilers must be conservative when they inline based on information obtained from a CFA. If the inlined lambda term has a free variable, the inlining could be unsafe.

Specific problem To determine the safety of inlining the lambda term lam at the call site [[(f ...)]], we need to know that for every environment ρ in which this call is evaluated, that ρ[[ f]] = ( lam, ρ￿) and ρ(v)=ρ￿(v) for each free variable v in the term lam.2

CFA Limitation: Globalization Sestoft identified globalization as a second blindspot of control-flow analysis [25]. Globalization is an optimization that con- verts a procedure parameter into a global variable when it is safe to do so. Though not obvious, globalization can also be cast as a problem of reasoning about en- vironments: if, for every state of execution, all reachable environments which contain a variable are equivalent for that variable, then it is safe to turn that variable into a global.

Specific problem To determine the safety of globalizing the variable v,weneedto know that for each reachable state, for any two environments ρ and ρ￿ reachable inside that state, it must be that ρ(v)=ρ￿(v)ifv dom(ρ) and v dom(ρ￿). ∈ ∈

CFA Limitation: Rematerialization Compilers for imperative languages have found that it can be beneficial to rematerialize (to recompute) a value at its point of use if the values on which it depends are still available. On mod- ern hardware, rematerialization can decrease register pressure and improve cache performance. Functional languages currently lack analyses to drive rematerial- ization. Consider a trivial example: 2 The symbol ρ denotes a conventional variable-to-value environment map. Environment in closure must match environment at call. Special environment problem

“Does env1(x) = env2(x)?” Application: Rematerialization

(f)

(lambda () z)

Compiler wants to inline, but z is out of scope at the call! Application: Rematerialization

((lambda () y))

(lambda () z)

Compiler wants to inline, but z is out of scope at the call! General environment problem

“Does env1(z) = env2(y)?” Approach: Build general solution atop special solution. Starting point: k-CFA for CPS In CPS, all calls must be tail calls. Functions never return, so no stack required. Small-step state-space

ς State = Call Env ∈ × ρ Env = Var ￿ Clo ∈ clo Clo = Lam Env ∈ × Small-step state-space

ς State = Call Env ∈ × ρ Env = Var ￿ Clo ∈ clo Clo = Lam Env ∈ × Split environments (Shivers, 1991)

ρ Env = Var ￿ Clo ∈ Split environments (Shivers, 1991)

ρ Env = Var ￿ Clo ∈ ρ Env = Var ￿ Clo ∈ Split environments (Shivers, 1991)

β BEnv = Var ￿ Bind ∈ ve VEnv = Bind ￿ Clo ∈ ς State = Call BEnv VEnv ∈ × × β BEnv = Var ￿ Bind ∈ ve VEnv = Bind ￿ Clo ∈ clo Clo = Lam BEnv ∈ × b Bind is some infinite set ∈ ς State = PC Struct Heap ∈ × × s Struct = Var ￿ Addr ∈ h Heap = Addr ￿ Tagged ∈ t Tagged = Type Struct ∈ × a Addr is some infinite set ∈ Solving the special problem Special problem

βˆ(v) = βˆ￿(v)

β(v) = β￿(v) Special problem

βˆ(v) = βˆ￿(v)

α α

β(v) = β￿(v) Special problem

βˆ(v) = βˆ￿(v)

α α

β(v) = β￿(v) Special problem

βˆ(v) = βˆ￿(v)

α α

β(v) = β￿(v) ? When does α ( b )= α ( b ￿ ) imply b = b ￿ ? When the abstract bindings are singleton abstractions. A singleton abstraction has only one concrete constituent. Next step: Engineer a singleton abstraction into semantics. Anodized bindings

Bindings Anodized bindings

Original Anodized Anodized bindings

g

Original Anodized

1 g− Anodization constraint

If g(b) and g(b￿) are reachable and α(b)=α(b￿),thenb = b￿. Policy example: Recency (Balakrishnan & Reps, 2006)

Anodize most-recently allocated binding. Solving the general problem What implies ve(b)=ve(b￿)? Fact 1: ve(b)=ve(b) Fact 2: ve(b)=ve(b￿) and ve(b￿)=ve(b￿￿) implies ve(b)=ve(b￿￿). When will we know that ve(b￿)=ve(b￿￿)? When b is bound to b￿ during function call. When (f x) calls (λ (v) call),weknowve(β(x)) = ve(β￿(v)). Solution: Track binding invariants as separate abstraction. Binding invariants

Π State Bind￿ Bind￿ ∈ ≡ ⊆ × 74 Relational Π s ŝ Mechanical

74 Relational Π Πʹ s ŝ ŝʹ Mechanical

74 Relational Π Πʹ s ŝ ŝʹ Mechanical

74 Relational Π Πʹ Πʹʹ Πʹʹʹ ... s ŝ ŝʹ ŝʹʹ ŝʹʹʹ ... Mechanical

74 βˆ(x) βˆ￿(y)? Relational ≡ Π Πʹ Πʹʹ Πʹʹʹ ... s ŝ ŝʹ ŝʹʹ ŝʹʹʹ ... Mechanical

74 Related work • Cousot & Cousot,1977, 1979, 1991, 1994. • Sagiv, Reps, & Wilhelm, 2002. • Ball et al., 2001. • Hudak et al., 1985. • Chase et al., 1990. • Shivers, 1988, 1991. • Jagannathan et al., 1998. CFA Limitation: Super-β inlining Inlining a function based on flow infor- mation is blocked by the lack of environmental precision in control-flow analysis. Shivers termed the inlining of a function based on flow information super-β in- lining [27], because it is beyond the reach of ordinary β-reduction. Consider: (let ((f (lambda (x h) (if x (h) (lambda () x))))) (f #t (f #f nil))) Nearly any CFA will find that at the call site (h), the only procedure ever invoked is a closure over the lambda term (lambda () x). The lambda term’s only free variable, x, is in scope at the invocation site. It feels safe to inline. Yet, if the compilerMore replaces the reference in topaperh with the lambda term (lambda () x), the meaning of the program will change from #f to #t. This happens because the closure that gets invoked was closed over an earlier binding of x (to #f), whereas the inlined lambda term closes over the binding of x currently in scope (which is to #t). Programs like this mean that functional compilers must be conservative when they inline based on information obtained from a CFA. If the inlined lambda term has a free variable, the inlining could be unsafe.

Specific problem To determine the safety of inlining the lambda term lam at the call site [[(f ...)]], we need to know that for every environment ρ in which this call is evaluated, that ρ[[ f]] = ( lam, ρ￿) and ρ(v)=ρ￿(v) for each free variable v in the term lam.2

CFA Limitation: Globalization Sestoft identified globalization as a second blindspot of control-flow analysis [25]. Globalization is an optimization that con- verts a procedure parameter into a global variable when it is safe to do so. Though not obvious, globalization can also be cast as a problem of reasoning about en- vironments: if, for every state of execution, all reachable environments which contain a variable are equivalent for that variable, then it is safe to turn that variable into a global.

Specific problem To determine the safety of globalizing the variable v,weneedto know that for each reachable state, for any two environments ρ and ρ￿ reachable inside that state, it must be that ρ(v)=ρ￿(v)ifv dom(ρ) and v dom(ρ￿). ∈ ∈

CFA Limitation: Rematerialization Compilers for imperative languages have found that it can be beneficial to rematerialize (to recompute) a value at its point of use if the values on which it depends are still available. On mod- ern hardware, rematerialization can decrease register pressure and improve cache performance. Functional languages currently lack analyses to drive rematerial- ization. Consider a trivial example: 2 The symbol ρ denotes a conventional variable-to-value environment map. Example 1. Suppose we have a concrete machine with three memory addresses: 0x01, 0x02 and 0x03. Suppose the addresses abstract so that α(0x01)=ˆa1 and α(0x02)=α(0x03)=ˆa . The addressa ˆ1 is a singleton abstraction, because it has only one concrete∗ constituent—0x01. After a pointer analysis, if some The constraints of a straightforward soundness theorem (Theorem 1) lead to Thepointer constraints variable of ap1 straightforwardpoints only to soundness addressa ˆ theorem￿ andan abstract another (Theorem transition pointer relation 1) lead variable over to this space:p2 an abstractpoints transition only to address relationa ˆ￿￿ overand thisa ˆ￿ =ˆ space:a1 =ˆa￿￿ then p1 must alias p2￿ . (([[(fe ...e ) ]] , βˆ, ve, tˆ), ) ❀ ((call, βˆ￿￿, ve￿, tˆ￿), ￿), where: 1 n ≡ ≡ ˆ ˆ ˆ ￿ ˆ ˆ di = (ei, β, ve) (([[In(fe order1 ...e ton solve) ]] , β the, ve super-, tˆ), )β❀inlining((call, problem,β￿￿, ve￿, tˆ￿) Shivers, ￿), where: informally￿ proposedE ￿ ˆ ￿￿ ˆ ≡ ≡ d0 ([[(λ (v1 ...vn) call)]] , β￿) a singleton abstraction for k-CFA which he termed “re-flow analysis” [27].￿ In ￿ ˆ ˆ ˆ semantics, binding environments map variables to bindings. A binding b is a di = (ei, β, ve) tˆ￿ = tick(call, tˆ) re-flow analysis, the CFA is re-run,E but with a “golden”commemorative contour token inserted minted for each at instance a of a variable receiving a value; ￿ ￿ for example, in k-CFA, a binding isˆ a variable name paired with the time-stamp ˆ ￿￿ ˆ bi = alloc￿(vi, tˆ￿) point of interest. The goldend contour—allocated0 ([[(λ (v1 ...vn only) callat once—is which)]] , itβ was￿) bound. a singleton The value environment ab-￿ ve tracks the denotable values ￿ (D) associated with every binding.ˆ A denotableˆ ˆ value d is a closure. straction by definition. While sound in theory,￿ re-flow analysis does not workB = inbi : bi Bind￿ 1 tˆ￿ = tick(call, tˆ) In CFAs, bindings—the atomic components of∈ environments—play the role ￿ 1 ￿ CFA Limitation: Super-β inlining Inlining a function based on flowthat infor- addresses do in pointer analysis.βˆ￿￿ =(ˆ Ourg ultimate− βˆ￿)[v goal isˆb to] infer relationships practice: the golden contour flows everywhere the non-golden contours flow, andBˆ i i mation is blocked by the lack of environmental precision in control-flow analysis.between the concrete values behind abstract bindings.￿→ For example, we want to Shivers termed the inliningˆ of a function based on flowˆ information super-β in- 1 1 inevitably, golden and non-goldenbi = contoursalloc￿(vi, aret￿) comparedbe able for to show equality. that bindings Neverthe- tove the￿ =(ˆ variableg− vev at) some[ˆb set(ˆ ofg− timesdˆ )], are equal, lining [27], because it is beyond the￿ reach of ordinary β-reduction. Consider: Bˆ ￿ i ￿→ Bˆ i less, we can salvage the spirit of Shivers’s golden contoursunder the through value environment,anodization to the bindings. to the variable x at some other set (let ((f (lambda (xˆ h) ˆ ˆ and singletonof times. bindings (In the pure areλ-calculus, reflexively the only equivalent: obvious relationships between bindings (if x B = bi : bi Bind￿ 1 are equality and inequality.) ￿ ￿ Under anodization, bindings are(h) not golden,∈ but may be temporarily gold-plated. In CFA theory, time-stamps also go by the less-intuitive name of contours. (lambda () x))))) ˆb Bind￿ 1 In anodization, the(f #t concrete (f #f nil)))ˆ and￿ abstract1 ˆ bindingsˆ ￿ Both are the split concrete into and the two abstract halves:∈ state-spaces leave the exact structure of β￿￿ =(ˆg−ˆ β￿)[vi bi] B time-stamps and bindings undefined.ˆb The￿ ˆb, choices for bindings determine the Nearly any CFA will find that at the call site (h), the￿→ only procedure ever ≡ invoked is a closure over the lambda term1(lambda () x). The lambdapolyvariance term’s1 of the analysis. Time-stamps encode the history of execution in only free variable, x, isve in scope￿ =(ˆ at theg− invocationve) site.[ˆb It feels safe(ˆg to−some inline.dˆ fashion,)], so that under abstraction, their structure determines the context Bind = Bind + Bind 1 Bˆ Bind￿iand= bindingsBind￿Bˆ i between+ Bind￿ singletons1, are trivially equivalent: Yet, if the compiler∞ replaces the reference to h with￿ the lambda￿→ term (lambdain context-sensitivity.∞ Formally,() x) the concrete transition rule must rebuild#f #t the value environment , the meaning of the program will change from to . This happensThe concrete and abstract state-spaces are linked by a parameterized second- withbecause every transition:the closure that gets invoked was closed over an earlier binding of x (to βˆ(e ) Bind￿ ˆb Bind￿ and singleton bindings are reflexively equivalent: Moreorder abstraction map, αiη : Σ inΣˆ,1 where the i parameterpaper1 η :(Addr Addr￿ ) and we assert “anodizing”#f), whereas the inlined bijections lambda term closes between over the binding these of x currently halves: in ∈ → ∈ → ∪ scope (which is to #t). Programs￿ ￿ like this mean￿ that functional compilers(Time must Time￿) abstracts bothβ bindingsˆ(e ) andˆb , times: ([[(fe1 ...en) ]] , β, ve,t) (call, β￿￿, ve￿,t￿), where: → i ￿ i be conservative whenbindings they inline in thebased set on⇒ informationBˆ: obtained from a CFA. If ≡ the inlined lambda term has a free variable,di = the(ei inlining, β, ve) could be unsafe. ˆ ￿ η η η η b Bind 1 E and untouched5 Application: bindingsα Higher-order( retaincall, β, ve their,t)=( equivalence:α rematerialization(ςV ), αΣ(=β),Callα (ve)BEnv, η(t)) VEnv Time ˆ g : Bind Bind 1 ￿￿ gˆ : Bind￿ Bind￿ 1, ςˆ Σ = Call BEnv￿ VEnv￿ Time￿ Specific problem To determine∈ the safetyd0 =([[ of inlining(λ ( thev1 lambda...vn) termcalllam)]]ˆ, βat￿) theˆ ˆFormally,η ˆ the concrete∈ transition rule× must rebuild× the× value environment∈ × × × ∞ → b￿∞ b→Bˆ and b =ˆαgBEnv(b￿) (β)=λv.η(β(v)) call site [[(f ...)]], we needˆ to knowˆ that for every environment1 ˆ ρ in which this with every transition:β BEnv = Var ￿ Bind ˆ ￿ ￿ t￿ = tick(call,tgˆ)−ˆ (b)=Now that we∈ have a generalizedˆbη ˆb environmentˆb ∈ Bˆ η analysis,ˆb Bˆ we can precisely state β BEnv = Var ￿ Bind call is evaluated, that ρ[[ f]]b = ( lam￿, ρb￿), and ρ(v)=ρ￿(v) forB each free variableˆ v α (￿ve)=λˆb. α (ve(￿b)) ∈ 2 b otherwise VEnv≡ ve ￿∈ VEnv = Bind￿∈ ￿ D in the term lam. ≡ bi = alloc(vi,t￿) the￿ condition under which higher-order rematerialization￿ is safe. Might’s work ˆ such that: ([[(fe1 ˆ...e∈η(nb))=ˆ ]]ˆb, β, ve,t) (call, β￿￿, ve￿,t￿), where: ve VEnv￿ = Bind￿ D b ￿ b￿, ⇒ B = 1b : b Bind on the1 correctness of1 super-β inliningη formallyη d D defined= Val safe to mean that the ∈ → gˆ− i dˆi,...,dˆ1 = gˆ− (dˆ ),...,gˆ− (dˆ ) α (d)= α≡ (d) di = (ei, β, ve) CFA Limitation: Globalizationˆ Sestoftˆ{ identified1 ∈ globalizationn} ˆtransformed as a secondˆ 1 programˆ andn the untransformedD Val ∈ program maintain a bisimulation dˆ Dˆ = (Val) and bindings between singletonsη are(b)= triviallyb iff ηB( equivalent:g1(b)) =g ˆ(b). B B { } E η val Valη = Clo ￿￿ blindspot of control-flow analysis [25].β Globalization￿￿ =(gB− β￿)[ isv ani optimizationbandi] bindings that con- re-bound toα themselves(lam, β)=(lam also, α retain(β)). their( equivalence:λ ( ) ) b ∈ P ￿ 1 ￿→ ˆ￿ in￿ their concrete1 ˆ executionsVal [16].￿ ∈ d0 =([[ v1 ...vn call ]] , β￿) verts a procedure parameter into a global variable1gˆ when−ˆ (lam it is safe, β to)=(1 do so.lam Though, gˆ−ˆ (β)) clo Clo = Lam BEnv val Val = Clo ve￿ =(gB− veB )[bi (gB− di)], B t￿ = tick(call,t) ∈ d not obvious, globalization2.3 Concrete can also and be cast abstract as a problem interpretation￿→ of reasoningTheorem about en- 4. It is safe to rematerialize the∈ expression e￿ in× place of the expres- ˆ ˆ 1 1 ˆ ˆ b Bindˆ is an infinite set of bindings ￿ Every abstract bindingvironments:β(e ) has if, for everyBind￿ two state variants, of execution,b all areachableBind summary￿gˆ− environments(βˆ)=sionλ variant,v.e whichgˆ−in( theβˆ(v call))b, site and call an iff for anodized everyβ(ei) reachablebi compoundbi = alloc( abstractvi,t￿) state of clo Clo = Lam BEnv i To evaluate a1 program1calli in the concreteBˆ 1 semantics,Bˆ its meaning is the set of ∈≡ c ∈ d d × wherecontain the de-anodization a variable are equivalent function for thatg− variable,:(BEnv thenBEnv it is safe) to(VEnv turn that VEnv) ˆ ∈ B ∈ the form ((call, βˆ￿￿, ve, tˆ), ),itisthecasethatt Time ˆis(e anB￿, βˆ=￿￿infinite, vebi )=(: bi setlamBind of￿,1βˆ times￿) and ˆ variant,g ˆ(b). Wevariable will into craft a global.states the reachable concrete from the initial and state→ abstract1ς0 =(call∪, [],ˆ[],t semantics0):1→ ˆ ∪ so thatβ theˆ(ei) an-￿ ˆbi. { ∈ } b Bind￿ is a finite set of bindings (D D) (Bind Bind) strips the anodizationgˆ− off(vebindings)=ˆλb. thatˆgˆ− ( abstractve(b)). toˆ ≡ ∈ E 1 c∈ d ˆ ˆ Bˆ (e, β2.2￿￿B,ˆve Transition)=(lam, β) rulesand the relation σ ≡Var Varβ￿￿is=( ag substitution− β￿)[v b that] uni- → ∪ → β(ei) ￿ bi, ς : ς ς . B i i odized variantany will bindingSpecific be problem in a the singletonTo set determineB: the abstraction. safety of globalizing the0 variable We∗ v mustE,weneedto anodize concrete bind-⊆ × ˆ ￿→ ˆ tˆ Time￿ is a finite set of times ≡ { ⇒ } fies the free variables￿ of lam￿ with lam and for each (v￿,v) ￿The1σ, β constraints￿(v￿) β1(v) of. a straightforward soundness theorem (Theorem 1) lead to know that for each reachable state, for any two environments ρ and ρ reachable ve￿ =(g∈B− ve)[bi ≡(gB− di)], ∈ 2.3 ConcreteA na¨ıveBecause abstract and the interpreter abstract concrete could semantics behave interpretation4.1 similarly, obey￿ Solving exploring the￿ uniqueness the the statesgeneralized reach-constraint environment (Equation 1), probleman abstract￿→ transition relation over this space: ings as well becauseinside the that state, concrete it1 must be that semanticsρ(v)=ρ￿(v)ifv havedom(ρ) and￿ tov employdom(ρ￿). ￿ the same anodization ablegB− from(b)= theb initial stateς ˆ =(call∈ , [], , tˆ0): ∈Proof.WithThe state-spaces proof of bisimulation defined, we has can a specifyFig. structure 1. theState-space identical concrete1 to transition for that the of thelambda relation proof calculus:for Concrete (left) and abstract (right). and untouched bindings retain theirthe abstract equivalence: interpretation may⊥ treat the set Bind￿ 1 whereas a set the ofde-anodization singleton abstrac- function gB− :(BEnv BEnv) (VEnv￿ ˆ VEnvˆ ) ˆ ˆ strategy as the abstractTo evaluate semantics a program incall orderin tothe prove concreteUndercorrectness soundness. theCPS, semantics, direct ( for) super- productΣ β Σinlining its; then abstraction, meaning inwe [16]. can define is the the its generalized corresponding set of environment abstraction(([[→ (fe1 ...e∪ undern theorem,) ]] ,→β, ve, t), ∪) ❀ ((call, β￿￿, ve￿, t￿), ￿), where: CFA Limitation:1 Rematerializationb η(b)=Compilersη(b￿)ςˆ for for:ˆς0 imperativesome❀∗ ςˆ g.(b￿) languagesB ⇒ η⊆ (D× ˆD) ˆ (Bind Bind) strips the anodization off bindings that abstract to≡ ≡ g−tions(g(b)) for = the purpose of testing{ binding} ∈the equality. map α ,(❀) →Σ Σ∪ . With→ the help of an argument-expression evaluator, have found thatB it can be beneficial to rematerialize (to recompute)which rules a value on the equalityany⊆ binding× of inindividual the set B: bindings, follows naturally: dˆ = ˆ(e , βˆ, ve) The concretestates semantics reachableIn practice, must from￿ wideningg also(b) the otherwise on obey value initial environments an state abstraction-uniqueness [5]ς0 accelerates=(:callExp convergence, BEnv[], [],t0VEnv [16,): 27].￿ constraintD: i i at itsˆ point ofˆ use if the valuesˆ on whichˆ it dependsˆ are stillˆ available. OnE mod- × × ￿ E ￿ b 1b b B 1 b B 6Relatedwork we can define the single concrete transition ruleˆ for CPS:￿￿ ˆ ern hardware,g− rematerialization(lam￿ , β)=(lam can,g decrease− (β)) register￿ pressure and improve cache 1 ˆ ˆ d0 ([[(λ (v1 ...vn) call)]] , β￿) B B Theorem 3. Given a compound abstractgB− (b)=b state ((call, β, ve, t), ) and two ab- over anodized bindings,performance.≡ Functionalso3.1 that Solving languages for￿∈ currently any the environment reachable lack analyses￿∈ to drive state problem rematerial- (call with, β anodization, ve,t): ≡ ￿ ￿ 2.41 Parameters1 for the analysis frameworkς stract: ς0 bindings,∗ ς . ˆb and ˆb ,ifα˙η(call(v,,ββ,,veve)=,t)ve(β(((v))call, βˆ, ve, tˆ), ) and η(b)=ˆb ˆ ˆ ization. ConsidergB− a trivial(β)= example:λv.gB− (β(v)) Clearly, this work draws￿ on the Cousots’E abstractb interpretationη(b)=η(b￿) for[5, 6]. some￿ Bindingg(b￿) B t￿ = tick(call, t) ˆ ˆ { ⇒ } 1 ￿ ([[(fe ...e ) ]]≡, β, ve,t) (call, β￿￿, ve￿,t￿), where: Time-stamp1 b incrementers￿ 1b￿, and binding allocators serve asˆ parameters:ˆ ˆ (glamB− (,gβ(,bve)))=( = lam, β), 1 n ∈ 2 The symbol ρgdenotes− (ve a)= conventionalλb.g− ( variable-to-valueve(b)). environmentandˆ map.ηinvariants(b￿)=ˆb￿ succeedand b theb Cousots’￿, thenE workve(b)= as ave relational(￿b￿g)(.b) abstraction otherwise of￿ higher-order ⇒ ˆ A na¨ıve abstractGivenB two interpreter≡ abstractB environments could behaveβ1 and similarly,β2, it is easy≡ exploring to determine the whether states the reach- bi = alloc￿(vi, tˆ￿) If g(b) dom(ve) and galloc(b￿): Var domTime (veBind) andalloc￿η(programsb: Var)=Time￿η [7,( 8],b￿)then withBind￿ the distinctionb = 1b￿ that. binding (1) invariants1 range over abstractdi = (ei, β, ve) ￿ g− (lam, β)=(lam,g− (β)) ￿ concrete constituents× → of these environmentsbindingsˆ × instead agree→ of on formal the parameters. valueB of some Binding subset invariantsB were also˙η inspired by E ∈ able from the initial∈ stateς ˆ =(call,Proof.[], , tBy0): the structure of the direct product abstraction α . ￿￿ Bˆ = ˆb : ˆb Bind￿ and bindings re-boundThe to corresponding themselves abstracttick also: Call transition retainTime ruleTime their must also equivalence:tick rebuild: Call theTime￿ value envi-Time￿. 1 1 d =([[(λ (v ...v i) calli )]] , β￿)1 of their domains, v1,...,vn : ⊥Gulwani et al.’s quantified abstractg domainsB− (β)=λ [9];v.gB− there(β(v is)) an implicit universal 0 1 n ∈ ronment with every transition: × → × → { } 1 1 ˆ ￿ 1 ˆ ˆ ￿ Time-stamps are designed to encode context/history.quantification Thus, the ranging abstract over time- concreteg constituents− (ve)=λb.g in− the(ve( definitionb)). of the abstrac-t￿ = tick(callβ,t￿￿ )=(ˆg− β￿)[vi bi] In other words, once the concrete semanticsη decidesςˆ :ˆς￿0 to❀η ∗ allocateςˆ .η an anodizedB bind- B Bˆ stampTheorem incrementer 2. Iftickαand(β the)= abstractionβˆ and mapαtionα(βη mapdecide)=αβ howˆ.,and This much workβ historyˆ ( alsov)= to fallsβˆ within(v) and andβ retainsˆ (v) the advantages of Schmidt’s ￿→ ￿ ˆ ˆ 1 ˆ {1 ˆ 2 } ≡2 1 2 1 1 1 ([[(fe1 ...eˆn) ]] , β, ve, t)ˆ❀ (call, β￿￿, ve￿, t￿), where: ∈ bi = alloc(vive,t￿￿)=(ˆg− ve) [ˆb (ˆg− dˆ )], ing, it must de-anodize existingretain￿β in( theei abstraction.) concretebi As a bindingsresult, the functionsmall-step whichtick determines abstract abstract the interpretiveThe context- corresponding to framework the same abstract [24]. As transition a generalization rule must of also control- rebuildFormally, the value the envi- concrete transitionBˆ ￿ rulei ￿→ mustBˆ rebuildi the value environment Bind 1, then β1(ˆv)=β2(ˆv). In practice,sensitivity widening of the≡ analysis. on￿di = valueˆ Similarly,(ei, β, environmentsve the) abstractflow binding analysis, [5] allocator accelerates the platform choosesronment how of with Section convergence every 2 transition: is a small-step [16, reformulation 27]. of Shivers’swithβ every￿￿ = β transition:￿[vi bi] abstract binding. Anodizationto allocate by abstract itself￿ bindings doesE to variables, not￿ dictate and indenotational doing￿ when so, itCFA fixesa the [27], concrete polyvari- which itself was seman- a extension of Jones’sand singleton original CFA bindings [13]. are reflexively￿→ equivalent: ˆ ˆˆ ￿￿ ˆ ￿ ￿ β(ei) ￿d0bi.([[(λ (v1 ...vn) call)]] , β￿) ￿ ve￿ = ve[bi di]. ￿ anceProof. of theBy analysis. the abstraction-uniqueness Once the parameters are fixed,Like constraint. the the Nielsons’semantics unifyingmust obey work a ([[ on(fe CFA...e [22],) ]] this, βˆ, ve work, tˆ) ❀ is( ancall implicit, βˆ￿￿, ve￿, argumenttˆ￿), where: ([[(fe ...e ) ]] , β, ve,t) (call, β￿￿, ve￿,t￿), where: tics should allocate an anodized binding;≡ ￿ this￿ is a policy decision; anodization1 n is ￿→1 ˆ n ⇒ straightforward soundnessˆ theorem: ˆ in favor of the inherent flexibility of abstract interpretation for the static analy- b Bind￿ 1 2.4 Parameters fort￿ the= tick analysis(call, t) framework dˆ = ˆ(e , βˆ, ve) ∈ di = (ei, β, ve) η sis of higher-order programs. In contrast with constraint-based,i E i type-based and E a mechanism. For simpleTheorem policies, 1. If α the(ς) parametersςˆ and ς ς￿, then therealloc existsand a statealloc￿ςˆ￿ such,byselecting that ￿ ￿ ˆ ˆ ˆ ￿￿ ˆ With the help of an￿ abstract￿ evaluator, : Exp bBEnv￿￿ b, VEnv￿ D: η ˆbi =￿ alloc￿(vi,⇒tˆ￿) ˆ ˆ d0 =([[(λ (v1 ...vn) call)]] , β￿) ςˆ3.2❀ ςˆ￿ and Implementingα (ς￿) ςˆ￿. anodizationmodel-checking efficiently CFAs, small-step abstract interpretived0 CFAs([[(λ are(v easy1 ...v ton) extendcall)]] , β￿) E × ≡ × → Time-stamp incrementers￿ ￿ and binding allocators serve as parameters: ￿ ￿ 4.1 Solvinganodized the or summary generalized bindings, environment jointlyˆ encodeˆ ˆ problem the policy.via direct products and parameterization. ˆ ˆ t￿ = tick(call,t) B = bi : bi Bind 1 t￿ =andtick bindings(call, t) betweenˆ ˆ singletons areˆ trivially equivalent: ∈ From shape analysis, anodized bindings draw on singleton abstraction(v, whileβ, ve)=ve(β(v)) b = alloc(v ,t) 3The Analogy: na¨ıve implementation Singleton abstraction of the abstract to binding transition anodization rule is inefficient: the de-ˆ E i i ￿ As an example of the simplest anodizationˆ ￿ 1 ˆ ˆ policy,￿ binding we invariants describe are inspired the by higher-both predicate-basedbi = alloc￿ abstractions(vi, tˆ￿) [3] and alloc : Var Timeβ￿￿ =(ˆg−1βBind￿)[vi bi] alloc￿ : Var Time￿ Bind￿ ￿ ˆ ˆ B = b : b Bind Under the direct product abstraction,anodizing function the generalizedg ˆB−ˆ must walk the environment abstract value environment theorem, with every ˆ ˆ β(ei) Bind￿ˆ1 bi Bind￿ 1i i 1 Focusing on our× goal of solving→Bˆ the generalized￿→ environmentthree-valued problem—reasoning logic analysis× [23]. Chase→ et. al had earlyˆ workˆ onˆ counting-based(lam, β, ve)= (lam, β) , { ∈ } order analog of Balakrishnan and Reps’s recency1 ˆ abstraction1 ˆ in Section 3.3. An B = bi : bi BindE 1 ∈ ∈ 1 transition. Evenve￿ in=(ˆ 0CFA,g− ve) this[bi walk(ˆg− addssingletondi)], a quadratic abstractions penalty [4], while to Hudak’s every work transition. on analysis of first-order∈ functional ￿ ￿ β￿￿ =(gB− β￿)[vi bi] about the equality of individualBˆ bindings—weBˆ turn to singleton abstraction [4]. βˆ(ei) ￿ ˆbi, ￿→ which rules on the equality oftick individual: Call Time bindings,Time￿ follows￿→ naturally:tick : Call Time￿ Time￿. ˆ ￿ 1 ˆ ˆ ￿ ￿ ￿ 1 1 SingletonTo avoid abstraction this walk, has been the used analysis in pointer should andprograms shape use analyses employed serial to numbers drive a precursor must- on to bindingscounting-based “under singletonβ￿￿ =(ˆg abstraction− β￿)[vi b [10].i] An- ≡ ve￿ =(g− ve)[bi (g− di)], example of a more complicated policy× is→ closure-focusing (Section× 3.4). →we can define an analogousBˆ ￿→ transition￿ rule for the abstract semantics:B ￿→ B ; we extend1 singleton abstraction,odization, and the framework using factored of anodiza- sets of singleton and non-singleton1 bindings, is most1 where the de-anodizationthe hood,” function so that:￿g ˆ− :(BEnv￿￿ BEnv￿) (VEnv￿ VEnv￿) ve￿ =(ˆandg− untouchedve) [ˆb bindings(ˆg− dˆ )], retain their equivalence: tion, to determine theBˆ equivalence of bindingsˆ to the same variable. That is, we Bˆ i Bˆ i 1 Theorem 3. Given aTime-stamps compound abstract are designed state to encode((→call, β context/history.∪closely, ve, relatedtˆ→), to) theand∪ Balakrishnan Thus, two the ab- and abstract Reps’s recency time- abstraction￿ [2],￿→ exceptwhere the de-anodization function g− :(BEnv BEnv) (VEnv VEnv) will be able to￿ solve the￿ environment problem with our￿ singleton abstraction, ￿ B → ∪ → ∪ (D D) (Val Val ) (Bind Bind) strips theBind￿ anodizationthatBind￿ anodization off ≡abstractNη. works on bindings instead of addresses,([[(fe and...e anodization) ]] , βˆ is(,Dve not, tˆ)D❀) (Bindcall, βˆ￿￿Bind, ve￿), tˆ strips￿), where: the anodization off bindings that abstract to → ∪ η → ∪ → ∞ 1 1 n ˆb ˆb￿ ˆb Bˆ ˆb￿ Bˆ ˆ stampˆ incrementer˙ but not the generalizedtick andenvironment the abstraction problem.ˆ ≈ In Sectionˆ map 4,× weα willwheredecide solve the the de-anodization how muchˆ function history￿g ˆ− to:(BEnv￿￿ BEnv￿) (→VEnv￿∪ VEnv￿→) stract bindings, b and b￿,ifα (call, β, ve,t) ((call, β, restrictedve, t), to a) most-recentand η( allocationb)= policy.b Superficially,Bˆ one might alsoany term binding in≡ the set B￿∈: ￿∈ generalized problem by bootstrapping binding invariants on top of anodization. → ∪ dˆ =→ˆ(e , βˆ, ve∪ˆ) ˆ ￿ retain￿ in￿ theThat￿ abstraction. is, the value environment As￿ aγ result, should theJones befunction and implemented≡ Bohr’stick work(D onasdeterminesD) terminationtwo(Val maps:Val analysis the) (Bind￿ context- of theBind￿ untyped) stripsλ-calculus the anodization via i off abstracti b ￿ b￿, ˆ ˆ ˆ A Galois connection [6] X Xˆ has a singleton abstraction iff there exists E 1 ≡ and η(b )=b and b b , then ve(b)=ve(b ). α size-change as another→ kind of∪ shape→ analysis∪ for higher-order→ programs [14].￿ ￿ ￿ ￿ ￿ ￿ −−←− −→ −− ￿ ˆ ￿￿gB− (b)=b ˆ ˆ ˆ￿ ˆ d0 ([[(λ (v1 ...vn) call)]] , β￿) ≡ sensitivitya of subset theX1 analysis.X such that Similarly,for allx ˆ X1, size the(γ(ˆx abstract)) = 1. Theˆ critical binding property￿ ￿ allocator of ￿ ￿ chooses howand bindings re-bound to themselves also retain their equivalence: ⊆ VEnv￿ (∈Bind￿ N D) (Bind￿ N). ￿ ￿ b η(b)=η(b￿) for some g(b￿) B singleton abstractions is that equality of abstract∞ ￿ representatives implies∞ equal- 1 to allocate abstract bindings≈ to variables,→ and→η in× doing￿ so,→ it fixes the polyvari- tˆ￿ = tickg(B−call(g(b,))tˆ) = ∈ Proof. By the structure of the directity of their product concrete constituents. abstraction Hence, when theα˙ set. X contains addresses, ￿g(b) otherwise βˆ(ei) ˆbi ance of thesingleton analysis. abstractions Once enable the must-alias parameters analysis. Analogously,ˆ areˆ fixed, when the the setˆ semanticsX must obey a 1 1 Given a split value environment ve =(f,h), a binding (b, n) is anodized only if ˆ ￿g− (lam,ˆβ)=(≡lam,g− (β)) contains bindings, singleton abstraction enables binding-equality testing. bi = allocB (viˆ, t￿) ˆ B ˆ ˆ ˆ ˆ ￿ β1(ei) ￿ bi. 1 straightforwardn = h(b soundness), and it is not theorem: anodized if n

1 but not the generalized environment problem. In Section 4, we will solve the where the de-anodization function￿g ˆ− :(BEnv￿￿ BEnv￿) (VEnv￿ VEnv￿) Bˆ → ∪ → ∪ generalized problem by bootstrapping binding invariants on top of anodization. (D D) (Val Val ) (Bind￿ Bind￿ ) strips the anodization off abstract γ ˆ → ∪ → ∪ → A Galois connection [6] X X has a singleton abstraction iff there exists ￿ ￿ ￿ ￿ −−←−α −→ −− a subset Xˆ1 Xˆ such that for allx ˆ Xˆ1, size(γ(ˆx)) = 1. The critical property of singleton abstractions⊆ is that equality∈ of abstract representatives implies equal- ity of their concrete constituents. Hence, when the set X contains addresses, singleton abstractions enable must-alias analysis. Analogously, when the set X contains bindings, singleton abstraction enables binding-equality testing. Shape analysis of higher-order programs exists. Shape analysis is useful. ¡Gracias!

matt.might.net I don’t know. Yes. No. Widening? Narrowing?