<<

The Semantics of Scheme Control-Flow Analysis

Olin Shivers

School of Science Carnegie Mellon

Pittsburgh, Pennsylvania 15213

OlinShiverscscmuedu

A bstract Nonstandard abstract semantic interpretations

Non-standard abstract semantic interpretation is an elegant method This is a follow-on to my1988PLDIpaper, ªControl-Flow Analysis for formally describing program analyses. Suppose we have a

inSchemeº[9]. Iusethe methodofabstractsemanticinterpretations S L with a denotational semantics , and we to explicate the control-¯ow analysis technique presented in that

wish to determine some property X at compile time. Our ®rst step is

paper.

S L

to developan alternate semantics X for that precisely expresses

I begin with a denotational semantics for CPS Scheme. I then S property X . That is, whereas semantics might say the meaning present an alternate semantics that precisely expresses the control- of a program is a function ªcomputingº the program's result value

¯ow analysis problem. I abstract this semantics in a natural way, S

given its inputs, semantics X would say the meaning ofa program arriving at two different semantic interpretations giving approxi-

is a function ªcomputingº the property X on its corresponding mate solutions to the ¯ow analysis problem, each computable at inputs.

compile time. The development of the ®nal abstract semantics pro- S

X is a precise de®nition of the property we wish to determine, vides a clear, formal description of the analysis technique presented but its precision typically implies that it cannot be computed at

in ªControl-Flow Analysis in Scheme.º compile time. It might be uncomputable; it might depend on the S

runtime inputs. The second step, then, is to abstract X to a

^

S

X Intro duction new semantics, which trades off accuracy for compile-time

computability. This sort of approximation is a typical program-

Controlow analysis analysis tradeoff Ð the real answers we seek are uncomputable,so we settle for computable, conservative approximations to them.

Schemecontrol-¯owanalysis(CFA)isausefultechniquefor compile- For example, allow property X to be the setof all ªuselessvari- time analysis of the control-¯ow structure of Scheme programs (or, ablesºin a program, where a uselessvariableis one referenced only more generally, programs written in languages allowing ®rst-class to computevaluesbound to other uselessvariables. Suchvariables,

functions). In a previouspaper[9], I introducedthe technique,gave and the computationsreferencingthem, can then beeliminated from S

an for it, and demonstrated two example optimisations the program without altering its result. Our alternate semantics X P (induction-variable elimination and useless-variable elimination) would map a program P to a function that ªcomputesº 's useless- that could be achieved with the results of the analysis. Other use- variable set. This semantics would probably be uncomputable,

ful applications of control-¯ow analysis are type recovery [11], and depending on perfect knowledge of the control-¯ow behavior of P .

copy, constantandlambda propagation[13]. The fundamentalideas ^ S

A useful, conservative abstraction X would be one that occasion- of control-¯ow analysis have also been utilised in other work on ally misses a truely useless variable, but never includes a useful languages [8, 2]. variable in its result set. The technique for performing Scheme control-¯ow anal- The method of non-standard abstract semantic interpretation ysis consists of translating the Scheme program into a simple inter- has several bene®ts. Since the analysis is expressed in termsofa mediate representation: -passing style (CPS) Scheme formal semantics, it is possible to prove important properties about with a primitive functional conditional operator and all side-effects the analysis. In particular, we can prove that the abstract semantics

to variables converted into side-effects to data-structures. After ^

S S X is computable, and safe with respect to X . Further, due to its the CPS conversion, all transfers of control in the program Ð se- formal nature, and because of its relation to the standard semantics quencing, iteration, conditional transfers, procedure call/return Ð of a programming language, the simple expression of an analysis are represented as tail-recursive procedure calls. Thus the problem in terms of abstract semantic interpretations helps clarify it. The of determining the control-¯ow structure of the program reduces to abstract semantic interpretation method of program analysis has the problem of determining for each call site the set of all lambda been applied to an array of program analyses [3, 5, 12, 4, 11]. expressionsin the program that could be branched to from that call In this paper, I will explicate Scheme control-¯ow analysis site. using this framework. I will show a series of semantics for CPS Scheme, beginning with the standard semantics, evolving through To appear at the First ACM SIGPLAN and IFIP Symposium exactcontrol analysis,andendingupwith two different computable on Partial Evaluation and Semantics-Based Program Manipu- abstractions (with different cost/precision tradeoffs). lation, June 1991, Yale University, New Haven, Conn. Also available as Technical Report CMU-CS-91-119, CMU School of . 2 CFA Scheme Semantics

PR ::= LAM

v v [v c ] i

LAM ::= 1 . . . n VAR CALL

a a [f a ] i

CALL ::= 1 . . . n FUN ARG

l c [f l c ] letrec f i

1 1 . . . i VAR LAM CALL +

ARG ::= LAM + VAR CONST +

FUN ::= LAM + VAR PRIM

x z foo g

VAR ::= f . . .

f g

CONST ::= f . . .

if g PRIM ::= f . . .

Figure 1: CPS Scheme Syntax Notation would like to limit the arguments we are willing to circularly close

to be lambda expressions. By elevating this restriction to the syn-



D is used to indicate all vectors of ®nite length over the set .

tactic level, we will simplify the semantics equations, and sidestep

a b c d

Functions are updated with brackets: e is the func- the Y operator. There is no syntax for assigning variables in this

b c d tion mapping a to , to , and everywhereelse identicalto function language. If we wish to allow side effects, we can introduce ap- e. This notation is extended by taking an update standing by itself propriate primops to create and side-effect mutable data structures; to imply an update to the appropriate bottom function ; hence[] assignments to variables in the source language can be converted

is equivalent to . Unused function variables are, by convention, into equivalent data structure side effects during the CPS conver-

x e D P (D )

subscripted with , e.g., x . The power set of is . Vectors sion [6]. Finally, we'll assume that all variables are uniquein a

a p z i

are written h . Lambda functions are sometimes written with program Ð that is, no identi®er is bound by more than one lambda

ha bi a vector-destructuring syntax: the function exp takesa two expression.

element vector as its single argument, binding a to the ®rst element, It bearsemphasizingthatthis ratherminimal languageis a useful

i v v i and b to the second. The th element of vector is written . intermediate representation for compiling higher-order languages

Functions with power-set ranges can be joined with the operator: such as Scheme. Variants of CPS Scheme have been used in sev-

t g = x (f x) (g x) +

f . The ªpredomainº operator is usedto eral Scheme and ML compilers. A full discussion of the many

+ B construct the disjoint union of two sets: A . This operator does advantages of CPS-based intermediate representations, however, is not introduce a new bottom element, and so the result object is just beyondthe of this paper [1, 6, 15]. a set, not a domain; following Reynolds [7], I attempt to introduce CPS Scheme has a very simple semantics. The semantic do- domains only where necessaryin semantic constructions, avoiding mains and functions are given in ®gure 2. There is a set of basic ªspurious values.º values,Bas, whichconsistsofthe integersandaspecialfalse value(I

will follow traditional Lisp practice in assuming no special boolean CPS Scheme type; anything not false is a true value). The value set D consists of the basicvaluesandCPS Schemefunctions. CPSSchemefunctions The practice of converting programs into continuation-passingstyle are representedasfunctionsfrom vectors ofvaluesto the answerset. (CPS) as an intermediate representation for compilation has been The domain of answers Ans is the value set D plus a special error discussed in several papers [15, 6, 1]. CPS can be summarised element denoting a run-time error, and a bottom element denoting by stating that function calls are one-way transfers Ð they do not non-termination1. An environment is a function from variables to return. Soafunctioncallcanbeviewedasa GOTO thatpassesvalues. values. In this section, we will de®ne a very simple language, called CPS Note that an environment only maps to values in D Ð it will Scheme, for expressing programs in written in this style. never map a variable to or error. This is one of the happy The syntax of CPS Scheme is shown in ®gure 1. A program consequencesof CPS conversion. Also note the absence of a store

is a single lambda expression. Lambda expressions bind variables in this semantics. Side effects have been dropped completely from

v v 1 . . . n ; the body of a lambda expression must be a single call this semantics to simplify the presentation; they are not dif®cult expression. There are two kinds of call expressions. A simple to reinstate [10] once the basic methods outlined in this paper are call expression is a function applied to a series of arguments. The understood.

function expressionmay only be a variable, a lambda or a primitive PR maps a program to its result. It simply calls the A function operation (primop). An argument expression may only be a vari- to close its lambda intheemptyenvironment[ ], andcallstheresult able, a lambda,ora constant. Notice thatoursyntaxdirectly re¯ects function on a one-element argument vector, which contains the

the ªfunction calls never returnº prohibition of CPS: it is impossi- terminal continuation. If the terminal continuation is ever applied

c d e

ble to nest function calls, e.g., is not syntactically

av

to a one-elementargumentvector av , that element 1 is the result

c d legal, since is not a legitimate argument expression in the

of running the program. If av doesn't contain exactly one element, letrec

call. A call expression is special syntax for establishing the program is aborted with a run-time error. l mutually recursive sets of functions. The lambda expressions i

The A function evaluates function and argument expressions. are evaluated, and the result functions bound to the corresponding

It is de®nedby cases. A variable v is simply looked up in the envi-

f l i

variables i . The are closed in an environment that includes k

ronment. A constant is passed to the constant function K , which

f c letrec the i . The inner call expression of the form is then presumably maps numerals to their corresponding integers, and the

evaluated in this environment. This is simply a CPS version of the

labels letrec

Scheme or form. It would be possible to eliminate 1 

The bottom element also allows Ans and, hence, D ! Ans to be legitimate

letrec Y the form by including the operator as a primop; this ap- domains,necessaryfor the le trec expressionto be well-de®ned. Thissort ofsemantic proach hasbeen adoptedin otherpresentations[6, 9]. However, we ®ne detail is beyondthe scope of this paper, and will not concern us further.

CFA Scheme Semantics 3

PR : PR Ans

= Z + f g

Bas false C : CALL Env Ans



= + ( )

D Bas D Ans A : ARG FUN Env D

= ( + f g) K

Ans D error ? : CONST D

P

Env = VAR D : PRIM D

av (av ) = av

length 1 1

PR =

A [ ]

otherwise error

e

[[v ]] e = v

A

[[k ]] e = k

A K

x

[[p]] e = p

A P

x

[[ v v c]] e = av (av ) = n c e v av i

A C i 1 . . . n length

otherwise error

0 0 0 0

[[f a a ]] e = f f ha a i

C n 1 . . . Bas 1 . . . n

otherwise error

0 0

f = f e a = a e

A A i

where i

0

[[letrec f l c]] e = c e C

1 1 . . . C

0 0

e = e f l e

A i

whererec i

[[]] = ha b ci

P bad argument error

ha + b i

otherwise c

[[if]] = hp c a i

P bad argument error

= c hi

p false hi otherwise a

Figure 2: Standard Semantics

false identi®er to the false value. The precise de®nition of K will However, if the function expressionevaluates to a non-function, the letrec

not concernus further. A primop p is passed to the primop function program is aborted with the error value for result. Given a

v v c

P A C

. A lambda 1 . . . n is mapped by to a function call expression, establishes the inner environment by evaluating

av av l

that takes as its argument an argument vector . If contains the lambda expressions i in the appropriate recursive environment

0

v e c exactly as many elements as the lambda has formal parameters i , , and then evaluates the inner call in the result environment.

then the lambda's call body c is evaluatedin an environment which

The P function must map each primop in CPS Scheme to an

e v

is the closure environment extended by mapping each i to the appropriate function. For expository purposes, I present its de®ni- av ith element in the argument vector. If, however, has too few tion for two values: the function, to illustrate ordinary primitive or too many elements to match up with the lambda's variables, the functions, and the if function, to show a primitive function that program is aborted, with the error value for result. determines control-¯ow.

Note that it is always possible to evaluate a function or argu- The primop denotes a function whose argument is a triple

b c ment expression. Since function and argument expressions are so composed of two integers a and , and a continuation function .

simple Ð variables, lambdas, primops, and constants Ð they can The integers are added, and the sum is packaged into a singleton

A

always be evaluated. This is why the range of does not include argument vector which is passed to c. If the primop is called on . Furthermore, the only possible case in which A might generate a ªbad argument,º the program is aborted with the error value for

an error value lies in referencing an unbound variable. However, result. For the purposesof , an argumentis bad if it containsfewer since CPS Scheme is lexically scoped, we can relegate this to syn- than or more than three elements; if its ®rst or second element is tax, and simply declare that programs containing unbound variable not an integer; or if its third element is not a function. references are not syntactically legal CPS Scheme programs. Thus The if primop denotes a function whose argument is a triple

we can omit the error value as well from A's range. c

composed of some value p and two continuation functions and a

The C function evaluates a call expression in a given environ- . If the argument does not conform to this requirement, a runtime p ment. It is also de®ned by cases. In the simple function case, C error is returned. If the predicate is true (i.e., anything but the

evaluates the function expression and each of the argumentexpres- false value), the consequent continuation c is called with no argu-

sions in the current environment. The argument values are pack- ments (i.e., the empty argument vector). Similarly, if p is false, the

agedinto an argumentvector, which is passedto the function value. alternate continuation a is called.

4 CFA Scheme Semantics

Z + f g nb

Bas = false : CN



= + ( ) PR

D Bas D VEnv Ans : PR Ans

= ( + f g) C

Ans D error ? : CALL BEnv VEnv Ans

C ontours

CN A : ARG FUN BEnv VEnv D

=

BEnv LAB CN K : CONST D

=

VEnv CN VAR D P : PRIM D

av e (av ) = av

x length 1 1

PR = A [][] [ ]

otherwise error

v ) v i [[v ]] e = e h(

A binder

[[k ]] e = k

A K

x x

[[p]] e = p

A P

x x

0 0

v v c]] e = av e (av ) = n c e [[

C A x : 1 . . . n length

otherwise error

= nb

where b

0

= b

0

e = e hb v i av i

i

0 0 0 0

[[f a a ]] e = f f ha a i e

C n 1 . . . Bas 1 . . . n

otherwise error

0 0

f = f e a = a e

A A i

where i

0 0

[[ letrec f l c]] e = c e C

: 1 1 . . . C

= nb

where b

0

= b

0 0

e = e hb f i l e

A

i i

[[]] = ha b ci e

P bad argument error

ha + b i e

otherwise c

[[if]] = hp c ai e

P bad argument error

= c hi e

p false

hi e otherwise a

Figure 3: Factored Env Semantics Factoring the environment Our syntax augmented, we can now de®ne our new semantics

(®gure 3). The nb function creates new binding contours. We

Before proceeding to the control-¯ow semantics, we must ®rst de- assumethat eachcall to nb producesa new, unusedbinding contour

velop a slight variant of our standard CPS Scheme semantics. This from CN. In this respect, nb serves a ªgensymº role, and is not will make it easier to eventually abstract the semantics. a properly de®ned function. It is clear, however, that we could

In the new semantics, called the factored-env semantics, we easily de®ne nb properly by modifying all the other functions in split the environment into two different structures: the contour our factored-env semantics to pass around as an extra argument

environment and a global, shared variable environment. A contour the set of binding contours already allocated; this set would serve nb environment maps a syntactic binding construct Ð a lambda or as the argument to . Such modi®cations to the equations are

letrec call Ð to a contour. A contour is just some token Ð an straightforward, tedious, and obfuscatory; we will pass them by. integer will do Ð that serves to distinguish one binding instance Since the variable environment is supposed to model a global

from another. The variable environment maps a contour/variable table,allfunctionsthatcouldaccessormodify theenvironmentmust

b v i

pair h to a value. pass it around as an extra argument. CPS Scheme functions, for



This model requires us to alter our syntax slightly, by requiring instance, are now represented by elements of D VEnv Ans that all our expressions have unique labels, e.g., Ð they are invoked with the current variable environment as a

secondargument. It is a generalfeature of this semanticsthat asthe

x y x y : computationprogressesforwards, the updatedvariableenvironment

has . Labels are drawn from the syntactic set LAB. To avoid is passed forwards in a tail-recursive fashion.

cluttering our code,we'll suppresslabels wheneverconvenient. We The A function has changedfor the variable and lambda cases. b de®ne a syntactic function binder whichmaps a variableto the label In the variable case, A looks up the binding contour introduced by

of its binding lambda or letrec expression. Hence, in the example the variable's binding lambda in the lexical contour environment.

x]] = hb v i above, binder[[ . Then the contour/variable pair is used to fetch the appropriate

CFA Scheme Semantics 5

PR : PR Ans

Z + f g C

Bas = false : CALL Env Ans



+ ( ) A

D = Bas D LAB Ans : ARG FUN Env D

= P( + )

Ans LAB LAB PRIM K : CONST D

=

Env VAR D P : PRIM D

av f PR = g c

A

c c

[ ] x top top

e

[[v ]] e = v

A

[[k ]] e = k

A x K

[[p]] e = p

A x P

v v c]] e = av fg t (av ) = n c e v av i [[

C A

c c i : 1 . . . n length

otherwise [ ]

0 0 0 0

f a a ]] e = f f ha a i c [[c

C n : 1 . . . Bas 1 . . . n

otherwise error

0 0

f = f e a = a e

A A i

where i

0

l c]] e = c e [[letrec f C

C 1 1 . . .

0 0

e = e f l e

A i

whererec i

[[]] = ha b ci fg t

P c

c bad argument [ ]

ha + b i ic

otherwise c

c

[[if]] = hp c ai fifg t

c c P bad argument [ ]

1

p = c hi ic

false if

2 c

a hi i c

otherwise if c

Figure 4: Exact Control-Flow Semantics

value from the variable environment. Factoring the environment by taking the standardsemantics and ªinstrumentingºit to record all

means that A now closesa lambda expressionin the current contour the calls thathappenduring programexecution. Insteadofreturning

environment. It produces a function, which, when called, creates a the actual value computed by the program, the new, non-standard nb

new binding contour b with , augments the contour environment semantics returns the cache constructed by the instrumentation. e

with the new contour, and updates the variable environment , Figure 4 presentsthe new, non-standardsemantics. It is initially

v v av binding parameters 1 . . . n to the elements of argument vector developedas anunfactored-envsemantics. TheanswerdomainAns in the new contour. Since b is a new contour, these bindings won't

is the domain of cache functions; its bottom element is simply

collide with those made by other calls to this lambda. Ans

the function x . CPS Scheme functions are now represented by b

Similarly, C introduces a new contour into the binding en-

functions that take an extra argument: the label c of the site from letrec

vironment whenever proceeds through a call, which they were called. The function augments the answer cache l and binds the lambda expressions i in the new environment. It is

by including its CPS Scheme lambda l in the set of lambdas called

interesting to note in passing that this formulation removes the cir-

from c . The meaning of call expressions is changed accordingly

letrec l 0

cularity from the environment update (since i is a lambda c

to pass along to the called function f the call site from which it e expression, A ignores ). has been called.

The alterations to the primops are similarly straightforward.

Semantics of exact c ontrolow analysis

The primop updatesthe cacheto indicate where it hasbeen called ic

from, and passes to its continuation c a marker pseudo-label

Now we may turn to control-¯ow analysis. To repeat an earlier to indicate that the continuation was called from its internal call site.c point, the point of using a CPS-basedintermediate representation is

The if primop is similar, its main variation being the use of two

that all control transfers are represented with the same mechanism: 1

i c

different internal call sites, one to mark consequent calls ( if ), c

tail-recursive procedure call. So the control-¯ow analysis problem 2

i c

and one to mark alternate calls ( if ).

c

P( + )

for CPS Schemeis to ®nd a function : LAB LAB PRIM

( )

PR c

that given the label c of a call expression, will return , the The function starts the program by calling the top function,

c set of all lambdas and primops called from c . We will call such a passing it a designated call-site label top to mark the top level call.

function a call cache. The top level continuation has a designated label top to indicate This is easy to cast as an alternate semantics: the meaning of a calls to it; when it is called, it records the ®nal call and stops the program is its cache function. We can compute this cache function program.

6 CFA Scheme Semantics

= P( + )

Ans LAB LAB PRIM nb : CN



= P( )

D D VEnv LAB Ans PR : PR Ans

Contours

CN

C : CALL BEnv VEnv Ans

=

P( ) BEnv LAB CN

A : ARG FUN BEnv VEnv D

= P( )

VEnv CN VAR D

P : PRIM D

PR = f av e f g c

x c c

x top [ ] top

ff g =

where A [][]

v ) v i [[v ]] e = e h(

A binder

[[k ]] e =

A

x x

[[p]] e = f pg

A P

x x

0 0

av e fg t (av ) = n c e

C

c c

length

otherwise [ ]

b = nb

[[ v v c]] e =

where

A

n x

: 1 . . .

0

= b

0

e = e t hb v i av i

i

F

0 0 0 0

[[c f a a ]] e = ff ha a i e c j f f eg

C A n

: 1 . . . 1 . . . n

0

a = a e

A i

where i

0 0

[[ letrec f l c]] e = c e C

: 1 1 . . . C

= nb

where b

0

= b

0 0

e = e t hb f i l e

A

i i

[[]] = ha b ci e fg t

P c

c bad argument [ ]

F

0 0

j c c c hi e i c

otherwise

c

bad argument [ ]

F

0

0 1

A

j c c c hi e ic

[[if]] = hp c a i e fifg t

if c c P otherwise

c

F 0

0 2

j a a a hi e ic t

if c

Figure 5: Control-Flow Semantics with ambiguous if and factored env

Abstracting the controlow analysis semantics if the result cachereturned by the primop. Removing this data de- pendencyhas a further consequence: the semantics no longer needs Now that we've de®ned our control-¯ow analysis semantics, we the basic value domain Bas. Since our semantics concerns itself haveaformal descriptionofthecontrol-¯owproblem. Thenextstep solely with control ¯ow, the only values that need to be considered is to abstract our semantics to a computable approximate semantics are those representing CPS Scheme functions. The ®nal change is that is usefulbut safe. Since we do not in generalknow at compile to arrange for argument and function expressions to evaluate to sets time whichwayaconditionalbranchwill go, wemust abstract away of values, instead of simple values. This is actually an isomorphic conditional dependencies. (Had we included i/o in our original shift, since all the sets that result in the new formulation are sin- exact control-¯ow semantics, this would be the appropriate time gleton sets. The extra machinery will come in useful in the next to remove those dependencies,as well.) While we are abstracting, approximation, however, since the semantics now tolerates ambi- we'll go ahead and factor the environment as well, which will be guity in argument evaluation: if a function expression can only be

useful in the next section. The result ambiguous-if factored-env determined to lie in some set, the C function will ®nd the call caches 0 semantics is presented in ®gure 5. resulting from calling all the functions f in that set, and join them together to form the result cache. Also, note that because of the We have introduced three major changes into our new seman-

shift to value sets, variable environmentupdatesare now performed

tics. First, the environment has been factored. This is essentially

0

e = e t hb v i av i

identical to the factoring performed upon the standard CPS Scheme with join operations, e.g., i , with bottom

= x semantics in section 3. Second,the if primop now ªbranches both element VEnv . ways.º That is, the cachesarising from both the consequentand al- ternate continuationsare computed;theseare joined togethertogive

CFA Scheme Semantics 7

PR : PR Ans

= P( + ) Ans LAB LAB PRIM

C : CALL VEnv Ans



= P( )

P( ) D D VEnv LAB Ans

A : ARG FUN VEnv D

= P( )

VEnv VAR D

P : PRIM D

PR = f av e f g c

x c c

x top [ ] top

ff g =

where A [ ]

e

[[v ]] e = v

A

[[k ]] e =

A

x

[[p ]] e = f p g

A P

x

0

av e fg t (av ) = n c e

c c

length C

v v c]] e = [[

n x

A : 1 . . . otherwise [ ]

0

e = e t v av i

where i

F

0 0 0 0

i e c j f f eg f a a ]] e = ff ha a [[c

A C n

: 1 . . . 1 . . . n

0

a = a e

A i

where i

0

[[ letrec f l c]] e = c e C

: 1 1 . . . C

0

e = e t f l e

A i

where i

[[]] = ha b ci e fg t

P c

c bad argument [ ]

F

0 0

j c c c hi e i c

otherwise

c

bad argument [ ]

F

0

1 0

A

c hi e ic j c c

[[if]] = hp c a i e fifg t

if P otherwise

c c

c

F 0

0 2

j a a a hi e ic t

if c

Figure 6: 0CFA Computable controlow analysis sem antics This is precisely 0CFA, the Zeroth-Order Control-Flow Analysis technique presented in my PLDI '88 paper. As a ®nal ®gure, I The problem with the previous semantics, abstracted though it may present the resultant 0CFA semantics in ®gure 6 (note that this is be, is that it is dif®cult to compute the ®xed-pointcache for a given the degenerate contour environment case, and so this artifact has program becausethe environment structure is in®nite. Consider the disappeared entirely). following expression (written in full Scheme, not CPS Scheme, for An alternative approximation, only brie¯y mentioned as 1CFA clarity):

in the earlier paper, is to distinguish contours created by calling a

loop f

letrec lambda from different call sites. Suppose, for example, that some

x c

n f n

loop lambda . . . is called from two different call sites 1 and

c x c

m

loop 2. In 1CFA, the valuesboundto by calls from 1 are kept distinct c from the values bound to x by calls from 2. This yields a tighter,

The variable f is bound to an in®nite set of functions higher-precision analysis.

i

j i

x 2 0 We arenowin apositionto preciselyexpressthis approximation:

nb A replace the call to in with c , the call site from which the

So it is dif®cult for any propagation-based®xed-point algorithm to

b =

lambda was called: c . Since there are only a ®nite numberof know when to stop propagating. call sites, the environment structure this engenders is still ®nite in There are only a ®nite number of lambda expressionsin a given size, and hence our caches are still computable. However, the ®ner program; the in®nite sets of functions arise because we can close granularity (or increased environment structure size) will cause the these lambdas with an in®nite set of environments. If we can cache to be more expensiveto compute. collapse our in®nite set of environments down to a ®nite approx- imation, then we can successfully compute a control-¯ow cache Thus, 0CFA and 1CFA allow us to trade off compile-time ef®- function. ciency for compile-time precision of analysis. If we examine the abstract semantics developedin the previous Now that we've abstracted the control-¯ow analysis, the rea-

section (®gure 5), we can seethat the in®nite environmentstructure son for factoring the environment should be clear. Factoring the

nb A

is built with all the calls to the function in the C and functions. environment exposed the binding mechanisms that gave rise to the

= nb b = If we replaced each call b with , we would then fold in®nite environment structure. Abstracting the contours and merg- all contours created by a given lambda expressiontogether, and our ing bindings was the critical step that allowed us to reduce this in®nite environmentsetwould collapseinto a®nite, manageableset. in®nite structure to a ®nite, computable one.

8 CFA Scheme Semantics Im plementation from there. This, of course, saves time, and allows for simpler convergence tests. I have written a prototype implementation of 1CFA; it is a straight- While this approach is correct, it does not generalise. In 1CFA, forward translation of the semantics into Scheme code. The re- we allow multiple distinct contours over a single lambda. So we

cursions in the semantics equations are terminated with a variant can't just propagate forward from all references to v Ð the envi- of Young and Hudak's memoised pending analysis [16]. The type ronment structure determining propagations must be established by recovery analysis mentioned in section 1 is built on top of this im- following the control-¯ow paths. plementation. The prototype implementation uses a modi®ed copy This dif®culty arises from the power of lambda: it provides of the ORBIT compiler's front end to produce CPS Scheme code both environment and control structure. In 0CFA, the environment from programs written in full Scheme. The implemented 1CFA structure collapses into the degenerate case exploited by the early semantics extends the semantics presented in this paper to include algorithm. side effects, external procedures and external calls. In , it

statically separates user procedures from introduced Conclusion by the CPS conversion. This last point is worth brie¯y discussing. CPS Scheme is an intermediate representation for full Scheme. In full Scheme, the user cannot write CPS-level continuations: all My chief purpose in writing this paper is to show a formal descrip- continuations, all variables bound to continuations, and all calls to tion of the Scheme control-¯ow analysisproblem. This description continuations (i.e., returns) are introduced by the CPS converter. is useful for several reasons: This divides the procedural world into two halves: user procedures and continuations introduced by the CPS converter. It is easy for It leads us to useful computable approximations. the CPS converter to mark these continuation lambdas, variables and call sites as it introduces them into the program. This parti- The semantic description should give the reader a detailed, tion is a powerful constraint on the sets propagated around by the rigorous understanding of the Scheme control-¯ow analysis analysis: a given call site either only calls user procedures, or only problem and its approximate solutions. calls continuations; a given variable is either bound to only user procedures, or bound to only continuations. This partition holds Because it is a formal description, grounded in the semantics throughout all details of the CFA semantics; exploiting it produces of Scheme, it can serve as a basis for proving formal proper- a much tighter analysis. ties of the analysis,its connectionto the standardScheme se- mantics, and the correctness of program optimisations based Running interpreted, the 1CFA implementation is able to anal-

on it. We would like to prove that the semantics are all delq yse small examples (such as fact or ) in about a half second well-de®ned; that the approximate CFA semantics is a con- on a DECstation 3100 PMAX. This is quick enoughthat I have not servative approximation of the exactCFA semantics;and that bothered to either compile the code or tune the simple algorithm the approximate semantics is computable. Such proofs are and data structures. beyond the scope of this paper, but they can be found in my Type recovery is a particularly interesting optimisation from dissertation [10]. the semantics point of view. I managed to implement induction- variable elimination and useless-variable elimination usingan early

It is a description which helps the Scheme compiler writer to ad hoc control-¯ow analysis algorithm [9], before I cast CFA into develop and implement useful program optimisations, such the non-standard abstract semantic interpretations framework pre- as type recovery. Hopefully, compiler writers can use this sented in this paper. I could not have done so with type recov- description to design and implement their own optimisations. ery analysis. Due to its dependence on sophisticated environment analysis, its design depended on the guidance of the semantics Control-¯ow analysis is an important tool for developing anal- presented in this paper. The semantic de®nition of control-¯ow yses and optimisations for higher-order programming languages analysis has proved to be a valuable engineering tool for delivering suchas Scheme. As such,it is too important to existwithout a solid useful Scheme program optimisations. theoretical foundation. The aim of this paper is to sketch out the Detailed discussion of the CFA implementations and the opti- structure of that foundation. misations built on them is beyond the scope of this paper; they are

treated elsewhere [10, 11, 13, 14].

A cknowledgments Discussion Refereeing conference papers is a tedious job. The PEPM referees gave my extended abstract a careful reading and provided many A note on 0CFA that may be of interest to those who have read thoughtful suggestions that improved the ®nal paper. I am grateful the PLDI '88 paper: The algorithm I presented in that paper was for their efforts. John Reynolds steered me through the (forthcom- fundamentally different from the semantics presented in this paper ing) proofs of semantic correctness. Dana Scott helped me with in one important respect. In this paper, the environment informa- a dif®cult domain construction. My advisor, Peter Lee, substan- tion is propagated along paths through the control structure of the tially improved an early draft of this paper. Olivier Danvy supplied program. That is, when we determine that some lambda's variable several useful references and his usual infectious enthusiasm.

v can be bound to some new function, we pass that information along to the lambda's call body, who passes it to the functions it

calls, and so forth. Eventually, this information may propagateto a References

reference to v, where it will used. In the algorithm presented in the early paper, the information [1] AndrewAppelandTrevorJim. Continuation-passing,closure- propagates along paths through the environment structure of the passing style. In Conference Record of the Sixteenth Annual

program. That is, once we determine that v can be bound to some ACM Symposium on Principles of Programming Languages,

new function, we jump straight to all references to v, and propagate pages 293±302, January 1989. CFA Scheme Semantics 9

[2] Anders Bondorf. Automatic autoprojection of higher order recursive equations. ESOP '90, Neil Jones (editor). Springer- Verlag, May 1990. [3] Patrick Cousot and Radhia Cousot. Abstract interpretation: a uni®ed lattice model for static analysis of programs by con- struction or approximation of ®xpoints. In Conference Record of the Fourth Annual Symposium on Principles of Program- ming Languages, pages 238±252. Association of Computing Machinery, 1977. [4] Paul Hudak. A semantic model of reference counting and its abstraction. In Proceedings of the 1986 ACM Conference on Lisp and Functional Programming, August 1986. [5] Paul HudakandAdrienneBloss. Variations onstrictness anal- ysis. In Proceedings of the 1986 ACM Conference on LISP and Functional Programming, August 1986. [6] David Kranz, et al. ORBIT: An optimizing compiler for Scheme. In Proceedings of the SIGPLAN '86 Symposium on Compiler Construction, publishedas SIGPLAN Notices 21(7), pages 219±233. Association for Computing Machinery, July 1986. [7] John Reynolds,Schoolof Computer Science,CMU. Personal communication. [8] Peter Sestoft. Replacing function parameters by global vari- ables. Master's Thesis, University of Copenhagen,1988. Stu- dent report 88-7-2, DIKU. A conference-length version of this thesis appears in the FPCA '89 Conference Proceedings, pages 39±53, September 1989. [9] Olin Shivers. Control-¯ow analysis in Scheme. In Proceed- ings of the SIGPLAN '88 Conference on Programming Lan- guage Design and Implementation, June1988. Also available as TechnicalReport ERGO-88-60, CMU School of Computer Science, Pittsburgh, Penn. [10] Olin Shivers. Control-Flow Analysis of Higher-Order Lan- guages. Ph.D. Dissertation, CMU. (Forthcoming) [11] Olin Shivers. Data-¯owanalysisandtyperecoveryin Scheme. Technical Report CMU-CS-90-115. CMU School of Com- puter Science, Pittsburgh, Penn., March 1990. Also to appear in Topics in Advanced Language Implementation, Peter Lee (editor), MIT Press. [12] Olin Shivers. The semanticsof Schemecontrol-¯ow analysis. In Proceedings of the First ACM SIGPLAN and IFIP Sym- posium on Partial Evaluation and Semantics-Based Program Manipulation, June 1991. To appear in SIGPLAN Notices. Also available as Technical Report CMU-CS-91-119, CMU School of Computer Science, Pittsburgh, Penn. An early ver- sion was available as Technical Report ERGO-90-090.

[13] Olin Shivers. Super- : Copy, constant, and lambda propaga- tion in Scheme. Working note #3, May 1990. [14] Olin Shivers. Useless-variable elimination. Working note #2, April 1990. [15] Guy L. Steele Jr. RABBIT: A Compiler for SCHEME. Tech- nical Report 474, MIT AI Lab, May 1978. [16] Jonathan Young and Paul Hudak. Finding ®xpoints on func- tion spaces. Research Report 505, Yale University, Depart- ment of Computer Science. December 1986.