Automatic Program Optimization

SIGPLAN ’93 PLDI Tutorial Notes

Ron Cytron

Washington University St. Louis, Missouri

Printing: August 5, 1994

1993 by Ron K. Cytron organizations

Program Front end: Operator and storage ab- stractions, alias mechanisms. Middle end: Parser Front  Dead code elimination End  Code motion

Semantics  Reduction in strength

 Constant propagation

 Common subexpression elimina- Middle tion Optimizer End  Fission

 Fusion

 Strip mining Code Back  Jamming Generation End

 Splitting

 Collapsing Back end: Finite resource issues and Text code generation.

Copyright c 1993 Ron K. Cytron. All rights reserved –1– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Some thoughts

Misconceptions All too often:::

Optimization optimizes your program. Optimization is disabled by default. De- There’s probably a better or sequence of bugging optimized code can be treacherous [71, 42].

program transformations. While optimization hopefully Optimization is often the primary suspect of program improves your program, the result is usually not optimal. misbehavior—sometimes deservedly so. “No, not the Optimization requires (much) more third switch!” compilation time. For example, dead code Optimization is slow. Transformations are often elimination can reduce the size of program text such applied to too much of a program. Optimizations are that overall compile time is also reduced. often textbook recipes, applied without proper thought. A clever programmer is a good sub- Optimization produces incorrect code. stitute for an . Although recent work is encouraging [67], optimizations While efficient coding of an algorithm is essential, pro- are usually developed ad hoc. grams should not be obfuscated by “tricks” that are Programmers are trained by their com- architecture- (and sometimes compiler-) specific. pilers. A style is inevitably developed that is con- ducive to optimization.

Optimization is like sex:

 Everybody claims to get good results using exotic techniques;

 Nobody is willing to provide the details.

Copyright c 1993 Ron K. Cytron. All rights reserved –2– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Multilingual systems

C FTN. . . ADA C FTN. . . ADA

IL

IBM SUN IBM SUN . . . CRAY . . . CRAY RS/6000 SPARC RS/6000 SPARC

Architecting an intermediate language reduces the incremental cost of accom- modating new source languages or target architectures [7]. Moreover, many optimizations can be performed directly on the intermediate language text, so that source- and machine-independent optimizations can be performed by a common middle-end.

Copyright c 1993 Ron K. Cytron. All rights reserved –3– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Intermediate languages

It’s very easy to devote much time and effort toward choosing the “right” IL. Below are some guidelines for choosing or developing a useful intermediate language:

 The IL should be a bona fide language, and not just an aggregation of data structures.

 The semantics of the IL should be cleanly defined and readily apparent.

 The IL’s representation should not be overly verbose: – Although some expansion is inevitable, the IL-to-source token ratio should be as low as possible. – It’s desirable for the IL to have a verbose, human-readable form.

 The IL should be easily and cleanly extensible.

 The IL should be sufficiently general to represent the important aspects of multiple front-end languages.

 The IL should be sufficiently general to support efficient code generation for multiple back-end targets. A sampling of difficult issues: Ideally, an IL has fractal characteristics: optimization can proceed at a given  How should a string operation be level; the IL can be “lowered”; opti- represented (intact or as a “loop”)? mization is then applied to the freshly

 How much detail of a procedure’s exposed description. behavior is relevant?

Copyright c 1993 Ron K. Cytron. All rights reserved –4– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Intermediate languages (cont')

Components of an IL:

SymbolTable: declares the name space of a procedure. AliasRelations: declares sets of aliased names. Semantics: gives a formal specification of the procedure’s behavior.

Example program:

Procedure y x;  foo The procedure randomly assigns the ad-

declare dress of a or b to p , and then stores y + x 

into the location dereferenced by p . ;y x; integer

;b a; integer This example is intended to illustrate

?p integer

?p y + x  ?p may-alias behavior: may be an alias

for a or b , and static analysis certainly  and r  p ?&a :&b can’t decide which alias will hold at run- time. end

We’ll now look at a possible intermediate representation of this procedure, using a Lisp-like notation that is easy to parse and to extend. The verbose keywords are easily coded and stored efficiently as terminals of the language’s grammar.

Copyright c 1993 Ron K. Cytron. All rights reserved –5– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Symbol table

(SymbolTable (Symbol (NumSymbols 5) (SymbolName a) (Symbol (SymbolID 4) (SymbolName x) ) (SymbolID 1) (Symbol ) (SymbolName b) (Symbol (SymbolID 5) (SymbolName y) ) (SymbolID 2) ) ) (Symbol (SymbolName p) (SymbolID 3) )

Symbol attributes (SymbolType, SymbolSize, SymbolVolatile, etc.) can be added as needed.

Copyright c 1993 Ron K. Cytron. All rights reserved –6– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Alias relations

In programs with pointers or reference (AliasRelations parameters, use of a lexical name could (NumAliasRelations 2) refer to one of multiple, distinct storage (AliasRelation names. We therefore specify sets of (AliasID 1) names that might be treated in this man- (MayAliases 2 a b) ner. Subsequently, a given reference to ) a lexical name can be associated with (AliasRelation an alias relation entry, to describe the (AliasID 2) potential effects of that reference. (MustAliases 1 y) (MayAliases 1 x) In this example, we have an alias re- ) lation with may-aliases for a and b and (AliasRelation no must-aliases. This relation might be (AliasID 3) appropriate for the assignment through (MustAliases 1 x)

?p . (MayAliases 1 y)

Also shown are two relations in which x )

and y are alternately must- and may- ) aliased. If parameters are passed “by-

address”, then a use of x may be a use

of y if each parameter is supplied the same address.

Copyright c 1993 Ron K. Cytron. All rights reserved –7– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Procedure semantics

(ProcSemantics (NodeSemantics (NumNodes 5) (NodeID 2) (Def (NodeSemantics (DefID 2) (NodeID 1) (SymbID ?) (Def (AliasWith 1) (DefID 1) (DefValue (SymbID p) (+ (DefValue (Use (Choose (UseID 1) ((= rand() 0) (SymbID x) (Addr a) ) (Addr b) (Use ) (UseID 2) ) (SymbID y) ) ) ) ) (Jump 2) ) ) ) )

)

Copyright c 1993 Ron K. Cytron. All rights reserved –8– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL What happens in the middle end?

Essentially, the program is transformed Graph into an observably equivalent while less resource-consumptive program. Such Depth-First Numbering transformation is often based on: Spanning Tree

 Assertions provided by the program Dominators Intervals author or benefactor. Program Dominance Semantics Profiling  The program dependence Frontiers graph [50, 35, 10]. Sparse Evaluation Static Single

 Static single assignment (SSA) Graph Assignment Control form [26, 5, 69, 28]. Form Dependence Data Flow  Static information gathered by solv- Edges ing data flow problems [44, 51, 52, Problems 53, 41, 54, 55, 48]. Data Dependence

 Run-time information collected by Edges profiling [61]. Program Program Dependence Graph Transformation

Let’s take a look at an example that benefits greatly from optimization:::

Copyright c 1993 Ron K. Cytron. All rights reserved –9– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Unoptimized matrix multiply

for= i 1 to N do

for= j 1 to N do

 ] j i; [ A 0

for= k 1 to N do

] j k; [ C  ] k i; [ B ]+ j i; [ A  ] j i; [ A od od od

Note] that j i; [ A is really

i + A  Addr K  1 j 1 + K  1 2 which takes 6 integer operations.

The innermost loop of this “textbook” program takes 24 integer ops 3loads 1 floating add 1 floating mpy 1store 30 instructions

Copyright c 1993 Ron K. Cytron. All rights reserved –10– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Optimizing matrix multiply

for= i 1 to N do motion The expression] j i; [ A is loop-invariant

= j N for 1 to do with respect to the k loop. Thus, code

 a ] j i; [ &A  can move the address arith-

] j i; [ A for= k 1 to N do metic for out of the innermost loop.

] j k; [ C  ] k i; [ B + ?a  ?a The resulting innermost loop contains od only 12 integer operations. od od

for= i 1 to N do

As loop k iterates, addressing arithmetic  b i; [ &B  1]

for B changes from] k i; [ B + k toi; [ B 1] .

for= j 1 to N do Induction variable analysis detects the

 a ] j i; [ &A  constant difference between these ex-

B = k N K + b for b 1 to do pressions.

] j k; [ C  ?b + ?a  ?a The resulting innermost loop contains only 7 integer operations. od od od

Similar analysis for C yields only 2 integer operations in the innermost loop, for a speedup of nearly 5. We can do better, especially for large arrays.

Copyright c 1993 Ron K. Cytron. All rights reserved –11– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL If optimization is:::

so great becase: then:

A good compiler can sell (even a slow) Why does it take so long? Compilation machine. Optimizing easily provide a time is usually 2–5 times slower, and programs with large factor of two in performance. Moreover, the analysis procedures often take longer. Often this is the result of performed during program optimization can be incor- poor engineering: better data structures or porated into the “programming environment” [50, 25, can help in the optimizer. 68]. Why does the resulting program some- New languages and architectures mo- times exhibit unexpected behavior? tivate new program optimizations. Al- Sometimes the source program is at fault, and a bug though some optimizations are almost universally benefi- is uncovered when the optimized code is executed;

cial, the advent of functional and parallel programming sometimes the optimizing compiler is itself to blame.

languages has increased the intensity of research into Why is “no-opt” the default? Most com- program analysis and transformation. pilations occur during the software development cy- Programs can be written with attention cle. Unfortunately, most debuggers cannot provide to clarity, rather than performance. useful information when the program has been opti- There is no substitute for a good algorithm. However, the mized [71, 42]. Even more unfortunately, optimizing expression of an algorithm should be as independent as compilers sometimes produce incorrect code. Often,

possible of any specific architecture. insufficient time is spent testing the optimizer, and with no-opt the default, bugs in the optimizer may remain hidden.

Copyright c 1993 Ron K. Cytron. All rights reserved –12– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Outline

1. Examples of data flow problems. 2. Flow graphs and their abstractions. 3. Data flow frameworks. (a) Specification. (b) Evaluation. (c) Properties. 4. Sparse Evaluation Graphs. 5. Static Single Assignment (SSA) form. 6. SSA-based algorithms.

Copyright c 1993 Ron K. Cytron. All rights reserved –13– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Ingredients in a data ¯ow framework

Data flow graph Start A

f d f d f d  graphE ; N = G B flow which is based on a directed C D

f f f  E ; N  = G , typically the control flow graph of a procedure. E F A data flow problem is G forward if the solution at a node may depend only on the program’s H past behavior; backward if the solution at a node J may depend only on a program’s K future behavior; bidirectional if both past and future L behavior is relevant [31, 32, 33]. Stop M

 We’ll assume the data flow graph is augmented with t a tar S and top S node, and

an edge from t tar S to top S .

 We’ll limit our discussion to non-bidirectional problems, and assume that edges in the data flow graph are oriented in the direction of the data flow problem.

Copyright c 1993 Ron K. Cytron. All rights reserved –14– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Ingredients in a data ¯ow framework (cont'd)

Meet lattice which determines the out- come when disparate solutions combine. The lattice is specified with distinguished elements Soln1 Soln2

> which represents the best possible solution, and Soln3

? which represents the worst possi- ble solution. Transfer Functions which transform one solution into another. Soln IN

OUT = f (IN)

Soln OUT

We’ll use the meet lattice to summarize the effects of convergent paths in the data flow graph, and transfer functions to model the effects of a data flow graph path on the data flow solution. We’ll begin with the four simple and related bit-vectoring data flow problems, classically solved as operations on bit-vectors. For ease of exposition, we’ll associate data flow solutions with the edges, rather than the nodes, of the data flow graph.

Copyright c 1993 Ron K. Cytron. All rights reserved –15– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Available expressions

An expression expr is available ail (Av )at Start

flow graph edge e if any past behavior of the program includes a computation

of the value of expr at e .

v = 9 Consider the expression w + v  in the y = v+w flow graph shown to the right. If the ex- pression is available at the assignment w = 5

to z , then it need not be recomputed. x = v+w

 This is a forward problem, so the data flow graph will have the same edges

and t tar S and top S nodesastheflow graph.

 The solution for any given expr is

either ail Av ail or Av .

 The “best” solution for an expression z = v+w

vail is Av . We thus obtain the two-level lattice:

> ail is Av .

? ail is Av . Stop v = 2

Copyright c 1993 Ron K. Cytron. All rights reserved –16– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Available expressions(cont'd)

Avail Nodes that compute an expression Start make that expression available. We also assume that every expression is

available from t tar S .

The transfer function for each high- v = 9 ail Av  w y = v+w

lighted node makes the expression+ v  , regardless of the solution w = 5 present at the node’s input.

Avail Avail x = v+w

Avail

z = v+w

Avail

Stop v = 2

Copyright c 1993 Ron K. Cytron. All rights reserved –17– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Available expressions(cont'd)

Nodes that assign to any variable in Start an expression make that expression not available, even if the variable’s value is unchanged.

The transfer function for each high- v = 9  yw = v+w

lighted node makes the expression+ v  Not Avail Not Avail

vail Av , regardless of the solution w = 5 present at the node’s input. Not Avail x = v+w

z = v+w

Stop Not Avail v = 2 Not Avail

Copyright c 1993 Ron K. Cytron. All rights reserved –18– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Available expressions (cont'd)

Here we see the global solution for avail- Start Avail

ability of the expression w + v  . Avail Each of the highlighted nodes shown Not Avail previously asserts a solution on its output Not Avail Not Avail edge(s). It’s the job of global data v = 9 y = v+w flow analysis to assign the best possible Not Avail Not Avail solution to every edge in the data flow v = 8 w = 5 graph, consistent with the asserted solu- Not Avail Not Avail tions. Avail Avail x = v+w

The expression w + v  need not be Avail computed in the assignment to z .The

relevant value is held either in x ,ory , depending on program flow. Avail To solve this problem using bit-vectors, Avail assign each expression a position in the bit-vector. When an expression is avail- z = v+w Avail Avail able, its associated bit is 1. Avail

Stop v = 2

Copyright c 1993 Ron K. Cytron. All rights reserved –19– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Very busy expressions

An expression expr is very busy (VB )at Start v = 2

flow graph edge e if any future behavior of the program references the value of w = 5

expr at e . Where an expression is very busy, re- sources allocated to compute or hold that expression won’t be wasted. v = 3 x = v+w

 This is a backward problem, so the data flow graph will have every

edge reversed, and nodes t tar S and

top S will be interchanged.

 The solution for any given expression

is either VB or VB . u = v+w

 The “best” solution for an expression

is VB . We thus obtain the two-level lattice:

> is VB .

? is VB . Stop

Copyright c 1993 Ron K. Cytron. All rights reserved –20– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Very busy expressions (cont'd)

Not VB Stop v = 2 Stop v = 2 Not VB Not VB w = 5 w = 5

VB Not VB v = 3 x = v+w v = 3 x = v+w

VB

VB u = v+w u = v+w

VB Start Start

Nodes that compute an expression Nodes that assign to any variable in make that expression very busy. We an expression make that expression not also assume that every expression is very busy.

very busy from t tar S . The above graphs are data flow graphs, with their edges oriented in the direction of this backward data flow problem.

Copyright c 1993 Ron K. Cytron. All rights reserved –21– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Very busy expressions (cont'd)

Not VB Here we see the global solution for the Stop v = 2

expression w + v  . Not VB w = 5 The original program may call for two Not VB VB

evaluations w of+ v  : one at the assign- A

ment of x and one at the assignment Not VB VB

of u . By moving the computation of v = 3 x = v+w

 w + v  from the assignment of u to node VB VB

A, we can avoid two computations of VB VB

the expression in an iteration of the outer VB

loop, even though the expression w + v 

isn’t available at the assignment to u . VB u = v+w As with available expressions, each ex- Not VB pression is assigned a position in the bit- Not VB VB vector. Not VB

Not VB VB Not VB Start

A variant of this data flow problem is useful for early transmission of data or messages in a distributed or hierarchical storage system (i.e., message passing architectures or programmable caches) [30].

Copyright c 1993 Ron K. Cytron. All rights reserved –22– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Reaching defs

A definition d of variable v reaches an Start edge e if the past behavior of the pro-

gram may cause d to provide the value

for v at e . Each definition shown in the flow graph is v1 = to the variable v ; the definitions are sub- scripted so we may distinguish among v 2 = the definitions of v . G

 This is a forward data flow problem. = v

 If a definition d does not reach e ,then

d is irrelevant to program optimiza- H

tion at e . Thus, we obtain a lattice = v where v 3 = > is the empty set.

? is the set of all definitions.

 The meet of two solutions a and b is

the union of the definition sites in a call f(v4 )

and b .

Stop

Copyright c 1993 Ron K. Cytron. All rights reserved –23– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Reaching defs (cont'd)

If a node absolutely changes the value Start of a variable, then the definition kills all other values for that variable. The trans- fer function for each highlighted node

produces only the node’s definition of v v = in the output. 1 {v1} {v1}

v = {v2} 2

= v

= v {v3} {v3} v 3 =

call f(v4 )

Stop

Copyright c 1993 Ron K. Cytron. All rights reserved –24– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Reaching defs (cont'd)

Otherwise, the definition preserves all Start other reaching defs of the variable, as might be the case with the procedure call highlighted above. The node’s transfer function adds the node’s defi- v = 1 nition of v to any that already reach the 1 node. v = 1 Examples of preserving defs include 2

 potentially modified variables at a = v call site;

 assignments that modify some, but not all, elements of an array; = v

 assignments to may-aliased vari- v = 5 ables. 3

call f(v4 ) IN + {v4} IN + {v4} Stop

Copyright c 1993 Ron K. Cytron. All rights reserved –25– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Reaching defs (cont'd)

{ } Here we see the global solution for Start

reaching defs of v . { } Reaching defs is often solved in support

of other optimization problems, such as {v3, v4} {v3, v4} { } constant propagation and register allo- v1 = cation. {v1} {v1} v = In its bit-vector setting, each definition is 2 {v1} {v2} assigned a position in the bit-vector. G {v1.v2} = v {v3. v4}

 The transfer function at node Y has the form {v1. v2. v3. v4}H = v {v3, v4}

Y Y Y

EN GE [  LL KI IN  = IN  f {v3}

Y Y where LL KI and N GE are node- v3 = {v3}

specific bit-vector constants. {v3}

 The meet lattice can be formed by {v3, v4} set-union, which is easily realized for {v3, v4} {v3, v4} the bit-vectors. call f(v4 )

Stop

Copyright c 1993 Ron K. Cytron. All rights reserved –26– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Live variables

A variable v is live at edge e if the future A Start behavior of the program may reference

the value of v at e . B

If a variable v is not live, then any C D resources associated with v (registers, storage, etc.) may be reclaimed. E F

 This is a backward problem. v = 1 v = 2

 In the bit-vector representation, G each variable is associated with a bit.

 The “best” solution e is Liv ,sowe H obtain the two-level lattice:

> e is Liv . J call f(v)

? e is Liv . K

L x = v

M Stop v = 3

Copyright c 1993 Ron K. Cytron. All rights reserved –27– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Live variables (cont'd)

Each of the highlighted nodes affects Stop the data flow solution:

 If a node uses v , then the node’s

asserts that v e is Liv .

 If a node kills v , then the node’s Not Live Not Live output asserts that v e is Liv . v = 1 v = 2

call f(v)

Not Live Live Live x = v Not Live Not Live Start v = 3

Copyright c 1993 Ron K. Cytron. All rights reserved –28– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Live variables (cont'd)

If a node Y preserves v (as might a Global solution: Live variables procedure call), then the node does not Stop Live affect the solution. A Live Live  If v e is Liv on “input” to Y ,thenY B

cannot make e v Liv . Not Live Live C D  If v e is Liv on “input” to Y ,thenY Not Live Not Live does not make e Liv v . E F v = 1 v = 2 Node Y ’s transfer function is therefore Live Live Live the identity function: G

Y

IN = IN  f Live

assuming node Y does not use v . H

Live J call f(v) Live

K Live Not Live Live Live L x = v Not Live Start Not Live M v = 3

Copyright c 1993 Ron K. Cytron. All rights reserved –29– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL The bit-vectoring data ¯ow problems

The four problems we’ve just covered share the following properties:

 A solution can be represented as a bit-vector.

 The meet operation can be realized as the Boolean operation or for reaching defs and live variables; and for available expressions and very busy expressions.

 At each node Y we have a transfer function of the form

Y Y Y

EN GE [  LL KI IN = f

Y Y where LL KI and N GE are node-specific bit-vector constants.

Copyright c 1993 Ron K. Cytron. All rights reserved –30– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Flow graphs

Intraprocedural: nodes of the control flow graph should represent all potentially executable code, and any program path followed during execution should exist as a path in the control flow graph. For lan- guages whose control flow behavior is syntactically apparent (using source text), the associated control flow graph can be constructed easily and precisely [3]. Higher-order languages such as Scheme complicate the construction [65], as do languages such as APL, where any statement could be the target of a branch [20]. Interprocedural: nodes of the procedure call graph should represent all potentially executable procedures, and any possible sequence of procedure calls during program execution should exist as a path in the call graph. Where procedures cannot be passed as parameters, construction of the call graph is straightforward; otherwise, some preliminary flow analysis is required to construct a precise graph [57, 15].

The correctness of static analysis depends on the availability of a suitably ap- proximate flow graph; the success of static analysis depends on the degree of approximation.

Copyright c 1993 Ron K. Cytron. All rights reserved –31– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Flow graphs (cont'd)

main Procedurecond Control Flow Graph for main() if 1 then if (cond1) call foo

fi cond

while 2 do call foo()

 w 5 od while (cond2) w = 5 end

foo Procedurecond Stop if 3 then

call A Procedure Call Graph

call B else main()

call C

call main fi foo() end

A B C

Copyright c 1993 Ron K. Cytron. All rights reserved –32– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Flow graphs (cont'd)

 Implicit exceptional control flow (division-by-zero, storage violations, etc.) can also be precisely modeled, but many analyzers do not account for exceptions that cause a program to halt.

 The procedure call graph is useful in programming environments [43, 14] and for solving certain interprocedural problems [4, 11, 24, 13].

 Notably absent from the procedure call graph are edges that represent return from procedure calls, although more sophisticated representations are possible [49, 16].

 An approach that obtains a uniform view of intra- and interprocedu- ral flow is continuation-passing style , in which anything executable is modeled as a function, supplied with continuation expressions to accomplish future action [6].

Copyright c 1993 Ron K. Cytron. All rights reserved –33– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL A ¯ow graph is::: structured if its associated procedure consists of the standard compound statements, structured loops, and structured branching constructs. A more precise definition requires graph-grammars [34] or control de- pendence [35]. reducible if every loop has a single point of entry. By applying a simple graph-grammar, such graphs can be reduced to a single node; more- over, applying that graph-grammar to an irreducible graph always re- sults in the graph shown to the right [36].

Generally, analysis can be faster for structured or reducible flow graphs; however, the best algorithms work for arbitrary graphs while obtaining the performance of specialized algorithms on specialized graphs. A more detailed classification of graph structure is given by Baker [8] and Sharir [64].

Copyright c 1993 Ron K. Cytron. All rights reserved –34– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Flow graph data structures

The following data structures simplify and facilitate program optimiza- tion: DFST is the depth-first spanning tree of a flow graph, constructed by depth-first search [1]. Dom is the (immediate) dominator tree of the flow graph, constructed by transitive reduction of a full dominator graph [2] or by a more clever and direct algorithm [47]. PDom is the (immediate) postdominator tree of the flow graph, con- structed by finding dominators of the reverse flow graph [35]. DomFron is the dominance frontier graph, constructed by one pass over the dominator tree [28]. CD is the control dependence relation, constructed by finding domi- nance frontiers of the reverse flow graph [28]. Intervals are the results of partitioning a flow graph’s nodes into single- entry (perhaps strongly-connected) loops [3, 66, 63].

Copyright c 1993 Ron K. Cytron. All rights reserved –35– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Depth-®rst spanning trees

f

2N Z

 num ? 0; oot r ?  ?  d ; chil ?  ?  ent ; par   ?  ; fn d 0

= Z foreach fn d do

oot r if  Z  ing ibl S 0 then

 Z  call S DF

Z  ; oot r fi od

+ num  num

Procedure X  S DF

 X  ucc S 2 Y

num   X 1; fn d X   num  ; er v

= Y foreach fn d do

if 0 then

X   Y   ent X  par d chil   Y  ing ; sibl Y   X  d ; chil

 Y  call S DF fi od

 X  fn d num   X  eny og pr

end

Copyright c 1993 Ron K. Cytron. All rights reserved –36– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Depth-®rst spanning trees (cont'd)

Here is shown a possible depth-first Enter A numbering of a flow graph. Although 1 3 unvisited nodes can be considered in B any order, the numbering on the right is 4 obtained when: C D 5 14

nter E  is the first node visited; E F  Unvisited successors of a branch 6 13 node are considered “left to right”. G 7

H 8

J 9

K 10

L 11

Exit M 2 12

Copyright c 1993 Ron K. Cytron. All rights reserved –37– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Depth-®rst spanning trees (cont'd)

Here is shown the associated depth-first Enter A 1..14 3..14 spanning tree. Each node X is labeled

with the range of node numbers in its B depth-first spanning subtree: 4..14

C D

 X  eny og pr + x  fn d :::  x  fn d 5..13 14..14 as computed by the algorithm. E F 6..12 13..13 By retainingX  eny og pr , we obtain an

 O 1 test for Y ’s inclusion in X ’s depth- G first spanning subtree: we need only 7..12

 testY  fn d for inclusion in the labeled

range for X .WhereY is in the depth- H 8..12 first spanning subtree rooted at X ,we’ll

use the notation J 9..12

X< Y K 10..12

L 11..12

Exit M 2..2 12..12

Copyright c 1993 Ron K. Cytron. All rights reserved –38– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Depth-®rst spanning trees (cont'd)

Enter Enter 1..14 3..14 1..14 3..14

B 4..14 4..14

D 5..13 14..14 5..13 14..14

F 6..12 13..13 6..12 13..13

G 7..12 7..12

H 8..12 8..12

J 9..12 9..12

10..12 10..12

L 11..12 11..12

Exit M Exit M 2..2 12..12 2..2 12..12

The dashed edges above are cross The dashed edges above are back edges with respect to our DFST. Picto- edges with respect to our DFST; such rially, such edges go right-to-left. edges represent loops in the graph.

Both types of edges are Y from fn d X   toX  Y ,fn d ; however,< Y X for a back

edge and< 6 Y X foracrossedge.Achord edge goes from X to Y , X< Y (e.g.,

flow graph edgeL ! J , not shown).

Copyright c 1993 Ron K. Cytron. All rights reserved –39– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Dominance

 Node Y dominates node Z , denoted Y Z , if every path from the

graph’s oot r to Z includes node Y .

During program execution, if node Z has executed, then so has node

Y .

 A node always dominates itself.

 Node Y strictly dominates Z , denotedZ  Y ,if Y Z Z and6= Y . A node never strictly dominates itself.

 The immediate dominator of node Z , denoted Z  idom ,istheclosest

 strict dominator of Z :

Z  Y     Z  idom = Y  X Z; and X 8  Y

f f The dominator tree for G has nodes N ; Y is a parent of Z in this tree

if andZ  onlyidom = iff Y .

Copyright c 1993 Ron K. Cytron. All rights reserved –40– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Fiber-optic view of dominance

Enter A Enter A

B B

C D C D

E F E F

G G

H H

J J

K K

L L

Exit M Exit M

a t tar S  is a light source;

 each edge conducts light from its source to its target;

 each node normally transmits light received on any in-edge to each out-edge.

Nodes dominated by X are “cast in shadow” when X is made opaque (transmission

of light through X is prevented).

Copyright c 1993 Ron K. Cytron. All rights reserved –41– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Dominator tree

Enter

Exit A

B

C D H

EFG J

K L

M

This is the most compact representation of dominance, requiring only two “pointers” per node: one for the node’s leftmost child and one for the node’s right sibling. In the tree representation, a node dominates all descendants, strictly dominates all proper descendants, and immediately dominates all children.

Copyright c 1993 Ron K. Cytron. All rights reserved –42– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Dominance (cont'd)

root  Z  Let dom be the set of nodes that dom-

inate Z . A recursive way of specifying X

 Z  dom is:

Y1 Y2 Z   Z is in dom ;

Z Y3  Any node dominating each prede-

cessor of Z also dominates Z Y4 We then obtain the following equation at

X If some node dominates each prede- each node Z :

i

 Z  eds r P 2 Y cessor Y of Z ,thenX appears on every

 Y  dom g[ Z f = Z  dom

oot r Z X

path from to and so \ dominates

Z .

We essentially obtain a simple forward data flow framework, in which

 meet is set intersection;

 the transfer function for each node Z is

Z

IN g[ Z f = IN  f

Copyright c 1993 Ron K. Cytron. All rights reserved –43– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Dominance as a data ¯ow problem

A R B

 B  dom \  A  dom  B A

g[ A f =  A  dom

Writing a suitable equation R  atdom eachg[ B f =  B  dom node, we obtain: g R f =  R  dom

 A  dom g[ B f =  B  dom

g A f =  A  dom

On inspection, it’s clear how to evaluate The first two equations are easily evalu- these equations to obtain:

g B A; f =  B  dom ated:

g B R; f =  B  dom

g A f =  A  dom

g R f =  R  dom

The equation A  for dom is problematic,

because the solution A  for dom de- pends on itself.

In general, solving such a system of equations involves an initial approximation for the solution at each node.

Copyright c 1993 Ron K. Cytron. All rights reserved –44– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL

Copyright

point oee,w i o bantebs,or best, the maximum obtain not did we However, ete obtain then We h bv qainrpeet a represents equation above The node each for approximate initially we Suppose l e fndsta aifisteequa- the satisfies that tions. nodes of set possi- ble “largest” the contain would lution .Eeynd in node Every 1. .Frhreauto sbttto of (substitution evaluation Further 2. nates dom tion. c dom : 93RnK yrn l ihsreserved rights All Cytron. K. Ron 1993  A  A A  xdpit nwihec so- each which in point, fixed osntcag n solu- any change not does ) ;  = = dom Z f f . A A  Z g[ g dom dom =  ;\   ; A A  dom = rl domi- truly  B f 

A fixed g[  dom –45– wri bandb ntal assuming initially is solution by each an- obtained best is the problem, swer flow data any In o that reason informally now may We point. fixed a in oefra ai o uhrea- such shortly. surface for will basis soning formal more A mo- code tion. for because latitude allow problem, dominators dominance the for solution, correct a indeed than “larger” is which obtain then would We ups u nta prxmto were approximation initial our Suppose  A dom  \  A dom  = = f dom  A B f f > > A ;B A; ; R g   . n hnieaigto iterating then and , Z g[ = = S N IG P  LAN f N N g 9 P ’93 f f \ ROGRAM dom O PTIMIZATION  B  T UTORIAL Simple dominance algorithm

 ist l W

f N  ?  dom  1 The loop at 2 performs the fixed point

; 6= ist l W computation, based on the initialization  root

 at 1 . The set of nodes whose solution while do g Y f ist l W  ist l 2W

may have changed is maintained in the  Y element from ist l W  3

 work list ist l W . The equation for node

f

E 2  X;Y 

Y is recomputed at 4 .Step5 detects  X  dom g[ Y f ndom 4

\ any change in solution, in which caseY  dom 6= ndom

successors of Y are added to the work 5  if then

list at 6 . f

2E  Z Y; 

ndom   Y  dom

When the work list is empty, each equa- foreach do 6

tion has stabilized and the algorithmg Z is[f ist l W  ist l W finished. od fi od

The dominators algorithm is written this way to expose its data flow nature.

Copyright c 1993 Ron K. Cytron. All rights reserved –46– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Fast dominance algorithm

2 2

f f f  The dominance relation canj takejN  O  spacej andjN  O  j jE time to compute.

 A faster algorithm computes immediate dominance directly [47]:

f 1. The graph G is distilled into a dominator-equivalent graph free of cross- and back-edges. 2. Immediate dominators are found for a graph containing only tree and chord edges.

Copyright c 1993 Ron K. Cytron. All rights reserved –47– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Postdominance

Enter A Exit

B Enter M C D L E F

G K J

H H

J

GDB K

L E F C A

Exit M

To compute postdominators, simply compute dominators of the reverse flow graph. Postdominance is useful for code motion [26] and for computing control depen- dence [35].

Copyright c 1993 Ron K. Cytron. All rights reserved –48– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Dominance frontiers [28]

traverse ee r tree omT (D ) order BottomUp at node (X ) do

 X  ucc S 2 Y

;  X  DF

X 6=  Y  foreachidom do if then

g Y [f  X  DF   X  DF  1 fi

 X  en dr od hil C 2 Z

foreach Z  DF 2 Y do

X 6=  Y  foreachidom do if then

g Y [f  X  DF   X  DF  2 fi od od od

Node Z is in the dominance frontier of node X if X dominates a predecessor Y of Z

but X does not strictly dominate Z [28]:

 X  Z  eds r P 2 Y 9  j Z f = X  DF g  Y Y andj  X

Step 1 finds successors of X X  in DF ; step 2 completes X  DF with nodes selected

from dominance frontiers of nodes immediately dominated by X .

Copyright c 1993 Ron K. Cytron. All rights reserved –49– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Fiber-optic view of dominance frontiers

Enter A Enter A

B B

C D C D

E F E F

G G

H H

J J

K K

L L

Exit M Exit M

g H f = C  DF g xit E ; J f = J  DF

Nodes just outside the dominance “shadow” cast by X are X  in DF .

Copyright c 1993 Ron K. Cytron. All rights reserved –50– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Dominance frontiers (cont'd)

Enter Enter

A A

B B

H H

J J

L L

M M

Another way to visualize dominance and keep sliding the source of the edge frontiers is to start with a flow graph edge up the tree. For each position where the superimposed on the dominator tree, tail is not above the head, the target of

such as the::: edge B M;  the edge is in the dominance frontier of the source.

So,  B M is in DF  L  , DF  J  , DF  H  , DF  ,andB  DF .

Copyright c 1993 Ron K. Cytron. All rights reserved –51– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Control dependence

Enter A Enter A

B B

C D C D

E F E F

G G

H H

J J

K K

L L

Exit M Exit M

C A

Node B controls node G by edge! B Node nter E controls H by! edge nter E

. ;node M controls H by edgeB ! M .

To compute control dependence, find the inverse of dominance frontiers of the

reverse flow graph [28], but qualify the relation with the in-edges at  -functions.

Copyright c 1993 Ron K. Cytron. All rights reserved –52– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Intervals [63]

Algorithm summary: Start A

1. Find back edges of the flow graph. B Each target of a back edge is an interval header; C D 2. Find and collapse members of inter- vals, inside-out. E F 3. Path-compression helps speed the search across intervals that have al- G ready been identified.

In the example to the right, H

 Highlighted nodes are interval head- J ers.

 Dark-shaded nodes are in the inner- K most loop.

 Light-shaded nodes are in the outer- L most loop. Stop M

 The remaining nodes are not in a loop, but may be regarded as be- longing to the outermost interval.

Copyright c 1993 Ron K. Cytron. All rights reserved –53– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Formal speci®cation of a data ¯ow framework

The data flow graph The meet semilattice isA

f d f d f d

 E ; N = G  ^ ;  ; ? ; > A; = L

has been described previously: is a set (usually a powerset), whose elements form the domain of the

 its edges are oriented in the direc- data flow problem, tion of the data flow problem;

> and ? are distinguished elements of

f d G is augmented with nodes t tar S

A , usually called “top” and “bot-

and top S and an top S edge t; tar S  , tom”, respectively, suitably inserted with respect to the ? = ^? a

 is a reflexive partial order, and direction of the data flow problem. a = ^> a

^

b is theb associative^ a and commutative Successors and predecessors are also meet operator, such that for any defined with respect to the direction of a  b ^ a A 2 b a; , the data flow problem: a = a ^ a

f d

g 2E  Y X;  j X f =  Y  eds r P

a = b ^ a   b  a

f d

g 2E  Z Y;  j Z f =  Y  uccs S

These rules allow formal reasoning

about > and ? in a framework.

Copyright c 1993 Ron K. Cytron. All rights reserved –54– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Formal speci®cation (cont'd)

The set F of transfer functions Start

f F f g L 7! : L

has elements for describing the behav- ior of any flow graph node with respect to the data flow problem. Y To obtain a stable solution, we’ll require

the functions in F to be monotone:

 y  f   x  f ! y  x

 L 2 y x; 8  2F f 8  Stop

In other words, a node cannot produce ?

a “better” solution when given “worse” ? = IN

: if

>

Y

= IN  f

input. Given a two-level lattice, evalua- > > = IN < if

>

tion of the data flow graph shown to the 8 right oscillates between solutions and never reaches a fixed point.

Copyright c 1993 Ron K. Cytron. All rights reserved –55– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Examples of data ¯ow frameworks: Dominators

Data flow graph is the forward control flow graph of a procedure. Lattice

f

N

A is 2 : the powerset of the flow graph’s nodes. That is, any solution

f is a subset of N ;

f > and ? are N and ; , respectively;

 defined as

 b a     b  a 

g D ; C ; B A; For example,gf C ; B A; f ,butsolutionsg B A; f and

; =

g C B; f are incomparable, and thus unrelated by  .

^; a = ^? a

^ is set intersection, which legitimizes > and ? since

a =

f

^N a = ^> a

Transfer function at each node Z is

Z

g Z [f IN = IN  f

Copyright c 1993 Ron K. Cytron. All rights reserved –56– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Examples: dominators (cont'd)

Consider evaluation of the transfer function at node E below: {A,B,C}{B,C,D} {A,B,C,D}

E

g E C; B; f =

E Let’s compute the meet-over-all-paths Now, suppose that prior to applying f ,

g E [f IN = T OU (MOP) solution: we took the meet of all input values. We would then compute

g C B; f =

 E  1. aths Let P l Al be the set of all paths

g in D ; the C ; B graphA; thatg^f D terminate C; B; g^f with C ; B nodeA; f = IN

p E , and let IN be the solution input

to E on path p ;

 E  aths P l Al 2 p

p E  IN  f = T 2. OU

^

In our example, we would compute: These two approaches compute the

j E  IN  f IN Edge j same solution for this framework; unfor- g E ; C ; B A; f g 1 C ; B A; f tunately there are some frameworks for g E D; C; B; f g 2 D C; B; f which the MOP evaluation produces a g E ; D ; C ; B A; f g D ; C ; 3 B A; f more precise solution. OUT g E C; B; f

Copyright c 1993 Ron K. Cytron. All rights reserved –57– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Examples of data ¯ow frameworks: live variables

Data flow graph is the reverse control flow graph of a procedure. Lattice

V

A is 2 ,whereV is the set of the program’s variables. Each element of A represents a set of live variables;

> and ? are ; and V , respectively;

^

 defined as

 b a     b  a 

is set union, which again legitimizes > and ? . Transfer functions

g v f IN

IN Z v

: if node has an upwards-exposed use of

>

>

>

>

>

Z = IN  f if node Z kills v

<

>

>

g v [f IN >

> otherwise

>

8 The lattice shown to the right occurs when the live variables data flow frame- (Dead) work is specified only for a single vari-

able v . It turns out that live variables is partitionable [70,48],andsothesolution

for a set of variables V can be easily (Live) constructed from the separate solutions

of eachV 2 v .

Copyright c 1993 Ron K. Cytron. All rights reserved –58– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Examples of data ¯ow frameworks: constant propagation

The object of constant propagation is to determine compile-time constants, to be substituted for run-time computa- tions.

Data flow graph is the forward control - ... flow graph of a procedure. -2 -1 0 1 2 ... Lattice

A is the set of all total functions

 ?g ; Z[f>  t 7! : V

Each element of A associates a

?g ; Z[f> value from with each > represents “any constant you

V 2 v . like”;

> ? > = v  t ? = v  t and are and ; ? represents “not a constant value”;

^ is performed by-variable, using  The meet of two disparate values for

the lattice shown to the right. the same variable results in ? .

  is defined as Since the (per-variable) lattice has depth greater than two, solutions

b a b a  v  t   v  vt 8     t  t  cannot be represented by simple also using the lattice. bit-vectors. Most implementations use a list of tuples [69].

Copyright c 1993 Ron K. Cytron. All rights reserved –59– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Examples: constant propagation (cont'd)

Transfer functions: Recall that the domain of our lattice is a set of functions. Thus, a transfer function is a mapping from one function to another. When we write

Y

 IN  f = T OU

for the transfer function at node Y , IN and T OU are themselves functions from V

?g ; to Z[f> . The transfer function for a node Y is then formulated as follows:

 For each variable v whose value is not affected v  IN = byv  T Y , OU ;

 For each variable v whose value is changed by the expression v  T  , OU is computed as follows:

– If any variable 2 y ? = hasy  IN ? = ,thenv  T OU ;

– Otherwise, if any variable 2 y > = hasy  IN > = ,thenv  T OU ;

–Otherwise,each 2 y has a constant value, v  and T OU is set to that value.

Examples of transfer functions::

> :

>

>

>

>

>  v  IN

< >

= IN  f

> Statement Transfer> function

>

8

<

w 6= v > if >

>

>

w=5 >

>

 v  IN

>

> w = v 5 if >

8

w if6= v ?

+ y  IN w if= v > = andy  IN w = y+2

w if= v ? = andy  IN 2 otherwise

Copyright c 1993 Ron K. Cytron. All rights reserved –60– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Examples: constant propagation (cont'd)

Start Start [T,T,T,T]

[T,T,T,T] [T,T,T,T] [?,?,?,3]

[?,?,?,3] [?,?,?,3]

w = 5 y = 3 w = 5 y = 3 [?,5,?,3] [?,5,?,3]

w = y+2 w = y+2

[?,5,?,3] [?,5,?,3] y = w-2 y = w-2

[?,5,?,3] [?,5,?,3]

[?,5,?,3] [?,?,?,3] x = 3 x = 3

[?,5,3,3]

w = 1 w = 1 [?,5,3,3] x = 7 x = 7

[?,1,7,3] u = w+x u = w+x

[?,?,?,3] [?,?,?,3] Stop Stop

Solutions in this example are] given y x; ; w asu; [ . Here, MOP gives a better result after

the highlighted node: Taking the meet of; ? [ 1; 7; 3 ] and; ? [ 5; 3; 3; ] yields? ; ? ; ? [ 3 ]

on input, so u is computed as ? . However, x + w  is8oneachinputedge.

Copyright c 1993 Ron K. Cytron. All rights reserved –61– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Evaluation of a data ¯ow framework

Although more specialized methods exist [46, 17, 48, 70], data flow frameworks are usually evaluated by one of the following methods: Iteration [37, 45] Elimination [58] 1. The flow graph is preprocessed to 1. The flow graph is partitioned into a

obtain a clever node ordering NL . set of (typically single-entry) regions

2. Within each iteration, nodes are vis- R . With interval analysis [3, 13], each region is a loop, or interval [66, 63], ited in order NL : of the flow graph.

(a) The input to a node Y is evalu- ated as the meet of the solutions 2. The flow graph is reduced into an

interconnection of members of R . at Y ’s predecessors.

3. EachR 2 r is assigned a trans-

(b) The transfer function across Y is evaluated on that input. fer function, typically computed by

composition and meet within r [64]. (c) The solution is made available at 4. Steps 1– 3 are repeated on the re-

each of Y ’s successors. gence duced flow graph, until no further 3. Iteration continues until conver- reduction can occur. is reached, by predetermin- 5. Solutions are computed in the outer- ing the number of iterations that suf- most region, using the summarizing fice for convergence, or by deter- transfer functions of any imbedded mining dynamically that subsequent regions. Inner regions are then re- iterations are unnecessary. cursively solved.

Copyright c 1993 Ron K. Cytron. All rights reserved –62– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Evaluation of a data ¯ow framework (cont'd)

 Iteration essentially utilizes single-node regions, and elimination may require multiple evaluations of a region to achieve fixed point. Thus, the two methods have much in common.

 It’s possible to combine aspects from both methods to achieve a hybrid evalua- tion method [48].

 Although each method can be incrementalized, there are difficulties and trade- offs to be considered [23, 60, 59, 12].

Copyright c 1993 Ron K. Cytron. All rights reserved –63– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Evaluation by elimination

Start A Start A

B B

C D C D

E F E F

G G

H H

J J

K K

L L

Stop M Stop M

Suppose regions are discovered as shown above, with the two “dotted” regions nested inside the “dashed” region.

Copyright c 1993 Ron K. Cytron. All rights reserved –64– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Evaluation by elimination (cont'd)

Start A Start A

B

D

f1(IN)

f3(IN)

H

f2(IN)

Stop M Stop

Each region is replaced by a suitable transfer function. The solution into and across the outermost region is computed first. That solution is pushed into the inner regions, so that they can be computed.

Copyright c 1993 Ron K. Cytron. All rights reserved –65– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Iterative evaluation

Consider the following flow graph, annotated with a depth-first numbering of the nodes:

StartA BC D Stop 1 2 346 5

The transfer function at each node Y is

Y

IN g[ Y f = IN  f

Meet is setg intersection top S ; D ; C ; B and A; t; > tar isS f . The following table illustrates evaluation using a poor node ordering:

Round top S D C B A t tar S

1 > > > > > g t tar S f

2 > > > g > A t; tar S f g t tar S f

3 > > g B > A; t; tar S f g A t; tar S f g t tar S f

4 g D ; > B A; t; tar S f g C A; t; tar S f g B A; t; tar S f g A t; tar S f g t tar S f

g top S A; t; 5 tar S f g D A; t; tar S f g C A; t; tar S f g B A; t; tar S f g A t; tar S f g t tar S f

6 :::::::::::::::::::::::::::::::::::::: No change ::::::::::::::::::::::::::::::::::::::

Copyright c 1993 Ron K. Cytron. All rights reserved –66– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Iteration (cont'd)

StartA BC D Stop 1 2 346 5

Now let’s try ordering the nodes by their depth-first numbering:

Round t tar S A B D top S C

g t 1 tar S f g A t; tar S f g B A; t; tar S f g D ; B A; t; tar g S f top S ; D ; B A; t; tar S f g C A; t; tar S f

g t 2 tar S f g A t; tar S f g B A; t; tar S f g D A; t; tar S f g top S A; t; tar S f g C A; t; tar S f

3 ::::::::::::::::::::::::::::::::::::::::: No change ::::::::::::::::::::::::::::::::::::::::: Convergence is achieved in fewer iterations because information is is propagated in the direction of the data flow problem. Can we do better?

Recall edge classification with respect Start 1 to a DFST:

A tree and chord: these are satisfied by 2 preorder depth-first ordering; B cross: these require visiting children 3 right-to-left, in reverse of the order in which they were visited during D C 4 depth-first numbering; 6 back: Well, you can’t win them all. Stop 5

Copyright c 1993 Ron K. Cytron. All rights reserved –67– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Iteration (cont'd)

StartA BC D Stop 1 2 346 5

Right-to-left preorder can evaluate a DAG (directed acyclic graph) in a single pass. Now let’s try this DAG-optimal node ordering:

Round t tar S A B C D top S

g t 1 tar S f g A t; tar S f g B A; t; tar S f g C A; t; tar S f g D A; t; tar S f g top S A; t; tar S f

2 ::::::::::::::::::::::::::::::::::::: No change ::::::::::::::::::::::::::::::::::::: Convergence is achieved in a single iteration, because nodes are visited topologi- cally.

Note that right-to-left preorder is equiv- Start 1 alent to reverse postorder (try a struc- turally inductive proof). A 2

B 3

D C 4 6

Stop 5

Thus, a DAG can be evaluated in a single pass. What about back edges?

Copyright c 1993 Ron K. Cytron. All rights reserved –68– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Termination of iterative evaluation

a b

Y

f(a ^ b) We can then expect that if we monitor  If we require transfer functions to be

monotonic,then the solution at any given flow graph edge, we will see a nonincreasing se-

Y Y c c c  2  f  1  c  f  2  1  quence: the solution never gets “better” as iteration proceeds. When meets are taken, we obtain

b  b ^ a

a  b ^ a

If the lattice has finite descending-chains , then iteration must eventually converge, and so the iterative algorithm terminates.

Copyright c 1993 Ron K. Cytron. All rights reserved –69– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Rate of convergence

1

23

4

5 6

7

8

Reverse postorder for this flow graph is

[ 1; 3; 4; 6; 7; 8; 5; 2 ] We’ll now look at evaluating two frameworks on this graph.

Copyright c 1993 Ron K. Cytron. All rights reserved –70– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Convergence for reaching de®nitions

Consider the framework of reaching definitions, applied to the flow graph shown on the left, where each definition of x is preserving. 1 1 { } { } 2 3 2 3 x1 = x1 = {x1} 4 4 {x1} {x1} 6 5 6 5 x2 = x2 = {x1,x2} 7 7 x3 = x3 = {x1,x2,x3} 8 8

The solution stabilizes in a single pass to the solution shown on the right.

Copyright c 1993 Ron K. Cytron. All rights reserved –71– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Convergence for reaching de®nitions (cont'd)

First pass Second pass

1 1 { } { } {x1} {x1} 2 3 2 3 x1 = x1 = {x1} {x1} {x1} {x1} 4 4 {x1} {x1} {x1} {x1} 5 6 5 6 x2 = x2 = {x1,x2} {x1,x2} 7 7 x3 = x3 = {x1,x2,x3} {x1,x2,x3} 8 8

With the back edge added, two passes are required to reach convergence as shown above.

Copyright c 1993 Ron K. Cytron. All rights reserved –72– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Convergence for reaching de®nitions (cont'd)

First pass Second pass

1 1 {x1,x2,x3} { } { } {x1,x2,x3} {x1,x2,x3} 2 3 2 3 x1 = x1 = {x1} {x1,x2,x3} {x1} 4 4 {x1} {x1} {x1,x2,x3} {x1,x2,x3} 5 6 5 6 x2 = x2 = {x1,x2} {x1,x2,x3} 7 7 x3 = x3 = {x1,x2,x3} {x1,x2,x3} 8 8

With two nested back edges, two passes still suffice for convergence.

Copyright c 1993 Ron K. Cytron. All rights reserved –73– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Convergence for reaching de®nitions (cont'd)

First pass Second pass

1 1 {x1,x2,x3} { } { } {x1} {x1} {x1} 2 3 2 3 x1 = x1 = {x1} {x1,x2,x3} {x1} {x1,x2,x3} 4 4 {x1} {x1} {x1,x2,x3} {x1,x2,x3} 5 6 5 6 x2 = x2 = {x1,x2} {x1,x2,x3} 7 7 x3 = x3 = {x1,x2,x3} {x1,x2,x3} 8 8

Convergence isn’t reached in the second pass, because node 4 does not have its final solution when node 1 is evaluated.

Copyright c 1993 Ron K. Cytron. All rights reserved –74– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Convergence for reaching de®nitions (cont'd)

Third pass Suppose information from node 6 must reach node 2. The information must first

use the back edge 8 ! 4 (after the first

1 pass), and then use the back edge 4 ! {x1,x2,x3} 1 (after the second pass). In the third {x1,x2,x3} {x1,x2,x3} pass, information from node 1 can flow 2 3 to node 2. x1 =

Let w be the maximum number of con- {x1,x2,x3} {x1,x2,x3} secutive back edges through which in- 4 formation must flow to “communicate” {x1,x2,x3} {x1,x2,x3} a solution from nodes X to Y .Thenat

5 6 least+ w 1 passes are required to reach

x2 = convergence, and w is called the width {x1,x2,x3} of the flow graph [33]. 7 x3 = We’ll now examine the conditions under

{x1,x2,x3} which+ w 1 iterations suffice for conver- 8 gence.

In the third pass, node 4 has stabilized, and so all nodes receive their final solution.

Copyright c 1993 Ron K. Cytron. All rights reserved –75– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Rapid data ¯ow frameworks

A data flow framework is rapid if

 a  f   >  f ^ a  2F f 8  A 2 a 8 

 a  f  f   >  f ^ a  f

When presented with some real input >  f a ,^ a takesusjustasfartoward

 a  f   >  f ^ a

convergence (down the lattice) as would application of f on the real input a .More

formally, for monotone f

 anda  f  f represents the solution after two iterations.

a f(T) a

f(a^f(T)) f(a) f(f(a)) f(f(f(a))) etc.

With a rapid framework, one iteration gathers all the information necessary to reach convergence; the solution is not affected by subsequent iteration.

Copyright c 1993 Ron K. Cytron. All rights reserved –76– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Reaching de®nitions is rapid

Recall the framework of reaching defi- Any transfer function in this framework nitions: can be characterized by

p p

= ^ N GE [  LL KI IN 

p

; = > = IN  f set union

b a   b  a

p p where LL KI and N GE are constants

for path p .

2

 a  f 

p p

EN GE [  LL KI a  

p p p

 N GE [ a  LL KI   N GE [ a   And so:

p

EN GE [ a =

p p p

EN GE [  LL KI ;  [ a =  >  f ^ a

Copyright c 1993 Ron K. Cytron. All rights reserved –77– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL A nonrapid problem

Consider the problem of determining the number of bits that should be asso- 1 ciated with a variable. In the program shown to the right, assume that node 1 forces the solution: 2 3 q = w Variable # bits x,y,q,w 4 4 d 7 x = y

 Letv  NB denote the number of 6 bits associated with v ; let solu- 5 w = x tion a be denoted as the 5-tuple

] d  ;NB  w  ;NB  q  ;NB  y  ;NB  x  NB [ ; 7 let the value for x in solution a be de-  a = b ^ a    b  a y = x+d

x noted a . For this framework

d d x x

] ;b a  ;:::;max  ;b a  max [ = b ^ a

[ = > 0; 0; 0; 0; 0 ] 8

 b = b or^ a 

For transfer functions, let the number of bits of an assigned variable be the maximum number of bits associated with any variable used in the assignment.

Copyright c 1993 Ron K. Cytron. All rights reserved –78– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Convergence takes 5 passes

y is 7 1 1 3 4 6 7 8 5 2

2 3 q = w 1 3 4 6 7 8 5 2 4 x = y x is 7 w is 7 1 3 4 6 7 8 5 2 5 6 w = x

7 1 3 4 6 7 8 5 2 y = x+d q i s 7 8 1 3 4 6 7 8 5 2

Consider a two-variable program that assignsd = y ,sothat

d d

] ;IN IN [ = IN  f

Taking f as above and=[ a 2; 7 ] ,weobtain

[ =

[ =  >  f ^ a 2[ ; 7^ ] 0[ ; 0 6 ]

2; 7 ]  a  f 6

7; 7 ]

Copyright c 1993 Ron K. Cytron. All rights reserved –79– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Distributive frameworks

Monotonicity and finite lattice depth guarantee termination, but how good is the answer obtained at convergence? The meet properties give us

b  b ^ a

a  b ^ a

and monotonicity gives us

 b  f   b ^ a  f

 a  f   b ^ a  f

and so we obtain

 b  f ^  a  f   b ^ a  f

 b  f ^  a  f   b ^ a  f ^  b ^ a  f

When

 b  f ^  a  f = b ^ a  f  2F f 8  A 2 b a; 8  then we call the data flow framework distributive. Evaluation of such a framework produces the best possible (static) solution.

Copyright c 1993 Ron K. Cytron. All rights reserved –80– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Examples

 b  f ^  a  f =

 N GE [  LL KI b  ^  N GE [  LL KI a  = Each of the bit-vectoring data flow frameworks is distributive, since

EN GE [  LL KI b  ^  LL KI a  =

EN GE [  LL KI  b ^ a  =  b ^ a  f

However, constant propagation is not distributive:

Consider the solution for u after the high- x = 3 lighted node shown on the right. If the w = 5 nodes’ transfer function is applied sep- arately to each in-edge, then we obtain w = 1

= x + w 8 on each path, and so the meet x = 7

determines that= u 8.

On the other hand, if we take the meet of the solutions entering the node, we u = w+x

? find= w and? = x x so+ thatw = u is

also ? .

Thus, our iterative technique does not compute the meet-over-all-paths ( MOP )

solution. For a DAG, we could compute MOP in nondistributive frameworks, but such computation could take exponential time in the size of the flow graph; for cyclic graphs, such computation may not terminate.

Copyright c 1993 Ron K. Cytron. All rights reserved –81– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Data ¯ow evaluation graphs±traditional

Consider the data flow problem of con- v = 5 stant propagation, applied to the flow graph shown to the right. As discussed previously, solutions are v = 5 traditionally propagated through the flow graph:

 At nodes with multiple in-edges, a meet is performed of the input solu- tions;

 Transfer functions are evaluated across nodes;

 The output solution is made avail- able. M

Solutions are iteratively computed until convergence is achieved. x = v y = v z = v

Notice how solutions for v are propagated through portions of the flow graph that neither affect nor are concerned with solutions for v.

Copyright c 1993 Ron K. Cytron. All rights reserved –82– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Data ¯ow evaluation graphs±direct connection

The dashed lines in the flow graph shown v = 5 to the right represent direct connections between nodes of the flow graph. v = 5

 Data flow information is propagated along the direct connections rather than along flow graph edges;

 Meets occur where multiple direct connections are incident on a node;

 Iteration occurs until convergence is achieved.

These types of connections are often M called def-use chains . Some data flow problems (e.g., constant propagation) are often specified using def-use chains, x = v y = v z = v which assumes that reaching definitions has already been solved.

Although uninteresting portions of the graph are avoided during evaluation, notice how the meet of the same information is computed three times. The traditional approach computed that meet only once.

Copyright c 1993 Ron K. Cytron. All rights reserved –83– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Sparse data ¯ow evaluation graphs

Here we see the best of both worlds: v = 5

 Information does not propagate through uninteresting regions of the flow graph; v = 5

 Only one meet operation is per- formed;

 Information flows directly to meet and useful sites.

M

x = v y = v z = v

We’ll now study an algorithm for constructing sparse evaluation graphs [21]. That algorithm, applied to a special framework, will construct Static Single Assignment (SSA) form [28].

Copyright c 1993 Ron K. Cytron. All rights reserved –84– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Sparse evaluation graphs (cont'd)

Let’s define two special kinds of transfer Steps for constructing a sparse evalua-

sg functions: tion graph G [21]:

sg identity: A node Y has identity transfer- 1. Let N be the set of nodes with ence, denoted nonidentity transference. The node

a t tar S is always included in this set.

Y

= f

sg 2. Add to N those nodes where meets

Y

a = a  f A; 2 a 8 if . Generally, such occur. These so-called  -nodes nodes will not be useful in evaluating are found by iterative closure using a data flow framework unless infor- dominance frontiers.

mation is gainfully combined there. sg 3. Construct E while traversing the

constant: A node Y has constant trans- original flow graph’s dominator tree. ference, denoted Although an almost-linear algorithm has

Y

K = IN  f been recently developed for placing  - nodes [27], we’ll examine the method

Y Y K = a  f A; 2 a if 8 . Because these nodes produce the same solution for more commonly in use [28]. any input, we can improve on the ef- ficiency of propagating information through such nodes.

The resulting sparse graph is ready for evaluation using any method (e.g., iteration). Also produced is a function that maps edges in the original flow graph to nodes in the sparse graph, so that a solution is available for every original flow graph edge after the sparse graph has been evaluated.

Copyright c 1993 Ron K. Cytron. All rights reserved –85– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Sparse evaluation graph construction



Y sg  g 6= f j Y g[f t star f N  1

sg  Step 1 initializes N as described

sg

N 2 X

 X  DF  odes eetN M

 2 + previously. S

sg sg

eNodes eetN M [ N N  3  Step 2 actually represents the iter-

Y sg > ;IN 2N Y 8 4 ative dominance frontier closure of

sg  tack S Empty 5 nodes in N .

 t star  ch ear S call  The odes set eetN M can be main-  tained as an attribute of elements in

sg

 Z  Link sg g t star f 2N Z Procedure N , but for clarity the set is shown

if then 6 explicitly.



Z Z K 6= if (f ) then  7 The sparse nodes are complete at step 3 .

TOS TOS K = if (f ) then  8

 Step 5 initializes the auxiliary stack,

TOS Z Z

K ^ IN  IN

onto which t tar S will be pushed in else

the first call ch to ear S .

sg sg  Z S; O T  [ E E  9

 For now, ignore step 4 and pretend fi

 Z  that Link is just steps 6 and 9 . fi fi end

Copyright c 1993 Ron K. Cytron. All rights reserved –86– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Sparse evaluation graph construction (cont'd)



 Y If is not a meet node, then its input Procedure Y  ch ear S arrives from whatever is currently on eNodes eetN M if62 (Y ) then 10 top of stack. Step 10 attempts to

 Y  call Link install the appropriate edge.

sg 2N fiY

 Meet nodes receive their inputs as

if then  11 directed by step 13 .

 Y  call push

 If Y is a sparse graph node, then it

 Y  fi ucc S 2 Z

becomes the relevant solution node

at step 11 .  foreach do

TOS   Z Y;  M  12  Edges are mapped by step 12 .

eNodes eetN M if2 (Z ) then 13  The construction proceeds recur-

sively down the dominator tree.  Z  call Link

Upon leaving node ch Y ear , S pops fi

Y if it had been pushed earlier, ex- od

posing the node stacked below Y as foreach child C of Y in DT do

the relevant sparse graph node.

 C  ch call ear S

sg 2N odY if then

 callY  pop fi end

Copyright c 1993 Ron K. Cytron. All rights reserved –87– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Sparse evaluation graph for live variables

Live Stop A Stop Live B Live Not Live Live B C D Not Live Not Live E F v = 1 v = 2 C Live Live Live G E F Live v = 1 v = 2 H

Live J L call f(v) Live x = v

K Live Not Live Live M Live L v = 3 x = v Not Live Start Not Live M Start v = 3

Even though the graph shown on the right is quite sparse, we’ll see how to eliminate most of its edges, obtaining an acyclic evaluation graph.

Copyright c 1993 Ron K. Cytron. All rights reserved –88– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Nodes in the sparse graph

Stop A Stop A

B B

C D C D

E F E F v = 1 v = 2 v = 1 v = 2

G G

H H

J J call f(v) call f(v)

K K

L L x = v x = v

Start M Start M v = 3 v = 3

Shaded nodes in the graph on the left are nonidentity transference nodes. High-

lighted nodes in the graph to the right are meet E  nodes:DF 2 C  C  DF 2 , B ,

 B  DF 2  M B  DF 2 top , S  L  DF 2 , L .

Copyright c 1993 Ron K. Cytron. All rights reserved –89– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Edges in the sparse graph 1

Stop A Stop

B B C D

E F v = 1 v = 2 C

G E F v = 1 v = 2 H

J L call f(v) x = v K M L v = 3 x = v

Start M Start v = 3

In this example, the node currently on top of stack will be shaded; dashed edges are mapped in the flow graph (left) or newly installed in the sparse graph (right).

Copyright c 1993 Ron K. Cytron. All rights reserved –90– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Edges in the sparse graph 2

Stop A Stop

B B C D

E F v = 1 v = 2 C

G E F v = 1 v = 2 H

J L call f(v) x = v K M L v = 3 x = v

Start M Start v = 3

Node L receives one of its meet edges from M .

Copyright c 1993 Ron K. Cytron. All rights reserved –91– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Edges in the sparse graph 3

Stop A Stop A

B B

C D C D

E F E F v = 1 v = 2 v = 1 v = 2

G G

H H

J J call f(v) call f(v)

K K

L L x = v x = v

Start M Start M v = 3 v = 3

No edges are added to the sparse graph when nodes L and K are visited; however, the edges are mapped to the relevant sparse graph node (shaded).

Copyright c 1993 Ron K. Cytron. All rights reserved –92– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Edges in the sparse graph 4

Stop A Stop

B B C D

E F v = 1 v = 2 C

G E F v = 1 v = 2 H

J L call f(v) x = v K M L v = 3 x = v

Start M Start v = 3

Edges are mapped, and the sparse edge to meet node L is installed.

Copyright c 1993 Ron K. Cytron. All rights reserved –93– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Edges in the sparse graph 5

Stop A Stop

B B C D

E F v = 1 v = 2 C

G E F v = 1 v = 2 H

J L call f(v) x = v K M L v = 3 x = v

Start M Start v = 3

Here are shown the results of visiting nodes J , H , G ,andD .ForD , the meet edge is

installed from L to B . Otherwise, only edge-mapping takes place.

Copyright c 1993 Ron K. Cytron. All rights reserved –94– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Edges in the sparse graph 6

Stop A Stop

B B C D

E F v = 1 v = 2 C

G E F v = 1 v = 2 H

J L call f(v) x = v K M L v = 3 x = v

Start M Start v = 3

Prior to making E the relevant node, a link edge is installed to E from the former

relevant node L . Edges are mapped and a meet edge is installed in the sparse

graph from E to C . Similar behavior occurs when visiting F (elided).

Copyright c 1993 Ron K. Cytron. All rights reserved –95– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Edges in the sparse graph 7

Stop A Stop

B B C D

E F v = 1 v = 2 C

G E F v = 1 v = 2 H

J L call f(v) x = v K M L v = 3 x = v

Start M Start v = 3

Here are shown the results of visiting node C .

Copyright c 1993 Ron K. Cytron. All rights reserved –96– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Edges in the sparse graph 8

Stop A Stop

B B C D

E F v = 1 v = 2 C

G E F v = 1 v = 2 H

J L call f(v) x = v K M L v = 3 x = v

Start M Start v = 3

Here are shown the results of visiting nodes B and A , and the process is complete.

Copyright c 1993 Ron K. Cytron. All rights reserved –97– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Edge optimization

Y Y If a node Y has constant transferenceK = ( f ), then the following optimizations are possible: Don't need these

constant Y constant Y

Don't need these

If the transfer function of node Y does If the output of node Y is always the

not reference its input solution, then no same, then node Y ’s solution can be

information need be transmitted into the forwarded to successors of Y prior to node at evaluation-time. evaluation.

The left optimization is safe for any evaluation method; the right optimization is appropriate only where path-specific information is not utilized. Since our iterative evaluator performs meets prior to node evaluation, both optimizations are appropri- ate for that method.

Copyright c 1993 Ron K. Cytron. All rights reserved –98– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Effects of optimization

Stop Stop

B B

C C

E F E F v = 1 v = 2 v = 1 v = 2

L L x = v x = v

M M v = 3 v = 3

Start Start

The resulting graph is acyclic, so a single pass of the ordinary iterative evaluator suffices for convergence.

Copyright c 1993 Ron K. Cytron. All rights reserved –99– SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Big picture

Program Control Flow Analysis

Sparse Evaluation Graph Evaluation

DF1 DF2 ... Lower

Constant Propagation

Value Numbering SSA form De-SSA Code Motion

New Paper

We’ll now examine some special algorithms for optimization, based on a single assignment representation.

Copyright c 1993 Ron K. Cytron. All rights reserved – 100 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Static Single Assignment (SSA) form

Recall our reaching definitions example, shown on the left, and its solution, shown on the right:

{ } Start Start { }

{v3, v4} {v3, v4} { } v = 1 v1 = {v1} {v1}

v = v = 2 2 {v1} {v2} G G {v1.v2} = v = v {v3. v4}

H {v1. v2. v3. v4}H = v = v {v3, v4}

{v3} v = 3 v3 = {v3}

{v3}

{v3, v4} {v3, v4} {v3, v4} call f(v ) 4 call f(v4 )

Stop Stop

Notice how the use of v at G is reached by two definitions, and the use at H is reached by four definitions. If each use were reached by just a single definition, then a direct connection graph would yield a sparse representation of def-use chains.

Copyright c 1993 Ron K. Cytron. All rights reserved – 101 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL SSA form (cont'd)

Here we see the SSA form of the pro- Start gram.

 v v = O(v ,v ) Each definition of is with respect to 6 / 0 3

a distinct symbol: v 1 is as different

from v 2 as x would be from y . v1 = 1

 Where multiple definitions reach a

node, a  -function is inserted, with v2 = 1 arguments sufficient to receive a dif- G ferent “name” for v on each in-edge. v = O(v/ ,v ) 4 1 2

 Each use is appropriately renamed = v4 to the distinct definition that reaches H v = O(v/ ,v ) it. 5 4 6 = v5

 Although  -functions could have

been placed at every node, the v3 = 3 program shown has exactly the

right number and placement of  - functions to combine multiple defs from the original program. call f(v3 )

 Our example assumes that proce-

dure f does not modify v . Stop

Copyright c 1993 Ron K. Cytron. All rights reserved – 102 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL SSA form (cont'd)

Each def is now regarded as a “killing” def, even those usually regarded as preserving

defs. For example, if v is potentially modified by the call site, then the old value for

v must be passed into the called procedure, so that its value can be assigned to the

v name for that always emerges from the procedure. v

in out Procedure v  foo Procedure ;v v  foo

in if (c ) then v 0 

c  v 7 if ( ) then

else v 1  7

/? Do nothing ? / else

fi /? Do nothing ? /

out fi v  v end

v   v 2  0 ;v 1 

2 end

SSA form can be computed by a data flow framework, in which the transfer function

for a node with multiple reaching defs of v generates its own def of v . Uses are then named by the solution in effect at the associated node.

Copyright c 1993 Ron K. Cytron. All rights reserved – 103 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL SSA form construction [28]

1. Turn every preserving def into a killing def, by copying potentially unmodified values (at subscripted defs, call sites, aliased defs, etc.).

2. Consider each node that defines v as having nonidentity transfer- ence.

3.  -functions then go at the meet nodes, as determined by the sparse graph builder. 4. Uses are renamed using the solution propagated along the associ- ated in-edge. For ordinary uses, nodes have only a single in-edge;

for  -uses, the use must be distinguished by in-edge;

Actually, the SSA work predates the sparse evaluation graph work, but it’s easier to describe the SSA algorithm this way if one already understand sparse evaluation graphs.

Copyright c 1993 Ron K. Cytron. All rights reserved – 104 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Why is SSA good?

Data flow algorithms built on def-use chains gain asymptotic efficiency as shown below: v = v = v = v = v = v =

v = O(v,v,v)/

= v = v = v = v = v = v

Quadratic def-use chains Linear def-use chains

With each use reached by a unique def, program transformations such as code motion are simplified: motion of a use depends primarily on motion of its unique reaching def. Intuitively, the program has been transformed to represent directly the flow of values. We’ll now look at some optimizations that are simplified by SSA form.

Copyright c 1993 Ron K. Cytron. All rights reserved – 105 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL SSA constant propagator [69]

Original Program SSA form

 i 6 i 1  6

 j 1 j 1  1

 k 1 k 1  1

repeat repeat j

i   i 2  1 ;i 5  k

j   2  1 ;j 3 

= i

i k   2  1 ;k 4 

if 6 then if 2 = 6 then

 k 0 k 3  0 else else

+ i  i 1 i i 3  2 + 1

fi fi k

i   i  ;i  4 2 3 i

k    ;k  + j  j 4 3 2

i  k + k + i  i 5 4 4

1 j j 3  2 + 1

i j = untilj = (i ) until ( 5 3)

Each name is initialized to the lattice value > . Propagation proceeds only along edges marked executable. Such marking takes place when the associated condi-

tion reaches a non-> value. The value > propagates along unexecutable edges.

Copyright c 1993 Ron K. Cytron. All rights reserved – 106 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL SSA constant propagator (cont'd)

SSA Form Pass 1

i 1  6 i 1  6

j 1  1 j 1  1

k 1  1 k 1  1

repeat j repeat

k i   i 2  1 ;i 5  i   i 2  1 ;i 5 = = 6 ^> 6

j   2  1 ;j 3  j   j 2  1 ;j 3 = = 1 ^> 1

i i k   2  1 ;k 4  k   k 2  1 ;k 4 = = 1 ^> 1

if 2 = 6 then if 2 = 6 then

k 3  0 k 3  0 else else

i i 3  2 + 1 /? Not executed ? /

fi k fi

i   i 4  2 ;i 3  i i   i 4   2 ;i  3  = 6 ^> 6

k   4  3 ;k 2  k   k 4   3 ;k  2  = 0 ^> 0

i 5  k 4 + 4 i i 5  k 4 +  4  6 + 0= 6

i j j 3  2 + 1 j j 3  2 +  1  1 + 1= 2

until (i j 5 = 3) until j 5 =  3  6 = 2= false

Copyright c 1993 Ron K. Cytron. All rights reserved – 107 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL SSA constant propagator (cont'd)

Pass 1 Pass 2

i 1  6 i 1  6

j 1  1 j 1  1

k 1  1 k 1  1 repeat repeat

k i   i 2  1 ;i 5 = = 6 ^> 6 i   i 2  1 ;i 5 = 6 ^ 6= 6

j   j 2  1 ;j 3 = = 1 ^> 1 j   j 2  1 ;j 3 = 1 ? ^ 2=

i i k   k 2  1 ;k 4 = = 1 ^> 1 k   2  1 ;k ? 4 = = 1 ^>

if 2 = 6 then if 2 = 6 then

k 3  0 k 3  0 else else

/? Not executed ? / /? Not executed ? / fi fi

i   i 4   2 ;i  3  = 6 ^> 6 i   i 4   2 ;i  3  = 6 ^> 6

k   k 4   3 ;k  2  = 0 ^> 0 k   k 4   3 ;k  2  = 0 ^> 0

i i 5  k 4 +  4  6 + 0= 6 i i 5  k 4 +  4  6 + 0= 6

i i j j 3  2 +  1  1 + 1= 2 j j 3  + 2 + ?  1  ? 1=

until j 5 =  3  6 = 2= false until j 5 =  ? 3  = ? 6 =

Our solution has stabilized. Even though k 2 is ? , that value is never transmitted along the unexecuted edge to the  for k 4.

Copyright c 1993 Ron K. Cytron. All rights reserved – 108 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL SSA value numbering [5, 56]

+ a  v

 ead r  a Value numbering attempts to label each

+ c  w 2 computation of the program with a num-

a  c ber, such that identical computations 2 are identically labeled.

+ a  t 2

 Prior to SSA form, value numbering

t  x 1 algorithms were applied only within basic blocks (i.e., no branching) [2]. For the above program, constant prop-

agation will fail to determine a compile-  Early value numbering algorithms relied on textual equivalence to de- time value for v and w , because the termine value equivalence. The behavior of the ead r function must be text of each expression (and per- captured as ? at compile-time. haps subexpression) was hashed to

Nonetheless, we can see that v and w a value number. Intervening defs will hold the same value, even though of variables contained in an expres- we cannot determine at compile-time sion would kill the expression. This exactly what that value will be. Such approach could not detect equiva-

knowledge helps us replace the com- lence of v and w in the example to

+ c   putation of 2 by a simple copy from the left, since+ a  2 is not textually

v . equivalent to+ c  2 .

It seems that x ought to have the same value as v and w , but our algorithm won’t

discover this, because the “function” that computes x (n:n 1) differs from the

“function” that computes v and+ w (n:n 2).

Copyright c 1993 Ron K. Cytron. All rights reserved – 109 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL SSA value numbering (cont'd)

 We essentially seek a partition of SSA names by value equivalence, c since value equivalence is reflexive, v symmetric, and transitive. a

 We’ll initially assume that all SSA w names have the same value. x

 When evidence surfaces that a t giventing block may contain disparate values (names), we’ll talk about split- the block. Generally, the algo- rithm will only split a block in two. However, the first split is more severe, t in that names are split by the func- x tional form of the expressions that v compute their value. w c a

Above are shown the initial and final partitions for the example on the previous page.

Copyright c 1993 Ron K. Cytron. All rights reserved – 110 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL SSA value numbering (cont'd)

After construction of SSA form, we split by the function name that computes values for the assigned variables. We thus distinguish between binary addition, multiplication, etc.

One further point is that  -functions at different nodes must also be distinguished, even though their function form appears the same. This is necessary because a

branch taken into one  -function is not necessarily the same branch taken into another, unless the two functions reside in the same node.

Binary Plus x * y + z

Unary Minus

cos(x) X = φ (X,X) PHI at node Y

PHI at node Z bar(x,y) foo(x,y)

X = φ (X,X)

Copyright c 1993 Ron K. Cytron. All rights reserved – 111 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL SSA value numbering example

if (condA ) then For brevity, symbols , ,and

a 1  represent syntactically distinct function

if (condB ) then classes in the program shown to the left.

b 1  In the figures that follow, we’ll see that

else b c 2 and d 2 have the same value, while c 3

a  2 and d 3 do not. Thus, program optimiza-

2  tion will save a memory fetch by using

c d fi b the value of 2 for 2.

a   a 3  1 ;a 2  c Note that if b is declared volatile in

b   3  1 ;b 2  the language C, then this optimization

?a 2  3 would be incorrect, because each ref-

b ?b d 2  3 erence to should be realized. How else can one account for volatility in this optimization? Perhaps by assuming that b 4  volatile variables cannot have the same

fi b value. a   a 5  1 ;a 0  c

b   5  0 ;b 4  It would be difficult and expensive to express all possible defs of a volatile ?a 3  5 variable in SSA form. ?b d 3  5

?a e 3  5

Copyright c 1993 Ron K. Cytron. All rights reserved – 112 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL SSA value numbering example (cont'd)

Here we see the initial partition of SSA names: a1=α b1=α a3= φ (a1,a2)

 The syntactic classes , ,and are distinguished; b3= φ (b1,b2) a2=β

  -functions at different nodes are distinguished; b2=β a5= φ (a1,a0)

 The initial value of each variable v 0 is considered identical; b5= φ (b0,b4) b4=γ

 Within each syntactic class, values are considered identical.

d2=*b3 a0= d3=*b5 b0= c0= c2=*a3 d0= c3=*a5 e3=*a5

Copyright c 1993 Ron K. Cytron. All rights reserved – 113 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL SSA value numbering example (cont'd)

a1=α a1=α b1=α a3= φ (a1,a2) b1=α a3= φ (a1,a2) b3= φ (b1,b2) b3= φ (b1,b2) a2=β a2=β b2=β b2=β a5= φ (a1,a0) a5= φ (a1,a0)

b5= φ (b0,b4) b5= φ (b0,b4) b4=γ b4=γ

d2=*b3 a0= d2=*b3 a0= d3=*b5 b0= d3=*b5 b0= c0= c0= c2=*a3 c2=*a3 d0= d0= c3=*a5 c3=*a5 e3=*a5 e3=*a5

On the left, the block with a 5 splits the five names shown into two subblocks; on the right, b 4 splits a 5 from b 5.

Copyright c 1993 Ron K. Cytron. All rights reserved – 114 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL SSA value numbering example (cont'd)

Finally, b 5 splits c 3 from d 3. Here, note

a1=α that we could have used either a 5 or b 5 b1=α a3= φ (a1,a2) to do the job. Asymptotic efficiency is

gained by choosing b 5, because there b3= φ (b1,b2) a2=β are fewer uses of that name than of a 5. b2=β In summary, the algorithm is as follows: a5= φ (a1,a0)

1. Let W be a worklist of blocks to be φ b5= (b0,b4) used for further splitting. b4=γ

2. Pick and remove (arbitrary) block D

from W .

3. For each block C properly split by D , d2=*b3 a0= (a) If C is on W , then remove C and d3=*b5 b0=

enqueue its splits by D ; c0= c2=*a3 d0= (b) Otherwise, enqueue the split with c3=*a5 the fewest uses. e3=*a5

4. Loop to step 2 until W is empty.

Copyright c 1993 Ron K. Cytron. All rights reserved – 115 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Conclusion

Program Control Flow Analysis Depth-First Structures

Semantics Dominators Dominance Frontiers Intervals

Profiling SSA Form Sparse Evaluation Graphs Pragmas

Evaluation of Data Flow Problems

Program Transformation

An important open problem is the efficient integration of program transformation and program analysis.

Copyright c 1993 Ron K. Cytron. All rights reserved – 116 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL Conclusion (cont'd)

1. The cost of optimization should be proportional to the benefits attributable to demand optimization. (a) Construction of an intermediate form should not dominate optimization time [29]. (b) Where little benefit is expected, expensive analysis should be computed on rather than exhaustively [62, 22]. 2. The expected asymptotic complexity of optimization applied throughout a program must be linear or almost-linear. (a) Expected and worst cases must be considered [27]. (b) Profiling and control flow analysis can determine needy portions of a large program [61, 9]. (c) Small procedures must compile quickly, even under extensive optimization. 3. Extensions and revisions of older languages are often targeted for improved program optimization [39, 38, 40]. 4. New languages and language paradigms offer challenging optimization prob- lems [19, 18]. 5. New and useful program optimizations can be developed most effectively by in-the-trench source and object code evaluation.

Copyright c 1993 Ron K. Cytron. All rights reserved – 117 – SIGPLAN ’93 PROGRAM OPTIMIZATION TUTORIAL References

[1] Alfred Aho, John Hopcroft, and Jeffrey Ullman. The Design and Analysis of Computer Algorithms . Addison-Wesley, 1974.Symp. on Principles of Programming Languages [2] A.V. Aho, R. Sethi, and J.D. Ullman. Compilers: Principles, Techniques, and Tools . Addison-Wesley, 1986. [3] F. E. Allen. Control flow analysis. ACM SIGPLAN Notices , 19(6):13–24, 1970. [4] Frances E. Allen. Interprocedural data flow analysis. Proc. IFIP Congress 74 , pages 398–402, 1974. [5] B. Alpern, M. N. Wegman, and F. K. Zadeck. Detecting equality of values in programs. Conf. Rec. Fifteenth ACM , pages 1–11, January 1988. [6] AndrewPrinciples W. ofProgramming Appel. Compiling Languages with Continuations . Cambridge University Press, 1992. [7] M. Auslander and M. Hopkins. An overview of the PL.8 compiler. Proc. SIGPLAN’82 Symp. on Compiler Construction , pages 22–31, June 1982. Published as SIGPLAN Notices Vol. 17, No. 6. [8] B. S. Baker. An algorithm for structuring flowgraphs. J. ACM , pages 98–120, January 1977. [9] Thomas Ball and James R. Larus. Optimally profiling and tracing programs. Proceedings of the ACM Symposium on , 1992. on Software Engineering [10] R. A. Ballance, A. B. Maccabe, and K. J. Ottenstein. The Program Dependence Web: A representation supporting control-, data-, and demand driven interpretation of languages. Proc. SIGPLAN’90 Symp. on Compiler Construction , pages 257–271, June 1990. Published as SIGPLAN Notices Vol. 25, No. 6. [11] Jeffreyon Programming Barth. A practical Languages interprocedural and Systems data flow analysis algorithm. Communications of the ACM , 21(9):724–736, September 1978. [12] M.G. Burke and B.G. Ryder. A critical analysis of incremental iterative data flow analysis algorithms. IEEE Transactions , 16(7):723–728, July 1990. Engineering [13] Michaelof the Sigplan Burke. ’88 Conference An interval-based on Programming approach Language to exhaustive Design and and Implementation incremental interprocedural data-flow analysis. ACM Transactions on Programming Languages and Systems , 12(3):341–395, July 1990. [14] Michael Burke and Linda Torczon. Interprocedural optimization: Eliminating unnecessary recompilation. ACM Trans. , July 1993. [15] D. Callahan, A. Carle, M. Hall, and K. Kennedy. Constructing the call multigraph. IEEE Transactions on Software , 16(4):483–487, 1990. [16] David Callahan. The program summary graph and flow-sensitive interprocedural data flow analysis. Proceedings , 23(7):47–56, July 1988. Atlanta, Georgia. [17] M.D. Carroll. Data Flow Update via Dominator and Attribute Updates . PhD thesis, Rutgers University, May 1988. [18] CraigProgramming Chambers. LanguagesThe Design and Implementation of the SELF Compiler, an Optimizing Compiler for Object-Oriented . PhD thesis, Stanford University, 1992. Available as technical report STAN-CS-92-1420. Development

[19] Craig Chambers and David Ungar. Making pure object-oriented languages practical. SIGPLAN Notices , 26(10), 1991. OOPSLA Conference Proceedings. [20] W.-M. Ching. Program analysis and code generation in an apl/370 compiler. IBM Journal of Research and , 30(6):594–602, 1986. [21] Jong-Deok Choi, Ron Cytron, and Jeanne Ferrante. Automatic construction of sparse data flow evaluation graphs. Conference Record of the Eighteenth Annual ACM Symposium on Principles of Programming Languages ,January 1991. [22] Jong-Deok Choi, Ron Cytron, and Jeanne Ferrante. On the efficient engineering of ambitious program analysis. Submitted to IEEE TSE , 1993. [23] KeithThirteenth Cooper. ACMInterprocedural Symp. on Principles Data of Programming Flow Analysis in Languages a Programming Environment . PhD thesis, Rice University, 1983. [24] Keith D. Cooper and Ken Kennedy. Complexity of interprocedural side-effect analysis. Technical report, Department of , Rice University, October 1987. TR87-61. [25] Keith D. Cooper, Ken Kennedy, and Linda Torczon. The impact of interprocedural analysis and optimization in the rnLanguages programming and Compiling environment. for ParallelismACM Transactions on Programming Languages and Systems , 8(4):491–523, October 1986. [26] R. Cytron, A. Lowry, and F. K. Zadeck. Code motion of control structures in high-level languages. Conf. Rec. , pages 70–85, January 1986. ACMSystems Conference on Design and Implementation [27] Ron Cytron and Jeanne Ferrante. Efficiently computing  -nodes on-the-fly. Proceedings of the Workshop on , 1993. submitted. [28] Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and , October 1991. Programming Languages and Systems [29] Ron Cytron and Reid Gershbein. Efficiently accommodating may-alias information in ssa form. Proceedings of the SIGPLAN’92 Symp. on Compiler Construction , 1993. [30] Ron Cytron, Steve Karlovsky, and Kevin P. McAuliffe. Automatic management of programmable caches (extended abstract).Twentieth ACMProceedings Symp. on ofPrinciples the 1988 of International Programming Conference Languages on Parallel Processing , August 1988. Also available as CSRD Rpt. No. 728 from U. of Ill.-Center for Supercomputing Research and Development. [31] D. M. Dhamdhere. Practical adaptation of the global optimization algorithm of morel and renvoise. ACM Trans. on 17th Annual IEEE Symposium on Foundations, 13(2):291–294, of Computer Science April 1991. [32] D. M. Dhamdhere, B. K. Rosen, and F. K. Zadeck. How to analyse large programs efficiently and informatively. Proc. , June 1992. Published as SIGPLAN Notices Vol. 27, No. 6. [33] Dhananjay M. Dhamdhere and Uday P. Khedker. Complexity of bi-directional data flow analysis. Conf. Rec. , 1993. [34] R.K. Farrow, K. Kennedy, and L. Zucconi. Graph grammars and global program flow analysis. Proceedings of the , 1975. Transactions on Programming Languages and Systems

[35] J. Ferrante, K. Ottenstein, and J. Warren. The program dependence graph and its use in optimization. ACM , 9(3):319–349, July 1987. [36] M. S. Hecht and J. D. Ullman. Characterizations of reducible flow graphs. J. ACM , 21(3):367–375, July 1974. [37] M.Fourth S. Hecht International and J. Workshop D. Ullman. A simple algorithm for global data flow analysis problems. SIAM J. Computing , 4(4):519–532, December 1975. [38] S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, and C. Tseng. An overview of the D programming system. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, , Santa Clara, CA, August 1991. Springer-Verlag. [39] S. Hiranandani, K. Kennedy, and C. Tseng. Compiling Fortran D for MIMD distributed-memory machines. CommunicationsNotices of the ACM , 35(8):66–80, August 1992. [40] S. Hiranandani, K. Kennedy, and C. Tseng. Evaluation of compiler optimizations for Fortran D on MIMD distributed- memoryTransactions machines. on Programming In Proceedings Languages of the and 1992 Systems ACM International Conference on Supercomputing , Washington, DC, July 1992. [41] L. H. Holley and B. K. Rosen. Qualified data flow problems. IEEE Trans. on Software Engineering , SE-7(1):60–78, January 1981. [42] Urs Holzle, ¨ Craig Chambers, and David Ungar. Debugging optimized code with dynamic deoptimization. SIGPLAN Flow Analysis, 27(7), 1992. SIGPLAN PLDI Conference Proceedings. [43] Susan Horwitz, Thomas Reps, and David Binkley. Interprocedural slicing using dependence graphs. ACM , 12,13:26–60, January 1990. [44] J. B. Kam and J. D. Ullman. Monotone data flow analysis frameworks. Acta Informatica , 7:305–317, 1977. [45] John B. Kam and Jeffrey D. Ullman. Global data flow analysis and iterative algorithms. JACM, 23,1:158–171, January 1976. [46] Ken Kennedy. A survey of data flow analysis techniques. In Stephen S. Muchnick and Neil D. Jones, editors, Program , chapter 1. Prentice–Hall, 1981. [47] T.Principles Lengauer of Programming and Robert Tarjan. Languages A fast algorithm for finding dominators in a flowgraph. TOPLAS, July 1979. [48] T. J. Marlowe. Data Flow Analysis and Incremental Iteration . PhD thesis, Dept. of Computer Science, Rutgers U., October 1989. [49] Eugene W. Myers. A precise inter-procedural data flow algorithm. Conference Record Eighth ACM Symposium on , 1981. [50] Karl J. Ottenstein and Linda M. Ottenstein. The program dependence graph in a software development environment. Software Engineering Notes , 9(3), 1984. Also appears in Proceedings of the ACM Symposium on Practical Programming Development Environments, Pittsburgh, PA, April, 1984, and in SIGPLAN Notices, Vol. 19, No 5, May, 1984. [51] B. K. Rosen. High level data flow analysis. Comm. ACM , 20(10):712–724, October 1977. [52] B. K. Rosen. Data flow analysis for procedural languages. J. ACM , 26(2):322–344, April 1979. Fifteenth ACM Symp. on Principles of Programming Languages

[53] B. K. Rosen. Monoids for rapid data flow analysis. SIAM J. Computing , 9(1):159–196, February 1980. [54] B. K. Rosen. Degrees of availability as an introduction to the general theory of data flow analysis. In S. S. Muchnick andN.D.Jones,editors,Program Flow Analysis , chapter 2, pages 55–76. Prentice Hall, 1981. [55] B. K. Rosen. A lubricant for data flow analysis. SIAM J. Computing , 11(3):493–511, August 1982. [56] B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global value numbers and redundant computations. Conf. Rec. Languages and Systems , pages 12–27, January 1988. [57] Barbara Ryder. Constructing the call graph of a program. IEEE Software Engineering , May 1979. [58] Barbara G. Ryder and Marvin C. Paull. Elimination algorithms for data flow analysis. ACM Computing Surveys , 18(3):277–316, September 1986. Conference on Programming Language Design and Implementation [59] Barbara G. Ryder and Marvin C. Paull. Incremental data flow analysis algorithms. ACM Transactions on Programming , 10(1):1–50, January 1988. [60] B.G. Ryder, T.G. Marlowe, and M.C. Paull. Conditions for incremental iteration: Examples and counterexamples. Science of Computer Programming , 11(1):1–15, October 1988. [61] Vivek Sarkar. Determining average program execution times and their variance. Proceedings of the 1989 SIGPLAN , 24(7):298–312, July 1989. [62] Vivek Sarkar. The ptran parallel programming system. Parallel Languages and Compilers , pages 309–391, 1991. [63] J. T. Schwartz and M. Sharir. Tarjan’s fast interval finding algorithm. Technical report, Courant Institute, New York University, 1978. SETL Newsletter Number 204. [64] M. Sharir. Structural analysis: a new approach to flow analysis in optimizing compilers. Computer Languages , 5:141–153, 1980. [65] Olin Shivers. Control flow analysis in scheme. Proc. SIGPLAN’88 Symp. on Compiler Construction , 1988. Published as SIGPLAN Notices Vol. 23, No. 7. [66] Robert Tarjan. Testing flow graph reducibility. Journal of Computer and System Sciences , 9(3):355–365, December 1974. [67] G. A. Venkatesh. A framework for construction and evaluation of high-level specifications for program analysis techniques.Languages andProc. Systems SIGPLAN’89 Symp. on Compiler Construction , pages 1–12, 1989. Published as SIGPLAN Notices Vol. 24, No. 7. [68] Jr. Vincent A. Guarna, Dennis Gannon, David Jablonowski, Allen D. Malony, and Yogesh Gaur. Faust: An integrated environmentConstruction for the development of parallel programs. Technical report, U. of Il.-Center for Supercomputing Research and Development, November 1988. CSRD Rpt. No. 825 to appear in IEEE Software, 1989. [69] M. N. Wegman and F. K. Zadeck. Constant propagation with conditional branches. ACM Trans. on Programming , 13(2):181–210, April 1991. [70] F. K. Zadeck. Incremental data flow analysis in a structure program editor. Proc. SIGPLAN’84 Symp. on Compiler , pages 132–143, June 1984. Published as SIGPLAN Notices Vol. 19, No. 6. [71] P. T. Zellweger. Interactive Source-Level Debugging of Optimized Programs . PhD thesis, Xerox Palo Alto Research Center, Palo Alto, Ca. 94304, May 1984.