Analysis and Optimizations

• Program Analysis P3 / 2006 – Discovers properties of a program • Optimizations Using Program Analysis – Use analysis results to transform program for Optimization – Goal: improve some aspect of program • number of executed instructions, number of cycles • cache hit rate • memory space (code or data) • power consumption – Has to be safe: Keep the semantics of the program

Kostis Sagonas 2 Spring 2006

Control Flow Graph Control Flow Graph entry int add(n, k) { • Nodes represent computation s = 0; a = 4; i = 0; – Each node is a Basic Block if (k == 0) b = 1; s = 0; a = 4; i = 0; k == 0 – Basic Block is a sequence of instructions with else b = 2; • No branches out of middle of basic block while (i < n) { b = 1; b = 2; • No branches into middle of basic block s = s + a*b; • Basic blocks should be maximal i = i + 1; i < n } – Execution of basic block starts with first instruction return s; s = s + a*b; – Includes all instructions in basic block return s } i = i + 1; • Edges represent control flow

Kostis Sagonas 3 Spring 2006 Kostis Sagonas 4 Spring 2006

Two Kinds of Variables Basic Block Optimizations

• Temporaries introduced by the compiler • Common Sub- • Copy Propagation – Transfer values only within basic block Expression Elimination – a = x+y; b = a; c = b+z; – Introduced as part of instruction flattening – a = (x+y)+z; b = x+y; – a = x+y; b = a; c = a+z; – Introduced by optimizations/transformations – t = x+y; a = t+z; b = t; • Program variables • Constant Propagation • – x = 5; b = x+y; – a = x+y; b = a; c = a+z; – Declared in original program – b = 5+y; – a = x+y; c = a+z – May transfer values between basic blocks • Algebraic Simplification • – a = x * 1; – t = i * 4; – a = x; – t = i << 2;

Kostis Sagonas 5 Spring 2006 Kostis Sagonas 6 Spring 2006 Value Numbering for CSE • Normalize basic block so that all statements are of the form • As we simulate execution of program – var = var op var (where op is a binary operator) • Generate a new version of program – var = op var (where op is a unary operator) – Each new value assigned to temporary – var = var • a = x+y; becomes a = x+y; t = a; • Simulate execution of basic block – Temporary preserves value for use later in program even if original variable rewritten – Assign a virtual value to each variable • a = x+y; a = a+z; b = x+y becomes – Assign a virtual value to each expression • a = x+y; t = a; a = a+z; b = t; – Assign a temporary variable to hold value of each computed expression

Kostis Sagonas 7 Spring 2006 Kostis Sagonas 8 Spring 2006

New Basic CSE Example Original Basic Block Block a = x+y a = x+y • Original • After CSE t1 = a b = a+z b = a+z a = x+y a = x+y b = b+y t2 = b b = a+z b = a+z c = a+z b = b+y b = b+y t = b t3 = b c = a+z b = b+y Var to Val c = t2 • Issues c = t x ç v1 y ç v2 Exp to Val Exp to Tmp – Temporaries store values for use later a ç v3 v1+v2 ç v3 v1+v2 ç t1 z ç v4 – CSE with different names v3+v4 ç v5 v3+v4 ç t2 b ç v5 b v6 ç v5+v2 ç t3 • a = x; b = x+y; c = a+y; c ç v5 v5+v2 v6 – Excessive Temp Generation and Use

Kostis Sagonas 9 Spring 2006 Kostis Sagonas 10 Spring 2006

Problems Copy Propagation

• Algorithm has a temporary for each new value • Once again, simulate execution of program – a = x+y; t1 = a • If possible, use the original variable instead of a • Introduces temporary – lots of temporaries – a = x+y; b = x+y; – lots of copy statements to temporaries – After CSE becomes a = x+y; t = a; b = t; • In many cases, temporaries and copy statements – After CP becomes a = x+y; b = a; are unnecessary • Key idea: determine when original variables are • So we eliminate them with copy propagation NOT overwritten between computation of stored and dead code elimination value and use of stored value

Kostis Sagonas 11 Spring 2006 Kostis Sagonas 12 Spring 2006 Copy Propagation Maps Copy Propagation Example

• Maintain two maps Original After CSE After CSE and – tmp to var: tells which variable to use instead of a Copy Propagation given temporary variable a = x+y a = x+y a = x+y b = a+z t1 = a t1 = a – var to set (inverse of tmp to var): tells which temps b = a+z c = x+y b = a+z are mapped to a given variable by tmp to var t2 = b a = b t2 = b c = t1 c = a a = b a = b

Kostis Sagonas 13 Spring 2006 Kostis Sagonas 14 Spring 2006

Copy Propagation Example Copy Propagation Example

Basic Block Basic Block After Basic Block Basic Block After After CSE CSE and Copy Prop After CSE CSE and Copy Prop a = x+y a = x+y a = x+y a = x+y t1 = a t1 = a t1 = a t1 = a b = a+z b = a+z t2 = b t2 = b

tmp to var var to set tmp to var var to set t1 ç a a ç{t1} t1 ç a a ç{t1} t2 ç b b ç{t2}

Kostis Sagonas 15 Spring 2006 Kostis Sagonas 16 Spring 2006

Copy Propagation Example Copy Propagation Example

Basic Block Basic Block After Basic Block Basic Block After After CSE CSE and Copy Prop After CSE CSE and Copy Prop a = x+y a = x+y a = x+y a = x+y t1 = a t1 = a t1 = a t1 = a b = a+z b = a+z b = a+z b = a+z t2 = b t2 = b t2 = b t2 = b c = t1 c = t1 c = a

tmp to var var to set tmp to var var to set t1 ç a a ç{t1} t1 ç a a ç{t1} t2 ç b b ç{t2} t2 ç b b ç{t2}

Kostis Sagonas 17 Spring 2006 Kostis Sagonas 18 Spring 2006 Copy Propagation Example Copy Propagation Example

Basic Block Basic Block After Basic Block Basic Block After After CSE CSE and Copy Prop After CSE CSE and Copy Prop a = x+y a = x+y a = x+y a = x+y t1 = a t1 = a t1 = a t1 = a b = a+z b = a+z b = a+z b = a+z t2 = b t2 = b t2 = b t2 = b c = t1 c = a c = t1 c = a a = b a = b a = b a = b tmp to var var to set tmp to var var to set t1 ç a a ç{t1} t1 ç t1 a ç{} t2 ç b b ç{t2} t2 ç b b ç{t2}

Kostis Sagonas 19 Spring 2006 Kostis Sagonas 20 Spring 2006

Dead Code Elimination Dead Code Elimination

• Copy propagation keeps all temps around • Basic Idea • There may be temps that are never read – Process code in reverse execution order • Dead Code Elimination (DCE) removes them – Maintain a set of variables that are needed later in computation Basic Block After Basic Block After – On encountering an assignment to a temporary that CSE + Copy Prop CSE + Copy Prop + DCE is not needed, we remove the assignment a = x+y a = x+y t1 = a b = a+z b = a+z c = a t2 = b a = b c = a a = b Kostis Sagonas 21 Spring 2006 Kostis Sagonas 22 Spring 2006

Basic Block After Basic Block After CSE and Copy Prop CSE and Copy Prop a = x+y a = x+y t1 = a t1 = a b = a+z b = a+z t2 = b t2 = b c = a c = a a = b a = b

Assume that initially Needed Set Needed Set {a, c} {a, b, c}

Kostis Sagonas 23 Spring 2006 Kostis Sagonas 24 Spring 2006 Basic Block After Basic Block After CSE and Copy Prop CSE and Copy Prop a = x+y a = x+y t1 = a t1 = a b = a+z b = a+z t2 = b c = a c = a a = b a = b

Needed Set Needed Set {a, b, c} {a, b, c}

Kostis Sagonas 25 Spring 2006 Kostis Sagonas 26 Spring 2006

Basic Block After Basic Block After CSE and Copy Prop CSE and Copy Prop a = x+y a = x+y t1 = a t1 = a b = a+z b = a+z

c = a c = a a = b a = b

Needed Set Needed Set {a, b, c, z} {a, b, c, z}

Kostis Sagonas 27 Spring 2006 Kostis Sagonas 28 Spring 2006

Basic Block After Basic Block after CSE Copy Propagation CSE and Copy Prop and Dead Code Elimination a = x+y a = x+y

b = a+z b = a+z

c = a c = a a = b a = b

Needed Set Needed Set {a, b, c, z} {a, b, c, z}

Kostis Sagonas 29 Spring 2006 Kostis Sagonas 30 Spring 2006 Basic Block after Interesting Properties CSE + Copy Propagation + Dead Code Elimination a = x+y • Analysis and Optimization Algorithms b = a+z Simulate Execution of Program – CSE and Copy Propagation go forward c = a – Dead Code Elimination goes backwards a = b • Optimizations are stacked Needed Set – Group of basic transformations {a, b, c, z} – Work together to get good result – Often, one transformation creates inefficient code that is cleaned up by subsequent transformations

Kostis Sagonas 31 Spring 2006 Kostis Sagonas 32 Spring 2006

Other Basic Block Transformations Dataflow Analysis

• Constant Propagation • Used to determine properties of programs that • Strength Reduction involve multiple basic blocks – a << 2 = a * 4; • Typically used to enable transformations – a + a + a = 3 * a; – common sub-expression elimination • Algebraic Simplification – constant and copy propagation – a = a * 1; – dead code elimination – b = b + 0; • Analysis and transformation often come in pairs • Unified transformation framework

Kostis Sagonas 33 Spring 2006 Kostis Sagonas 34 Spring 2006

Reaching Definitions Reaching Definitions

• Concept of definition and use s = 0; a = 4; – z = x+y i = 0; – is a definition of z k == 0 – is a use of x and y • A definition reaches a use if b = 1; b = 2; – value written by definition i < n – may be read by use s = s + a*b; i = i + 1; return s

Kostis Sagonas 35 Spring 2006 Kostis Sagonas 36 Spring 2006 Reaching Definitions and Constant Is a constant in s = s+a*b? Propagation • Is a use of a variable a constant? s = 0; Yes! – Check all reaching definitions a = 4; i = 0; On all reaching – If all assign variable to same constant k == 0 definitions – Then use is in fact a constant a = 4 • Can replace variable with constant b = 1; b = 2;

i < n

s = s + a*b; i = i + 1; return s

Kostis Sagonas 37 Spring 2006 Kostis Sagonas 38 Spring 2006

Constant Propagation Transform Is b constant in s = s+a*b?

s = 0; Yes! s = 0; No! a = 4; a = 4; i = 0; On all reaching i = 0; One reaching k == 0 definitions k == 0 definition with a = 4 b = 1 b = 1; b = 2; b = 1; b = 2; One reaching definition with i < n i < n b = 2 s = s + 4*b; s = s + a*b; i = i + 1; return s i = i + 1; return s

Kostis Sagonas 39 Spring 2006 Kostis Sagonas 40 Spring 2006

Computing Reaching Definitions 0000000 1: s = 0; 2: a = 4; • Compute with sets of definitions 3: i = 0; – represent sets using bit vectors k == 0 1110000 1110000 – each definition has a position in bit vector 4: b = 1; 5: b = 2; • At each basic block, compute – definitions that reach start of block i < n 111110101 111110101 – definitions that reach end of block 111110101 6: s = s + a*b; • Do computation by simulating execution of 7: i = i + 1; return s program until the fixed point is reached

Kostis Sagonas 41 Spring 2006 Kostis Sagonas 42 Spring 2006 GEN[0] = 1110000 Formalizing Analysis KILL[0] = 0000011 1: s = 0; 2: a = 4; • Each basic block has 3: i = 0; – IN - set of definitions that reach beginning of block k == 0 – OUT - set of definitions that reach end of block GEN[1] = 0001000 GEN[2] = 0000100 – GEN - set of definitions generated in block KILL[1] = 0000100 4: b = 1; 5: b = 2; KILL[2] = 0001000

– KILL - set of definitions killed in the block i < n GEN[3] = 0000000 • GEN[s = s + a*b; i = i + 1;] = 0000011 KILL[3] = 0000000 6: s = s + a*b; • KILL[s = s + a*b; i = i + 1;] = 1010000 7: i = i + 1; return s • Compiler scans each basic block to derive GEN GEN[4] = 0000011 GEN[5] = 0000000 and KILL sets KILL[4] = 1010000 KILL[5] = 0000000

Kostis Sagonas 43 Spring 2006 Kostis Sagonas 44 Spring 2006

Dataflow Equations Solving Equations • Use fixed point algorithm • IN[b] = OUT[b1] ß ... ß OUT[bn] • Initialize with solution of OUT[b] = 0000000 – where b1, ..., bn are predecessors of b in CFG • Repeatedly apply equations • OUT[b] = (IN[b] - KILL[b]) ß GEN[b] – IN[b] = OUT[b1] ß ... ß OUT[bn] • IN[entry] = 0000000 – OUT[b] = (IN[b] - KILL[b]) ß GEN[b] • Result: system of equations • Until reaching fixed point – I.e., until equation application has no further effect • Use a worklist to track which equation applications may have a further effect

Kostis Sagonas 45 Spring 2006 Kostis Sagonas 46 Spring 2006

Reaching Definitions Algorithm Questions for all nodes n in N OUT[n] = •; // OUT[n] = GEN[n]; Worklist = N; // N = all nodes in graph • Does the algorithm halt? while (Worklist != •) – yes, because transfer function is monotonic choose a node n in Worklist; – if increase IN, increase OUT Worklist = Worklist - { n }; – in limit, all bits are 1 IN[n] = •; • If bit is 1, is there always an execution in which for all nodes p in predecessors(n) IN[n] = IN[n] ß OUT[p]; OUT[n] = (IN[n] - KILL[n]) ß GEN[n]; corresponding definition reaches basic block? if (OUT[n] changed) • If bit is 0, does the corresponding definition for all nodes s in successors(n) Worklist = Worklist ß { s }; ever reach basic block? • Concept of conservative analysis

Kostis Sagonas 47 Spring 2006 Kostis Sagonas 48 Spring 2006 Available Expressions Computing Available Expressions

• An expression x+y is available at a point p if • Represent sets of expressions using bit vectors – every path from the initial node to p evaluates x+y • Each expression corresponds to a bit before reaching p, • Run dataflow algorithm similar to reaching – and there are no assignments to x or y after the definitions evaluation but before p. • Big difference: • Available Expression information can be used to do global (across basic blocks) CSE. – A definition reaches a basic block if it comes from ANY predecessor in CFG • If an expression is available at use, there is no – An expression is available at a basic block only if it need to re-evaluate it. is available from ALL predecessors in CFG

Kostis Sagonas 49 Spring 2006 Kostis Sagonas 50 Spring 2006

Available Expressions Example 0000 Global CSE Transform a = x+y; a = x+y; t = a; 0000 Expressions x == 0 Expressions x == 0 1: x+y 1001 1001 1: x+y 1001 x = z; 2: i < n x = z; 2: i < n b = x+y; 3: i+c b = x+y; 3: i+c t = b; 1000 4: x == 0 1000 1000 4: x == 0 i = x+y; i = t; must use same temp 1000 for CSE in all blocks 1000 i < n i < n 1100 1100 1100 1100 c = x+y; d = x+y c = t; d = t; i = i+c; i = i+c;

Kostis Sagonas 51 Spring 2006 Kostis Sagonas 52 Spring 2006

Formalizing Analysis Dataflow Equations • Each basic block has • IN[b] = OUT[b1] ¶ ... ¶ OUT[bn] – IN - set of expressions available at start of block – where b1, ..., bn are predecessors of b in CFG – OUT - set of expressions available at end of block • OUT[b] = (IN[b] - KILL[b]) ß GEN[b] – GEN - set of expressions computed in block – KILL - set of expressions killed in the block • IN[entry] = 0000 • GEN[x = z; b = x+y] = 1000 • Result: system of equations • KILL[x = z; b = x+y] = 1001 • Compiler scans each basic block to derive GEN and KILL sets

Kostis Sagonas 53 Spring 2006 Kostis Sagonas 54 Spring 2006 Solving Equations Available Expressions Algorithm • Use fixed point algorithm for all nodes n in N OUT[n] = E; // OUT[n] = E - KILL[n]; • • IN[entry] = 0000 IN[Entry] = ; OUT[Entry] = GEN[Entry]; Worklist = N - { Entry }; // N = all nodes in graph • Initialize OUT[b] = 1111 while (Worklist != •) • Repeatedly apply equations choose a node n in Worklist; – IN[b] = OUT[b1] ¶ ... ¶ OUT[bn] Worklist = Worklist - { n }; IN[n] = E; // E is set of all expressions – OUT[b] = (IN[b] - KILL[b]) ß GEN[b] for all nodes p in predecessors(n) • Use a worklist algorithm to track which IN[n] = IN[n] ¶ OUT[p]; equation applications may have further effect OUT[n] = (IN[n] - KILL[n]) ß GEN[n]; if (OUT[n] changed) for all nodes s in successors(n) Worklist = Worklist ß { s };

Kostis Sagonas 55 Spring 2006 Kostis Sagonas 56 Spring 2006

Questions Duality In Two Algorithms

• Does algorithm always halt? • Reaching definitions • If expression is available in some execution, is – Confluence operation is set union it always marked as available in analysis? – OUT[b] initialized to empty set • If expression is not available in some execution, • Available expressions can it be marked as available in analysis? – Confluence operation is set intersection • In what sense is the algorithm conservative? – OUT[b] initialized to set of available expressions • General framework for dataflow algorithms. • Build parameterized dataflow analyzer once, use for all dataflow problems

Kostis Sagonas 57 Spring 2006 Kostis Sagonas 58 Spring 2006

Liveness Analysis What Use is Liveness Information? • . • A variable v is live at point p if – If a variable is dead, we can reassign its register – v is used along some path starting at p, and • Dead code elimination. – no definition of v along the path before the use. – Eliminate assignments to variables not read later. • When is a variable v dead at point p? – But must not eliminate last assignment to variable – No use of v on any path from p to exit node, or (such as instance variable) visible outside CFG. – If all paths from p, redefine v before using v. – Can eliminate other dead assignments. – Handle by making all externally visible variables live on exit from CFG

Kostis Sagonas 59 Spring 2006 Kostis Sagonas 60 Spring 2006 Conceptual Idea of Analysis Liveness Example

• Simulate execution • Assume a,b,c visible a = x+y; • But start from exit and go backwards in CFG outside function t = a; c = a+x; • Compute liveness information from end to • So are live on exit x == 0 beginning of basic blocks • Assume x,y,z,t are 1100111 not visible 1000111 b = t+z; • Represent liveness 1100100 using a bit vector c = y+1; – order is abcxyzt 1110000

Kostis Sagonas 61 Spring 2006 Kostis Sagonas 62 Spring 2006

Using Liveness Information for Dead Code Elimination Formalizing Analysis • Each basic block has • Assume a,b,c visible a = x+y; – IN - set of variables live at start of block outside function t = a; – OUT - set of variables live at end of block • So are live on exit c = a+x; x == 0 – USE - set of variables with upwards exposed uses in • Assume x,y,z,t are 1100111 block 1000111 not visible – DEF - set of variables defined in block b = t+z; • Represent liveness 1100100 • USE[x = z; x = x+1;] = { z } (x not in USE) using a bit vector c = y+1; • DEF[x = z; x = x+1;y = 1;] = {x, y} – order is abcxyzt 1110000 • Compiler scans each basic block to derive USE and DEF sets Kostis Sagonas 63 Spring 2006 Kostis Sagonas 64 Spring 2006

Algorithm Similar to Other Dataflow OUT[Exit] = •; Algorithms IN[Exit] = USE[n]; • Backwards analysis, not forwards for all nodes n in N - { Exit } IN[n] = •; Worklist = N - { Exit }; • Still have transfer functions while (Worklist != •) • Still have confluence operators choose a node n in Worklist; • Can generalize framework to work for both Worklist = Worklist - { n }; forwards and backwards analyses OUT[n] = •; for all nodes s in successors(n) OUT[n] = OUT[n] ß IN[p]; IN[n] = USE[n] ß (OUT[n] - DEF[n]); if (IN[n] changed) for all nodes p in predecessors(n) Worklist = Worklist ß{ p };

Kostis Sagonas 65 Spring 2006 Kostis Sagonas 66 Spring 2006 Analysis Information Inside Basic Summary Blocks • One detail: • Basic blocks and basic block optimizations – Copy and constant propagation – Given dataflow information at IN and OUT of node – Common sub-expression elimination – Also need to compute information at each statement – Dead code elimination of basic block • Dataflow Analysis – Simple propagation algorithm usually works fine – Control flow graph – Can be viewed as restricted case of dataflow – IN[b], OUT[b], transfer functions, join points analysis • Paired of analyses and transformations – Reaching definitions/constant propagation – Available expressions/common sub-expression elimination – Liveness analysis/Dead code elimination

Kostis Sagonas 67 Spring 2006 Kostis Sagonas 68 Spring 2006