Analysis and Optimizations

• Program Analysis – Discovers properties of a program Using Program Analysis • Optimizations – Use analysis results to transform program for Optimization – Goal: improve some aspect of program • numbfber of execute diid instructions, numb bfler of cycles • cache hit rate • memory space (d(code or d ata) • power consumption – Has to be safe: Keep the semantics of the program

Control Flow Graph Control Flow Graph

entry int add(n , k) { • Nodes represent computation s = 0; a = 4; i = 0; – Each node is a Basic Block if (k == 0) b = 1; s = 0; a = 4; i = 0; k == 0 – Basic Block is a sequence of instructions with else b = 2; • No branches out of middle of basic block while (i < n) { b = 1; b = 2; • No branches into middle of basic block s = s + a*b; • Basic blocks should be maximal ii+1;i = i + 1; i < n } – Execution of basic block starts with first instruction return s; s = s + a*b; – Includes all instructions in basic block return s } i = i + 1; • Edges represent control flow Two Kinds of Variables Basic Block Optimizations

• Temporaries introduced by the compiler • Common Sub- • Copy Propagation – Transfer values only within basic block Expression Elimination – a = x+y; b = a; c = b+z; – Introduced as part of instruction flattening – a = (x+y)+z; b = x+y; – a = x+y; b = a; c = a+z; – Introduced by optimizations/transformations – t = x+y; a = t+z;;; b = t; • Program variables • Constant Propagation • – x = 5; b = x+y; – Decldiiillared in original program – a = x+y; b = a; c = a+z; – x = 5; b = 5+y; – a = x+y; c = a+z – May transfer values between basic blocks • Algebraic Simplification • – a = x * 1; – t = i * 4; – a = x; – ti<<2t = i << 2;

Value Numbering for CSE • Normalize basic block so that all statements are ofhf the f orm • As we simulate execution of program – var = var opp( var (where op is a binary yp) operator) • Generate a new version of program – var = op var (where op is a unary operator) – Each new value assigned to temporary – var = var • a = xxy+y; becomes a = xxy+y;t; t = a; • Simulate execution of basic block – Temporary preserves value for use later in program even if the original variable is rewritten – Assign a virtual value to each variable • a = x+y; a = a+z; b = x+y becomes – Assign a virtual value to each expression • a=a = x+y;t=a;a=; t = a; a = a+z;b=t;; b = t; – Assign a temporary variable to hold value of each computed expression New Basic CSE Example Original Basic Block Block a = x+y a = x+y • Original • After CSE t1 = a b = a+z b = a+z a = x+y a = x+y b = b+y t2 = b b+b = a+z b+b = a+z c = a+z bbb = b+y b = b+y t = b t3 = b c = a+z bb+b = b+y Var to Val c = t2 • Issues c = t x → v1 y → v2 Exp to Val Exp to Tmp – Temporaries store values for use later a v3 → v1+v2 → v3 v1+v2 → t1 z v4 – CSE with different names → v3+v4 → v5 v3+v4 → t2 b → v5v6 v5+v2 → t3 • a = x; b = x+y; c = a+y; c → v5 v5+v2 → v6 – Excessive Temp Generation and Use

Problems Copy Propagation

• Algorithm has a temporary for each new value • Once again, simulate execution of program – a = x+y; t1 = a • If possible, use the original variable instead of a • Introduces temporary – lots of temporaries – a = x+y; b = x+y; – lots of copy statements to temporaries – After CSE becomes a = x+y; t = a; b = t; • In many cases, temporaries and copy statements – After CP becomes a = x+y; b = a; are unnecessary •Keyyg idea: determine when original variables are • So we eliminate them with copy propagation NOT overwritten between computation of stored and dead code elimination value and use of stored value Copy Propagation Maps Copy Propagation Example

• Maintain two maps OiOrigi nal After CSE After CSE and – tmp to var: tells which variable to use instead of a Copy Propagation given temporary variable a = x+y a = x+y a = x+y b = a+z t1 = a t1 = a – var to set (inverse of tmp to var): tells which temps b = a+z c = x+y b+b = a+z are mapped to a given variable by tmp to var t2 = b a = b t2 = b c = t1 c = a a = b a = b

Copy Propagation Example Copy Propagation Example

Basic Block Basic Block After Basic Block Basic Block After After CSE CSE and Copy Prop After CSE CSE and Copy Prop a = x+y a = x+y a = x+y a = x+y t1 = a t1 = a t1 = a t1 = a b=a+zb = a+z b = a+z t2 = b t2 = b

tmp to var var to set tmp to var var to set t1 → aa →{t1} t1 → a a →{t1} t2 → b b →{t2} Copy Propagation Example Copy Propagation Example

Basic Block Basic Block After Basic Block Basic Block After After CSE CSE and Copy Prop After CSE CSE and Copy Prop a = x+y a = x+y a = x+y a = x+y t1 = a t1 = a t1 = a t1 = a b=a+zb = a+z b = a+z b=a+zb = a+z b = a+z t2 = b t2 = b t2 = b t2 = b c = t1 c = t1 c = a tmp to var var to set tmp to var var to set t1 → a a →{t1} t1 → a a →{t1} t2 → b b →{t2} t2 → b b →{t2}

Copy Propagation Example Copy Propagation Example

Basic Block Basic Block After Basic Block Basic Block After After CSE CSE and Copy Prop After CSE CSE and Copy Prop a = x+y a = x+y a = x+y a = x+y t1 = a t1 = a t1 = a t1 = a b=a+zb = a+z b = a+z b=a+zb = a+z b = a+z t2 = b t2 = b t2 = b t2 = b c = t1 c = a c = t1 c = a a = b a = b a = b a = b tmp to var var to set tmp to var var to set t1 → a a →{t1} t1 → t1 a →{} t2 → b b →{t2} t2 → b b →{t2} Dead Code Elimination Dead Code Elimination

• Copy propagation keeps all temps around • Basic Idea • There may be temps that are never read – Process code in reverse execution order • Dead Code Elimination (DCE) removes them – Maintain a set of variables that are needed later in computation Basic Block After Basic Block After – On encountering an assignment to a temporary that CSE + Copy Prop CSE + Copy Prop + DCE is not needed, we remove the assignment a = x+y a = x+y t1 = a b = a+z b = a+z c = a t2 = b a = b c = a a = b

Basic Block After Basic Block After CSE and Copy Prop CSE and Copy Prop a = x+y a = x+y t1 = a t1 = a b = a+z b = a+z t2 = b t2 = b c = a c = a a = b a = b

Assume that initially Needed Set Needed Set {a, c} {a, b, c} Basic Block After Basic Block After CSE and Copy Prop CSE and Copy Prop a = x+y a = x+y t1 = a t1 = a b = a+z b = a+z t2 = b c = a c = a a = b a = b

Needed Set Needed Set {a, b, c} {a, b, c}

Basic Block After Basic Block After CSE and Copy Prop CSE and Copy Prop a = x+y a = x+y t1 = a t1 = a b = a+z b = a+z

c = a c = a a = b a = b

Needed Set Needed Set {a, b, c, z} {a, b, c, z} Basic Block After Basic Block after CSE Copy Prop ag ation CSE and Copy Prop and Dead Code Elimination a = x+y a = x+y

b = a+z b = a+z

c = a c = a a = b a = b

Needed Set Needed Set {a, b, c, z} {a, b, c, z}

Basic Block after Interesting Properties CSE + Copy Propagation + Dead Code Elimination a = x+y • Analysis and Optimization Algorithms

b = a+z Simulate Execution of Program – CSE and Copy Propagation go forward c = a – Dead Code Elimination goes backwards a = b • Optimizations are stacked Needed Set – GfbifiGroup of basic transformations {a, b, c, z} – Work together to get good result – Often, one transformation creates inefficient code that is cleaned uppy by subse quent transformations Other Basic Block Transformations Dataflow Analysis

• Constant Propagation • Used to determine properties of programs that • Strength Reduction involve multiple basic blocks – a << 2 = a * 4; • Typically used to enable transformations – a+a+aa + a + a = 3 * a; – common sub -expression elimination • Algebraic Simplification – constant and copy propagation – a = a * 1; – dddead co de eli liiminati on – b = b + 0; •Analyypsis and transformation often come in pairs • Unified transformation framework

Reaching Definitions Reaching Definitions

• Concept of definition and use s=0;s = 0; a = 4; –z = x+y i = 0; – is a definition of z k == 0 – is a use of x and y • A definition reaches a use if b = 1; b = 2;

– valibdfiiilue written by definition i < n – may be read by use s = s + a*b; i = i + 1; return s Reaching Definitions and Constant Is a constant in s = s+a* b? Propagation • Is a use of a variable a constant? s = 0; Yes! – Check all reaching definitions a = 4; On all reaching i = 0; – If all assign variable to same constant k == 0 definitions – Then use is in fact a constant a = 4 • Can replace variable with constant b = 1; b = 2;

i < n

s = s + a*b; i = i + 1; return s

Constant Propagation Transform Is b constant in s = s+a* b?

s = 0; Yes! s = 0; No! a = 4; On all reaching a = 4; One reaching i = 0; i = 0; k == 0 definitions k == 0 definition with a = 4 b = 1 b = 1; b = 2; b = 1; b = 2; One reaching definition with i < n i < n b = 2 s = s + 4*b; s = s + a*b; i = i + 1; return s i = i + 1; return s Computing Reaching Definitions 0000000 1: s = 0; 2: a = 4; • Compute with sets of definitions 3: i = 0; – represent sets using bit vectors k == 0 1110000 1110000 – each definition has a position in bit vector • At each basic block , compute 4: b = 1; 5: b = 2; – definitions that reach start of block i < n 11111001111111 11111001111111 – dfiiidefinitions th at reach end of fblk block 11111001111111 6: s = s + a*b; •Do comppyutation by simulating execution of 7: i=i+1;i = i + 1; return s program until the fixed point is reached

GEN[0] = 1110000 Formalizing Analysis KILL[0] = 0000011 1: s = 0; 2: a = 4; • Each basic block has 3: i = 0; – IN - set of definitions that reach beginning of block k == 0 – OUT - set of definitions that reach end of block GEN[1] = 0001000 GEN[2] = 0000100 – GEN - set of definitions generated in block KILL[1] = 0000100 4: b = 1; 5: b = 2; KILL[2] = 0001000 – KILL - set of definitions killed in the block i < n GEN[3] = 0000000 • GEN[+*bii+1][s = s + a*b; i = i + 1;] = 0000011 KILL[3] = 0000000 6: s = s + a*b; •KILL[[;;]s = s + a*b; i = i + 1;] = 1010000 7: i=i+1;i = i + 1; return s • Compiler scans each basic block to derive GEN GEN[4] = 0000011 GEN[5] = 0000000 and KILL sets KILL[4] = 1010000 KILL[5] = 0000000 Dataflow Equations Solving Equations • Use fixed point algorithm • IN[b] = OUT[b1] ∪ ... ∪ OUT[bn] • Initialize with solution of OUT[b] = 0000000 – where b1, ..., bn are predecessors of b in CFG • Repeatedly apply equations • OUT[b] = (IN[b] - KILL[b]) ∪ GEN[b] – IN[b] = OUT[b1] ∪ ... ∪ OUT[bn] • IN[entry] = 0000000 – OUT[b] = (IN[b] - KILL[b]) ∪ GEN[b] • Result: system of equations • Until reaching fixed point – I.e., until equation application has no further effect • Use a worklist to track which equation applications ma y have a further effect

Reaching Definitions Algorithm Questions for all nodes n in N OUT[n] = ∅; // OUT[n] = GEN[n]; Worklist = N; // N = all nodes in graph • Does the algorithm halt? while (Worklist != ∅) – yes, because transfer function is monotonic choose a node n in Worklist; – if increase IN, increase OUT Worklist = Worklist - { n }; – in limit, all bits are 1 IN[n] = ∅; • If bit is 1, is there always an execution in which for all nodes p in predecessors(n) IN[n] = IN[n] ∪ OUT[p]; OUT[n] = (IN[n] - KILL[n]) ∪ GEN[n]; corresponding de fin ition reach es b asi c bl ock? if (OUT[n] changed) • If bit is 0,,pg does the corresponding definition for all nodes s in successors(n) Worklist = Worklist ∪ { s }; ever reach basic block? • Concept of conservative analysis Available Expressions Computing Available Expressions

• An expression x+y is available at a point p if • Represent sets of expressions using bit vectors – every path from the initial node to p evaluates x+y • Each expression corresponds to a bit before reaching p, • Run dataflow algorithm similar to reaching – and there are no assignments to x or y after the definitions evaluation but before p. • Big difference: • Available Expression information can be used to do global (across basic blocks) CSE. – A definition reaches a basic block if it comes from ANY predecessor in CFG •Ifiiilblhif an expression is available at use, there is no – An expression is available at a basic block only if it need to re-evaluate it. is available from ALL predecessors in CFG

Available Expressions Example 0000 Global CSE Transform a = x+y; a = x+y; t = a; 0000 Expressions x == 0 Expressions x == 0 1: x+y 1001 1001 1: x+y 1001 x = z; 2: i < n x = z; 2: i < n b = x+y; 3: i+c b+b = x+y; 3: i+c t = b; 4: x == 0 1000 1000 4: x == 0 1000 i+i = x+y; i = t; must use same temp 1000 for CSE in all blocks 1000 i < n i < n 1100 1100 1100 1100 c = x+y; d = x+y c = t; d = t; i = i+c; i = i+c; Formalizing Analysis Dataflow Equations • Each basic block has • IN[b] = OUT[b1] ∩ ... ∩ OUT[bn] – IN - set of expressions available at start of block – where b1, ..., bn are predecessors of b in CFG – OUT - set of expressions available at end of block • OUT[b] = (IN[b] - KILL[b]) ∪ GEN[b] – GEN - set of expressions computed in block – KILL - set of expressions killed in the block • IN[entry] = 0000 • GEN[x = z; b = xxy]+y] = 1000 • Result: system of equations • KILL[x = z; b = x+y] = 1001 • Compiler scans each basic block to derive GEN and KILL sets

Solving Equations Available Expressions Algorithm • Use fixed point algorithm for all nodes n in N OUT[n] = E; // OUT[n] = E - KILL[n]; • IN[entry] = 0000 IN[Entry] = ∅; OUT[Entry] = GEN[Entry]; Worklist = N - { Entry }; // N = all nodes in graph • Initialize OUT[b] = 1111 while (Worklist != ∅) • Repeatedly apply equations choose a node n in Worklist; – IN[b] = OUT[b1] ∩ ... ∩ OUT[bn] Worklist = Worklist - { n }; – OUT[b] = ( IN[b] - KILL[b]) ∪ GEN[b] IN[n] = E; // E is set of all expressions for all nodes p in predecessors(n) • Use a worklist algorithm to track which IN[n] = IN[n] ∩ OUT[p]; equation applications may have further effect OUT[n] = (IN[n] - KILL[n]) ∪ GEN[n]; if (OUT[[]n] chang ed) for all nodes s in successors(n) Worklist = Worklist ∪ { s }; Questions Duality In Two Algorithms

• Does algorithm always halt? • Reaching definitions • If expression is available in some execution, is – Confluence operation is set union it always marked as available in analysis? – OUT[b] initialized to empty set • If expression is not available in some execution , • Available expressions can it be marked as available in analysis? – Confluence operation is set intersection • In what sense is the algorithm conservative? – OUT[biiilidb] initialized to set of avail ilblable expressi ons • General framework for dataflow algorithms • Build parameterized dataflow analyzer once, use for all dataflow analysis problems

Liveness Analysis What Use is Liveness Information? • • A variable v is live at point p if – If a variable is dead, we can reassign its register – v is used along some path starting at p, and • Dead code elimination – no definition of v along the path before the use. – Eliminate assignments to variables not read later • When is a variable v dead at point p? – But must not eliminate last assignment to variable – No use of v on any path from p to exit node, or (such as instance variable) visible outside CFG – Ifllf all path hfs from p, red dfiefine v b bfefore usi ng v. – Can eliminate other dead assignments – Handle by making all externally visible variables live on exit from CFG Conceptual Idea of Analysis Liveness Example 0101110

• Simulate execution • Assume a, b, c visible a = x+y; • But start from exit and go backwards in CFG outside function and t = a; thus are live on exit c = a+x; • Compute liveness information from end to x == 0 • Assume x,y,z,t are beginning of basic blocks 1100111 not visible on exit 1000111 • RtliRepresent liveness b = t+z; 1100100 using a bit vector c = y+1; – order is abcxyzt 1110000

Using Liveness Information for Formalizing Analysis Dead Code Elimination 0101110 • Each basic block has • Assume a, b, c visible a = x+y; – IN - set of variables live at start of block outside function and t = a; thus are live on exit c = a+x; – OUT - set of variables live at end of block x == 0 – USE - set of variables with upwards exposed uses in • Assume x,y,z,t are 1100111 block not visible on exit 1000111 – DEF - set of variables defined in block • RtliRepresent liveness b = t+z; 1100100 • USE[x = z; x = x+1;] = { z } (x not in USE) using a bit vector c = y+1; • DEF[x=z;x=x+1;y=1;]={xy}[x = z; x = x+1;y = 1;] = {x, y} – order is abcxyzt 1110000 • Compiler scans each basic block to derive USE and DEF sets Algorithm Similar to Other Dataflow OUT[Exit] = ∅; Algorithms IN[[]Exit] = USE[[];n]; • Backwards analysis, not forwards for all nodes n in N - { Exit } IN[n] = ∅; Worklist = N - {{t}; Exit }; • Still have transfer functions while (Worklist != ∅) • Still have confluence operators choose a node n in Worklist; • ClifktkfbthCan generalize framework to work for both Worklist = Worklist - { n }; forwards and backwards analyses OUT[n] = ∅; for all nodes s in successors(n) OUT[n] = OUT[n] ∪ IN[s]; IN[n] = USE[n] ∪ (OUT[n] - DEF[n]); if (IN[n] changed) for all nodes p in predecessors(n) Worklist = Worklist ∪{p};{ p };

Analysis Information Inside Basic Summary Blocks • One detail: • Basic blocks and basic block optimizations – Copy and constant propagation – Given dataflow information at IN and OUT of node – Common sub -expression elimination – Also need to compute information at each statement – Dead code elimination of basic block • Dataflow Analysis – Simple propagation algorithm usually works fine – Control flow graph – Can be viewed as restricted case of dataflow – IN[b], OUT[b], transfer functions, join points analysis • Paired of analyses and transformations – Reaching definitions/constant propagation – Available expressions/common sub-expression elimination – Liveness analysis/Dead code elimination