Using Program Analysis for Optimization

Analysis and Optimizations • Program Analysis – Discovers properties of a program Using Program Analysis • Optimizations – Use analysis results to transform program for Optimization – Goal: improve some aspect of program • numbfber of execute diid instructions, num bflber of cycles • cache hit rate • memory space (d(code or data ) • power consumption – Has to be safe: Keep the semantics of the program Control Flow Graph Control Flow Graph entry int add(n, k) { • Nodes represent computation s = 0; a = 4; i = 0; – Each node is a Basic Block if (k == 0) b = 1; s = 0; a = 4; i = 0; k == 0 – Basic Block is a sequence of instructions with else b = 2; • No branches out of middle of basic block while (i < n) { b = 1; b = 2; • No branches into middle of basic block s = s + a*b; • Basic blocks should be maximal ii+1;i = i + 1; i < n } – Execution of basic block starts with first instruction return s; s = s + a*b; – Includes all instructions in basic block return s } i = i + 1; • Edges represent control flow Two Kinds of Variables Basic Block Optimizations • Temporaries introduced by the compiler • Common Sub- • Copy Propagation – Transfer values only within basic block Expression Elimination – a = x+y; b = a; c = b+z; – Introduced as part of instruction flattening – a = (x+y)+z; b = x+y; – a = x+y; b = a; c = a+z; – Introduced by optimizations/transformations – t = x+y; a = t+z;;; b = t; • Program variables • Constant Propagation • Dead Code Elimination – x = 5; b = x+y; – Decldiiillared in original program – a = x+y; b = a; c = a+z; – x = 5; b = 5+y; – a = x+y; c = a+z – May transfer values between basic blocks • Algebraic Simplification • Strength Reduction – a = x * 1; – t = i * 4; – a = x; – ti<<2t = i << 2; Value Numbering Value Numbering for CSE • Normalize basic block so that all statements are ofhf the form • As we simulate execution of program – var = var opp( var (where op is a binary yp) operator) • Generate a new version of program – var = op var (where op is a unary operator) – Each new value assigned to temporary – var = var • a = xxy+y; becomes a = xxy+y;t; t = a; • Simulate execution of basic block – Temporary preserves value for use later in program even if the original variable is rewritten – Assign a virtual value to each variable • a = x+y; a = a+z; b = x+y becomes – Assign a virtual value to each expression • a=a = x+y;t=a;a=; t = a; a = a+z;b=t;; b = t; – Assign a temporary variable to hold value of each computed expression New Basic CSE Example Original Basic Block Block a = x+y a = x+y • Original • After CSE t1 = a b = a+z b = a+z a = x+y a = x+y b = b+y t2 = b b+b = a+z b+b = a+z c = a+z bbb = b+y b = b+y t = b t3 = b c = a+z bb+b = b+y Var to Val c = t2 • Issues c = t x → v1 y → v2 Exp to Val Exp to Tmp – Temporaries store values for use later a v3 → v1+v2 → v3 v1+v2 → t1 z v4 – CSE with different names → v3+v4 → v5 v3+v4 → t2 b → v5v6 v5+v2 → t3 • a = x; b = x+y; c = a+y; c → v5 v5+v2 → v6 – Excessive Temp Generation and Use Problems Copy Propagation • Algorithm has a temporary for each new value • Once again, simulate execution of program – a = x+y; t1 = a • If possible, use the original variable instead of a • Introduces temporary – lots of temporaries – a = x+y; b = x+y; – lots of copy statements to temporaries – After CSE becomes a = x+y; t = a; b = t; • In many cases, temporaries and copy statements – After CP becomes a = x+y; b = a; are unnecessary •Keyyg idea: determine when original variables are • So we eliminate them with copy propagation NOT overwritten between computation of stored and dead code elimination value and use of stored value Copy Propagation Maps Copy Propagation Example • Maintain two maps OiOrig inal After CSE After CSE and – tmp to var: tells which variable to use instead of a Copy Propagation given temporary variable a = x+y a = x+y a = x+y b = a+z t1 = a t1 = a – var to set (inverse of tmp to var): tells which temps b = a+z c = x+y b+b = a+z are mapped to a given variable by tmp to var t2 = b a = b t2 = b c = t1 c = a a = b a = b Copy Propagation Example Copy Propagation Example Basic Block Basic Block After Basic Block Basic Block After After CSE CSE and Copy Prop After CSE CSE and Copy Prop a = x+y a = x+y a = x+y a = x+y t1 = a t1 = a t1 = a t1 = a b=a+zb = a+z b = a+z t2 = b t2 = b tmp to var var to set tmp to var var to set t1 → aa →{t1} t1 → a a →{t1} t2 → b b →{t2} Copy Propagation Example Copy Propagation Example Basic Block Basic Block After Basic Block Basic Block After After CSE CSE and Copy Prop After CSE CSE and Copy Prop a = x+y a = x+y a = x+y a = x+y t1 = a t1 = a t1 = a t1 = a b=a+zb = a+z b = a+z b=a+zb = a+z b = a+z t2 = b t2 = b t2 = b t2 = b c = t1 c = t1 c = a tmp to var var to set tmp to var var to set t1 → a a →{t1} t1 → a a →{t1} t2 → b b →{t2} t2 → b b →{t2} Copy Propagation Example Copy Propagation Example Basic Block Basic Block After Basic Block Basic Block After After CSE CSE and Copy Prop After CSE CSE and Copy Prop a = x+y a = x+y a = x+y a = x+y t1 = a t1 = a t1 = a t1 = a b=a+zb = a+z b = a+z b=a+zb = a+z b = a+z t2 = b t2 = b t2 = b t2 = b c = t1 c = a c = t1 c = a a = b a = b a = b a = b tmp to var var to set tmp to var var to set t1 → a a →{t1} t1 → t1 a →{} t2 → b b →{t2} t2 → b b →{t2} Dead Code Elimination Dead Code Elimination • Copy propagation keeps all temps around • Basic Idea • There may be temps that are never read – Process code in reverse execution order • Dead Code Elimination (DCE) removes them – Maintain a set of variables that are needed later in computation Basic Block After Basic Block After – On encountering an assignment to a temporary that CSE + Copy Prop CSE + Copy Prop + DCE is not needed, we remove the assignment a = x+y a = x+y t1 = a b = a+z b = a+z c = a t2 = b a = b c = a a = b Basic Block After Basic Block After CSE and Copy Prop CSE and Copy Prop a = x+y a = x+y t1 = a t1 = a b = a+z b = a+z t2 = b t2 = b c = a c = a a = b a = b Assume that initially Needed Set Needed Set {a, c} {a, b, c} Basic Block After Basic Block After CSE and Copy Prop CSE and Copy Prop a = x+y a = x+y t1 = a t1 = a b = a+z b = a+z t2 = b c = a c = a a = b a = b Needed Set Needed Set {a, b, c} {a, b, c} Basic Block After Basic Block After CSE and Copy Prop CSE and Copy Prop a = x+y a = x+y t1 = a t1 = a b = a+z b = a+z c = a c = a a = b a = b Needed Set Needed Set {a, b, c, z} {a, b, c, z} Basic Block After Basic Block after CSE Copy Prop ag ation CSE and Copy Prop and Dead Code Elimination a = x+y a = x+y b = a+z b = a+z c = a c = a a = b a = b Needed Set Needed Set {a, b, c, z} {a, b, c, z} Basic Block after Interesting Properties CSE + Copy Propagation + Dead Code Elimination a = x+y • Analysis and Optimization Algorithms b = a+z Simulate Execution of Program – CSE and Copy Propagation go forward c = a – Dead Code Elimination goes backwards a = b • Optimizations are stacked Needed Set – GfbifiGroup of basic transformations {a, b, c, z} – Work together to get good result – Often, one transformation creates inefficient code that is cleaned uppy by subseq uent transformations Other Basic Block Transformations Dataflow Analysis • Constant Propagation • Used to determine properties of programs that • Strength Reduction involve multiple basic blocks – a << 2 = a * 4; • Typically used to enable transformations – a+a+aa + a + a = 3 * a; – common sub -expression elimination • Algebraic Simplification – constant and copy propagation – a = a * 1; – dddead co de e liiliminati on – b = b + 0; •Analyypsis and transformation often come in pairs • Unified transformation framework Reaching Definitions Reaching Definitions • Concept of definition and use s=0;s = 0; a = 4; –z = x+y i = 0; – is a definition of z k == 0 – is a use of x and y • A definition reaches a use if b = 1; b = 2; – valibdfiiilue written by definition i < n – may be read by use s = s + a*b; i = i + 1; return s Reaching Definitions and Constant Is a constant in s = s+a*b ? Propagation • Is a use of a variable a constant? s = 0; Yes! – Check all reaching definitions a = 4; On all reaching i = 0; – If all assign variable to same constant k == 0 definitions – Then use is in fact a constant a = 4 • Can replace variable with constant b = 1; b = 2; i < n s = s + a*b; i = i + 1; return s Constant Propagation Transform Is b constant in s = s+a*b ? s = 0; Yes! s = 0; No! a = 4; On all reaching a = 4; One reaching i = 0; i = 0; k == 0 definitions k == 0 definition with a = 4 b = 1 b = 1; b = 2; b = 1; b = 2; One reaching definition with i < n i < n b = 2 s = s + 4*b; s = s + a*b; i = i + 1; return s i = i + 1; return s Computing Reaching Definitions 0000000 1: s = 0; 2: a = 4; • Compute with sets of definitions 3: i = 0; – represent sets using bit vectors k == 0 1110000 1110000 – each definition has a position in bit vector • At each basic block , compute 4: b = 1; 5: b = 2; – definitions that reach start of block i < n 11111001111111 11111001111111 – dfiiidefinitions th at reac h end o fblkf block 11111001111111 6: s = s + a*b; •Do comppyutation by simulatin g execution of 7: i=i+1;i = i + 1; return s program until the fixed point is reached GEN[0] = 1110000 Formalizing Analysis KILL[0] = 0000011 1: s = 0; 2: a = 4; • Each basic block has 3: i = 0; – IN - set of definitions that reach beginning of block k == 0 – OUT - set of definitions that reach end of block GEN[1] = 0001000 GEN[2] = 0000100 – GEN - set of definitions generated in block KILL[1] = 0000100 4: b = 1; 5: b = 2; KILL[2] = 0001000 – KILL - set of definitions killed in the block i < n GEN[3] = 0000000 • GEN[+*bii+1][s = s + a*b; i = i + 1;] = 0000011 KILL[3] = 0000000 6: s = s + a*b; •KILL[[;;]s = s + a*b; i = i + 1;] = 1010000 7: i=i+1;i = i + 1; return s • Compiler scans each basic block to derive GEN GEN[4] = 0000011 GEN[5] = 0000000 and KILL sets KILL[4] = 1010000 KILL[5] = 0000000 Dataflow Equations Solving Equations • Use fixed point algorithm • IN[b] = OUT[b1] ∪ ..

Using Program Analysis for Optimization

Redundancy Elimination Common Subexpression Elimination

Integrating Program Optimizations and Transformations with the Scheduling of Instruction Level Parallelism*

Copy Propagation Optimizations for VLIW DSP Processors with Distributed Register Files ?

Garbage Collection

Phase-Ordering in Optimizing Compilers

CSE 401/M501 – Compilers

Machine Independent Code Optimizations

Code Improvement:Thephasesof Compilation Devoted to Generating Good Code

Abstract Interpretations, 14 Access Expressions, 137 Access Graph

Compiler Construction

Optimization.Pdf

Code Optimization 2