Common-Subexpression Elimination Partial Redundancy Elimination, (Repetition)

Loop Optimization An occurrence of an expression in a program is a common subexpression if there is another occurrence of the expression whose evaluation always precedes this one & Lazy Code Motion in execution order and if the operands of the expression remain unchanged between the two evaluations.

Local Common Subexpression Elimination (CSE) keeps track of the set of available expressions within a basic block and replaces instances of them by references to Repetition: CSE Advanced Compiler Techniques new temporaries that keep their value. … … 2005 ttt=x+y; a=( x+y)+ z; Erik Stenman a=ttt+z; b=a-1; Virtutech b=a-1; =x+y; c=ttt; … …

Before CSE After CSE

Advanced Compiler Techniques 4/22/2005 2 http://lamp.epfl.ch/teaching/advancedCompiler/

Partially Redundant Redundant Expressions Expressions An expression is redundant at a point p if, on every path to p An expression is partially redundant at p if it is redundant along 1. It is evaluated before reaching p, and some, but not all, paths reaching p. 2. None of its constituent values is redefined before p Example Example a←b+c b←b+1 a←b+c a←b+c b←b+1 a←b+c a←b+c a←b+c

Redundant Expressions Some occurrences of a←b+c b+c are redundant

Partially Redundant Expressions a←b+c a←b+c

Not all occurrences ← ← a b+c b b+1 of b+c are redundant! Inserting a copy of “a←b+c” after the definition ← a b+c of b can make it redundant.

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 3 http://lamp.epfl.ch/teaching/advancedCompiler/ 4 http://lamp.epfl.ch/teaching/advancedCompiler/

Loop Invariant Expressions Loop Optimizations

Another example: x←y*z x←y*z ♦ is important because most x←y*z a←b+c a←b+c of the execution time occurs in loops. ♦ b + c is partially First, we will identify loops. a←b+c redundant here a←b+c ♦ We will study three optimizations: ♦ Loop-invariant code motion. Loop Optimizations ♦ . Partially Redundant Expressions ♦ elimination. Loop invariant expressions are partially redundant. ♦ We will not talk about which ♦ Partial redundancy elimination performs code motion. ♦ Major part of the work is figuring out where to insert operations. is another important optimization technique.

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 5 http://lamp.epfl.ch/teaching/advancedCompiler/ 6 http://lamp.epfl.ch/teaching/advancedCompiler/

1 What is a Loop? Anomalous Situations

♦Set of nodes ♦ Two back ♦ edges, two Loop header loops, one ♦ Single node header ♦ All iterations of ♦ Compiler Loop Optimizations Loop Optimizations loop go through merges loops header ♦ ♦Back edge No loop header, no loop

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 7 http://lamp.epfl.ch/teaching/advancedCompiler/ 8 http://lamp.epfl.ch/teaching/advancedCompiler/

Defining Loops With Identifying Loops Dominators Recall the concept of dominators: ♦ Node n dominates a node m if all paths from the ♦ A loop has a unique entry point – the header. start node to m go through n. ♦ At least one path back to header. ♦ The immediate dominator of m is the last dominator ♦ Find edges whose heads (>) dominate tails (-), of m on any path from start node. these edges are back edges of loops. ♦ A dominator tree is a tree rooted at the start node: ♦ Given a back edge n→d: ♦ Loop Optimizations: Dominators Nodes are nodes of control flow graph. ♦ The node d is the loop header. ♦ Loop Optimizations: Identifying loops Edge from d to n if d is the immediate dominator of n. ♦ The loop consists of n plus all nodes that can reach n without going through d (all nodes “between” d and n)

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 9 http://lamp.epfl.ch/teaching/advancedCompiler/ 10 http://lamp.epfl.ch/teaching/advancedCompiler/

Loop Construction Algorithm Nested Loops

loop (d,n) ♦ If two loops do not have same header then loop = {d} ; stack = ∅; insert (n); ♦ Either one loop (inner loop) is contained in the while stack not empty do other (outer loop) ♦ Or the two loops are disjoint m = pop stack ; ♦ If two loops have same header, typically they for all p ∈ pred (m) do insert (p); are unioned and treated as one loop insert (m) 1 if m ∉ loop then Loop Optimizations: Identifying loops Loop Optimizations: Identifying loops 1 ∪ Two loops: loop = loop {m}; 2 3 {1,2} and {1, 3} push m onto stack ; 2 3 Unioned: {1,2,3} 4

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 11 http://lamp.epfl.ch/teaching/advancedCompiler/ 12 http://lamp.epfl.ch/teaching/advancedCompiler/

2 Loop Preheader Loop Optimizations

♦ Many optimizations stick code before loop. ♦Now that we have the loop, we can ♦ Put a special node (loop preheader ) before optimize it! loop to hold this code. ♦Loop invariant code motion: ♦ Move loop invariant code to the header. Loop Optimizations: Preheader Loop Optimizations: Loop invariant code motion

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 13 http://lamp.epfl.ch/teaching/advancedCompiler/ 14 http://lamp.epfl.ch/teaching/advancedCompiler/

Detecting Loop Invariant Loop Invariant Code Motion Code

If a computation produces the same value in ♦ A statement is loop-invariant if operands are every loop iteration, move it out of the loop. ♦ Constant, ♦ Have all reaching definitions outside loop, or t1 = 100*N for i = 1 to N for i = 1 to N ♦ Have exactly one reaching definition, and that x = x + 1 definition comes from an invariant statement for j = 1 to N x = x + 1 a[i,j] = 100*N + 10*i + j + xxx t2 = t1 + 10*i + x ♦ Concept of exit node of loop for j = 1 to N ♦ node with successors outside loop a[i,j] = t2 + j Loop Optimizations: Loop invariant code motion Loop Optimizations: Loop invariant code motion

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 15 http://lamp.epfl.ch/teaching/advancedCompiler/ 16 http://lamp.epfl.ch/teaching/advancedCompiler/

Loop Invariant Code Detection Algorithm Loop Invariant Code Motion for all statements in loop if operands are constant or have all reaching definitions ♦ Conditions for moving a statement s: x = y+z outside loop, mark statement as invariant into loop header: do ♦ s dominates all exit nodes of loop for all statements in loop not already marked invariant ♦ If it does not, some use after loop might get wrong value if operands are constant, have all reaching definitions ♦ Alternate condition: definition of x from s reaches no use outside loop (but moving s may increase run time) outside loop, or have exactly one reaching definition from ♦ No other statement in loop assigns to x invariant statement ♦ If one does, assignments might get reordered then mark statement as invariant ♦ No use of x in loop is reached by definition other

Loop Optimizations: Loop invariant code motion until there are no more invariant statements Loop Optimizations: Loop invariant code motion than s ♦ If one is, movement may change value read by use

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 17 http://lamp.epfl.ch/teaching/advancedCompiler/ 18 http://lamp.epfl.ch/teaching/advancedCompiler/

3 Order of Statements in Induction Variables Preheader Preserve data dependences from original program (can use order in which discovered by algorithm) Example: b = 2 b = 2 for j = 1 to 100 i = 0 i = 0 *(&A + 4*j) = 202 - 2*j a = b * b i < 80 c = a + a Basic Induction variable: J = 1, 2, 3, 4, ….. a = b * b i < 80 c = a + a Loop Optimizations: Induction Variables Induction variable &A+4*j: Loop Optimizations: Loop invariant code motion i = i + c &A+4*j = &A+4, &A+8, &A+12, &A+16, …. i = i + c

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 19 http://lamp.epfl.ch/teaching/advancedCompiler/ 20 http://lamp.epfl.ch/teaching/advancedCompiler/

What are induction variables? Types of Induction Variables

♦ x is an induction variable of a loop L if ♦ Base induction variable: ♦ variable changes its value every iteration of the ♦ Only assignments in loop are of form i = i ± c loop ♦ Derived induction variables: ♦ the value is a function of number of iterations of ♦ Value is a linear function of a base induction the loop variable. ♦ Within loop, j = c*i + d, where i is a base induction ♦ In programs, this function is normally a linear variable. Loop Optimizations: Induction Variables Loop Optimizations: Induction Variables function ♦ Very common in array index expressions – Example: for loop index variable j, function d + c*j an access to a[i] produces code like p = a + 4*i.

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 21 http://lamp.epfl.ch/teaching/advancedCompiler/ 22 http://lamp.epfl.ch/teaching/advancedCompiler/

Strength Reduction for Elimination of Superfluous tion Derived Induction Variables Induction Variables

i = 0 i = 0 i = 0 p = 0 p = 0 p = 0 i < 10 i < 10 p < 40 i < 10

i = i + 1 i = i + 1 use of p use of p i = i + 1 p = p + 4 use of p p = 4 * i p = p + 4 use of p p = p + 4 Loop Optimizations: Induction Variables – Elimination Loop Optimizations: Induction Variables – Strength Reduc

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 23 http://lamp.epfl.ch/teaching/advancedCompiler/ 24 http://lamp.epfl.ch/teaching/advancedCompiler/

4 Output of Induction Variable Three Algorithms Detection Algorithm

♦ Detection of induction variables: ♦Set of induction variables: ♦ Find base induction variables. ♦ base induction variables. ♦ Each base induction variable has a family of ♦ derived induction variables. derived induction variables, each of which is a ♦ linear function of base induction variable. For each induction variable j, a triple : ♦ Strength reduction for derived induction ♦ variables. i is a base induction variable. Loop Optimizations: Induction Variables Loop Optimizations: Induction Variables ♦ the value of j is i*c+d. ♦ Elimination of superfluous induction ♦ variables. j belongs to family of i.

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 25 http://lamp.epfl.ch/teaching/advancedCompiler/ 26 http://lamp.epfl.ch/teaching/advancedCompiler/

Induction Variable Detection tion Strength Reduction Algorithm t = 202 Scan loop to find all base induction variables for j = 1 to 100 do t = t - 2 Scan loop to find all variables k with one assignment of form k = j*b where j is an induction variable with triple *(abase + 4*j) = t (j = i*c + d, k = (i*c+d)*b = i*c*b + d*b) Basic Induction variable: make k an induction variable with triple J = 1, 2, 3, 4, ….. Scan loop to find all variables k with one assignment of 1 1 1 ± form k = j b where j is an induction variable with triple Induction variable 202 - 2*j (j = i*c + d, k = (i*c+d) ± b = i*c*b + d ± b) Loop Optimizations: Induction Variables t = 202, 200, 198, 196, ….. make k an induction variable with triple -2 -2 -2 until no more induction variables are found Induction variable abase+4*j: Loop Optimizations: Induction Variables – Strength Reduc abase+4*j = abase+4, abase+8, abase+12, abase+16, ….

Advanced Compiler Techniques 4/22/2005 4 4 4 Advanced Compiler Techniques 4/22/2005 27 http://lamp.epfl.ch/teaching/advancedCompiler/ 28 http://lamp.epfl.ch/teaching/advancedCompiler/

Strength Reduction for tion Strength Reduction Algorithm tion Derived Induction Variables for all derived induction variables j with triple i = 0 i = 0 p = 0 Create a new variable s i < 10 i < 10 Replace assignment j = i*c+d with j = s Immediately after each assignment i = i + e , i = i + 1 i = i + 1 insert statement s = s + c*e (c*e is constant) use of p use of p p = 4 * i p = p + 4 place s in family of i with triple Insert s = c*i+d into preheader Loop Optimizations: Induction Variables – Strength Reduc Loop Optimizations: Induction Variables – Strength Reduc

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 29 http://lamp.epfl.ch/teaching/advancedCompiler/ 30 http://lamp.epfl.ch/teaching/advancedCompiler/

5 Induction Variable tion Example Elimination Choose a base induction variable i such that double A[256], B[256][256] double A[256], B[256][256] j = 1 j = 1 only uses of i are in a = &A + 8 b = &B + 2056 // 2048+8 termination condition of the form i < n while(j<100) while(j<100) assignment of the form i = i + m A[j] = B[j][j] *a = *b j = j + 2 j = j + 2 Choose a derived induction variable k with a = a + 16 b = b + 4112 // 4096+16 Replace termination condition with k < c*n+d Loop Optimizations: Induction Variables – Elimination Loop Optimizations: Induction Variables – Strength Reduc

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 31 http://lamp.epfl.ch/teaching/advancedCompiler/ 32 http://lamp.epfl.ch/teaching/advancedCompiler/

Summary Loop Optimization

♦Important because lots of time is spent in loops. ♦Detecting loops. ♦Loop invariant code motion. ♦Induction variable analyses and Loop Optimizations: Summary optimizations: ♦ Strength reduction. ♦ Induction variable elimination.

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 33 http://lamp.epfl.ch/teaching/advancedCompiler/ 34 http://lamp.epfl.ch/teaching/advancedCompiler/

Lazy Code Motion Lazy Code Motion

The concept The intuitions ♦ Solve data-flow problems that reveal limits of code motion ♦ Compute available expressions ♦ Compute INSERT & DELETE sets from solutions ♦ Compute anticipable expressions ♦ Linear pass over the code to rewrite it (using INSERT & DELETE ) ♦ These lead to an earliest placement for each expression ♦ Push expressions down the CFG until it changes behavior The history ♦ Partial redundancy elimination ( Morel & Renvoise, CACM, 1979 )

Lazy Lazy Code Motion Lazy Code Motion Assumptions ♦ Improvements by Drechsler & Stadel, Joshi & Dhamdhere, Chow, ♦ Uses a lexical notion of identity (not value identity ) Knoop, Ruthing & Steffen, Dhamdhere, Sorkin, … ♦ Code is in an Intermediate Representation with unlimited name space ♦ All versions of PRE optimize placement ♦ Consistent, disciplined use of names ♦ Guarantee that no path is lengthened ♦ Identical expressions define the same name Avoids copies ♦ LCM was invented by Knoop et al. in P LDI , 1992 ♦ No other expression defines that name ♦ We will look at a variation by Drechsler & Stadel }Result serves as proxy SIGPLAN Notices, 28(5), May, 1993

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 35 http://lamp.epfl.ch/teaching/advancedCompiler/ 36 http://lamp.epfl.ch/teaching/advancedCompiler/

6 Lazy Code Motion: Running Lazy Code Motion Example B 1: r1←1 The Name Space Variables: r2←r1 ♦ ri+rj→rk, always (hash to find k ) r ←r +@m r2,r 4,r 8 ♦ 3 0 We can refer to ri+rj by rk (bit-vector sets ) Expressions: r4←r3 ♦ Variables must be set by copies r ,r ,r ,r ,r ,r ,r r5←(r1< r2) 1 3 5 6 7 20 21 ♦ B B No consistent definition for a variable if r5 then 2 else 3 B ♦ 2: Break the rule for this case, but require rsource < rdestination B 1 Lazy Lazy Code Motion Lazy Code Motion r ←r *r ♦ To achieve this, assign register names to variables first 20 17 18 r21 ←r19 +r20 ← r8 r21 B ← 2 Without this name space r6 r2+1 ← ♦ LCM must insert copies to preserve redundant values r2 r6 ← ♦ LCM must compute its own map of expressions to unique ids r7 (r2>r4) B B B 3 if r7 then 3 else 2 B 3: ...

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 37 http://lamp.epfl.ch/teaching/advancedCompiler/ 38 http://lamp.epfl.ch/teaching/advancedCompiler/

Lazy Code Motion: Running Lazy Code Motion Example B : 1 Variables: r1←1 Predicates (computed by Local Analysis) r2,r 4,r 8 r ←r ♦ DEE XPR (b) contains expressions defined in b that survive to the 2 1 Expressions: ← @ end of b. r3 r0+ m r ,r ,r ,r ,r ,r ,r r ←r 1 3 5 6 7 20 21 e ∈ DEE XPR (b) ⇒ evaluating e at the end of b produces the same 4 3 value for e as evaluating it in its original position. r5←(r1< r2) if r then B else B 5 2 3 BBB1 BBB2 BBB3 ♦ UEE XPR B (b) contains expressions defined in b that have upward 2: DEE XPR r1, r3, r5 r7, r20, r21

Lazy Lazy Code Motion exposed arguments (both args). Lazy Code Motion UEE XPR r1, r3 r6, r20 r20 ←r17 *r 18 e ∈ UEE XPR (b) ⇒ evaluating e at the start of b produces the same r ←r +r KKKILLEDILLED EEEXPR r5, r6, r7 r5, r6, r7, r21 value for e as evaluating it in its original position. 21 19 20 r8←r21 ← ♦ KILLED EXPR (b) contains those expressions whose arguments are r6 r2+1 (re)defined in b. r2←r6 e ∈ KILLED EXPR (b) ⇒ evaluating e at the start of b does not produce r7←(r2>r4) the same result as evaluating it at its end. B B if r7 then 3 else 2 B 3: ...

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 39 http://lamp.epfl.ch/teaching/advancedCompiler/ 40 http://lamp.epfl.ch/teaching/advancedCompiler/

Lazy Code Motion Lazy Code Motion Availability Anticipability

AVAIL IN(n) = ∩∩∩ AVAIL OUT (m), n ≠ n m∈ preds( n) 0 ∩ ANT OUT (n) = ∩∩m∈ succs( n) ANT IN(m), n not an exit block

AVAIL OUT (m) = DEE XPR (m) ∪ (AVAIL IN(m) ∩ KILLED EXPR (m)) ANT IN(m) = UEE XPR (m) ∪ (ANT OUT (m) ∩ KILLED EXPR (m))

Initialize AVAIL IN(n) to the set of all names, except at n 0 Initialize ANT OUT (n) to the set of all names, except at exit blocks Set AVAIL IN(n ) to Ø 0 Set ANT OUT (n) to Ø, for each exit block n Lazy Lazy Code Motion Interpreting AVAIL Lazy Code Motion Interpreting ANT OUT ♦ e ∈ AVAIL OUT (b) ⇔ evaluating e at end of b produces the same value ♦ ∈ ANT IN ⇔ for e. AVAIL OUT tells the compiler how far forward e can move the e (b) evaluating e at start of b produces the same value for evaluation of e, ignoring any uses of e. e. ANT IN tells the compiler how far backward e can move ♦ ♦ This differs from the way we talk about AVAIL in global redundancy This view shows that anticipability is, in some sense, the inverse of elimination. availability (& explains the new interpretation of AVAIL ).

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 41 http://lamp.epfl.ch/teaching/advancedCompiler/ 42 http://lamp.epfl.ch/teaching/advancedCompiler/

7 Lazy Code Motion Lazy Code Motion

Earliest placement Later (than earliest) placement

LATER IN(j) = ∩∩∩ LATER (i,j), j ≠ n EARLIEST (i,j) = ANT IN(j) ∩ AVAIL OUT (i) ∩ (KILLED EXPR (i) ∪ ANT OUT (i)) i ∈ preds( j) 0

∪ ∩ ∩ LATER (i,j) = EARLIEST (i,j) (LATER IN(i) UEE XPR (i)) EARLIEST (n0,j) = ANT IN(j) AVAIL OUT (n0)

Initialize LATER IN(n0) to Ø x ∈ LATER IN(k) ⇔ every path that reaches k has x ∈ EARLIEST (m) for

Lazy Lazy Code Motion EARLIEST is a predicate Lazy Code Motion some block m, and the path from m to k is x-clear & does not ♦ Computed for edges rather than nodes (placement ) evaluate x. ♦ e ∈ EARLIEST (i,j) if ⇒ the compiler can move x through k without losing any benefit. ♦ It can move to head of j, x ∈ LATER (i,j) ⇔ is its earliest placement, or it can be moved ♦ It is not available at the end of i, and forward from i (LATER (i)) and placement at entry to i does not ♦ either it cannot move to the head of i (KILLED EXPR (i)) anticipate a use in i (moving it across the edge exposes that use ). ♦ or another edge leaving i prevents its placement in i (ANT OUT (i))

Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 43 http://lamp.epfl.ch/teaching/advancedCompiler/ 44 http://lamp.epfl.ch/teaching/advancedCompiler/

Lazy Code Motion Lazy Code Motion

Rewriting the code INSERT (i,j) = LATER (i,j) ∩ LATER IN(j) Edge placement ♦ x ∈ INSERT (i,j) DELETE (k) = UEE XPR (k) ∩ LATER IN(k), k ≠ n 0 B B B B h i i i x INSERT & DELETE are predicates x B B B x B B j k Compiler uses them to guide the rewrite step j j k Lazy Lazy Code Motion Lazy Code Motion ♦ x ∈ INSERT (i,j) ⇒ insert x at start of i, end of j, or |succs (i)| = 1 |preds (j)| = 1 |succs (i) > 1 |preds (j)| > 1 new block Three cases ♦ ∈ ⇒ ♦ |succs (i)| = 1 ⇒ insert x at end of i. x DELETE (k) delete first evaluation of x in k ♦ |succs (i)| > 1 but | preds (j)| = 1 ⇒ insert x at start of j. ♦ |succs (i)| > 1 and | preds (j)| > 1 ⇒ create new block in for x. If local redundancy elimination has already been performed, only one copy of x exists. Otherwise, remove all upward exposed copies of x. Advanced Compiler Techniques 4/22/2005 Advanced Compiler Techniques 4/22/2005 45 http://lamp.epfl.ch/teaching/advancedCompiler/ 46 http://lamp.epfl.ch/teaching/advancedCompiler/

Lazy Code Motion Example

B1 B2 B3 B1:r1←1 DEE XPR r1, r3, r5 r7, r20, r21 r2←r1 B UEE XPR r1, r3 r6, r20 r3←r0+@m 1 KKKILLEDILLED EEEXPR r5, r6, r7 r5, r6, r7,r21 r4←r3 r5←(r 1r ) 7 2 4 1,2 1,3 2,2 2,3 if r 7 then B 3 else B 2 EEEARLIEST r20, r21 { } { } { } B3:...

Example is too small to show off LATER

INSERT (1,2) = { r 20 , r 21 }

DELETE (2) = { r 20 , r 21 }

Advanced Compiler Techniques 4/22/2005 47 http://lamp.epfl.ch/teaching/advancedCompiler/

8