Loops are Key 15-745 Lecture 5 • Loops are extremely important – the “90-10” rule Control flow analysis • Loop optimization involves Natural loops – understanding control-flow structure Classical Loop Optimizations – Understanding data-dependence information DdiDependencies – sensitivity to side-effecting operations – extra care in some transformations such as register spilling
Copyri ght © Se th Gold st ei n, 2008
Based on slides from Lee & Callahan Lecture 5 15-745 © 2008 1 Lecture 5 15-745 © 2008 2
Common loop optimizations Finding Loops
• To optimize loops, we need to find them! • Hoisting of loop-invariant computations Scalar opts, DF analysi s, • Could use source language loop information in – pre-compute before entering the loop Control flow analysis the abstract syntax tree… • Elimination of induction variables •BUT: –change p=i*w+ b to p=b,p+=w, when w,b invari ant – There are multiple source loop constructs: for, while, • Loop unrolling do-while, even goto in C – to improve scheduling of the loop body – Want IR to support different languages • Software pipelining Requires understanding – Ideally, we want a single concept of a loop so all have – To improve scheduling of the loop body data same analysis, same optimizations dependencies • Loop permutation – Solution: dismantle source-level constructs, then –to improve cache memory performance re-find loops from fundamentals
Lecture 5 15-745 © 2008 3 Lecture 5 15-745 © 2008 4 Finding Loops Control-flow analysis
• To optimize loops, we need to find them! • Many languages have goto and other complex • Specifically: control, so loops can be hard to find in general – loop-header node(s) • nodes in a loop that have immediate predecessors • Determininggpg the control structure of a program not in the loop is called control-flow analysis – back edge(s) • cntlcontrol-flow edge s to previously executed nodes • Based on the notion of dominators – all nodes in the looppy body
Lecture 5 15-745 © 2008 5 Lecture 5 15-745 © 2008 6
Dominators Back edges and loop headers
•a dom b • A control-flow edge Entry – node a dominates b if every possible from node B3 to B2 B1 k = false execution path from entry to b includes a is a back edge if i = 1 B2 dom B3 j = 2 • a sdom b i <= n B2 – a strictly dominates b if a dom b and a != b • Furthermore, in that case node B2 j = j*2 ..k.. B4 B3 k = true •a idom b is a loop header i = i+1 –a imme dildiately didominates b if a s dom b, AND print j i = i+1 B5 B6 there is no c such that a sdom c and c sdom b exit
Lecture 5 15-745 © 2008 7 Lecture 5 15-745 © 2008 8 Natural loop Examples
• Consider a back edge from node n to node h
• The natural loop of n→h is the set of nodes L Simple example: such that for all x∈L: –h dom x and – there is a path from x to n not containing h (often it’s more complicated, since a FOR loop foun d in the source code might need an if/then guard)
Lecture 5 15-745 © 2008 9 Lecture 5 15-745 © 2008 10
Examples Examples
Try this: a for (..) { if { b c … } else { d e … e if (x) { e; f break; ) } }
Lecture 5 15-745 © 2008 11 Lecture 5 15-745 © 2008 12 Examples Examples
for (..) { for (..) { if { if { … lexically, in loop, … lexically, in loop, } else { but not in } else { but not in … natural loop e … natural loop e if (x) { if (x) { e; e; break; break; and another ) ) reason why CFG } } analysis is } } preferred over source/AST loops
Lecture 5 15-745 © 2008 13 Lecture 5 15-745 © 2008 14
Examples Natural Loops
• Yes, it can happen in C One loop per header.. What are the natural loops?
0 0 0 {012}{0,1,2} 1 2 1 1
2 2 0
3 3 {} 1 2 {1,2,3} {1,2},{0,1,2,3}
Lecture 5 15-745 © 2008 15 Lecture 5 15-745 © 2008 16 Nested Loops General Loops
• Unless two natural loops have the same header, • A more general looping structure is a strongly they are either di sjoint or nested wi thin each connectdted componen t tof the con tro l flow grap h other – subgraph
Lecture 5 15-745 © 2008 17 Lecture 5 15-745 © 2008 18
Reducible Flow Graphs Reducible flow graphs Definition: A flow graph G is reducible iff we can partition the edges into two disjoint groups, forward edges and back There is a sppgp,ecial class of flow graphs, called edges, with the following two properties. reducible flow graphs, for which several code- optimizations are especially easy to perform. 1. The forward edges form an acyclic graph in which every node can be reached from the initial node of G. 2. The back edges consist only of edges whose heads In reducible flow graphs loops are unambiguously dominate their tails. defined and dominators can be efficiently Why isn’t this computed. reducible? 0
This flow graph has no back edges. Thus, it would be 1 2 reducible if the entire graph were acyclic, which is not the case.
Lecture 5 15-745 19 © 2008 Lecture 5 15-74520 © 2008 Alternative definition Properties of Reducible Flow Graphs • Definition: A flow graph G is reducible if we can repeatedly collapse (reduce) together blocks (x,y) where x is the only predecessor of y (ignoring self • In a reducible flow ggp,raph, loops) until we are left with a single node all loops are natural loops • Can use DFS to find loops 0 0 • Many analyses are more efficient 0 0 – polynomial versus exponential 1 1,2,3 1,2 1,,,2,3 2 3 01230,1,2,3 3
Lecture 5 15-745 21 © 2008 Lecture 5 15-74522 © 2008
Good News Dealing with Irreducibility
• Most flow graphs are reducible •Don’t • Languages prohibit irreducibility • Can split nodes and duplicate code to get – goto free C reducible graph – Java – possible exponential blowup • Programmers usually don’t use such constructs • Other techniques… even if they’re available 0 – >90% of old Fortran code reducible 0 1 2 1 2 2a
Lecture 5 15-74523 © 2008 Lecture 5 15-745 © 2008 Loop-invariant computations
•A definition Loop optitiitis:mizations: t = x op y in a loop is (conservatively) loop-invariant if Hoistinggp of loop-invariant – x and y are constants, or computations – all reaching definitions of x and y are outside the loop, or – only one definition reaches x (or y), and that definition is loop-invariant • so keep marking iteratively
Lecture 5 15-745 © 2008 25 Lecture 5 15-745 © 2008 26
Loop-invariant computations Hoisting
• Be careful: • In order to “hoist” a loop-invariant computation out of a loop, we need a place to put it t = expr; for () { • We could copy it to all immediate predecessors s = t * 2; (except along the back-edge) of the loop t = loop_invariant_expr; header... x = t + 2; … } • ...But we can avoid code duppyglication by inserting a new block, called the pre-header • Even though t’s two reaching expressions are each invariant, s is not invariant…
Lecture 5 15-745 © 2008 27 Lecture 5 15-745 © 2008 28 Hoisting Hoisting
preheaders A A A’
A B B B’
B
Lecture 5 15-745 © 2008 29 Lecture 5 15-745 © 2008 30
Hoisting conditions We need to be careful...
• For a loop-invariant definition • All hoisting conditions must be satisfied!
d: t = x op y L0: L0: L0: t = 0 t = 0 t = 0 • we can hoist d into the loop’s pre-header only if L1: L1: L1: i = i + 1 if i>=N goto L2 i = i + 1 1. d’s block dominates all loop exits at which t is live- t = a*ba * b i = i+1i + 1 t = a*ba * b out, and M[i] = t t = a * b M[i] = t if i OK violates 1,3 violates 2 Lecture 5 15-745 © 2008 31 Lecture 5 15-745 © 2008 32 We need to be careful... • All hoisting conditions must be satisfied! L0: L0: L0: t = 0 t = 0 t = 0 L1: L1: L1: Loop optimizations: i = i + 1 if i>=N goto L2 i = i + 1 t = a*ba * b i = i+1i + 1 t = a*ba * b this de f M[i] = t t = a * b M[i] = t reaches Induction-variable if i OK violates 1,3 violates 2 Lecture 5 15-745 © 2008 33 Lecture 5 15-745 © 2008 34 The basic idea of IVE Simple Example of IVE • Suppose we have a loop variable – i initially 0; each iteration i = i + 1 i <- 0 H: if i >= n goto exit for i = 0 to n j – iitiliinitialize x = io * c1 + c2 (execute once) Clearly, j & k do not need to be computer anew – increment x by c1 each iteration each time since they are related to i and i changes linearly. Lecture 5 15-745 © 2008 35 Lecture 5 15-745 © 2008 36 Simple Example of IVE Simple Example of IVE i <- 0 i <- 0 i <- 0 j' <- 0 j' <- 0 i <- 0 H: k' <- a k' <- a k' <- a if i >= n goto exi t H: H: H: j <- i * 4 if i >= n goto exit if i >= n goto exit if i >= n goto exit k <- j + a j <- j' j <- j' k <- k' M[k] <- 0 k <- k' k <- k' M[k] <- 0 i <- i + 1 M[k] <- 0 M[k] <- 0 i <- i + 1 goto H i <- i + 1 i <- i + 1 k' <- k' + 4 j' <- j' + 4 j' <- j' + 4 goto H k' <- k'+4k' + 4 k' <- k'+4k' + 4 goto H goto H Bu t, then we don't even need j (or j') Do we need i? Lecture 5 15-745 © 2008 37 Lecture 5 15-745 © 2008 38 Simple Example of IVE Simple Example of IVE Rewrite comparison Invariant code motion on a+(n*4) i <- 0 i <- 0 i <- 0 k' <- a k' <- a k' <- a k' <- a n' <- a + (n * 4) H: H: H: H: if i >= n goto exit if k' >= a+(n*4) goto exi if k' >= a+(n*4)goto exit if k' >= n' goto exit k <- k' k <- k' k <- k' k <- k' M[k] <- 0 M[k] <- 0 M[k] <- 0 M[k] <- 0 i <- i + 1 k' <- k' + 4 k' <- k' + 4 k' <- k' + 4 k' <- k' + 4 goto H goto H goto H goto H But, a+(n*4) is loop invariant now, we do copy propagati on and eliilimina te k Lecture 5 15-745 © 2008 39 Lecture 5 15-745 © 2008 40 Simple Example of IVE Simple Example of IVE Compare original and result of IVE Copy propagation k' <- a i <- 0 k' <- a k' <- a n' <- a + (n * 4) H: n' <- a + (n * 4) n' <- a + (n * 4) H: if i >= n goto exit H: H: if k' >= n' goto exit j <- i*4i * 4 if k' >= n' goto exit if k' >= n' goto exit M[k'] <- 0 k <- j + a k <- k' M[k'] <- 0 k' <- k' + 4 M[k] <- 0 M[k] <- 0 k' <- k' + 4 goto H i Voila! Voila! Lecture 5 15-745 © 2008 41 Lecture 5 15-745 © 2008 42 What we did Is it faster? • identified induction variables (i,j,k) • strength reduction (changed * into +) • On some hardware, adds are much faster than • dead-code elimination (j <- j') multiplies • useless-variable elimination (j' <- j' + 4) (This can also be done with ADCE) • Furthermore, one fewer value is computed, • loop invariant identification & code-motion – thus potentially saving a register • almost useless-variable elimination (i) – and decreasing the possibility of spilling • copy propagation Lecture 5 15-745 © 2008 43 Lecture 5 15-745 © 2008 44 Loop preparation How to do it, step 1 • Before attempting IVE, it is best to first • First, find the basic IVs perform : – scan loop body for defs of the form – constant propagation & constant folding x = x + c or x = x – c – copy propagation where c is loop-invariant – loop-invariant hoisting – record these basic IVs as x = (x, 1, c) – this represents the IV: x = x * 1 + c Lecture 5 15-745 © 2008 45 Lecture 5 15-745 © 2008 46 Representing IVs How to do it, step 2 • Characterize all induction variables by: • Scan for derived IVs of the form k = i * c1 + c2 (base-variable, offset, multiple) – where i is a basic IV, – where the offset and multippple are loop- this is the only def of k in the loop, and invariant c1 and c2 are loop invariant • IOW, after an induction variable is defined it • We say k is in the family of i equals: • Record as k = (i, c1, c2) offse t + mu ltip le * base-varibliable Lecture 5 15-745 © 2008 47 Lecture 5 15-745 © 2008 48 How to do it, step 3 Simple Example of IVE • Iterate, looking for derived IVs of the form i <- 0 k = j * c1 + c2 H: if i >= n goto exit – where IV j =(i, a, b), and j <- i * 4 k <- j + a – this is the only def of k in the loop, and M[k] <- 0 – there is no def of i between the def of j and i <- i + 1 goto H the def of k i: (i, 1, 1) i.e., i = 1 + 1 * i – c1 and c2 are loop invariant j: (i, 0, 4) i.e., j = 0 + 4 * i • Record as k = (i, ac1a*c1, bc1+c2)b*c1+c2) k: (i, a, 4) i.e., k = a + 4 * i So, j & k are in fam ily o f i Lecture 5 15-745 © 2008 49 Lecture 5 15-745 © 2008 50 Finding the IVs How to do it, step 4 • Maintain three tables: basic & maybe & other • This is the strength reduction step • Find basic Ivs: Scan stmts. If var ∉ maybe, and of proper • For an induction variable k = (i, c1, c2) form,,p put into basic. Otherwise, ,p put var in other and remove from maybe. – initialize k = i * c1 + c2 in the preheader •Find compound Ivs: – replace k’ s def in the loop by – If var defined more than once, put into other k = k + c1 – For all stmts of ppproper form that use a basic – make sure to do this after i’ s def IV » FIX THIS SLIDE Lecture 5 15-745 © 2008 51 Lecture 5 15-745 © 2008 52 How to do it, step 5 Notes • This is the comparison rewriting step • Are the c1, c2 constant, or just invariant? – if constant, then you can keep folding them: they’re always a constant even for derived IVs • For an induction variable k = (i, ak, bk) – If k used only in definition and comparison – otherwise, they can be expressions of loop- – There exists another variable, j, in the same invariant variables class and is not “useless ” and j=(i, aj, bj) •Rewrite k < n as • But if constant, can find IVs of the type j < (bj/bk)(n-ak)+aj x = i/b • Note: since they are in same class: and know that it’s legal, if b evenly divides the stride… (j-aj)/bj = (k-ak)/bk Lecture 5 15-745 © 2008 53 Lecture 5 15-745 © 2008 54 Is it faster? (2) Common loop optimizations • On some hardware, adds are much faster than multipl ies • Hoisting of loop-invariant computations • But…not always a win! – pre-compute before entering the loop • Elimination of induction variables – Constant multiplies might otherwise be –change p=i*w+ b to p=b,p+=w, when w,b invari ant reduced to shifts/adds that result in even • Loop unrolling better code than IVE – to to improve scheduling of the loop body – Scaling of addresses (i*4) might come for • Software pipelining Requires free on your processor’s address modes understanding – To improve scheduling of the loop body data dependencies • So maybe: only convert i*c1+c2 when c1 is • Loop permutation loop invariant but not a constant –to improve cache memory performance Lecture 5 15-745 © 2008 55 Lecture 5 15-745 © 2008 56 Dependencies in Loops Defining Dependencies • Loop independent data dependence occurs • Flow Dependence W Î R δf true between accesses in the same loop iteration. • Anti-Dependence R Î W δa false • Loop-carried data dependence occurs between • Output Dependence W Î W δo accesses across different loop iterations. • There is data dependence between acce ss a at iterat i on i-k and access b at iteration i when: S1) a=0; S2) b=a; –aand b access the same memory location S3) c=a+d+e; – There is a path from a to b S4) d=b; – EEtherither a or b is a write S5) b= 5+e; Lecture 5 15-745 © 2008 57 Lecture 5 15-745 © 2008 58 Example Dependencies Data Dependence in Loops S1) a=0; These are scalar dependencies.1 The S2) b=a; • Dependence can flow across iterations of same idea holds for memory accesses. S3) c= a+d+e; the loop. S4) d=b; 2 • Dependence information is annotated with source type target due to S5) b=5+e; iteration information. S1 δf S2 a • If dependence is across iterations it is loop S1 δf S3 a 3 S2 δf S4 b carried otherwise loop independent. S3 δa S4 d 4 S4 δa S5 b for (i=0; i Lecture 5 15-745 © 2008 59 11/20/01 15-411 Fall '01 © Seth Copen Goldstein 2001 60 Data Dependence in Loops Unroll Loop to Find Dependencies for (i=0; i 11/20/01 15-411 Fall '01 © Seth Copen Goldstein 2001 61 11/20/01 15-411 Fall '01 © Seth Copen Goldstein 2001 62 Iteration Space Distance Vector Every iteration generates a point in an n- dimensional space, where n is the depth of for (i=0; i 11/20/01 15-411 Fall '01 © Seth Copen Goldstein 2001 63 11/20/01 15-411 Fall '01 © Seth Copen Goldstein 2001 64 Example of Distance Vectors Example of Distance Vectors for (i=0; i 11/20/01 15-411 Fall '01 © Seth Copen Goldstein 2001 65 11/20/01 15-411 Fall '01 © Seth Copen Goldstein 2001 66 Data Dependences Review of Linear Algebra Lexicoggpraphic Order Loop carried: between two statements instances in two different iterations of a loop. Loop iiddtndependent: btbetween two s tttatemen ts Two n-vectors a and b are equal, a = b, if ai = bi, 1 ≤ i ≤ n. instances in the same loop iteration. We say that a is less than b, a Lexically forward: the source comes before the target . We say that a is lexicographically less than b, at level j, LLywexically backward: otherww.ise. a «j b, if ai = bi, 1 ≤ i < j and aj We say that a is lexicographically less than b, a « b, if ≤ ≤ The right-hand side of an assignment is considered there is a j, 1 j n, such ththtat a «j b. to precede the left-hand side. Lecture 5 15-745 © 2008 67 Lecture 5 15-745 © 2008 68 Lexicographic Order Properties of Lexicographic Order Example of vectors Consider the vectors a and b below : Let n ≥ 1, and i, j, k denote arbitrary vectors in Rn n ⎡ 1 ⎤ ⎡ 1 ⎤ 1 For each u in 1 ≤ u ≤ n, the relation «u in R is ⎢−1⎥ ⎢−1⎥ irreflexive and transitive. a = ⎢ ⎥ b = ⎢ ⎥ 2 The n reltislations «u are paiiisrwise disjitdisjoint: ⎢ 0 ⎥ ⎢ 1 ⎥ i « j and i « j imply that u = v. ⎢ ⎥ ⎢ ⎥ u v ⎣ 2 ⎦ ⎣−1⎦ 3 If i ≠ j, there is a unique integer u such that 1 ≤ u ≤ n and exactly one of the following two conditions We say that a is lexicographically less than b holds: i « j or j « i. at level 3, a p 3 b, or simply tha t a p b. u u Both a and b are lexicographically positive 4 i «u j and j «v k together imply that i «w k, where w = min (u,v). because 0 p a, and 0 p b. Lecture 5 15-745 © 2008 69 Lecture 5 15-745 © 2008 70 Data Dependence in Loops Data Dependence in Loops An Example Find the depend ence rel ati ons due to the array X in the program (S1) for i = 2 to S2 below: 9 do (1,3) (S2) X[i] = Y[i] + Z[i] (S1) for i = 2 to 9 do S3 (S ) X[i] = Y[i] + Z[i] (S3) A[i] = 2 X[i-1] + 1 Data dependence graph for statements in a loop (S3) A[i] = X[i-1] + 1 (S4) end for (1,3) := iteration distance is 1, latency is 3. (S4) end for i = 2 i = 3 i = 4 Solution To find the data dependence relations in a simple loop, we can unroll (s2) X[2]=Y[2]+Z[2] X[3]=Y[3]+Z[3] X[4]=Y[4]+Z[4] (s3) A[2]=X[1]+1 A[3]=X[2]+1 A[4]=X[3]+1 the loop and see which statement instances depend on which others: i = 2 i = 3 i = 4 There is a loop-carried, lexically forward, flow (s2) X[2]=Y[2]+Z[2] X[3] =Y[3]+Z[3] X[4]=Y[4]+Z[4] dddependence from S2 to S3. (s3) A[2]=X[1]+1 A[3] =X[2]+1 A[4]=X[3]+1 Lecture 5 15-745 © 2008 From Wolfe 71 Lecture 5 15-745 © 2008 From Wolfe 72