<<

Reducible Flow Graphs

• A flow graph is reducibleif and only if we can partition the edges into 2 disjoint groups often called forward and back Lecture 4: Optimization edges with the following properties • The forward edges form an acyclic graph in which every node can be reached from the Entry COS 598C – Advanced Compilers • The back edges consist only of edges whose destinations dominate their sources • More simply œ Take a CFG, remove all the backedges(x‰ y where y dominates x), you should have a connected, bb1 Spyridon Triantafyllis acyclicgraph Non-reducible!

Prof. David August bb2 bb3 Department of Science Princeton University

COS 598C - Advanced Compilers 1 Prof. David August Loop Induction Variables Back to Loops – Assembly Generation Schema • Induction variables are variables such that every time for (i=x; i

COS 598C - Advanced Compilers 2 Prof. David August COS 598C - Advanced Compilers 3 Prof. David August Class Problem 1 (4 from last time) Loop Unrolling

• Most renowned control flow opti • Replicate the body of a loop N-1 times (giving N total copies) r1 = 0 • Loop unrolled N times or Nxunrolled Identify the basic, r7 = &A primary and derived • Enable overlap of operations from Loop: Loop: r2 = r1 * 4 inductions variables in different iterations r1 = MEM[r2 + 0] this loop. r4 = r7 + 3 • Increase potential for ILP (instruction r4 = r1 * r5 r7 = r7 + 1 r6 = r4 << 2 level parallelism) r1 = load(r2) MEM[r3 + 0] = r6 r3 = load(r4) • 3 variants r2 = r2 + 1 r9 = r1 * r3 • Unroll multiple of known trip count blt r2 100 Loop r10 = r9 >> 4 • Unroll with remainder loop store (r10, r2) • unroll r1 = r1 + 4 blt r1 100 Loop

COS 598C - Advanced Compilers 4 Prof. David August COS 598C - Advanced Compilers 5 Prof. David August Loop Unroll – Type 1 Loop Unroll – Type 2

Counted loop Counted loop tc = final – initial All parms known Loop: Some parms unknown tc = tc / increment Loop: rem = tc % N r2 is the loop variable, r1 = MEM[r2 + 0] Loop: r1 = MEM[r2 + 0] r2 is the loop variable, fin = rem * increment Increment is 1 r4 = r1 * r5 r1 = MEM[r2 + 0] r4 = r1 * r5 Increment is ? Initial value is 0 r6 = r4 << 2 Initial value is ? r4 = r1 * r5 Final value is 100 MEM[r3 + 0] = r6 r6 = r4 << 2 Final value is ? RemLoop: r6 = r4 << 2 Trip count is 100 r2 = r2 + 1 MEM[r3 + 0] = r6 Trip count is ? r1 = MEM[r2 + 0] MEM[r3 + 0] = r6 r4 = r1 * r5 r1 = MEM[r2 + 0] r1 = MEM[r2 + 1] r6 = r4 << 2 r1 = MEM[r2 + X] r4 = r1 * r5 r4 = r1 * r5 Loop: Loop: MEM[r3 + 0] = r6 r4 = r1 * r5 r6 = r4 << 2 r6 = r4 << 2 r1 = MEM[r2 + 0] r1 = MEM[r2 + 0] r2 = r2 + X r6 = r4 << 2 MEM[r3 + 0] = r6 MEM[r3 + 0] = r6 r4 = r1 * r5 r4 = r1 * r5 blt r2 fin RemLoop MEM[r3 + 0] = r6 r2 = r2 + 1 r2 = r2 + 2 r6 = r4 << 2 r6 = r4 << 2 r2 = r2 + (N*X) blt r2 100 Loop blt r2 100 Loop MEM[r3 + 0] = r6 MEM[r3 + 0] = r6 blt r2 Y Loop r2 = r2 + 1 r2 = r2 + X blt r2 100 Loop blt r2 Y Loop Remainder loop executes Remove branch from Remove r2 increments the “leftover” iterations Unrolled loop same as Type 1, first N-1 iterations from first N-1 iterations and is guaranteed to execute and update last increment a multiple of N times

COS 598C - Advanced Compilers 6 Prof. David August COS 598C - Advanced Compilers 7 Prof. David August Loop Unroll – Type 3 Loop Unroll Summary

• Goal œ Enable overlap of multiple iterations to increase ILP Non-counted loop Some parms unknown Just duplicate the • Type 1 is the most effective Loop: body, none of the pointer chasing, loop r1 = MEM[r2 + 0] loop branches can • All intermediate branches removed, least code expansion var modified in a strange r4 = r1 * r5 be removed. Instead • Limited applicability way, etc. r6 = r4 << 2 they are converted into MEM[r3 + 0] = r6 conditional breaks • Type 2 is almost as effective r2 = MEM[r2 + 0] • All intermediate branches removed beq r2 0 Exit • Remainder loop is required since trip count not known at compiletime Can apply this Loop: r1 = MEM[r2 + 0] to any loop including • Need to make sure don‘ spend much time in rem loop r4 = r1 * r5 r1 = MEM[r2 + 0] a superblock or • Type 3 can be effective r4 = r1 * r5 r6 = r4 << 2 hyperblock loop ! r6 = r4 << 2 MEM[r3 + 0] = r6 • No branches eliminated MEM[r3 + 0] = r6 r2 = MEM[r2 + 0] • But operation overlap still possible bne r2 0 Loop r2 = MEM[r2 + 0] • Always applicable (most loops fall into this category!) bne r2 0 Loop Exit: • Use expected trip count to guide unroll amount

COS 598C - Advanced Compilers 8 Prof. David August COS 598C - Advanced Compilers 9 Prof. David August Class Problem 2 Control Flow Optimizations for Acyclic Code

• Generally quite simplistic Unroll both the outer loop and inner loop 2x • Goals • Reduce the number of dynamic branches • Make larger basic blocks for (i=0; i<100; i++) { j = i; • Reduce code size while (j < 100) { • Classic control flow optimizations A[j]--; j += 5; • Branch to unconditional branch } • Unconditional branch to branch B[i] = 0; • Branch to next basic } • Basic block merging • Branch to same target • Branch target expansion • elimination

COS 598C - Advanced Compilers 10 Prof. David August COS 598C - Advanced Compilers 11 Prof. David August Control Flow Optimizations (1) Control Flow Optimizations (2)

1. Branch to unconditional branch 3. Branch to next basic block Branch is unnecessary

L1: if (a < b) goto L2 L1: if (a < b) goto L3 ...... BB1 BB1 ...... L1: if (a < b) goto L2 L1: L2: goto L3 L2: goto L3 ‰ may be deleted

L2: BB2 L2: BB2 2. Unconditional branch to branch ......

L1: goto L2 L1: if (a < b) goto L3 4. Basic block merging Merge BBs when single edge between . . . goto L4: L2: if (a < b) goto L3 ...... BB1 L4: BB1 L2: if (a < b) goto L3 ‰ may be deleted L1: . . . L4: L1: L2: L2: . . . BB2 . . .

COS 598C - Advanced Compilers 12 Prof. David August COS 598C - Advanced Compilers 13 Prof. David August Control Flow Optimizations (3) Unreachable Code Elimination

5. Branch to same target ...... L1: if (a < b) goto L2 entry L1: goto L2 Mark procedure entry BB visited goto L2 to_visit = procedure entry BB while (to_visit not empty) { bb1 bb2 6. Branch target expansion current = to_visit.pop() for (each successor block of current) { stuff1 Mark successor as visited; bb3 bb4 stuff1 BB1 L1: stuff2 BB1 to_visit += successor L1: goto L2 . . . } ...... } bb5 Eliminate all unvisited blocks Which BB(s) can be deleted? L2: stuff2 L2: stuff2 BB2 BB2 ......

What about expanding a conditional branch?

COS 598C - Advanced Compilers 14 Prof. David August COS 598C - Advanced Compilers 15 Prof. David August Class Problem 3 Regions

• Region: A collection of operations that are treated as a Maximally optimize the control flow of this code single unit by the compiler • Examples L1: if (a < b) goto L11 L2: goto L7 • Basic block L3: goto L4 • Procedure L4: stuff4 • Body of a loop L5: if (c < d) goto L15 • Properties L6: goto L2 L7: if (c < d) goto L13 • Connected subgraphof operations L8: goto L12 • Control flow is the key parameter that defines regions L9: stuff 9 • Hierarchicallyorganized (sometimes) L10: if (a < c) goto L3 L11:goto L9 • Problem L12: goto L2 • Basic blocks are too small (3-5 operations) L13: stuff 13 • Hard to extract sufficient parallelism L14: if (e < f) goto L11 L15: stuff 15 • Procedure control flow too complex for many compiler xforms L16: rts • Plus only parts of a procedure are important (90/10 rule)

COS 598C - Advanced Compilers 16 Prof. David August COS 598C - Advanced Compilers 17 Prof. David August Regions (2) Region Type 1 – Trace

• Want • Trace- Linear collection of basic blocks that tend to execute in 10 • Intermediate sized regions with simple control flow sequence • Bigger basic blocks would be ideal !! • —Likely control flow path“ BB1 • Separate important code from less important • Acyclic (outer backedgeok) 90 80 20 • Optimize frequently executed code at the expense of the rest • Side entranceœ branch into the BB2 BB3 middle of a trace • Solution • Side exitœ branch out of the 80 20 • Define new region types that consist of multiple BBs middle of a trace BB4 • Profile information used in the identication • Compilation strategy 10 • Sequential control flow (sorta) • Compile assuming path occurs BB5 90 100% of the time 10 • Pretend the regions are basic blocks • Patch up side entrances and exits afterwards BB6 • Motivated by (i.e., trace scheduling) 10

COS 598C - Advanced Compilers 18 Prof. David August COS 598C - Advanced Compilers 19 Prof. David August Linearizing a Trace Issues With Selecting Traces

• Acyclic 10 (entry count) • Cannot go past a backedge 10 • Trace length BB1 BB1 20 (side exit) • Longer = better ? 80 90 80 20 • Not always ! BB2 BB3 BB2 BB3 90 (entry/ • On-trace / off-trace transitions exit count) 80 80 20 20 (side entrance) • Maximize on-trace BB4 • Minimize off-trace BB4 10 (side exit) • Compile assuming on-trace is 10 90 BB5 100% (iesingle BB) BB5 90 • Penalty for off-trace 10 10 (side entrance) BB6 • Tradeoff (heuristic) BB6 • Length 10 (exit count) • Likelihood remain within the 10 trace

COS 598C - Advanced Compilers 20 Prof. David August COS 598C - Advanced Compilers 21 Prof. David August Best Successor/Predecessor Trace Selection Algorithm • Node weight vsedge weight i = 0; • edge more accurate best_successor_of(BB) mark all BBs unvisited • THRESHOLD while (there are unvisited nodes) do e = control flow edge with highest seed = unvisited BB with largest execution freq • controls off-trace probability probability leaving BB trace[i] += seed • 60-70% found best if (e is a backedge) then mark seed visited return 0 current = seed • Notes on this algorithm endif /* Grow trace forward */ • BB only allowed in 1 trace if (probability(e) <= THRESHOLD) then return 0 while (1) do • Cumulative probability ignored next = best_successor_of(current) endif if (next == 0) then break • Min weight for seed to be chose d = destination of e trace[i] += next (ieexecuted 100 times) if (d is visited) then mark next visited return 0 current = next endif endwhile return d /* Grow trace backward analogously */ endprocedure i++ endwhile

COS 598C - Advanced Compilers 22 Prof. David August COS 598C - Advanced Compilers 23 Prof. David August Class Problem 4 Free-form regions

• Choose a —related“ control- flow subgraph 100 • profile-guided selection Find the traces BB1 0.99 • other criteria? 0.01 20 80 0.1 • Optimize as a unit BB2 BB3 0.9 • radically reduced compile time 80 20 0.1 • minimal performance loss BB4 51 49 0.9 • … if regions are selected right!

BB5 BB3 • Found at: 10 41 • IMPACT (earlier versions) 49 • ORC (for scheduling only) 450 BB6 BB6 41 • Open64 10 • Area still open for BB6 experimentation

COS 598C - Advanced Compilers 24 Prof. David August COS 598C - Advanced Compilers 25 Prof. David August Intervals & Structural Analysis Intervals and Structural Analysis: Example

do-loop while-loop if-then if-then-else etc.

• Structural regions correspond to source-language structures • Structural regions can be nested! • Intervals: Structural regions with less detail (loops vs. non-loops) • Useful for dataflow analysis • Could they be used as compilation regions?

COS 598C - Advanced Compilers 26 Prof. David August COS 598C - Advanced Compilers 27 Prof. David August