Control Flow Optimization COS 598C

Control Flow Optimization COS 598C

Reducible Flow Graphs • A flow graph is reducibleif and only if we can partition the edges into 2 disjoint groups often called forward and back Lecture 4: Control Flow Optimization edges with the following properties • The forward edges form an acyclic graph in which every node can be reached from the Entry COS 598C – Advanced Compilers • The back edges consist only of edges whose destinations dominate their sources • More simply œ Take a CFG, remove all the backedges(x y where y dominates x), you should have a connected, bb1 Spyridon Triantafyllis acyclicgraph Non-reducible! Prof. David August bb2 bb3 Department of Computer Science Princeton University COS 598C - Advanced Compilers 1 Prof. David August Loop Induction Variables Back to Loops – Assembly Generation Schema • Induction variables are variables such that every time for (i=x; i<y; i+=z) { they changevalue, they are incremented/decremented by body; some constant } • Basic induction variableœ induction variable whose only while-do schema do-while schema assignments within a loop are of the form j = j+- C, where C is a constant loop: if (i >= y) goto done if (i >= y) goto done body; loop: body; • Primary induction variableœ basic induction variable that i += z; i += z; controls the loop execution (for I=0; I<100; I++), I goto loop if (i < y) goto loop (virtual register holding I) is the primary induction variable done: done: • Derived induction variableœ variable that is a linear function of a basic induction variable Question: which schema is better and why? COS 598C - Advanced Compilers 2 Prof. David August COS 598C - Advanced Compilers 3 Prof. David August Class Problem 1 (4 from last time) Loop Unrolling • Most renowned control flow opti • Replicate the body of a loop N-1 times (giving N total copies) r1 = 0 • Loop unrolled N times or Nxunrolled Identify the basic, r7 = &A primary and derived • Enable overlap of operations from Loop: Loop: r2 = r1 * 4 inductions variables in different iterations r1 = MEM[r2 + 0] this loop. r4 = r7 + 3 • Increase potential for ILP (instruction r4 = r1 * r5 r7 = r7 + 1 r6 = r4 << 2 level parallelism) r1 = load(r2) MEM[r3 + 0] = r6 r3 = load(r4) • 3 variants r2 = r2 + 1 r9 = r1 * r3 • Unroll multiple of known trip count blt r2 100 Loop r10 = r9 >> 4 • Unroll with remainder loop store (r10, r2) • While loop unroll r1 = r1 + 4 blt r1 100 Loop COS 598C - Advanced Compilers 4 Prof. David August COS 598C - Advanced Compilers 5 Prof. David August Loop Unroll – Type 1 Loop Unroll – Type 2 Counted loop Counted loop tc = final – initial All parms known Loop: Some parms unknown tc = tc / increment Loop: rem = tc % N r2 is the loop variable, r1 = MEM[r2 + 0] Loop: r1 = MEM[r2 + 0] r2 is the loop variable, fin = rem * increment Increment is 1 r4 = r1 * r5 r1 = MEM[r2 + 0] r4 = r1 * r5 Increment is ? Initial value is 0 r6 = r4 << 2 Initial value is ? r4 = r1 * r5 Final value is 100 MEM[r3 + 0] = r6 r6 = r4 << 2 Final value is ? RemLoop: r6 = r4 << 2 Trip count is 100 r2 = r2 + 1 MEM[r3 + 0] = r6 Trip count is ? r1 = MEM[r2 + 0] MEM[r3 + 0] = r6 r4 = r1 * r5 r1 = MEM[r2 + 0] r1 = MEM[r2 + 1] r6 = r4 << 2 r1 = MEM[r2 + X] r4 = r1 * r5 r4 = r1 * r5 Loop: Loop: MEM[r3 + 0] = r6 r4 = r1 * r5 r6 = r4 << 2 r6 = r4 << 2 r1 = MEM[r2 + 0] r1 = MEM[r2 + 0] r2 = r2 + X r6 = r4 << 2 MEM[r3 + 0] = r6 MEM[r3 + 0] = r6 r4 = r1 * r5 r4 = r1 * r5 blt r2 fin RemLoop MEM[r3 + 0] = r6 r2 = r2 + 1 r2 = r2 + 2 r6 = r4 << 2 r6 = r4 << 2 r2 = r2 + (N*X) blt r2 100 Loop blt r2 100 Loop MEM[r3 + 0] = r6 MEM[r3 + 0] = r6 blt r2 Y Loop r2 = r2 + 1 r2 = r2 + X blt r2 100 Loop blt r2 Y Loop Remainder loop executes Remove branch from Remove r2 increments the “leftover” iterations Unrolled loop same as Type 1, first N-1 iterations from first N-1 iterations and is guaranteed to execute and update last increment a multiple of N times COS 598C - Advanced Compilers 6 Prof. David August COS 598C - Advanced Compilers 7 Prof. David August Loop Unroll – Type 3 Loop Unroll Summary • Goal œ Enable overlap of multiple iterations to increase ILP Non-counted loop Some parms unknown Just duplicate the • Type 1 is the most effective Loop: body, none of the pointer chasing, loop r1 = MEM[r2 + 0] loop branches can • All intermediate branches removed, least code expansion var modified in a strange r4 = r1 * r5 be removed. Instead • Limited applicability way, etc. r6 = r4 << 2 they are converted into MEM[r3 + 0] = r6 conditional breaks • Type 2 is almost as effective r2 = MEM[r2 + 0] • All intermediate branches removed beq r2 0 Exit • Remainder loop is required since trip count not known at compiletime Can apply this Loop: r1 = MEM[r2 + 0] to any loop including • Need to make sure don‘t spend much time in rem loop r4 = r1 * r5 r1 = MEM[r2 + 0] a superblock or • Type 3 can be effective r4 = r1 * r5 r6 = r4 << 2 hyperblock loop ! r6 = r4 << 2 MEM[r3 + 0] = r6 • No branches eliminated MEM[r3 + 0] = r6 r2 = MEM[r2 + 0] • But operation overlap still possible bne r2 0 Loop r2 = MEM[r2 + 0] • Always applicable (most loops fall into this category!) bne r2 0 Loop Exit: • Use expected trip count to guide unroll amount COS 598C - Advanced Compilers 8 Prof. David August COS 598C - Advanced Compilers 9 Prof. David August Class Problem 2 Control Flow Optimizations for Acyclic Code • Generally quite simplistic Unroll both the outer loop and inner loop 2x • Goals • Reduce the number of dynamic branches • Make larger basic blocks for (i=0; i<100; i++) { j = i; • Reduce code size while (j < 100) { • Classic control flow optimizations A[j]--; j += 5; • Branch to unconditional branch } • Unconditional branch to branch B[i] = 0; • Branch to next basic block } • Basic block merging • Branch to same target • Branch target expansion • Unreachable code elimination COS 598C - Advanced Compilers 10 Prof. David August COS 598C - Advanced Compilers 11 Prof. David August Control Flow Optimizations (1) Control Flow Optimizations (2) 1. Branch to unconditional branch 3. Branch to next basic block Branch is unnecessary L1: if (a < b) goto L2 L1: if (a < b) goto L3 . BB1 BB1 . L1: if (a < b) goto L2 L1: L2: goto L3 L2: goto L3 may be deleted L2: BB2 L2: BB2 2. Unconditional branch to branch . L1: goto L2 L1: if (a < b) goto L3 4. Basic block merging Merge BBs when single edge between . goto L4: L2: if (a < b) goto L3 . BB1 L4: BB1 L2: if (a < b) goto L3 may be deleted L1: . L4: L1: L2: L2: . BB2 . COS 598C - Advanced Compilers 12 Prof. David August COS 598C - Advanced Compilers 13 Prof. David August Control Flow Optimizations (3) Unreachable Code Elimination 5. Branch to same target Algorithm . L1: if (a < b) goto L2 entry L1: goto L2 Mark procedure entry BB visited goto L2 to_visit = procedure entry BB while (to_visit not empty) { bb1 bb2 6. Branch target expansion current = to_visit.pop() for (each successor block of current) { stuff1 Mark successor as visited; bb3 bb4 stuff1 BB1 L1: stuff2 BB1 to_visit += successor L1: goto L2 . } . } bb5 Eliminate all unvisited blocks Which BB(s) can be deleted? L2: stuff2 L2: stuff2 BB2 BB2 . What about expanding a conditional branch? COS 598C - Advanced Compilers 14 Prof. David August COS 598C - Advanced Compilers 15 Prof. David August Class Problem 3 Regions • Region: A collection of operations that are treated as a Maximally optimize the control flow of this code single unit by the compiler • Examples L1: if (a < b) goto L11 L2: goto L7 • Basic block L3: goto L4 • Procedure L4: stuff4 • Body of a loop L5: if (c < d) goto L15 • Properties L6: goto L2 L7: if (c < d) goto L13 • Connected subgraphof operations L8: goto L12 • Control flow is the key parameter that defines regions L9: stuff 9 • Hierarchicallyorganized (sometimes) L10: if (a < c) goto L3 L11:goto L9 • Problem L12: goto L2 • Basic blocks are too small (3-5 operations) L13: stuff 13 • Hard to extract sufficient parallelism L14: if (e < f) goto L11 L15: stuff 15 • Procedure control flow too complex for many compiler xforms L16: rts • Plus only parts of a procedure are important (90/10 rule) COS 598C - Advanced Compilers 16 Prof. David August COS 598C - Advanced Compilers 17 Prof. David August Regions (2) Region Type 1 – Trace • Want • Trace- Linear collection of basic blocks that tend to execute in 10 • Intermediate sized regions with simple control flow sequence • Bigger basic blocks would be ideal !! • —Likely control flow path“ BB1 • Separate important code from less important • Acyclic (outer backedgeok) 90 80 20 • Optimize frequently executed code at the expense of the rest • Side entranceœ branch into the BB2 BB3 middle of a trace • Solution • Side exitœ branch out of the 80 20 • Define new region types that consist of multiple BBs middle of a trace BB4 • Profile information used in the identication • Compilation strategy 10 • Sequential control flow (sorta) • Compile assuming path occurs BB5 90 100% of the time 10 • Pretend the regions are basic blocks • Patch up side entrances and exits afterwards BB6 • Motivated by scheduling (i.e., trace scheduling) 10 COS 598C - Advanced Compilers 18 Prof. David August COS 598C - Advanced Compilers 19 Prof. David August Linearizing a Trace Issues With Selecting Traces • Acyclic 10 (entry count) • Cannot go past a backedge 10 • Trace length BB1 BB1 20 (side exit) • Longer = better ? 80 90 80 20 • Not always ! BB2 BB3 BB2 BB3 90 (entry/ • On-trace / off-trace transitions exit count) 80 80 20 20 (side entrance) • Maximize on-trace BB4 • Minimize off-trace BB4 10 (side exit) • Compile assuming on-trace is 10 90 BB5 100% (iesingle BB) BB5 90 • Penalty for off-trace 10 10 (side entrance) BB6 • Tradeoff (heuristic) BB6 • Length 10 (exit count) • Likelihood remain within the 10 trace COS 598C - Advanced Compilers 20 Prof.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us