COS 598C – Advanced Compilers

COS 598C – Advanced Compilers

Where are we? • Analysis • Control Flow/Predicate Lecture 12: Optimization • Dataflow • SSA • Optimization COS 598C – Advanced Compilers Prof. David August Department of Computer Science Princeton University COS 598C - Advanced Compilers 1 Prof. David August Optimization Classical Optimizations • Make the code run faster on the target processor • Operation-level œ 1 operation in isolation • My favorite topic !! • Constant folding, strength reduction • Anything goes • Dead code elimination (global, but 1 op at a time) • Look at benchmark kernels, what‘s the bottleneck?? • Local/Global œ Pairs of operations • Invent your own optimizations (easier and harder than you think) • Constant propagation • Classes of optimization • Forward copy propagation • 1. Classical(machine independent) • Backward copy propagation • Reducing operation count (redundancy elimination) • CSE • Simplifying operations • Constant combining • Generally good for any kind of machine • Operation folding • 2. Machine specific • Peephole optimizations • Loop œ Body of a loop • Take advantage of specialized hardware features • Invariant code removal • 3. ILP enhancing • Global variable migration • Increasing parallelism • Induction variable strength reduction • Possibly increase instructions • Induction variable elimination COS 598C - Advanced Compilers 2 Prof. David August COS 598C - Advanced Compilers 3 Prof. David August Caveat Static Single Assignment (SSA) • Traditional compiler class • Sophisticated implementations of optimizations, efficient algorithms • Spend entire class on 1 optimization • For this class œ Go over concepts of each optimization • What it is • When can it be applied (set of conditions that must be satisfied) COS 598C - Advanced Compilers 4 Prof. David August COS 598C - Advanced Compilers 5 Prof. David August Dominance Property of SSA Dead Code Elimination COS 598C - Advanced Compilers 6 Prof. David August COS 598C - Advanced Compilers Prof. David August Dead Code Elimination Dead Code Elimination • Remove any operation who‘s result is never consumed • Rules r1 = 3 • X can be deleted r2 = 10 • no stores or branches • DU chain empty or destregister not live r4 = r4 + 1 • This misses some dead code!! r7 = r1 * r4 • Especially in loops • Critical operation r2 = 0 r3 = r3 + 1 • store or branch operation • Any operation that does not directly or indirectly feed a r3 = r2 + r1 critical operation is dead • Trace UD chains backwards from critical operations store (r1, r3) • Any op not visited is dead COS 598C - Advanced Compilers 8 Prof. David August COS 598C - Advanced Compilers 9 Prof. David August Constant Folding Strength Reduction • Simplify 1 operation based on values of srcoperands • Constant propagation creates opportunities for this • Replace expensive ops with cheaper ones • All constant operands • Constant propagation creates opportunities for this • Evaluate the op, replace with a move • Power of 2 constants • r1 = 3 * 4 r1 = 12 • Multiply by power of 2, replace with left shift • r1 = 3 / 0 ??? Don‘t evaluate excepting ops !, what about floating-point? • r1 = r2 * 8 r1 = r2 << 3 • Evaluate conditional branch, replace with BRU or noop • Divide by power of 2, replace with right shift • if (1 < 2) gotoBB2 BRU BB2 • r1 = r2 / 4 r1 = r2 >> 2 • if (1 > 2) gotoBB2 convert to a noop • Remainder by power of 2, replace with logical and • Algebraic identities • r1 = r2 REM 16 r1 = r2 & 15 • r1 = r2 + 0, r2 œ 0, r2 | 0, r2 ^ 0, r2 << 0, r2 >> 0 • More exotic • r1 = r2 • Replace multiply by constant by sequence of shift and adds/subs • r1 = 0 * r2, 0 / r2, 0 & r2 • r1 = r2 * 6 • r1 = 0 • r100 = r2 << 2; r101 = r2 << 1; r1 = r100 + r101 • r1 = r2 * 1, r2 / 1 • r1 = r2 * 7 • r1 = r2 • r100 = r2 << 3; r1 = r100 œ r2 COS 598C - Advanced Compilers 10 Prof. David August COS 598C - Advanced Compilers 11 Prof. David August Class Problem Constant Propagation • Forward propagation of moves of the form Optimize this applying r1 = 0 • rx = L (where L is a literal) r1 = 5 1. constant folding • Maximally propagate r2 = r1 + r3 2. strength reduction • Assume no instruction encoding r4 = r1 | -1 3. dead code elimination r7 = r1 * 4 restrictions r6 = r1 • When is it legal? r1 = r1 + r2 r7 = r1 + r4 • SRC: Literal is a hard coded constant, so never a problem r3 = 8 * r6 r3 = 8 / r6 r3 = r3 + r2 • DEST: Must be available r8 = r1 + 3 • Guaranteed to reach r2 = r2 + r1 • May reach not good enough r6 = r7 * r6 r9 = r1 + r11 r1 = r1 + 1 store (r1, r3) COS 598C - Advanced Compilers 12 Prof. David August COS 598C - Advanced Compilers 13 Prof. David August Simple Constant Propagation Local Constant Propagation • Consider 2 ops, X and Y in a BB, X is before Y • 1. X is a move • 2. src1(X) is a literal r1 = 5 • 3. Y consumes dest(X) r2 = ‘_x’ • 4. There is no definition of r3 = 7 dest(X) between X and Y r4 = r4 + r1 r1 = r1 + r2 • 5. No danger betwX and Y r1 = r1 + 1 • When dest(X) is a Macro reg, r3 = 12 BRL destroys the value r8 = r1 - r2 r9 = r3 + r5 r3 = r2 + 1 r10 = r3 – r1 COS 598C - Advanced Compilers 14 Prof. David August COS 598C - Advanced Compilers 15 Prof. David August Global Constant Propagation Class Problem • Consider 2 ops, X and Y in different BBs • 1. X is a move r1 = 0 Optimize this applying r1 = 5 • 2. src1(X) is a literal r2 = 10 r2 = ‘_x’ • 3. Y consumes dest(X) 1. constant propagation 2. constant folding • 4. X is in a_in(BB(Y)) r4 = 1 3. strength reduction • 5. Dest(x) is not modified between the r7 = r1 * 4 4. dead code elimination top of BB(Y) and Y r6 = 8 r1 = r1 + r2 r7 = r1 – r2 • 6. No danger betwX and Y • When dest(X) is a Macro reg, BRL destroys r2 = 0 r3 = r4 * r6 the value r8 = r1 * r2 r3 = r2 / r6 r3 = r3 + r2 r2 = r2 + r1 r9 = r1 + r2 r6 = r7 * r6 r1 = r1 + 1 store (r1, r3) COS 598C - Advanced Compilers 16 Prof. David August COS 598C - Advanced Compilers 17 Prof. David August Forward Copy Propagation Backward Copy Propagation • Forward propagation of the RHS of • Backward propagation of the LHS moves of moves • r1 = r2 • r1 = r2 + r3 r4 = r2 + r3 • … r1 = r2 • … • r5 = r1 + r6 r5 = r4 + r6 • r4 = r1 + 1 r4 = r2 + 1 r3 = r4 r1 = r8 + r9 • … r2 = r9 + r1 • Benefits • r4 = r1 noop r4 = r2 • Reduce chain of dependences r6 = r2 + 1 r2 = 0 r6 = r3 + 1 • Rules (ops X and Y in same BB) • Eliminate the move • dest(X) is a register r9 = r1 • Rules (ops X and Y) • dest(X) not live out of BB(X) r10 = r6 r5 = r6 + 1 • X is a move • Y is a move r4 = 0 r5 = r2 + r3 • dest(Y) is a register • src1(X) is a register r8 = r2 + r7 • Y consumes dest(X) • Y consumes dest(X) • dest(Y) not consumed in (X…Y) • X.destis an available def at Y • dest(Y) not defined in (X…Y) • X.src1 is an available exprat Y • There are no uses of dest(X) after the first redefinition of dest(Y) COS 598C - Advanced Compilers 18 Prof. David August COS 598C - Advanced Compilers 19 Prof. David August CSE ± Common Subexpression Elimination Class Problem • Eliminate recomputationof an expression by reusing the previous result Optimize this applying • r1 = r2 * r3 r1 = 9 r1 = r2 * r6 r4 = 4 1. constant propagation • r100 = r1 r3 = r4 / r7 r5 = 0 2. constant folding • … r6 = 16 3. strength reduction • r4 = r2 * r3 r4 = r100 r2 = r3 * r4 4. dead code elimination 5. forward copy propagation • Benefits r8 = r2 + r5 r2 = r2 + 1 r6 = r3 * 7 6. backward copy propagation • Reduce work r9 = r3 r7 = load(r2) 7. CSE • Moves can get copy propagated r5 = r9 * r4 • Rules (ops X and Y) r3 = load(r2) r5 = r2 * r6 r10 = r3 / r6 • X and Y have the same opcode r8 = r4 / r7 store (r8, r7) • src(X) = src(Y), for all srcs r9 = r3 * 7 r11 = r2 • expr(X) is available at Y r12 = load(r11) • if X is a load, then there is no store that if op is a load, call it redundant store(r12, r3) may write to address(X) along any path load elimination rather than CSE between X and Y COS 598C - Advanced Compilers 20 Prof. David August COS 598C - Advanced Compilers 21 Prof. David August Constant Combining Operation Folding • Combine 2 dependent ops into 1 by • Combine 2 dependent ops into 1 combining the literals complex op • r1 = r2 + 4 • Classic example is MPYADD • r1 = r2 * r3 • … r1 = r2 + 4 r1 = r2 & 4 • … • r5 = r1 - 9 r5 = r2 œ 5 r3 = r1 < 0 r3 = r1 ^ -1 • r5 = r1 + r4 r5 = r2 * r3 + r4 • First op often becomes dead r2 = r3 + 6 r2 = r3 < 6 r7 = r1 – 3 • First op often becomes dead r4 = r2 == 0 • Rules (ops X and Y in same BB) r8 = r7 + 5 • Borders on machine dependent r5 = r6 << 1 • X is of the form rx +- K opti(often it is !! ) r7 = r5 + r8 • dest(X) != src1(X) • Rules (ops X and Y in same BB) • Y is of the form ry+- K • X is an arithmetic operation (comparison also ok) • dest(X) != any src(X) • Y consumes dest(X) • Y is an arithmetic operation • src1(X) not modified in (X…Y) • Y consumes dest(X) • X and Y can be merged • src(X) not modified in (X…Y) COS 598C - Advanced Compilers 22 Prof. David August COS 598C - Advanced Compilers 23 Prof. David August.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us