Where are we?

• Analysis • Control Flow/Predicate Lecture 12: Optimization • Dataflow • SSA • Optimization COS 598C – Advanced

Prof. David August Department of Computer Science Princeton University

COS 598C - Advanced Compilers 1 Prof. David August

Optimization Classical Optimizations

• Make the code run faster on the target processor • Operation-level œ 1 operation in isolation • My favorite topic !! • , • Anything goes • (global, but 1 op at a time) • Look at benchmark kernels, what‘s the bottleneck?? • Local/Global œ Pairs of operations • Invent your own optimizations (easier and harder than you think) • Constant propagation • Classes of optimization • Forward copy propagation • 1. Classical(machine independent) • Backward copy propagation • Reducing operation count (redundancy elimination) • CSE • Simplifying operations • Constant combining • Generally good for any kind of machine • Operation folding • 2. Machine specific • Peephole optimizations • Loop œ Body of a loop • Take advantage of specialized hardware features • Invariant code removal • 3. ILP enhancing • Global variable migration • Increasing parallelism • strength reduction • Possibly increase instructions • Induction variable elimination

COS 598C - Advanced Compilers 2 Prof. David August COS 598C - Advanced Compilers 3 Prof. David August Caveat Static Single Assignment (SSA)

• Traditional class • Sophisticated implementations of optimizations, efficient algorithms • Spend entire class on 1 optimization • For this class œ Go over concepts of each optimization • What it is • When can it be applied (set of conditions that must be satisfied)

COS 598C - Advanced Compilers 4 Prof. David August COS 598C - Advanced Compilers 5 Prof. David August

Dominance Property of SSA Dead Code Elimination

COS 598C - Advanced Compilers 6 Prof. David August COS 598C - Advanced Compilers 7 Prof. David August Dead Code Elimination Dead Code Elimination

• Remove any operation who‘s result is never consumed • Rules r1 = 3 • X can be deleted r2 = 10 • no stores or branches • DU chain empty or destregister not live r4 = r4 + 1 • This misses some dead code!! r7 = r1 * r4 • Especially in loops • Critical operation r2 = 0 r3 = r3 + 1 • store or branch operation • Any operation that does not directly or indirectly feed a r3 = r2 + r1 critical operation is dead • Trace UD chains backwards from critical operations store (r1, r3) • Any op not visited is dead

COS 598C - Advanced Compilers 8 Prof. David August COS 598C - Advanced Compilers 9 Prof. David August

Constant Folding Strength Reduction

• Simplify 1 operation based on values of srcoperands • Constant propagation creates opportunities for this • Replace expensive ops with cheaper ones • All constant operands • Constant propagation creates opportunities for this • Evaluate the op, replace with a move • Power of 2 constants • r1 = 3 * 4 ‰ r1 = 12 • Multiply by power of 2, replace with left shift • r1 = 3 / 0 ‰ ??? Don‘t evaluate excepting ops !, what about floating-point? • r1 = r2 * 8 ‰ r1 = r2 << 3 • Evaluate conditional branch, replace with BRU or noop • Divide by power of 2, replace with right shift • if (1 < 2) gotoBB2 ‰ BRU BB2 • r1 = r2 / 4 ‰ r1 = r2 >> 2 • if (1 > 2) gotoBB2 ‰ convert to a noop • Remainder by power of 2, replace with logical and • Algebraic identities • r1 = r2 REM 16 ‰ r1 = r2 & 15 • r1 = r2 + 0, r2 œ 0, r2 | 0, r2 ^ 0, r2 << 0, r2 >> 0 • More exotic • r1 = r2 • Replace multiply by constant by sequence of shift and adds/subs • r1 = 0 * r2, 0 / r2, 0 & r2 • r1 = r2 * 6 • r1 = 0 • r100 = r2 << 2; r101 = r2 << 1; r1 = r100 + r101 • r1 = r2 * 1, r2 / 1 • r1 = r2 * 7 • r1 = r2 • r100 = r2 << 3; r1 = r100 œ r2

COS 598C - Advanced Compilers 10 Prof. David August COS 598C - Advanced Compilers 11 Prof. David August Class Problem Constant Propagation

• Forward propagation of moves of the form Optimize this applying r1 = 0 • rx = L (where L is a literal) r1 = 5 1. constant folding • Maximally propagate r2 = r1 + r3 2. strength reduction • Assume no instruction encoding r4 = r1 | -1 3. dead code elimination r7 = r1 * 4 restrictions r6 = r1 • When is it legal? r1 = r1 + r2 r7 = r1 + r4 • SRC: Literal is a hard coded constant, so never a problem r3 = 8 * r6 r3 = 8 / r6 r3 = r3 + r2 • DEST: Must be available r8 = r1 + 3 • Guaranteed to reach r2 = r2 + r1 • May reach not good enough r6 = r7 * r6 r9 = r1 + r11 r1 = r1 + 1

store (r1, r3)

COS 598C - Advanced Compilers 12 Prof. David August COS 598C - Advanced Compilers 13 Prof. David August

Simple Constant Propagation Local Constant Propagation

• Consider 2 ops, X and Y in a BB, X is before Y • 1. X is a move • 2. src1(X) is a literal r1 = 5 • 3. Y consumes dest(X) r2 = ‘_x’ • 4. There is no definition of r3 = 7 dest(X) between X and Y r4 = r4 + r1 r1 = r1 + r2 • 5. No danger betwX and Y r1 = r1 + 1 • When dest(X) is a Macro reg, r3 = 12 BRL destroys the value r8 = r1 - r2 r9 = r3 + r5 r3 = r2 + 1 r10 = r3 – r1

COS 598C - Advanced Compilers 14 Prof. David August COS 598C - Advanced Compilers 15 Prof. David August Global Constant Propagation Class Problem

• Consider 2 ops, X and Y in different BBs • 1. X is a move r1 = 0 Optimize this applying r1 = 5 • 2. src1(X) is a literal r2 = 10 r2 = ‘_x’ • 3. Y consumes dest(X) 1. constant propagation 2. constant folding • 4. X is in a_in(BB(Y)) r4 = 1 3. strength reduction • 5. Dest(x) is not modified between the r7 = r1 * 4 4. dead code elimination top of BB(Y) and Y r6 = 8 r1 = r1 + r2 r7 = r1 – r2 • 6. No danger betwX and Y • When dest(X) is a Macro reg, BRL destroys r2 = 0 r3 = r4 * r6 the value r8 = r1 * r2 r3 = r2 / r6 r3 = r3 + r2

r2 = r2 + r1 r9 = r1 + r2 r6 = r7 * r6 r1 = r1 + 1

store (r1, r3)

COS 598C - Advanced Compilers 16 Prof. David August COS 598C - Advanced Compilers 17 Prof. David August

Forward Copy Propagation Backward Copy Propagation

• Forward propagation of the RHS of • Backward propagation of the LHS moves of moves • r1 = r2 • r1 = r2 + r3 ‰ r4 = r2 + r3 • … r1 = r2 • … • r5 = r1 + r6 ‰ r5 = r4 + r6 • r4 = r1 + 1 ‰ r4 = r2 + 1 r3 = r4 r1 = r8 + r9 • … r2 = r9 + r1 • Benefits • r4 = r1 ‰ noop r4 = r2 • Reduce chain of dependences r6 = r2 + 1 r2 = 0 r6 = r3 + 1 • Rules (ops X and Y in same BB) • Eliminate the move • dest(X) is a register r9 = r1 • Rules (ops X and Y) • dest(X) not live out of BB(X) r10 = r6 r5 = r6 + 1 • X is a move • Y is a move r4 = 0 r5 = r2 + r3 • dest(Y) is a register • src1(X) is a register r8 = r2 + r7 • Y consumes dest(X) • Y consumes dest(X) • dest(Y) not consumed in (X…Y) • X.destis an available def at Y • dest(Y) not defined in (X…Y) • X.src1 is an available exprat Y • There are no uses of dest(X) after the first redefinition of dest(Y)

COS 598C - Advanced Compilers 18 Prof. David August COS 598C - Advanced Compilers 19 Prof. David August CSE ± Common Subexpression Elimination Class Problem

• Eliminate recomputationof an expression by reusing the previous result Optimize this applying • r1 = r2 * r3 r1 = 9 r1 = r2 * r6 r4 = 4 1. constant propagation • ‰ r100 = r1 r3 = r4 / r7 r5 = 0 2. constant folding • … r6 = 16 3. strength reduction • r4 = r2 * r3 ‰ r4 = r100 r2 = r3 * r4 4. dead code elimination 5. forward copy propagation • Benefits r8 = r2 + r5 r2 = r2 + 1 r6 = r3 * 7 6. backward copy propagation • Reduce work r9 = r3 r7 = load(r2) 7. CSE • Moves can get copy propagated r5 = r9 * r4 • Rules (ops X and Y) r3 = load(r2) r5 = r2 * r6 r10 = r3 / r6 • X and Y have the same opcode r8 = r4 / r7 store (r8, r7) • src(X) = src(Y), for all srcs r9 = r3 * 7 r11 = r2 • expr(X) is available at Y r12 = load(r11) • if X is a load, then there is no store that if op is a load, call it redundant store(r12, r3) may write to address(X) along any path load elimination rather than CSE between X and Y

COS 598C - Advanced Compilers 20 Prof. David August COS 598C - Advanced Compilers 21 Prof. David August

Constant Combining Operation Folding

• Combine 2 dependent ops into 1 by • Combine 2 dependent ops into 1 combining the literals complex op • r1 = r2 + 4 • Classic example is MPYADD • r1 = r2 * r3 • … r1 = r2 + 4 r1 = r2 & 4 • … • r5 = r1 - 9 ‰ r5 = r2 œ 5 r3 = r1 < 0 r3 = r1 ^ -1 • r5 = r1 + r4 ‰ r5 = r2 * r3 + r4 • First op often becomes dead r2 = r3 + 6 r2 = r3 < 6 r7 = r1 – 3 • First op often becomes dead r4 = r2 == 0 • Rules (ops X and Y in same BB) r8 = r7 + 5 • Borders on machine dependent r5 = r6 << 1 • X is of the form rx +- K opti(often it is !! ) r7 = r5 + r8 • dest(X) != src1(X) • Rules (ops X and Y in same BB) • Y is of the form ry+- K • X is an arithmetic operation (comparison also ok) • dest(X) != any src(X) • Y consumes dest(X) • Y is an arithmetic operation • src1(X) not modified in (X…Y) • Y consumes dest(X) • X and Y can be merged • src(X) not modified in (X…Y)

COS 598C - Advanced Compilers 22 Prof. David August COS 598C - Advanced Compilers 23 Prof. David August