Outline

• Unreachable-Code Elimination • Straightening • If and Loop Simplifications • and Unswitching Control-Flow and Low-Level • Branch Optimizations Optimizations • Tail Merging (Cross Jumping) • Conditional Moves • Dead-Code Elimination • Branch Prediction • • Machine Idioms & Instruction Combining

Kostis Sagonas 2 Spring 2012

Unreachable-Code Elimination Unreachable-Code Elimination

Unreachable code is code that cannot be executed, entry entry regardless of the input data = a + b c = a + b – code that is never for any input data d = c d = c f = a + c e > c e > c – code that has become unreachable due to a previous g = e transformation a = e +c f = c – g f = c – g b = c + 1 b = c + 1 elimination removes this code h = e + 1 d = 4 * a d = 4 * a e < a e = d – 7 e = d – 7 – reduces the code space f = e + 2 f = e + 2 – improves instruction-cache utilization – enables other control-flow transformations exit exit

Kostis Sagonas 3 Spring 2012 Kostis Sagonas 4 Spring 2012

Outline Straightening

• Unreachable-Code Elimination Straightening is applicable to pairs of basic blocks such • Straightening that the first has no successors other than the second • If and Loop Simplifications and the second has no predecessors other than the first. • Loop Inversion and Unswitching • Branch Optimizations • Tail Merging (Cross Jumping) … … • Conditional Moves a = b + c a = b + c • Dead-Code Elimination b = c * 2 • Branch Prediction b = c * 2 a = a + 1 • Peephole Optimization a = a + 1 c < 0 • Machine Idioms & Instruction Combining c < 0

Kostis Sagonas 5 Spring 2012 Kostis Sagonas 6 Spring 2012

1 Straightening Example Outline

Straightening in the presence of fall-throughs is tricky... • Unreachable-Code Elimination • Straightening L1: … L1: … • If and Loop Simplifications a = b + c a = b + c • Loop Inversion and Unswitching goto L2 b = c * 2 • Branch Optimizations a = a + 1 • Tail Merging (Cross Jumping) L6: … if c < 0 goto L3 goto L4 goto L5 • Conditional Moves • Dead-Code Elimination L2: b = c * 2 L6: … • Branch Prediction a = a + 1 goto L4 • Peephole Optimization if c < 0 goto L3 • Machine Idioms & Instruction Combining L5: … L5: …

Kostis Sagonas 7 Spring 2012 Kostis Sagonas 8 Spring 2012

If Simplifications If Simplification Example

If simplifications apply to conditional constructs a > d a > d Y N Y N one or both of whose branches are empty: b = a – if either the then or the else part of an if-construct is c = 4 * b b = a c = 4 * b empty, the corresponding branch can be eliminated (a >= d) or bool YN – one branch of an if with a constant-valued condition d = b d = c d = b can also be eliminated – we can also simplify ifs whose condition, C, occurs e = a + b e = a + b in the of a condition that implies C (and none of the condition’s operands has changed value) … …

Kostis Sagonas 9 Spring 2012 Kostis Sagonas 10 Spring 2012

Loop Simplifications Loop Simplification Example

• A loop whose body is empty can be eliminated s = 0 if the iteration-control code has no side-effects i = 0 s = 0 i = i + 1 (Side-effects might be simple enough that they can be i = 0 s = s + i i = 4 replaced with non-looping code at ) L1: if i >= 4 goto L2 i = i + 1 i = i + 1 s = 10 s = s + i L2: … s = s + i i = i + 1 • If number of iterations is small enough, loops goto L1 s = s + i can be unrolled into branchless code and the L2: … i = i + 1 s = s + i loop body can be executed at compile time L2: …

Kostis Sagonas 11 Spring 2012 Kostis Sagonas 12 Spring 2012

2 Outline Loop Inversion

• Unreachable-Code Elimination Loop inversion transforms a while loop into a • Straightening repeat loop (i.e. moves the loop-closing test • If and Loop Simplifications from before the loop to after it). • Loop Inversion and Unswitching • Branch Optimizations – Has the advantage that only one branch instruction • Tail Merging (Cross Jumping) needs to be executed to close the loop. • Conditional Moves – Requires that we determine that the loop is entered • Dead-Code Elimination at least once! • Branch Prediction • Peephole Optimization • Machine Idioms & Instruction Combining

Kostis Sagonas 13 Spring 2012 Kostis Sagonas 14 Spring 2012

Loop Inversion Example 1 Loop Inversion Example 2 for (i = 0; i < 100; i++) { Loop bounds a[i] = i + 1; are known } if (k >= n) goto L for (i = k; i < n; i++) { i = k; a[i] = i + 1; do { } a[i] = i + 1; i++; i = 0; i = 0; Loop bounds } while (i < n) while (i < 100) { do { are unknown L: a[i] = i + 1; a[i] = i + 1; i++; i++; } } while (i < 100)

Kostis Sagonas 15 Spring 2012 Kostis Sagonas 16 Spring 2012

Unswitching Unswitching Example

Unswitching is a control-flow transformation that moves loop-invariant conditional branches out if (k == 2) { of loops for (i = 1; i < 100; i++) { for (i = 1; i < 100; i++) { if (k == 2) { if (a[i] > 0) if (k == 2 && a[i] > 0) for (i = 1; i < 100; i++) { for (i = 1; i < 100; i++) a[i] = a[i] + 1; a[i] = a[i] + 1; if (k == 2) a[i] = a[i] + 1; } } a[i] = a[i] + 1; } else { } else { else for (i = 1; i < 100; i++) i = 100; a[i] = a[i] – 1; a[i] = a[i] – 1; } } }

Kostis Sagonas 17 Spring 2012 Kostis Sagonas 18 Spring 2012

3 Outline Branch Optimizations

• Unreachable-Code Elimination Branches to branches are remarkably common! • Straightening – An unconditional branch to an unconditional branch can be • If and Loop Simplifications replaced by a branch to the latter’s target • Loop Inversion and Unswitching – A conditional branch to an unconditional branch can be • Branch Optimizations replaced by the corresponding conditional branch to the • Tail Merging (Cross Jumping) latter branch’s target • Conditional Moves – An unconditional branch to a conditional branch can be • Dead-Code Elimination replaced by a copy of the conditional branch • Branch Prediction – A conditional branch to a conditional branch can be replaced • Peephole Optimization by a conditional branch with the former’s test and the latter’s target as long as the latter condition is true whenever the • Machine Idioms & Instruction Combining former one is

Kostis Sagonas 19 Spring 2012 Kostis Sagonas 20 Spring 2012

Branch Optimization Examples Eliminating Useless Control-Flow

if a == 0 goto L1 if a == 0 goto L2 The Problem: … … – After optimization, the CFG can contain empty blocks L1: if a >= 0 goto L2 L1: if a >= 0 goto L2 … … – “Empty” blocks still end with either a branch or jump L2: … L2: … – Produces jump to jump, which wastes time and space

goto L1 The Algorithm: (Clean) L1: … L1: … – Use four distinct transformations if a == 0 goto L1 – Apply them in a carefully selected order goto L2 if a != 0 goto L2 – Iterate until done L1: … L1: …

Kostis Sagonas 21 Spring 2012 Kostis Sagonas 22 Spring 2012

Eliminating Useless Control-Flow Eliminating Useless Control-Flow

Transformation 1 Both sides of branch target B2 Transformation 2 Merging an empty block – Neither block must be empty – Empty B1 ends with a jump B1 B1 – Replace it with a jump to B1 empty – Coalesce B1 and B2 B1 – Simple rewrite of the last – Move B1’s incoming edges – Eliminates extraneous jump operation in B1 B2 B2 B2 B2 – Faster, smaller code Branch, not a jump How does this happen? How does this happen? Eliminating redundant branches – By rewriting other branches Eliminating empty blocks – By eliminating operations in B1 How do we recognize it? How do we recognize it? – Check each branch – Test for empty block

Kostis Sagonas 23 Spring 2012 Kostis Sagonas 24 Spring 2012

4 Eliminating Useless Control-Flow Eliminating Useless Control-Flow

Transformation 3 Coalescing blocks Transformation 4 Jump to a branch – Neither block must be empty – B1 ends with a jump, B2 is B1 – B1 ends with a jump to B2 B1 B1 empty – Eliminates pointless jump B1 – B2 has one predecessor – Combine the two blocks – Copy branch into end of B1 B2 empty empty – Might make B2 unreachable B2 – Eliminates a jump B2 B2

How does this happen? How does this happen? Eliminating non-empty blocks – By eliminating operations in B1 – By simplifying edges out of B1 Hoisting branches How do we recognize it? from empty blocks How do we recognize it? – Check target of jump – Jump to empty block

Kostis Sagonas 25 Spring 2012 Kostis Sagonas 26 Spring 2012

Eliminating Useless Control-Flow Tail Merging (Cross Jumping)

Putting the transformations together Tail merging applies to basic blocks whose last few instructions – Process the blocks in postorder are identical and that continue to the same location and replaces the matching instructions of one block with a branch to the • Clean up Bi’s successors before Bi corresponding point in the other. • Simplifies implementation and understanding – At each node, apply transformations in a fixed order … … • Eliminate redundant branch r1 = r2 + r3 r1 = r2 + r3 r4 = r3 shl 2 goto L2 • Eliminate empty block r2 = r2 + 1 … • Merge block with successors r2 = r4 – r2 r5 = r4 – 6 • Hoist branch from empty successor goto L1 L2: r4 = r3 shl 2 – May need to iterate … r2 = r2 + 1 r5 = r4 – 6 r2 = r4 – r2 •Postorder ⇒ unprocessed successors along back edges r4 = r3 shl 2 L1: … • Can bound iterations, but deriving a tight bound is hard r2 = r2 + 1 • Must recompute postorder between iterations r2 = r4 – r2 L1: … Kostis Sagonas 27 Spring 2012 Kostis Sagonas 28 Spring 2012

Outline Conditional Moves

• Unreachable-Code Elimination Conditional moves are instructions that copy a source to • Straightening a target if and only if a specified condition is satisfied • If and Loop Simplifications – available in several modern architectures • Loop Inversion and Unswitching – are used to replace simple branching code sequences with • Branch Optimizations non-branching code • Tail Merging (Cross Jumping) • Conditional Moves if a > b goto L1 • Dead-Code Elimination max = b t1 = a > b • Branch Prediction goto L2 max = b • Peephole Optimization L1: max = a max = (t1) a • Machine Idioms & Instruction Combining L2: …

Kostis Sagonas 29 Spring 2012 Kostis Sagonas 30 Spring 2012

5 Conditional Moves help Outline

for (i = 1; i <= n; i++) { for (i = 1; i <= n; i++) { • Unreachable-Code Elimination x = a[i]; x = a[i]; • Straightening if (x>0) u = z * x; w = z * x; u = b[i]; • If and Loop Simplifications else u = b[i]; u = (x>0) w; • Loop Inversion and Unswitching s = s + u; s = s + u; • Branch Optimizations } } • Tail Merging (Cross Jumping) • By using conditional move instructions, we can unroll • Conditional Moves loops containing internal control-flow and end up with • Dead-Code Elimination “straight-line” code • Branch Prediction – helps because is then more effective • Peephole Optimization – works if the two instruction blocks of the if are small in size • Machine Idioms & Instruction Combining

Kostis Sagonas 31 Spring 2012 Kostis Sagonas 32 Spring 2012

Dead-Code Elimination Dead-Code Elimination Example entry entry A variable is dead if it is not used on any path from the location in k is only used the code where it is defined to the exit point of the routine. i = 1 j = 2 to define new i = 1 An instruction is dead if it computes values that are not used on j = 2 k = 3 values for itself! any executable path leading from the instruction. n = 4 n = 4

• Many compiler optimizations create dead code as part of the i = i + j i = i + j division of labor principle: keep each optimization phase as l = j + 1 l = j + 1 simple as possible (to make it easy to implement and maintain) j = j + 2 j = j + 2 and leave it to other passes to clean up the mess… j > n j > n • Detecting dead code local to a procedure is simple • Interprocedural analysis is required to detect dead variables k = k – j return j + i print(l) return j + i with wider visibility print(l)

Kostis Sagonas 33 Spring 2012 Kostis Sagonas 34 Spring 2012

Outline Branch Prediction

• Unreachable-Code Elimination Branch prediction refers to predicting whether a • Straightening conditional branch transfers flow of control or not • If and Loop Simplifications Modern machines rely on branch prediction to make the • Loop Inversion and Unswitching right guess on which instructions to fetch after a branch • Branch Optimizations • Tail Merging (Cross Jumping) Static prediction: the compiler predicts which way the • Conditional Moves branch is likely to go and places its prediction in the • Dead-Code Elimination branch instruction itself • Branch Prediction Dynamic prediction: the hardware remembers for each • Peephole Optimization recently executed branch, which way it went the • Machine Idioms & Instruction Combining previous time and predicts that it will go the same way

Kostis Sagonas 35 Spring 2012 Kostis Sagonas 36 Spring 2012

6 Static Branch Prediction Static vs. Dynamic Branch Prediction

A simple rule used by many machines: Perfect static production results in a dynamic misprediction rate of about 9% for C and about 6% for Backward branches are assumed to be taken, Fortran programs forward branches are assumed to be not-taken Profile-based prediction approaches the accuracy of • When generating code for machines following perfect static prediction Heuristic-based static prediction results in a dynamic this prediction rule, a compiler can order the misprediction rate of about 20% (for C) basic blocks in such a way that the predicted- Hardware-based prediction typically results in a taken branches go towards lower addresses misprediction rate of about 11% (for C) • Several empirically validated heuristics help the Relying on heuristics that mispredict 20% of branches is better compiler predict the direction of a branch than no prediction, but does not suffice in practice!

Kostis Sagonas 37 Spring 2012 Kostis Sagonas 38 Spring 2012

Outline Peephole Optimization

• Unreachable-Code Elimination Peephole optimization is an effective post-pass • Straightening technique for improving assembly code • If and Loop Simplifications • Loop Inversion and Unswitching Basic Idea: • Branch Optimizations • Tail Merging (Cross Jumping) – Discover local improvements by looking at a window • Conditional Moves of the code (a peephole) • Dead-Code Elimination Peephole: a short sequence of (usually contiguous) instructions • slide the peephole over the code, and examine the contents • Branch Prediction – The optimizer replaces the sequence with another • Peephole Optimization equivalent one (but faster) • Machine Idioms & Instruction Combining

Kostis Sagonas 39 Spring 2012 Kostis Sagonas 40 Spring 2012

Peephole Optimization (Cont.) Peephole Optimization Examples

Write peephole optimizations as rewrite rules store r1 ⇒ r0, 8 store r1 ⇒ r0, 8 load r0, 8 ⇒ r2 move r1 ⇒ r2 i1, …, in → j1, …, jm where the RHS is the improved version of the LHS addiu r1, 0 ⇒ r2 mult r3, r1 ⇒ r2 •Example: mult r3, r2 ⇒ r2 move r1⇒r2, move r2⇒r1 → move r1⇒r2 – Works if move r2⇒r1 is not the target of a jump jumpl L1 jumpl L2 L1: jumpl L2 L1: jumpl L2 • Another example: addiu r1, i ⇒ r1 addiu r1, j ⇒ r1 → addiu r1, i+j ⇒ r1

Kostis Sagonas 41 Spring 2012 Kostis Sagonas 42 Spring 2012

7 Peephole Optimization (Cont.) Outline

• Many (but not all) of the basic block (i.e. local) • Unreachable-Code Elimination optimizations can be cast as peephole • Straightening • If and Loop Simplifications optimizations • Loop Inversion and Unswitching – Example: add r1, 0 ⇒ r2 → move r1 ⇒ r2 • Branch Optimizations – Example: move r ⇒ r → • Tail Merging (Cross Jumping) – These two together eliminate add r, 0 ⇒ r • Conditional Moves • Dead-Code Elimination • Just like most compiler optimizations, peephole • Branch Prediction optimizations need to be applied repeatedly to • Peephole Optimization achieve maximum effect • Machine Idioms & Instruction Combining

Kostis Sagonas 43 Spring 2012 Kostis Sagonas 44 Spring 2012

Machine Idioms & Instruction Examples of Instruction Combining Combining Machine idioms are (sequences of) instructions for a If high-order 20 bits of const are all 0 particular architecture that provide a more efficient sethi %hi(const) ⇒ r1 add r0, const ⇒ r1 way of performing a computation than one might use if or r1, %lo(const) ⇒ r1 compiling for a more generic architecture. shl r1, 2 ⇒ r2 Pattern matching is used to recognize opportunities where mult r1, 5 ⇒ r2 add r1, r2 ⇒ r2 – Individual instructions can be substituted by faster and more specialized instructions that achieve the same purpose – Groups of instructions can be combined into a shorter or sub r1, r2 ⇒ r3 subcc r1, r2 ⇒ r3 faster sequence …. …. subcc r1, r2 ⇒ r0 bg L1 bg L1

Kostis Sagonas 45 Spring 2012 Kostis Sagonas 46 Spring 2012

8