Dead Code Elimination & Constant Propagation on SSA form
Slides mostly based on Keith Cooper’s set of slides (CO M P 512 class at Rice U niversity, Fall 2002). U sed w ith kind permission.
Using SSA – Dead Code Elimination
D ead code elimination • Conceptually similar to mark-sw eep garbage collection > M ark useful operations > Everything not marked is useless • N eed an efficient w ay to find and to mark useful operations > Start w ith critical operations > W ork back up SSA edges to find their antecedents
D efine critical • I/O statements, linkage code (entry & exit blocks), return values, calls to other procedures
A lgorithm w ill use post-dominators & reverse dominance frontiers
KT2, 2003 2
1 Using SSA – Dead Code Elimination
Mark Sweep for each op i for each op i clear i’s mark if i is not marked then if i is critical then if i is a branch then mark i rewrite with a jump to add i to WorkList i’s nearest useful
while (Worklist Ø) post-dominator remove i from WorkList if i is not a jump then (i has form “x←y op z” ) delete i if def(y) is not marked then mark def(y) Notes: add def(y) to WorkList if def(z) is not marked then • Eliminates some branches mark def(z) • Reconnects dead branches to the add def(z) to WorkList remaining live code for each b ∈ RDF(block(i)) • Find useful post-dominator by walking mark the block-ending post-dominator tree branch in b add it to WorkList > Entry & exit nodes are useful KT2, 2003 3
Using SSA – Dead Code Elimination
Handling Branches • W hen is a branch useful? > W hen a useful operation depends on its existence
In the CFG, j is control dependent on i if 1. ∃ a non-null path p from i to j ∋ j post-dominates every node on p after i 2. j does not strictly post-dominate i
• j control dependent on i Ω one path from i leads to j, one doesn’t • T his is the reverse dom inance frontier of j (R D F(j))
A lgorithm uses R D F(n) to m ark branches as live
KT2, 2003 4
2 Using SSA – Dead Code Elimination
What’s left? • A lgorithm elim inates useless definitions & som e useless branches • A lgorithm leaves behind em pty blocks & extraneous control-flow
Algorithm from: Cytron, Ferrante, Rosen, Wegman, & Zadeck, Efficiently Computing Static Single Assignment Form and the Control Dependence Graph, ACM TOPLAS T w o m ore issues 13(4), October 1991 with a correction due to Rob Shillner • S im plifying control-flow • Elim inating unreachable blocks B oth are CFG transform ations (no need for S S A )
KT2, 2003 * 5
Constant Propagation
S afety • Proves that nam e alw ays has know n value • S pecializes code around that value > M oves som e com putations to com pile tim e (Ω code m otion) > Exposes som e unreachable blocks (Ω dead code)
O pportunity
• V alue ⊥ signifies an opportunity
Profitability • Com pile-tim e evaluation is cheaper than run-tim e evaluation • B ranch rem oval m ay lead to block coalescing (CLEA N ) > If not, it still avoids the test & m akes branch predictable
KT2, 2003 6
3 Using SSA — Sparse Constant Propagation
∀ expression, e TOP if its value is unknown Value(e) ← ci if its value is known WorkList ← Ø { BOT if its value is known to vary ∀ SSA edge s = if Value(u) TOP then i.e., o is “a←b op v” or “a ←v op b” add s to WorkList
while (WorkList Ø) remove s = from WorkList let o be the operation that uses v Evaluating a Φ-node:
Φ if Value(o) BOT then (x1,x2,x3, … xn) is ← ∧ t result of evaluating o Value(x1) Value(x2) Value(x3) ∧ ∧ if t Value(o) then ... Value(xn) ∀ SSA edge
Same result, fewer ∧ operations ∧ ci cj = BOT if ci cj Performs ∧ only at Φ nodes BOT ∧ x = BOT ∀ x
KT2, 2003 7
Using SSA — Sparse Constant Propagation
How long does this algorithm take to halt? TOP • Initialization is two passes ... c c c c c c ... > |ops| + 2 x |ops| edges i j k l m n
• V alue(x) can take on 3 values BOT > T O P, ci, B O T > Each use can be on W orkList twice > 2 x |args| = 4 x |ops| evaluations, W orkList pushes & pops
T his is an optimistic algorithm: • Initialize all values to T O P, unless they are known constants
• Every value becomes B O T or ci, unless its use is uninitialized
KT2, 2003 8
4 Sparse Conditional Constant Propagation
Optimism Optimism
12 ← • This version of the algorithm is an i0 12 OPpetsimsimistisict ic while ( … ) Clear optimistic formulation iinniittiiaalliizzaattiioonnss ← Φ that i is T⊥OP i1 (i0,i3) TOP ← always • Initializes values to TOP ⊥ x i1 * 17 LLeeaaddss ttoo:: TOP ← ii ≡≡1211 22a t∧ dT⊥eOf ≡P⊥≡ 12 ⊥ j i1 11 • Prior version used ⊥ (implicit) TOP ← ≡o≡f ⊥x ≡≡⊥ ⊥ i2 … xx 12* * 1 177 204 … jj ≡≡ 1⊥2 ← ≡≡ ⊥ T⊥OP i3 j ii33 12 ≡ ∧ ≡ i1 12 12 12 In general • Optimism helps inside loops • Largely a matter of initial value
M.N. Wegman & F.K. Zadeck, Constant propagation with conditional branches, ACM TOPLAS, 13(2), April 1991, pages 181–210.
KT2, 2003 * 9
Sparse Constant Propagation
W hat happens w hen it propagates a value into a branch? • T OP Ω w e gain no know ledge Ω But, the algorithm • B OT either path can execute does not use this ... Ω } • T R U E or FA LS E only one path can execute
W orking this into the algorithm • U se tw o w orklists: S S A W orkList & CFG W orkList > S S A W orkList determines values > CFG W orkList governs reachability • D on’t propagate into operation until its block is reachable
KT2, 2003 10
5 Sparse Conditional Constant Propagation
SSAWorkList ← é while ((CFGWorkList ∪ SSAWorkList) é)
← CFGWorkList n0 while(CFGWorkList é) ∀ block b remove b from CFGWorkList clear b's mark mark b ∀ expression e in b evaluate each Φ-function in b Value(e) ← TOP evaluate each op in b, in order
while(SSAWorkList é) Initialization Step remove s = from WorkList let o be the operation that contains v To evaluate a branch t ← result of evaluating o
if arg is BOT then
put both targets on CFGWorklist if t Value(o) then Value(o) ← t else if arg is TRUE then ∀ put TRUE target on CFGWorkList SSA edge
KT2, 2003 11
Sparse Conditional Constant Propagation
There are some subtle points • B ranch conditions should not be TO P w hen evaluated > Indicates an upw ards-exposed use (no initial value) > H ard to envision compiler producing such code
• Initialize all operations to TO P > B lock processing w ill fill in the non-top initial values > U nreachable paths contribute TO P to Φ-functions
• Code show s CFG edges first, then S S A edges > Can intermix them in arbitrary order (correctness) > Taking CFG edges first may help w ith speed (minor effect)
KT2, 2003 12
6 Sparse Conditional Constant Propagation
More subtle points • T O P * B O T → T O P > If T O P becom es 0, then 0 * B O T → 0 > T his prevents non-m onotonic behavior for the result value > U ses of the result value m ight go irretrievably to > S im ilar effects w ith any operation that has a “zero”
• S om e values reveal sim plifications, rather than constants > → B O T * ci B O T , but m ight turn into shifts & adds (ci = 2, B O T ≥ 0) > Rem oves com m utativity (reassociation) > B O T **2 → B O T * B O T (vs. series or call to library)
→ → • cbr T R U E L1,L2 becom es br L1 > Method discovers this; it m ust rew rite the code, too!
KT2, 2003 13
Sparse Conditional Constant Propagation
U nreachable Code Optimism 17 i ← 17 WAlilt hp aSthCsC · Initialization to TOP is still if (i > 0) then emxaerckuinteg important TOP ← blocks 1100 j1 10 else · Unreachable code keeps TOP T2O0P j ← 20 2 · ∧ with TOP has desired result ⊥ ← Φ T1O0P j3 (j1, j2) 1⊥70 k ← j * 17 3 Cannot get this any other way · DEAD code cannot test (i > 0)
· DEAD marks j2 as useful In general, combining two optimizations can lead to answers that cannot be produced by any combination of running them separately. This algorithm is one example of that general principle. Combining register allocation & instruction scheduling is another ... KT2, 2003 *14
7 Using SSA Form for Compiler Optimizations
In general, using SSA conversion leads to • Cleaner form ulations • B etter results • Faster algorithm s
W e’ve seen tw o SSA-based algorithm s. • D ead-code elim ination • Sparse conditional constant propagation W e’ll see m ore…
KT2, 2003 15
8