15-745 Optimizing Compilers Spring 2008
School of Computer Science School of Computer Science
Back end structure After Instruction Selection define i32 @erk(i32 %a, i32 %b) {! entry:! ! %tmp3 = mul i32 %a, 37!! ;
register entry: 0x803ce0, LLVM BB @0x801a90, ID#0:! allocator TempMap ! %reg1024 = MOV32ri 33! ! %reg1025 = ADD32rm %reg1024,
School of Computer Science School of Computer Science
1 Abstract View Register allocator’s job
…! … …Written Read t7 <- 2! movl $2, t7 t7 <- () The register allocator’s t11 <- t7! movl t7, t11 t11 <- (t7) t11 <- (t7, t11)! job is to assign each t10 <- t11! imull t7, t11 t11 <- (t7, t11) temp to a machine t10 <- t10! movl t11, t10 t10 <- (t11) t8 <- t10! imull $37, t10 t10 <- (t10) register. t17 <- t7! movl t10, t8 t8 <- (t10) t17 <- t17! movl t7, t17 t17 <- (t7) %eax <- ! addl $1, t17 t17 <- (t17) %edx <- (%eax,%edx)! movl $33, %eax %eax <- () If that fails, the register %eax,%edx <- t17! cltd %edx,%eax <- (%eax) allocator must rewrite …! idivl t17 %eax,%edx <- (t17,%eax,%edx) the code so that it can … … … t7 : %ebx succeed. t8 : %ecx t10 : %eax The t11 : %eax Abstract view, for register allocation purposes t17 : %esi TempMap … School of Computer Science School of Computer Science
Some terminology A graph-coloring problem Two temps interfere if at some point in the program they cannot both occupy the same Interference graph: an undirected graph where •!nodes = temps register. •!there is an edge between two nodes if their v corresponding temps interfere
v <- 1 x w Which temps interfere? v <- 1 w <- v + 3 w <- v + 3 x <- w + v x <- w + v u <- v u <- v t <- u + x u t <- u + x <- w <- w <- t <- t <- u <- u t
School of Computer Science School of Computer Science
2 A graph-coloring problem History
For early architectures, register allocation was not very A graph is k-colorable if every node in the graph can important be colored with one of k colors such that two adjacent Early work by Cocke (in 1971) proposed the idea that nodes do not have the same color v register allocation can be viewed as a graph coloring problem Assigning k registers = Coloring with k colors Chaitin was the first to implement this idea for the IBM 370 PL/1 compiler, in 1981 x w v <- 1 In 1982, at IBM, Chaitin’s allocator was used for the PL.8 w <- v + 3 compiler for the IBM 801 RISC system x <- w + v eax u <- v Today, register allocation is the most essential of code edx u optimizations t <- u + x ecx <- w <- t <- u t
School of Computer Science School of Computer Science
History, cont’d Steps in register allocation Motivated by the first MIPS architecture, Chow and Hennessy developed priority-based graph coloring in 1984 “top down” coloring Build
Another popular algorithm for register Color allocation based on graph coloring is due to Briggs in 1992 Spill “bottom up” coloring
School of Computer Science School of Computer Science
3 Building the interference graph Edges of Interference Graph Given liveness information, we can build the Intuitively: interference graph (IG) Two variables interfere if they overlap at some point in the program. Algorithm: At each point in program, enter an edge for every pair of variables at that point How? v v <- 1 {} w <- v + 3 {v} An optimized definition & algorithm for edges: x <- w + v {w,v} x w u <- v {w,x,v} "" For each defining inst i Let x be definition at inst i t <- u + x {w,u,x} For each variable y live at end of inst i <- x {x,w,u,t} u insert an edge between x and y <- w {w,u,t} Faster? <- t {u,t} {D} A = 2 (A ) Edge between A and D <- u {u} Better quality? 2 {A,D} t
School of Computer Science School of Computer Science
Building the interference graph A Better Interference Graph for each defining inst i! let x be temp defined at inst i" x = 0;! What does the interference for all y in LIVE-IN of succ(i)! for(i = 0; i < 10; i++)! graph look like? insert an edge between x and y! {! x += i;! v What’s the minimum }! number of registers v <- 1 {} needed? w <- v + 3 {v} x w y = global;! x <- w + v {w,v} y *= x;! u <- v {w,x,v} t <- u + x {w,u,x} <- x {x,w,u,t} u for(i = 0; i < 10; i++)! <- w {w,u,t} {! <- t {u,t} y += i;! <- u {u} t }!
School of Computer Science School of Computer Science
4 Live Ranges & Merged Live Ranges Example Live Variables A live range consists of a definition and all Reaching Definitions {} {} A = ... (A1) {A} {A } IF A goto L1 1 the points in a program (e.g. end of an {A} {A1} instruction) in which that definition is live. {A} {A1} B = ... (B ) 1 L1: {A} {A } {A,B} {A1,B1} = A 1 C = ... (C1) –!How to compute a live range? {B} {A1,B1} {A,C} {A1,C1} D = B (D2) = A {D} {A1,B1,D2} {C} {A1,C1} D = ... (D1) a = … a = … {D} {A1,C1,D1}
{D} {A ,B ,C ,D ,D } A = 2 (A ) 1 1 1 1 2 Merge … = a 2 {A,D} {A ,B ,C ,D ,D } 2 1 1 1 2 Two overlapping live ranges for the same {A,D} {A2,B1,C1,D1,D2} {D} {A ,B ,C ,D ,D } = A variable must be merged 2 1 1 1 2 ret D A live range consists of a definition and all the points in a program in which that definition is live.
School of Computer Science School of Computer Science
Merging Live Ranges Example: Merged Live Ranges
Merging definitions into equivalence classes: {} A = ... (A1) {A } –! Start by putting each definition in a different equivalence class IF A goto L1 1 {A1} –! For each point in a program
•! if variable is live, {A1} B = ... (B ) 1 L1: {A } and there are multiple reaching definitions for the variable {A1,B1} = A 1 C = ... (C1) {B1} {A1,C1} •! merge the equivalence classes of all such definitions into a one D = B (D2) = A {D1,2} {C1} equivalence class D = ... (D1) {D1,2}
{D1,2} A = 2 (A2) {A2,D1,2}
{A ,D } 2 1,2 = A {D1,2} Merged live ranges are also known as “webs” A has two “webs” ret D makes register allocation easier
School of Computer Science School of Computer Science
5 Steps in register allocation Steps in register allocation
Build Build Prioritize Color Color Select Spill Spill
School of Computer Science School of Computer Science
Priority Coloring Priority Coloring Heuristic priority function computes Allocate in priority order k = 3 k = 3 priority for each variable Another heuristic selects which –!highest priority allocated first v register to assign to variable v – rotating registers Ideas for priority functions? ! –!lowest numbered register x w x w v <- 1 Priorities: w <- v + 3 v: 4 w: 3 x <- w + v eax x: 2 u u u <- v edx u: 3 t <- u + x ecx <- w t: 2 <- t Order: Order: t t <- u v, w, u, x, t v, w, u, x, t
School of Computer Science School of Computer Science
6 Steps in register allocation Graph coloring
Once we have the interference graph, we can attempt Simplify register allocation by searching for a K-coloring Build This is an NP-complete problem (K!3)* Coalesce But a linear-time simplification algorithm by Kempe (in 1879) Color tends to work well in practice Potential Spill Spill Select * [1] H. Bodlaender, J. Gustedt, and J. A. Telle, “Linear-time register allocation for a fixed number of registers,” in Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, pp. 574–583, Society for Industrial and Applied Mathematics, 1998. [2] S. Kannan and T. Proebsting, “Register allocation in structured programs,” in Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms, pp. 360–368, Society for Industrial and Applied Mathematics, 1995. [3] M. Thorup, “All structured programs have small tree width and good register allocation,” Inf. Comput., vol. 142, no. 2, pp. 159–181, 1998. School of Computer Science School of Computer Science
Kempe’s algorithm Kempe’s algorithm, cont’d
Basic observation: If all nodes are removed by step #1, then –!given a graph that contains a node with the graph is K-colorable degree less than K, the graph is K-colorable However, the degree t1 t4 School of Computer Science School of Computer Science 7 Kempe’s algorithm, cont’d Example In step #1, each removed node should be pushed onto a k = 3 stack v –!when all are removed, we pop each node and put it back into v the graph, assigning a suitable color as we go In case we get stuck (i.e., there are no nodes with x degree u u t t Stack School of Computer Science School of Computer Science Example Another Example Now what? k = 3 k = 3 Be optimistic: v - Put a node with degree ! k on stack v v - Lose guarantee that anything we put on stack is colorable x x w -! If we’re lucky this node will still be x w colorable when popped from stack w Be realistic: t u u -! If unlucky, this node will have to be u spilled (allocated to memory) w eax -! Mark as potential spill to avoid t edx recomputation later t ecx t v Stack School of Computer Science School of Computer Science 8 Select Steps in register allocation Pop a node from the stack k = 3 Assign it a color that does not v conflict with neighbors in Build interference graph x This will always be possible, x w unless the node is a potential spill u Color t If it is not possible, mark as u Spill actual spill w t v School of Computer Science School of Computer Science Spilling to Memory Spilling a use RISC Architectures –! Only load and store can access memory For an instruction like •! every use requires load –!t <- (u,v) •! every def requires store If u is marked as an actual spill, transform to •! create new temporary for each location –!u’ := u (i.e., a load instruction) –!t <- (u’,v) CISC Architectures where u’ is a new temp –! can operate on data in memory directly u and u’ are special: •! makes writing compiler easier(?), but isn’t necessarily faster –!u is spilled and thus unallocatable –! pseudo-registers inside memory operands still have to be handled –!u’ is marked as unspillable School of Computer Science School of Computer Science 9 Spilling a def Spilled (unallocable) temps For an instruction like Question: Where do the spilled temps get stored? –!t <- (u,v) Answer: On the stack, in stack slots If t is marked as an actual spill, transform to –!t’ <- (u,v) return addr –!t := t’ (i.e., a store instruction) ebp old ebp where t’ is a new temp Each spilled temp … should be allocated into t and t’ are special: a stack slot –!t is spilled and thus unallocatable slot n –!t’ is marked as unspillable … The compiler can slot 1 maintain a counter for esp slot 0 the “next” slot number To “mark” an actual spill, give it a slot number School of Computer Science School of Computer Science Stack slots Spill code generation In order to create the stack slots at run time, The effect of spill code generation is to turn the prelude code needs to modify %esp long live ranges into a bunch of tiny live ranges _main: _main: This introduces new temps pushl %ebp pushl %ebp movl %esp, %ebp movl %esp, %ebp Hence, register allocation must start over from movl %edi, t2 movl %edi, t2 movl %ebx, t3 movl %ebx, t3 scratch whenever spill code is generated movl %esi, t4 movl %esi, t4 subl $(n"4), %esp Note that the subl can be generated only after register allocation is finished School of Computer Science School of Computer Science 10 Spilling What to Spill? v <- 1 Allocate w to memory w1 <- v + 3 location Mw When choosing potential spill node want: w1 Mw[]<- w1 Spilled variables are allocated to –!A node that makes graph easier to color the stack in an area completely w2 <- Mw[] w2 •! Fewer spills later controlled by the compiler. x <- w2 + v These memory locations are –!A node that isn’t “expensive” to spill v special in that they can be •! An expensive node would slow down the program if spilled u <- v optimized without concern for memory aliasing issues. t <- u + x x –!We can apply heuristics both when choosing potential spill nodes and when choosing actual spill nodes <- x •! not required to spill node that we popped off stack and can’t w3 color w3 <- Mw[] t <- w3 Now Start Over... <- t u ...compute live ranges... <- u School of Computer Science School of Computer Science A Spill Heuristic Rematerialization Pick node (live range) n that minimizes: An alternative to spilling depth(def ) depth(use ) –!Recompute value of variable instead of store/ #10 + #10 load to memory def "n use "n –!Example: degree(n) v <- 1 v <- 1 w <- v + 3 w <- v + 3 This heuristic prefers nodes that: x <- w + v x <- w + v –! Are used infrequently u <- v u <- v –! Aren’t used inside of loops t <- u + x t <- u + x –! Have a large degree <- w w <- 4 ! <- t <- w Could use any one of several other heuristics as well… <- u <- t <- u School of Computer Science School of Computer Science 11 Build Take Two Simplify->Select v <- 1 k = 3 w <- v + 3 k = 3 1 v v Mw[]<- w1 w2 <- Mw[] x <- w + v x w w w 2 1 2 3 x w1 w2 w3 u <- v t <- u + x u <- x u w3 <- Mw[] <- w 3 t t <- t Recalculate interference graph <- u School of Computer Science School of Computer Science Another Example Simplification steps… t3 t1 t5 t1 <- () t3 t1 t5 t2 <- () t2 t4 t3 <- (t1,t2) t4 <- (t1,t3) t2 t4 t5 <- (t1,t2) t6 r1 r2 t6 <- (t4,t5) t6 r1 r2 Assume 2 machine registers, {r1,r2}. Assume t4 may not be in r1. Then we have the interference graph k = 2 School of Computer Science School of Computer Science 12 After some simplification steps… Choosing potential spills… t6 t6 t3 t1 t5 t5 t3 t1 t5 t5 r2 r2 r1 r1 t2 t4 t2 t4 t3 ps t4 ps t6 r1 r2 t6 r1 r2 k = 2 k = 2 School of Computer Science School of Computer Science Completing simplification Selecting colors… t6 t6 t3 t1 t5 t5 t3 t1 t5 t5 r2 r2 r1 r1 t2 t4 t3 ps t2 t4 t3 ps t4 ps t4 ps t1 t6 r1 r2 t2 t6 r1 r2 k = 2 k = 2 School of Computer Science School of Computer Science 13 Actual spills… Select complete! t6 t3 t1 t5 t5 t3 t1 t5 r2 r1 t2 t4 t2 t4 t6 r1 r2 t6 r1 r2 School of Computer Science School of Computer Science Spill code generation… …and start over! t1 <- ()#() r2t1 <-<- ()() t3 t1 t5 t8 t1 t5 t10 t2 <- ()#() r1t2 <-<- ()() t3t7 <- (t1,t2)#(t1,t2) r1t7 <-<- (r2,r1)(t1,t2) t2 t4 t3 # t2 slot0t3 t4 :=<- t7(t1,t3) t9 := :=t7 r1 t5t8 <-:= #t3(t1,t2) r1t8 :=:= slot0t3 t6t9 <- (t1,#(t4,t5)t8) r1t9 <-<- (r2,r1)(t1,t8) t6 t7 t6 r1 r2 t4 := t9 r1 r2 slot1t4 := :=t9 r1 t5 <- (t1,t2) r1t5 <-<- (r2,r1)(t1,t2) t10 := t4 r2t10 := := slot1 t4 Notice: Live ranges for t3 and t6 <- (t10,t5) r1t6 <-<- (r2,r1)(t10,t5) t4 have been chopped up into lots of small live ranges School of Computer Science School of Computer Science 14 Steps in register allocation Move Coalescing Eliminate moves by assigning the src and Simplify dest to the same register Build movl t1,t2 movl %eax,%eax Coalesce addl %edx,%eax addl t3,t2 addl %edx,%eax Color Potential Spill When can we coalesce t1 and t2? Spill Select How can we modify our interference graph to do this? School of Computer Science School of Computer Science Example Example v <- 1 v v <- 1 Want u and v to be v assigned same color... w <- v + 3 w <- v + 3 ...merge u and v x <- w + v v x <- w + v to form a single x w node x uv w u <- v x u <- v t <- u + x w t <- u + x <- w t u <- w u <- t u <- t <- u <- u t t u and v are special: First compute live ranges...... then construct interference graph A move whose source is not live-out of the That is, if the src and dest move is a candidate for coalescing don’t interfere School of Computer Science School of Computer Science 15 Is Coalescing Always Good? When should we coalesce? y Always y –! If we run into trouble start un-coalescing •! no nodes with degree < k, see if breaking up coalesced nodes fixes –! Can we always coalesce all coalescable moves? Only if we can prove it won’t cause problems u x –! Briggs: Conservative Coalescing u x –! George: Iterated Coalescing uv move edge vs. y u x v v a a When we simplify the graph, b we remove nodes of degree < b k... v a And the winner is? want to make sure we will 2 colorable 3 colorable still be able to simplify coalesced node, uv b School of Computer Science School of Computer Science Briggs: Conservative Coalescing George: Iterated Coalescing •!Can coalesce u and v if: Can coalesce u and v if –!(# of neighbors of uv with degree ! k) < k foreach neighbor t of u Resulting node uv will •! t interferes with v, or, doesn’t change degree (after simplification) •!Why? •! degree of t < k removed by simplification have degree equal to degree of v –!Simplify pass removes all nodes with degree < k –!# of remaining nodes < k Why? –!let S be set of neighbors of u with degree < k u –!Thus, uv can be simplified y x –!If no coalescing, simplify removes all nodes in S, call that graph G1 What does Briggs –!If we coalesce we can still remove all nodes in S, call say about that graph G2 k = 3? v a –!G2 is a subgraph of G1 k = 2? b School of Computer Science School of Computer Science 16 George: Iterated Coalescing Why Two Methods? S No coalescing, •!Why not? 2 S 3 after •!With Briggs, one needs to look at all neighbors of a & b S1 u simplification 1 •!With George, only need to look at neighbors of a. u S4 G So: x2 v –!Use George if one of a & b has very large degree x2 v x1 –!Use Briggs otherwise x1 After coalescing and k = 4 simplification x2 uv G2 x1 School of Computer Science School of Computer Science Optimistic Coalescing Steps in register allocation Aggressively coalesce top-down vs. bottom-up If coalesced node spills, uncoalesce y y Prioritize Simplify Select Coalesce u x u x Potential Spill uv v a Which one is better? v a Select b b Will this always work? School of Computer Science School of Computer Science 17 Alternative Allocators Alternative Allocators Graph allocator, as described, has issues Local/Global Allocation gcc’s approach, –! What are they? –!Allocate “local” pseudo-registers unless -fnew-ra Alternative: Single pass graph coloring •! Lifetime contained within basic block –! Build, Simplify, Coalesce as before –! In select, if can’t color with register, color with stack location •! Register sufficiency no longer NP-Complete! •! Keep going –!Allocate global pseudo-registers –! Requires second, reload phase •! Single pass global coloring •! “fixes” spilled variables •! Might require that we reserve a register •! Coloring heuristic may reverse local allocation •! Can get messy –!Reload pass to fix spills (allocator does not generate Claim: Does a pretty good job spill code) –! Why? •! Key is order nodes are colored (top-down) –!Can also do global then local –!Advantages? Disadvantages? Advantages? Disadvantages? School of Computer Science School of Computer Science Alternative Allocators In Chaitin’s words Linear Scan “…since I was a mathematician, the register allocation kept –!Performs single sweep over live ranges getting simpler and faster as I understood better what was –!Each range represented by a single interval required. I preferred to base algorithms on a simple, clean idea that was intellectually understandable rather than write – Greedily allocates/spills ! complicated ad hoc computer code… Second Chance Binpacking So I regard the success of this approach, which has been the basis –!maintains more state for much future work, as a triumph of the power of a simple •! lifetime holes, register-memory consistency mathematical idea over ad hoc hacking. Yes, the real world is –!will split a live range messy and complicated, but one should try to base algorithms –!less greedy; may reevaluate previous allocations on clean, comprehensible mathematical ideas and only complicate them when absolutely necessary. In fact, certain Advantages? Disadvantages? instructions were omitted from the 801 architecture because they would have unduly complicated register allocation…” — G. Chaitin, 2004 School of Computer Science School of Computer Science 18 Avoiding Spills Bottom-up vs Top-down: Speed 15.00% 100.00% 10.00% 90.00% 80.00% 5.00% 70.00% 60.00% 0.00% 50.00% -5.00% 40.00% 30.00% -10.00% 20.00% Percent Functions With No Spills -15.00% Performance improvement of graph allocator 10.00% e ack ortex 179.art 175.vpr 181.mcf 254.gap 0.00% 164.gzip 301.apsi 300.twolf 171.swim 256.bzip2 177.mesa 173.applu 172.mgrid 188.ammp 197.parser 255.v x86 (8) 68k (16) PPC (32) 183.equak 200.sixtr 168.wupwise top down bottom up SPECint SPECfp 1.8 Ghz Pentium 4; -O3 -funroll-loops -fnew-ra; gcc version 3.2.2 1.8 Ghz Pentium 4; -O3 -funroll-loops; gcc version 3.2.2 School of Computer Science School of Computer Science Bottom-up vs Top-down: Size 2.00% 0.00% -2.00% -4.00% -6.00% -8.00% Complexity of Register Allocation -10.00% -12.00% -14.00% -16.00% Percent size improvement of graph allocator -18.00% e y ack aft ortex 179.art 175.vpr 181.mcf 252.eon 254.gap 253.perl 164.gzip 301.apsi 300.twolf 171.swim 256.bzip2 177.mesa 173.applu 186.cr 172.mgrid 188.ammp 197.parser 255.v 183.equak 200.sixtr 168.wupwise SPECint SPECfp x86; -Os; gcc version 3.2.2 School of Computer Science 19 Complexity of Register Allocation Complexity of Register Allocation Graph color is NP-complete Complexity of local register allocation? –!what does this tell us about register allocation? –!linear algorithm for register sufficiency Given arbitrary graph can construct program SSA Form? with matching interference graph1 –!interference graph is turns out to be both perfect1 –!simply determining if spilling is necessary is and chordal2 therefore NP-complete… or is it? •!can color in linear time Can exploit structure of reducible program2,3,4 –!BUT all bets are off after SSA elimination3 [1] G.J. Chaitin, M. Auslander, A.K. Chandra, J. Cocke, M.E. Hopkins, and P. Markstein. Register allocation via coloring. Computer Languages, 6:47-57, 1981.! [1] Philip Brisk, Foad Dabiri, Jamie Macbeth, and Majid Sarrafzadeh. Polynomial time graph coloring register allocation. In 14th [2] H. Bodlaender, J. Gustedt, and J. A. Telle, “Linear-time register allocation for a fixed number of registers,” in Proceedings of the International Workshop on Logic and Synthesis. ACM Press, 2005.! ninth annual ACM-SIAM symposium on Discrete algorithms, pp. 574–583, Society for Industrial and Applied Mathematics, 1998.! [2] Sebastian Hack. Interference graphs of programs in SSA-form. Technical Report ISSN 1432-7864, Universitat Karlsruhe, 2005.! [3] S. Kannan and T. Proebsting, “Register allocation in structured programs,” in Proceedings of the sixth annual ACM-SIAM [3] Jens Palsberg and Fernando Magno Quintao Pereira Register allocation after classical SSA elimination is NP-complete, In symposium on Discrete algorithms, pp. 360–368, Society for Industrial and Applied Mathematics, 1995.! Proceedings of FOSSACS'06, Foundations of Software Science and Computation Structures. Springer-Verlag (LNCS), Vienna, [4] M. Thorup, “All structured programs have small tree width and good register allocation,” Inf. Comput., vol. 142, no. 2, pp. 159– Austria, March 2006.! 181, 1998.! School of Computer Science School of Computer Science Complexity of Register Allocation Complexity of optimizing spill code? –!NP-complete even without control flow1 Complexity of optimal coalescing? –!NP-complete2 [1] Martin Farach and Vincenzo Liberatore. On local register allocation. In 9th ACMSIAM symposium on Discrete Algorithms, pages 564 { 573. ACM Press, 1998.! [2] Andrew W. Appel and Lal George. Optimal spilling for cisc machines with few registers. In Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, pages 243–253. ACM Press, 2001.! School of Computer Science 20