15-745 Optimizing Spring 2008

School of Computer Science School of Computer Science

Back end structure After define i32 @erk(i32 %a, i32 %b) {! entry:! ! %tmp3 = mul i32 %a, 37!! ; [#uses=1]! ! %tmp4 = mul i32 %tmp3, %a!! ; [#uses=1]! instruction ! %tmp6 = add i32 %b, 33!! ; [#uses=1]! Assem ! %tmp7 = sdiv i32 %tmp4, %tmp6!! ; [#uses=1]! IR selector ! %tmp9.neg = xor i32 %a, -1!! ; [#uses=1]! ! %tmp10 = add i32 %tmp7, %tmp9.neg!! ; [#uses=1]! ! ret i32 %tmp10! }!

register entry: 0x803ce0, LLVM BB @0x801a90, ID#0:! allocator TempMap ! %reg1024 = MOV32ri 33! ! %reg1025 = ADD32rm %reg1024, , 1, %NOREG, 0! ! %reg1026 = MOV32rm , 1, %NOREG, 0! ! %reg1027 = IMUL32rr %reg1026, %reg1026! ! %reg1028 = IMUL32rri8 %reg1027, 37! ! %EAX = MOV32rr %reg1028! instruction ! %EAX = CDQ %EDX, %EAX! Assem ! IDIV32r %reg1025, %EAX, %EDX, %EAX, %EDX! scheduler ! %reg1029 = MOV32rr %EAX! ! %reg1030 = NOT32r %reg1026! ! %reg1031 = ADD32rr %reg1029, %reg1030! ! %EAX = MOV32rr %reg1031! ! RET!

School of Computer Science School of Computer Science

1 Abstract View Register allocator’s job

…! … …Written Read t7 <- 2! movl $2, t7 t7 <- () The register allocator’s t11 <- t7! movl t7, t11 t11 <- (t7) t11 <- (t7, t11)! job is to assign each t10 <- t11! imull t7, t11 t11 <- (t7, t11) temp to a machine t10 <- t10! movl t11, t10 t10 <- (t11) t8 <- t10! imull $37, t10 t10 <- (t10) register. t17 <- t7! movl t10, t8 t8 <- (t10) t17 <- t17! movl t7, t17 t17 <- (t7) %eax <- ! addl $1, t17 t17 <- (t17) %edx <- (%eax,%edx)! movl $33, %eax %eax <- () If that fails, the register %eax,%edx <- t17! cltd %edx,%eax <- (%eax) allocator must rewrite …! idivl t17 %eax,%edx <- (t17,%eax,%edx) the code so that it can … … … t7 : %ebx succeed. t8 : %ecx t10 : %eax The t11 : %eax Abstract view, for register allocation purposes t17 : %esi TempMap … School of Computer Science School of Computer Science

Some terminology A graph-coloring problem Two temps interfere if at some point in the program they cannot both occupy the same Interference graph: an undirected graph where •!nodes = temps register. •!there is an edge between two nodes if their v corresponding temps interfere

v <- 1 x w Which temps interfere? v <- 1 w <- v + 3 w <- v + 3 x <- w + v x <- w + v u <- v u <- v t <- u + x u t <- u + x <- w <- w <- t <- t <- u <- u t

School of Computer Science School of Computer Science

2 A graph-coloring problem History

For early architectures, register allocation was not very A graph is k-colorable if every node in the graph can important be colored with one of k colors such that two adjacent Early work by Cocke (in 1971) proposed the idea that nodes do not have the same color v register allocation can be viewed as a problem Assigning k registers = Coloring with k colors Chaitin was the first to implement this idea for the IBM 370 PL/1 , in 1981 x w v <- 1 In 1982, at IBM, Chaitin’s allocator was used for the PL.8 w <- v + 3 compiler for the IBM 801 RISC system x <- w + v eax u <- v Today, register allocation is the most essential of code edx u optimizations t <- u + x ecx <- w <- t <- u t

School of Computer Science School of Computer Science

History, cont’d Steps in register allocation Motivated by the first MIPS architecture, Chow and Hennessy developed priority-based graph coloring in 1984 “top down” coloring Build

Another popular algorithm for register Color allocation based on graph coloring is due to Briggs in 1992 Spill “bottom up” coloring

School of Computer Science School of Computer Science

3 Building the interference graph Edges of Interference Graph Given liveness information, we can build the Intuitively: interference graph (IG) Two variables interfere if they overlap at some point in the program. Algorithm: At each point in program, enter an edge for every pair of variables at that point How? v v <- 1 {} w <- v + 3 {v} An optimized definition & algorithm for edges: x <- w + v {w,v} x w u <- v {w,x,v} "" For each defining inst i Let x be definition at inst i t <- u + x {w,u,x} For each variable y live at end of inst i <- x {x,w,u,t} u insert an edge between x and y <- w {w,u,t} Faster? <- t {u,t} {D} A = 2 (A ) Edge between A and D <- u {u} Better quality? 2 {A,D} t

School of Computer Science School of Computer Science

Building the interference graph A Better Interference Graph for each defining inst i! let x be temp defined at inst i" x = 0;! What does the interference for all y in LIVE-IN of succ(i)! for(i = 0; i < 10; i++)! graph look like? insert an edge between x and y! {! x += i;! v What’s the minimum }! number of registers v <- 1 {} needed? w <- v + 3 {v} x w y = global;! x <- w + v {w,v} y *= x;! u <- v {w,x,v} t <- u + x {w,u,x} <- x {x,w,u,t} u for(i = 0; i < 10; i++)! <- w {w,u,t} {! <- t {u,t} y += i;! <- u {u} t }!

School of Computer Science School of Computer Science

4 Live Ranges & Merged Live Ranges Example Live Variables A live range consists of a definition and all Reaching Definitions {} {} A = ... (A1) {A} {A } IF A goto L1 1 the points in a program (e.g. end of an {A} {A1} instruction) in which that definition is live. {A} {A1} B = ... (B ) 1 L1: {A} {A } {A,B} {A1,B1} = A 1 = ... (C1) –!How to compute a live range? {B} {A1,B1} {A,C} {A1,C1} D = B (D2) = A {D} {A1,B1,D2} {C} {A1,C1} D = ... (D1) a = … a = … {D} {A1,C1,D1}

{D} {A ,B ,C ,D ,D } A = 2 (A ) 1 1 1 1 2 Merge … = a 2 {A,D} {A ,B ,C ,D ,D } 2 1 1 1 2 Two overlapping live ranges for the same {A,D} {A2,B1,C1,D1,D2} {D} {A ,B ,C ,D ,D } = A variable must be merged 2 1 1 1 2 ret D A live range consists of a definition and all the points in a program in which that definition is live.

School of Computer Science School of Computer Science

Merging Live Ranges Example: Merged Live Ranges

Merging definitions into equivalence classes: {} A = ... (A1) {A } –! Start by putting each definition in a different equivalence class IF A goto L1 1 {A1} –! For each point in a program

•! if variable is live, {A1} B = ... (B ) 1 L1: {A } and there are multiple reaching definitions for the variable {A1,B1} = A 1 C = ... (C1) {B1} {A1,C1} •! merge the equivalence classes of all such definitions into a one D = B (D2) = A {D1,2} {C1} equivalence class D = ... (D1) {D1,2}

{D1,2} A = 2 (A2) {A2,D1,2}

{A ,D } 2 1,2 = A {D1,2} Merged live ranges are also known as “webs” A has two “webs” ret D makes register allocation easier

School of Computer Science School of Computer Science

5 Steps in register allocation Steps in register allocation

Build Build Prioritize Color Color Select Spill Spill

School of Computer Science School of Computer Science

Priority Coloring Priority Coloring Heuristic priority function computes Allocate in priority order k = 3 k = 3 priority for each variable Another heuristic selects which –!highest priority allocated first v register to assign to variable v – rotating registers Ideas for priority functions? ! –!lowest numbered register x w x w v <- 1 Priorities: w <- v + 3 v: 4 w: 3 x <- w + v eax x: 2 u u u <- v edx u: 3 t <- u + x ecx <- w t: 2 <- t Order: Order: t t <- u v, w, u, x, t v, w, u, x, t

School of Computer Science School of Computer Science

6 Steps in register allocation Graph coloring

Once we have the interference graph, we can attempt Simplify register allocation by searching for a K-coloring Build This is an NP-complete problem (K!3)* Coalesce But a linear-time simplification algorithm by Kempe (in 1879) Color tends to work well in practice Potential Spill Spill Select * [1] H. Bodlaender, J. Gustedt, and J. A. Telle, “Linear-time register allocation for a fixed number of registers,” in Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, pp. 574–583, Society for Industrial and Applied Mathematics, 1998. [2] S. Kannan and T. Proebsting, “Register allocation in structured programs,” in Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms, pp. 360–368, Society for Industrial and Applied Mathematics, 1995. [3] M. Thorup, “All structured programs have small tree width and good register allocation,” Inf. Comput., vol. 142, no. 2, pp. 159–181, 1998. School of Computer Science School of Computer Science

Kempe’s algorithm Kempe’s algorithm, cont’d

Basic observation: If all nodes are removed by step #1, then –!given a graph that contains a node with the graph is K-colorable degree less than K, the graph is K-colorable However, the degree

t1 t4 School of Computer Science School of Computer Science

7 Kempe’s algorithm, cont’d Example

In step #1, each removed node should be pushed onto a k = 3 stack v –!when all are removed, we pop each node and put it back into v the graph, assigning a suitable color as we go In case we get stuck (i.e., there are no nodes with x degree

u u

t t Stack School of Computer Science School of Computer Science

Example Another Example Now what? k = 3 k = 3 Be optimistic: v - Put a node with degree ! k on stack v v - Lose guarantee that anything we put on stack is colorable x x w -! If we’re lucky this node will still be x w colorable when popped from stack w Be realistic: t u u -! If unlucky, this node will have to be u spilled (allocated to memory) w eax -! Mark as potential spill to avoid t edx recomputation later t ecx t v Stack School of Computer Science School of Computer Science

8 Select Steps in register allocation

Pop a node from the stack k = 3 Assign it a color that does not v conflict with neighbors in Build interference graph x This will always be possible, x w unless the node is a potential spill u Color

t If it is not possible, mark as u Spill actual spill w

t v

School of Computer Science School of Computer Science

Spilling to Memory Spilling a use RISC Architectures –! Only load and store can access memory For an instruction like •! every use requires load –!t <- (u,v) •! every def requires store If u is marked as an actual spill, transform to •! create new temporary for each location –!u’ := u (i.e., a load instruction) –!t <- (u’,v) CISC Architectures where u’ is a new temp –! can operate on data in memory directly u and u’ are special: •! makes writing compiler easier(?), but isn’t necessarily faster –!u is spilled and thus unallocatable –! pseudo-registers inside memory operands still have to be handled –!u’ is marked as unspillable

School of Computer Science School of Computer Science

9 Spilling a def Spilled (unallocable) temps

For an instruction like Question: Where do the spilled temps get stored? –!t <- (u,v) Answer: On the stack, in stack slots If t is marked as an actual spill, transform to –!t’ <- (u,v) return addr –!t := t’ (i.e., a store instruction) ebp old ebp where t’ is a new temp Each spilled temp … should be allocated into t and t’ are special: a stack slot –!t is spilled and thus unallocatable slot n –!t’ is marked as unspillable … The compiler can slot 1 maintain a counter for esp slot 0 the “next” slot number

To “mark” an actual spill, give it a slot number

School of Computer Science School of Computer Science

Stack slots Spill code generation

In order to create the stack slots at run time, The effect of spill code generation is to turn the prelude code needs to modify %esp long live ranges into a bunch of tiny live ranges _main: _main: This introduces new temps pushl %ebp pushl %ebp movl %esp, %ebp movl %esp, %ebp Hence, register allocation must start over from movl %edi, t2 movl %edi, t2 movl %ebx, t3 movl %ebx, t3 scratch whenever spill code is generated movl %esi, t4 movl %esi, t4 subl $(n"4), %esp

Note that the subl can be generated only after register allocation is finished

School of Computer Science School of Computer Science

10 Spilling What to Spill? v <- 1 Allocate w to memory

w1 <- v + 3 location Mw When choosing potential spill node want: w1 Mw[]<- w1 Spilled variables are allocated to –!A node that makes graph easier to color the stack in an area completely w2 <- Mw[] w2 •! Fewer spills later controlled by the compiler. x <- w2 + v These memory locations are –!A node that isn’t “expensive” to spill v special in that they can be •! An expensive node would slow down the program if spilled u <- v optimized without concern for memory aliasing issues. t <- u + x x –!We can apply heuristics both when choosing potential spill nodes and when choosing actual spill nodes <- x •! not required to spill node that we popped off stack and can’t w3 color w3 <- Mw[] t <- w3 Now Start Over... <- t u ...compute live ranges... <- u

School of Computer Science School of Computer Science

A Spill Heuristic

Pick node (live range) n that minimizes: An alternative to spilling

depth(def ) depth(use ) –!Recompute value of variable instead of store/ #10 + #10 load to memory def "n use "n –!Example: degree(n) v <- 1 v <- 1 w <- v + 3 w <- v + 3 This heuristic prefers nodes that: x <- w + v x <- w + v –! Are used infrequently u <- v u <- v –! Aren’t used inside of loops t <- u + x t <- u + x –! Have a large degree <- w w <- 4 ! <- t <- w Could use any one of several other heuristics as well… <- u <- t <- u School of Computer Science School of Computer Science

11 Build Take Two Simplify->Select

v <- 1 k = 3 w <- v + 3 k = 3 1 v v Mw[]<- w1

w2 <- Mw[] x <- w + v x w w w 2 1 2 3 x w1 w2 w3 u <- v t <- u + x u <- x u

w3 <- Mw[] <- w 3 t t <- t Recalculate interference graph <- u School of Computer Science School of Computer Science

Another Example Simplification steps…

t3 t1 t5 t1 <- () t3 t1 t5 t2 <- () t2 t4 t3 <- (t1,t2) t4 <- (t1,t3) t2 t4 t5 <- (t1,t2) t6 r1 r2 t6 <- (t4,t5) t6 r1 r2

Assume 2 machine registers, {r1,r2}. Assume t4 may not be in r1.

Then we have the interference graph k = 2 School of Computer Science School of Computer Science

12 After some simplification steps… Choosing potential spills…

t6 t6 t3 t1 t5 t5 t3 t1 t5 t5 r2 r2 r1 r1 t2 t4 t2 t4 t3 ps t4 ps

t6 r1 r2 t6 r1 r2

k = 2 k = 2 School of Computer Science School of Computer Science

Completing simplification Selecting colors…

t6 t6 t3 t1 t5 t5 t3 t1 t5 t5 r2 r2 r1 r1 t2 t4 t3 ps t2 t4 t3 ps t4 ps t4 ps t1 t6 r1 r2 t2 t6 r1 r2

k = 2 k = 2 School of Computer Science School of Computer Science

13 Actual spills… Select complete!

t6 t3 t1 t5 t5 t3 t1 t5 r2 r1 t2 t4 t2 t4

t6 r1 r2 t6 r1 r2

School of Computer Science School of Computer Science

Spill code generation… …and start over!

t1 <- ()#() r2t1 <-<- ()() t3 t1 t5 t8 t1 t5 t10 t2 <- ()#() r1t2 <-<- ()() t3t7 <- (t1,t2)#(t1,t2) r1t7 <-<- (r2,r1)(t1,t2) t2 t4 t3 # t2 slot0t3 t4 :=<- t7(t1,t3) t9 := :=t7 r1 t5t8 <-:= #t3(t1,t2) r1t8 :=:= slot0t3 t6t9 <- (t1,#(t4,t5)t8) r1t9 <-<- (r2,r1)(t1,t8) t6 t7 t6 r1 r2 t4 := t9 r1 r2 slot1t4 := :=t9 r1 t5 <- (t1,t2) r1t5 <-<- (r2,r1)(t1,t2) t10 := t4 r2t10 := := slot1 t4 Notice: Live ranges for t3 and t6 <- (t10,t5) r1t6 <-<- (r2,r1)(t10,t5) t4 have been chopped up into lots of small live ranges

School of Computer Science School of Computer Science

14 Steps in register allocation Move Coalescing

Eliminate moves by assigning the src and Simplify dest to the same register Build movl t1,t2 movl %eax,%eax Coalesce addl %edx,%eax addl t3,t2 addl %edx,%eax Color Potential Spill When can we coalesce t1 and t2? Spill Select How can we modify our interference graph to do this?

School of Computer Science School of Computer Science

Example Example

v <- 1 v v <- 1 Want u and v to be v assigned same color... w <- v + 3 w <- v + 3 ...merge u and v x <- w + v v x <- w + v to form a single x w node x uv w u <- v x u <- v t <- u + x w t <- u + x <- w t u <- w u <- t u <- t <- u <- u t t u and v are special: First compute live ranges...... then construct interference graph A move whose source is not live-out of the That is, if the src and dest move is a candidate for coalescing don’t interfere School of Computer Science School of Computer Science

15 Is Coalescing Always Good? When should we coalesce?

y Always y –! If we run into trouble start un-coalescing •! no nodes with degree < k, see if breaking up coalesced nodes fixes –! Can we always coalesce all coalescable moves? Only if we can prove it won’t cause problems u x –! Briggs: Conservative Coalescing u x –! George: Iterated Coalescing uv move edge vs. y u x v v a a When we simplify the graph, b we remove nodes of degree < b k... v a And the winner is? want to make sure we will 2 colorable 3 colorable still be able to simplify coalesced node, uv b School of Computer Science School of Computer Science

Briggs: Conservative Coalescing George: Iterated Coalescing

•!Can coalesce u and v if: Can coalesce u and v if –!(# of neighbors of uv with degree ! k) < k foreach neighbor t of u Resulting node uv will •! t interferes with v, or, doesn’t change degree (after simplification) •!Why? •! degree of t < k removed by simplification have degree equal to degree of v –!Simplify pass removes all nodes with degree < k –!# of remaining nodes < k Why? –!let S be set of neighbors of u with degree < k u –!Thus, uv can be simplified y x –!If no coalescing, simplify removes all nodes in S, call that graph G1 What does Briggs –!If we coalesce we can still remove all nodes in S, call say about that graph G2 k = 3? v a –!G2 is a subgraph of G1 k = 2? b

School of Computer Science School of Computer Science

16 George: Iterated Coalescing Why Two Methods?

S No coalescing, •!Why not? 2 S 3 after •!With Briggs, one needs to look at all neighbors of a & b S1 u simplification 1 •!With George, only need to look at neighbors of a. u S4 G So: x2 v –!Use George if one of a & b has very large degree x2 v x1 –!Use Briggs otherwise x1

After coalescing and k = 4 simplification

x2 uv G2 x1

School of Computer Science School of Computer Science

Optimistic Coalescing Steps in register allocation Aggressively coalesce top-down vs. bottom-up If coalesced node spills, uncoalesce y y Prioritize Simplify

Select Coalesce u x u x Potential Spill uv v a Which one is better? v a Select b b Will this always work? School of Computer Science School of Computer Science

17 Alternative Allocators Alternative Allocators

Graph allocator, as described, has issues Local/Global Allocation gcc’s approach, –! What are they? –!Allocate “local” pseudo-registers unless -fnew-ra Alternative: Single pass graph coloring •! Lifetime contained within basic block –! Build, Simplify, Coalesce as before –! In select, if can’t color with register, color with stack location •! Register sufficiency no longer NP-Complete! •! Keep going –!Allocate global pseudo-registers –! Requires second, reload phase •! Single pass global coloring •! “fixes” spilled variables •! Might require that we reserve a register •! Coloring heuristic may reverse local allocation •! Can get messy –!Reload pass to fix spills (allocator does not generate Claim: Does a pretty good job spill code) –! Why? •! Key is order nodes are colored (top-down) –!Can also do global then local –!Advantages? Disadvantages? Advantages? Disadvantages?

School of Computer Science School of Computer Science

Alternative Allocators In Chaitin’s words

Linear Scan “…since I was a mathematician, the register allocation kept –!Performs single sweep over live ranges getting simpler and faster as I understood better what was –!Each range represented by a single interval required. I preferred to base algorithms on a simple, clean idea that was intellectually understandable rather than write – Greedily allocates/spills ! complicated ad hoc computer code… Second Chance Binpacking So I regard the success of this approach, which has been the basis –!maintains more state for much future work, as a triumph of the power of a simple •! lifetime holes, register-memory consistency mathematical idea over ad hoc hacking. Yes, the real world is –!will split a live range messy and complicated, but one should try to base algorithms –!less greedy; may reevaluate previous allocations on clean, comprehensible mathematical ideas and only complicate them when absolutely necessary. In fact, certain Advantages? Disadvantages? instructions were omitted from the 801 architecture because they would have unduly complicated register allocation…” — G. Chaitin, 2004 School of Computer Science School of Computer Science

18 Avoiding Spills Bottom-up vs Top-down: Speed 15.00%

100.00%

10.00% 90.00%

80.00% 5.00%

70.00%

60.00% 0.00%

50.00% -5.00% 40.00%

30.00% -10.00%

20.00% Percent Functions With No Spills -15.00% Performance improvement of graph allocator

10.00% e ack ortex 179.art 175.vpr 181.mcf 254.gap 0.00% 164.gzip 301.apsi 300.twolf 171.swim 256.bzip2 177.mesa 173.applu 172.mgrid 188.ammp 197.parser 255.v

(8) 68k (16) PPC (32) 183.equak 200.sixtr 168.wupwise top down bottom up SPECint SPECfp 1.8 Ghz Pentium 4; -O3 -funroll-loops -fnew-ra; gcc version 3.2.2 1.8 Ghz Pentium 4; -O3 -funroll-loops; gcc version 3.2.2 School of Computer Science School of Computer Science

Bottom-up vs Top-down: Size

2.00%

0.00%

-2.00%

-4.00%

-6.00% -8.00% Complexity of Register Allocation -10.00%

-12.00%

-14.00%

-16.00%

Percent size improvement of graph allocator -18.00% e y ack aft ortex 179.art 175.vpr 181.mcf 252.eon 254.gap 253.perl 164.gzip 301.apsi 300.twolf 171.swim 256.bzip2 177.mesa 173.applu 186.cr 172.mgrid 188.ammp 197.parser 255.v 183.equak 200.sixtr 168.wupwise SPECint SPECfp x86; -Os; gcc version 3.2.2 School of Computer Science

19 Complexity of Register Allocation Complexity of Register Allocation

Graph color is NP-complete Complexity of local register allocation? –!what does this tell us about register allocation? –!linear algorithm for register sufficiency Given arbitrary graph can construct program SSA Form? with matching interference graph1 –!interference graph is turns out to be both perfect1 –!simply determining if spilling is necessary is and chordal2 therefore NP-complete… or is it? •!can color in linear time Can exploit structure of reducible program2,3,4 –!BUT all bets are off after SSA elimination3

[1] G.J. Chaitin, M. Auslander, A.K. Chandra, J. Cocke, M.E. Hopkins, and P. Markstein. Register allocation via coloring. Computer Languages, 6:47-57, 1981.! [1] Philip Brisk, Foad Dabiri, Jamie Macbeth, and Majid Sarrafzadeh. Polynomial time graph coloring register allocation. In 14th [2] H. Bodlaender, J. Gustedt, and J. A. Telle, “Linear-time register allocation for a fixed number of registers,” in Proceedings of the International Workshop on Logic and Synthesis. ACM Press, 2005.! ninth annual ACM-SIAM symposium on Discrete algorithms, pp. 574–583, Society for Industrial and Applied Mathematics, 1998.! [2] Sebastian Hack. Interference graphs of programs in SSA-form. Technical Report ISSN 1432-7864, Universitat Karlsruhe, 2005.! [3] S. Kannan and T. Proebsting, “Register allocation in structured programs,” in Proceedings of the sixth annual ACM-SIAM [3] Jens Palsberg and Fernando Magno Quintao Pereira Register allocation after classical SSA elimination is NP-complete, In symposium on Discrete algorithms, pp. 360–368, Society for Industrial and Applied Mathematics, 1995.! Proceedings of FOSSACS'06, Foundations of Software Science and Computation Structures. Springer-Verlag (LNCS), Vienna, [4] M. Thorup, “All structured programs have small tree width and good register allocation,” Inf. Comput., vol. 142, no. 2, pp. 159– Austria, March 2006.! 181, 1998.! School of Computer Science School of Computer Science

Complexity of Register Allocation

Complexity of optimizing spill code? –!NP-complete even without control flow1 Complexity of optimal coalescing? –!NP-complete2

[1] Martin Farach and Vincenzo Liberatore. On local register allocation. In 9th ACMSIAM symposium on Discrete Algorithms, pages 564 { 573. ACM Press, 1998.! [2] Andrew W. Appel and Lal George. Optimal spilling for cisc machines with few registers. In Proceedings of the ACM SIGPLAN 2001 conference on design and implementation, pages 243–253. ACM Press, 2001.!

School of Computer Science

20