Instruction Selection Compiler Backend Intermediate Representations

Instruction Selection 15-745 Optimizing Compilers Spring 2007 1 School of Computer Science School of Computer Science Compiler Backend Intermediate Representations instruction IR IR IR Assem Source Front Middle Back Target selector Code End End End Code Front end - produces an intermediate representation (IR) register TempMap allocator Middle end - transforms the IR into an equivalent IR that runs more efficiently Back end - transforms the IR into native code instruction Assem scheduler IR encodes the compiler’s knowledge of the program Middle end usually consists of several passes 2 3 School of Computer Science School of Computer Science Page ‹#› Types of Intermediate Intermediate Representations Representations Decisions in IR design affect the speed and efficiency of Structural the compiler – Graphically oriented Examples: Some important IR properties – Heavily used in source-to-source translators Trees, DAGs – Ease of generation – Tend to be large – Ease of manipulation Linear – Procedure size – Pseudo-code for an abstract machine Examples: – Freedom of expression – Level of abstraction varies 3 address code Stack machine code – Level of abstraction – Simple, compact data structures The importance of different properties varies between – Easier to rearrange compilers Hybrid Example: – Selecting an appropriate IR for a compiler is critical – Combination of graphs and linear code Control-flow graph 4 5 School of Computer Science School of Computer Science Level of Abstraction Level of Abstraction The level of detail exposed in an IR influences the Structural IRs are usually considered high-level profitability and feasibility of different optimizations. Linear IRs are usually considered low-level Not necessarily true: Two different representations of an array reference: load loadI 1 => r1 Low level AST sub rj, r1 => r2 loadArray A,i,j + subscript loadI 10 => r3 High level linear code mult r2, r3 => r4 @A sub ri, r1 => r5 + A i j add r4, r5 => r6 * - loadI @A => r7 add r7, r6 => r8 High level AST: load r => r - 10 j 1 Good for memory 8 Aij disambiguation Low level linear code: j 1 Good for address calculation 6 7 School of Computer Science School of Computer Science Page ‹#› Abstract Syntax Tree Structural IR Directed Acyclic Graph Structural IR An abstract syntax tree is the procedure’s parse A directed acyclic graph (DAG) is an AST with a unique tree with the nodes for most non-terminal nodes node for each value removed ← ← z - w / x * - x - 2 * y 2 y z ← x - 2 * y x * w ← x / 2 2 y Same expression twice means Makes sharing explicit that the compiler might arrange When is an AST a good IR? to evaluate it just once! Encodes redundancy 8 9 School of Computer Science School of Computer Science Pegasus IR Structural IR Stack Machine Code Linear IR Originally used for stack-based computers, now Java Predicated Explicit Example: push x GAted Simple x - 2 * y becomes push 2 Uniform SSA push y multiply subtract Advantages Structural IR used in – Compact form Implicit names take up CASH (Compiler for – Introduced names are implicit, not explicit no space, where explicit Application Specific – Simple to generate and execute code ones do! Hardware) Useful where code is transmitted over slow communication links (the net ) 10 11 School of Computer Science School of Computer Science Page ‹#› Three Address Code Linear IR Two Address Code Linear IR Three address code has statements of the form: Allows statements of the form x ← y op z x ← x op y With 1 operator (op ) and, at most, 3 names (x, y, & z) Has 1 operator (op ) and, at most, 2 names (x and y) t 2 Example: 1 ← Example: t load y t ← 2 * y 2 ← z ← x - 2 * y becomes t2 ← t2 * t1 z ← x - t z ← x - 2 * y becomes z ← load x z ← z - t Advantages: – Can be very compact 2 – Resembles many machines (RISC) – Destructive operations make reuse hard – Compact form – Good model for machines with destructive ops (x86) 12 13 School of Computer Science School of Computer Science Control-flow Graph Hybrid IR Using Multiple Representations Models the transfer of control in the procedure Source Front IR 1 Middle IR 2 Middle IR 3 Back Target Nodes in the graph are basic blocks Code End End End End Code – Straight-line code – Either linear or structural representation Repeatedly lower the level of the intermediate representation Edges in the graph represent control flow – Each intermediate representation is suited towards certain optimizations Example: the Open64 compiler – WHIRL intermediate format Example if (x = y) Basic blocks — • Consists of 5 different IRs that are progressively more detailed Maximal length – gcc sequences of • but not explicit about it :-( a ← 2 a ← 3 straight-line code b ← 5 b ← 4 c ← a * b 14 15 School of Computer Science School of Computer Science Page ‹#› Instruction Selection Instruction selection example Suppose we have IR -> Assem MOVE(TEMP r, templates MEM(BINOP(TIMES,TEMP s,CONST c))) We can generate the x86 code… instruction IR Assem movl %(,s,c), %r selector …if c = 1, 2, or 4; otherwise… imull $c, %s, %r movl (%r), %r 16 17 School of Computer Science School of Computer Science Selection dependencies Example, cont’d We can see that the selection of instructions can For depend on the constants MEM(BINOP(TIMES,TEMP s,CONST c)) The context of an IR expression can also affect the choice of instruction we might sometimes want to generate Consider testl $0,(,%s,c) MOVE(TEMP r, je L MEM(BINOP(TIMES,TEMP s,CONST c))) What context might cause us to do this? 18 19 School of Computer Science School of Computer Science Page ‹#› Instruction selection as tree matching Sample tree-matching rules In order to take context into account, instruction IR pattern code cost selectors often use pattern-matching on IR trees BINOP(PLUS,i,j) leal (i,j),r 1 BINOP(TIMES,i,j) movl j,r 2 – each pattern specifies what instructions to select imull i,r BINOP(PLUS,i,CONST c) leal c(i),r 1 MEM(BINOP(PLUS,i,CONST c)) movl c(i),r 1 MOVE(MEM(BINOP(PLUS,i,j)),k) movl k,(i,j) 1 BINOP(TIMES,i,CONST c) leal (,i,c),r 1 If c is 1, 2, or 4 BINOP(TIMES,i,CONST c) movl c,r 2 imull i,r MEM(i) movl (i),r 1 MOVE(MEM(i),j) movl j,(i) 1 MOVE(MEM(i),MEM(j)) movl (j),t 2 movl t,(i) 20 21 School of Computer Science … School of Computer Science Tiling an IR tree, v.1 Tiling an IR tree, v.2 a[x] = *y; a[x] = *y; MOVE MOVE r3 MEM MEM MEM MEM leal $a(%ebp),r1 r4 movl $a(%ebp),r1 y movl (r1),r2 y + leal (,x,4),r2 leal (,x,$4),r3 movl (y),r3 + leal (r2,r3),r4 r2 r3 movl r3,(r1,r2) r1 r2 movl (y),r5 MEM movl r5,(r4) * MEM r1 * + x CONST 4 + x CONST 4 (assume a is a EBP CONST a formal parameter BP CONST a passed on the stack) 22 23 School of Computer Science School of Computer Science Page ‹#› Tiling choices The best tiling? In general, for any given tree, many tilings are We want the “lowest cost” tiling possible – usually, the shortest sequence – each resulting in a different instruction sequence – but can also take into account cost/delay of each instruction Optimum tiling We can ensure pattern coverage by covering, at a – lowest-cost tiling minimum, all atomic IR trees Locally Optimal tiling – no two adjacent tiles can be combined into one tile of lower cost 24 25 School of Computer Science School of Computer Science Locally optimal tilings Maximal munch MOVE MEM MEM Locally optimal tiling is easy Choose the largest pattern with lowest y – A simple greedy algorithm works extremely cost, i.e., the + well in practice: “maximal munch” – Maximal munch MEM * • start at root + x CONST 4 • use “biggest” match (in # of nodes) – use cost to break ties BP CONST a IR pattern code cost MOVE(MEM(BINOP(PLUS,i,j)),k) movl k,(i,j) 1 MOVE(MEM(i),j) movl j,(i) 1 MOVE(MEM(i),MEM(j)) movl (j),t 2 26 27 movl t,(i) School of Computer Science School of Computer Science Page ‹#› Maximal munch Maximal munch is not optimum Maximal munch does not necessarily produce the Consider what happens, for example, if two of our optimum selection of instructions rules are changed as follows: But: – it is easy to implement – it tends to work well for current instruction-set IR pattern code cost architectures MEM(BINOP(PLUS,i,CONST c)) movl c,r 3 addl i,r movl (r),r MOVE(MEM(BINOP(PLUS,i,j)),k) movl j,r 3 addl i,r movl k,(r) 28 29 School of Computer Science School of Computer Science Sample tree-matching rules Tiling an IR tree, new rules Rule # IR pattern cost a[x] = *y; 0 TEMP t 0 MOVE r3 1 CONST c 1 MEM MEM 2 BINOP(PLUS,i,j) 1 movl $a,r1 3 BINOP(TIMES,i,j) 2 addl %ebp,r1 y 4 BINOP(PLUS,i,CONST c) 1 movl (r1),r1 + 5 MEM(BINOP(PLUS,i,CONST c)) 3 leal (,x,4),r2 r1 r2 6 MOVE(MEM(BINOP(PLUS,i,j)),k) 3 movl (y),r3 MEM * 7 BINOP(TIMES,i,CONST c) 1 If c is 1, 2, or 4 movl r2,r4 addl r1,r4 8 BINOP(TIMES,i,CONST c) 2 + movl r3,(r4) x CONST 4 9 MEM(i) 1 10 MOVE(MEM(i),j) 1 BP CONST a 11 MOVE(MEM(i),MEM(j)) 2 30 31 School of Computer Science School of Computer Science Page ‹#› Optimum selection Dynamic programming To achieve optimum instruction selection, we must The idea is fairly simple use a more complex algorithm – Working bottom up… – dynamic programming – Given the optimum tilings of all subtrees, generate In contrast to maximal munch, the trees are optimum tiling of the current tree matched bottom-up • consider all tiles for the root of the current tree • sum cost of best subtree tiles and each tile • choose tile with minimum total cost Second pass generates the code using the results from the bottom-up pass 32 33 School of Computer Science School of Computer Science Bottom-up CG, pass 1 Bottom-up CG, pass 2 (10,6),(11,6) (10,6),(11,6) MOVE MOVE (9,4) (9,2) (9,4)

Load more