Where We Are Abstract Assembly Instruction Selection Example X86

10/15/2013 Where we are Intermediate code CS 4120 syntax-directed translation Introduction to Compilers reordering with traces Canonical intermediate code Ross Tate tiling Cornell University dynamic programming Abstract assembly code Lecture 18: Instruction Selection register allocation Assembly code CS 4120 Introduction to Compilers 2 Abstract Assembly Instruction selection • Abstract assembly • Conversion to abstract assembly is = assembly code w/ infinite register set problem of instruction selection for a • Canonical intermediate code = abstract assembly code + expression trees single IR statement node MOVE(e , e ) ⇒ mov e1, e2 1 2 • Full abstract assembly code: glue JUMP(e) ⇒ jmp e translated instructions from each of the CJUMP(e,l) ⇒ cmp e1, e2 statements [jne|je|jgt|…] l CALL(e, e1,…) ⇒ push e1; … ; call e • Problem: more than one way to translate LABEL(l ) ⇒ l: a given statement. How to choose? CS 4120 Introduction to Compilers 3 CS 4120 Introduction to Compilers 4 Example x86-64 ISA • Need to map IR tree to actual machine instructions – need to know how instructions work mov %rbp,t2 MOVE • A two-address CISC architecture (inherited from 4004, 8008, add $4,t2 8086…) • Typical instruction has TEMP(t1) ADD mov (t2),t3 – opcode (mov, add , sub, shl, shr, mul, div, jmp, jcc, ? add t3, t1 push, pop, test, enter, leave) TEMP(t1) MEM – destination (r,n,(r),k(r),(r1,r2), (r1,r2,w),k(r1,r2,w) ) ADD (may also be an operand) add 4(%rbp),t1 – source (any legal destination, or a constant $k) TEMP(fp) 4 opcode src src/dest mov $1,%rax add %rcx,%rbz sub %rbp,%esi add %edi, (%rcx,%edi,16) je label1 jmp 4(%rbp) CS 4120 Introduction to Compilers 5 CS 4120 Introduction to Compilers 6 1 10/15/2013 AT&T vs Intel Tiling • Intel syntax: • Idea: each Pentium instruction performs • opcode dest, src computation for a piece of the IR tree: a tile • Registers rax, rbx, rcx,...r8,r9,...r15 mov %rbp,t2 add $4, t2 • constants k MOVE • memory operands [n], [r+k], [r1+w*r2], ... mov (t2),t3 • AT&T syntax (GNU assembler default): TEMP(t1) ADD t3 add t3, t1 • opcode src, dest TEMP(t1) MEM • Tiles connected by new • %rax, %rbx,… t2 ADD temporary registers (t2, • constants $k t2 t3) that hold result of • memory operands n, k(r), (r1,r2,w), ... TEMP(fp) 4 tile CS 4120 Introduction to Compilers 7 CS 4120 Introduction to Compilers 8 Tiles Some tiles MOVE t2 mov t2,t1 TEMP(t1) t2 t1 + mov t1, t2 add $1, t2 tf 1 mov t1,tf ADD add t ,t • Tiles capture compiler’s understanding of 2 f instruction set t1 t2 • Each tile: sequence of instructions that MOVE update a fresh temporary (may need extra MEM CONST(i) mov $i,(t1,t2) mov’s) and associated IR tree ADD t • All outgoing edges are temporaries t1 2 CS 4120 Introduction to Compilers 9 CS 4120 Introduction to Compilers 10 Designing tiles More handy tiles lea instruction computes a memory address but doesn’t • Only add tiles that are useful to compiler actually load from memory • Many instructions will be too hard to use tf effectively or will offer no advantage • Need tiles for all single-node trees to guarantee + lea (t1,t2), tf that every tree can be tiled, e.g. t1 t2 t1 tf mov t1, t2 + add t1, t3 + (k one of lea (t ,t ,k ), t 1 * 1 2 1 f 2,4,8,16) t2 t3 t1 CONST(k ) 1 t2 CS 4120 Introduction to Compilers 11 CS 4120 Introduction to Compilers 12 2 10/15/2013 Matching CJUMP for RISC Condition code ISA • As defined in lecture, have • Appel’s CJUMP corresponds more directly CJUMP(cond, destination) to Pentium conditional jumps • Appel: CJUMP(op, e1, e2, destination) set condition codes where op is one of ==, !=, <, <=, =>, > CJUMP cmp t1, t2 • Our CJUMP translates easily to RISC ISAs < NAME(n) jl n that have explicit comparison result t1 t2 MIPS test condition codes CJUMP t1 cmplt t2, t3, t1 < • However, can handle Pentium-style jumps NAME(n) br t1, n t2 t3 with lecture IR with appropriate tiles CS 4120 Introduction to Compilers 13 CS 4120 Introduction to Compilers 14 Branches An annoying instruction • How to tile a conditional jump? • Pentium mul instruction multiples single operand by eax, puts result in eax (low 32 • Fold comparison operator into tile bits), edx (high 32 bits) • Solution: add extra mov instructions, let CJUMP test t1 jnz l1 register allocation deal with edx overwrite t 1 l1 (l2) tf mov t1, %eax CJUMP MUL cmp t1, t2 mul t2 je l1 t EQ l1 (l2) 1 t2 mov %eax, tf t1 t2 CS 4120 Introduction to Compilers 15 CS 4120 Introduction to Compilers 16 Tiling Problem Example • How to pick tiles that cover IR statement tree with minimum execution time? x = x + 1; • Need a good selection of tiles ebp: Pentium frame-pointer register • small tiles to make sure we can tile every tree MOVE t2 • large tiles for efficiency MEM t1 + mov (%ebp,x), t1 • Usually want to pick large tiles: fewer instructions mov t1, t2 + MEM 1 • instructions ≠ cycles: RISC core instructions take add $1, t2 FP x 1 cycle, other instructions may take more + mov t2, (%ebp,x) add %rax,4(%rcx) ⇔ mov 4(%rcx),%rdx FP x add %rdx,%rax mov %rax,4(%rcx) CS 4120 Introduction to Compilers 17 CS 4120 Introduction to Compilers 18 3 10/15/2013 Alternate (non-RISC) tiling Greedy tiling • Assume larger tiles = better x = x + 1; • Greedy algorithm: start from top of tree and use largest tile that matches tree MOVE • Tile remaining subtrees recursively MEM + add $1, (ebp,x) MOVE + MEM 1 MEM 4 FP x + MOVE ADD FP x + MEM MUL MEM r/m32 ADD 4 CONST(k) ADD r/m32 FP 8 FP 12 CS 4120 Introduction to Compilers 19 CS 4120 Introduction to Compilers 20 Improving instruction selection Timing model • Greedy tiling may not generate best code • Idea: associate cost with each tile (proportional to # cycles to execute) • Always selects largest tile, not necessarily • caveat: cost is fictional on modern architectures fastest instruction • Estimate of total execution time is sum of costs of all tiles • May pull nodes up into tiles when better to leave below MOVE 2 • Can do better using dynamic programming MEM + 1 algorithm + MEM 1 FP x Total cost: 5 + FP x 2 CS 4120 Introduction to Compilers 21 CS 4120 Introduction to Compilers 22 Finding optimum tiling Recursive implementation • Goal: find minimum total cost tiling of tree • Any dynamic-programming algorithm • Algorithm: for every node, find minimum equivalent to a memoized version of total-cost tiling of that node and sub-tree. same algorithm that runs top-down • Lemma: once minimum-cost tiling of all • For each node, record best tile for node children of a node is known, can find • Start at top, recurse: minimum-cost tiling of the node by trying out • First, check in table for best tile for this node all possible tiles matching the node • If not computed, try each matching tile to see which one has lowest cost • Therefore: start from leaves, work • Store lowest-cost tile in table and return upward to top node • Finally, use entries in table to emit code CS 4120 Introduction to Compilers 23 CS 4120 Introduction to Compilers 24 4 10/15/2013 Problems with model Finding matching tiles • Modern processors: • Explicitly building every tile: tedious • execution time not sum of tile times • Easier to write subroutines for matching • instruction order matters Pentium source, destination operands • Processors are pipelining instructions and executing • Reuse matcher for all opcodes different pieces of instructions in parallel • bad ordering (e.g. too many memory operations in sequence) stalls processor pipeline • processor can execute some instructions in parallel (super-scalar) • cost is merely an approximation • instruction scheduling needed CS 4120 Introduction to Compilers 25 CS 4120 Introduction to Compilers 26 Matching tiles Tile Specifications abstract class IR_Stmt { • Previous approach simple, efficient, but Assembly munch(); } hard-codes tiles and their priorities class IR_Move extends IR_Stmt { • Another option: explicitly create data IR_Expr src, dst; Assembly munch() { structures representing each tile in if (src instanceof IR_Plus && instruction set MOVE ((IR_Plus)src).lhs.equals(dst) && • Tiling performed by a generic tree-matching + is_regmem32(dst) { Assembly e = (IR_Plus)src).rhs.munch(); and code generation procedure return e.append(new AddIns(dst, e.target())); • Can generate from instruction set description r/m32 } – generic back end! else if ... } • For RISC instruction sets, over-engineering } CS 4120 Introduction to Compilers 27 CS 4120 Introduction to Compilers 28 Summary • Can specify code-generation process as a set of tiles that relate IR trees to instruction sequences • Instructions using fixed registers problematic but can be handled using extra temporaries • Greedy algorithm implemented simply as recursive traversal • Dynamic-programming algorithm generates better code, can also be implemented recursively using memoization • Real optimization will require instruction scheduling CS 4120 Introduction to Compilers 29 5 .

Where We Are Abstract Assembly Instruction Selection Example X86

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support