1 Instruction Selection DAG Tiling Problems Pentium ISA Example Tiling

1 Instruction Selection DAG Tiling Problems Pentium ISA Example Tiling

Instruction Selection CS412/CS413 1. Translate low-level IR code into DAG representation Introduction to Compilers 2. Then find a good tiling of the DAG Tim Teitelbaum - disjoint set of tiles that cover the DAG - Maximal munch algorithm Lecture 32: More Instruction Selection - Dynamic programming algorithm 15 Apr 07 CS 412/413 Spring 2007 Introduction to Compilers 1 CS 412/413 Spring 2007 Introduction to Compilers 2 DAG Tiling Problems • Classes of registers •Goal:find a good covering of DAG with tiles – Registers may have specific purposes – Example: Pentium multiply instruction •Problem:need to know what variables are in registers - multiply register eax by contents of another register - store result in eax (low 32 bits) and edx (high 32 bits) • Assume abstract assembly: - need extra instructions to move values into eax – Machine with infinite number of registers • Two-address machine instructions – Temporary/local variables stored in registers – Three-address low-level code – Parameters/heap variables: use memory accesses – Need multiple machine instructions for a single tile •CISC versus RISC – Complex instruction sets => many possible tiles and tilings – Example: multiple addressing modes (CISC) versus load/store architectures (RISC) CS 412/413 Spring 2007 Introduction to Compilers 3 CS 412/413 Spring 2007 Introduction to Compilers 4 Pentium ISA Example Tiling •Consider: t = t + i •Pentium:two-address CISC architecture t = temporary variable i = parameter • Multiple addressing modes: source operands may be = – Immediate value: imm • Need new temporary registers t + – Register: reg between tiles (unless operand t1 t – Indirect address: [reg], [imm], [reg+imm], node is labeled with temporary) load {i} – Indexed address: [reg+reg’], [reg+imm*reg’], t0 [reg+imm*reg’+imm’] • Resulting code: - mov %ebp, t0 ebp 20 • Destination operands = same, except immediate values sub $20, t0 mov 0(t0), t1 add t1, t CS 412/413 Spring 2007 Introduction to Compilers 5 CS 412/413 Spring 2007 Introduction to Compilers 6 1 Tiles Some Tiles t2 store + mov t1, t2 = mov t1 mov t2, t1 + add $1, t2 t1 t2 $10, 0(t1,t2) 10 1 t1 t2 • Tiles capture compiler’s understanding of instruction set t3 • Each tile: sequence of machine instructions that mov t2, t3 mov t1, %eax t3 match a subgraph of the DAG + add t1, t3 mul t2 * • May need additional move instructions t1 t2 mov %eax, t3 t1 t2 • Tiling = cover the DAG with tiles CS 412/413 Spring 2007 Introduction to Compilers 7 CS 412/413 Spring 2007 Introduction to Compilers 8 Conditional Branches Maximal Munch Algorithm • How to tile a conditional jump? • Maximal Munch = find largest tiles (greedy algorithm) • Fold comparison into tile • Start from top of tree • Find largest tile that matches top node • Tile remaining subtrees recursively tjump tjump store t1 L == L + 4 t1 t2 load * test t1,t1 cmp t1,t2 + 4 load jnz L je L ebp 8 + ebp 12 CS 412/413 Spring 2007 Introduction to Compilers 9 CS 412/413 Spring 2007 Introduction to Compilers 10 DAG Representation Example •DAG:a node may have multiple parents • Algorithm: same, but a node with multiple parents x = x + 1; occurs inside a tile only if all its parents are in the tile store store + + + load * ebp 8 load 1 + 4 load + ebp 8 + ebp 8 ebp 12 CS 412/413 Spring 2007 Introduction to Compilers 11 CS 412/413 Spring 2007 Introduction to Compilers 12 2 Example Alternate (CISC) Tiling x = x + 1; x = x + 1; store store add $1, 8(%ebp) t2 mov 8(%ebp), t1 + + + + t1 mov t1, t2 = ebp 8 load 1 add $1, t2 ebp 8 load 1 + + mov t2, 8(%ebp) + r/m32 const ebp 8 ebp 8 r/m32 CS 412/413 Spring 2007 Introduction to Compilers 13 CS 412/413 Spring 2007 Introduction to Compilers 14 ADD Expression Tiles ADD Statement Tiles = t1 t1 Intel Architecture + mov t2, t1 r/m32 const + + add r/m32, t1 t2 r/m32 add imm32, %eax t2 t3 add imm32, r/m32 = add imm8, r/m32 + t1 add r32, r/m32 r/m32 mov t2, t1 + add r/m32, r32 = add imm32, t1 t2 const + r32 r/m32 CS 412/413 Spring 2007 Introduction to Compilers 15 CS 412/413 Spring 2007 Introduction to Compilers 16 Designing Tiles More Handy Tiles • Only add tiles that are useful to compiler lea instruction computes a memory address • Many instructions will be too hard to use effectively or will offer no advantage t3 • Need tiles for all single-node trees to guarantee lea (t1,t2), t3 + that every tree can be tiled, e.g. t1 t2 t3 t1 mov t2, t1 + + add t3, t1 lea c1(t1,t2,c2), t3 + * t2 t3 t1 c1 c2 t2 CS 412/413 Spring 2007 Introduction to Compilers 17 CS 412/413 Spring 2007 Introduction to Compilers 18 3 Matching Jump for RISC Condition Code ISA • As defined in lecture, have • Pentium: condition encoded in jump instruction tjump(cond, destination) • cmp: compare operands and set flags fjump(cond, destination) • jcc: conditional jump according to flags • Our tjump/fjump translates easily to RISC ISAs that have explicit comparison result set condition codes tjump cmp t1, t2 tjump MIPS < L jl L t1 cmplt t2, t3, t1 t2 < L t3 br t1, L test condition codes t2 t3 CS 412/413 Spring 2007 Introduction to Compilers 19 CS 412/413 Spring 2007 Introduction to Compilers 20 Fixed-register instructions Implementation mul r/m32 • Maximal Munch: start from top node Multiply value in register eax • Find largest tile matching top node and all of the Result: low 32 bits in eax, high 32 bits in edx children nodes jecxz L • Invoke recursively on all children of tile Jump to label L if ecx is zero • Generate code for this tile • Code for children will have been generated already in add r/m32, %eax recursive calls Add to eax • How to find matching tiles? • No fixed registers in low IR except frame pointer • Need extra move instructions CS 412/413 Spring 2007 Introduction to Compilers 21 CS 412/413 Spring 2007 Introduction to Compilers 22 Matching Tiles Tile Specifications abstract class LIR_Stmt { Assembly munch(); • Previous approach simple, efficient, but hard-codes tiles } and their priorities class LIR_Assign extends LIR_Stmt { LIR_Expr src, dst; • Another option: explicitly create data structures Assembly munch() { representing each tile in instruction set if (src instanceof IR_Plus && = ((IR_Plus)src).lhs.equals(dst) && – Tiling performed by a generic tree-matching and + is_regmem32(dst)) { code generation procedure Assembly e = ((LIR_Plus)src).rhs.munch(); – Can generate from instruction set description: return e.append(new AddIns(dst, r/m32 e.target())); code generator generators } – For RISC instruction sets, over-engineering else if ... } } CS 412/413 Spring 2007 Introduction to Compilers 23 CS 412/413 Spring 2007 Introduction to Compilers 24 4 How Good Is It? Improving Instruction Selection • Very rough approximation on modern pipelined • Because it is greedy, Maximal Munch does not architectures: execution time is number of tiles necessarily generate best code – Always selects largest tile, but not necessarily the • Maximal munch finds a locally optimal (two adjacent fastest instruction tiles can never be combined into one) but not – May pull nodes up into tiles inappropriately – it may necessarily globally optimum tiling (least cost of all covers) be better to leave below (use smaller tiles above and larger, or faster tiles below) • Metric used: tile size • Better to use dynamic programming, an optimization technique that uses memoization to assure that subproblems are never solved more than once. CS 412/413 Spring 2007 Introduction to Compilers 25 CS 412/413 Spring 2007 Introduction to Compilers 26 Timing Cost Model Finding globally optimum tiling •Idea:associate cost with each tile (say proportional to •Goal:find minimum total cost tiling of DAG number of cycles to execute) – may not be a good metric on modern architectures • Algorithm: for every node, find minimum total cost tiling • Total execution time is sum of costs of all tiles of that node and subgraph below it Cost = 2 store • Lemma: Given minimum cost tiling of all nodes in + + Cost=1 subgraph, we can find minimum cost tiling of the node by trying out all possible tiles matching the node Total cost: 5 ebp 8 load 1 • Therefore: start from leaves, work upward to top node + ebp 8 Cost = 2 CS 412/413 Spring 2007 Introduction to Compilers 27 CS 412/413 Spring 2007 Introduction to Compilers 28 Dynamic Programming: a[i] Recursive Implementation mov 8(%ebp), t1 • Traverse DAG recursively, and for each node n, record <t,c>, where load mov 12(%ebp), t2 mov (t1,t2,4), t3 – t is the best tile to use for subgraph rooted at n, – c is the total cost of tiling the subgraph rooted at n if t is chosen. + • To compute <t,c> for node n load * – Consider every tile t’ that matches rooted at n, and compute 4 load total cost c’ = cost of tile t’ + sum of the costs of tiling the + subgraphs rooted at the leaves of t’ (which costs can be + computed recursively and memoized) ebp 8 ebp 12 – Store lowest-cost tile t’ and its total cost c’ • To emit code, traverse least-cost tiles recursively and emit code in postorder CS 412/413 Spring 2007 Introduction to Compilers 29 CS 412/413 Spring 2007 Introduction to Compilers 30 5 Memoization Problems with Model class IR_Move extends IR_Stmt { = • Modern processors: IR_Expr src, dst; + – execution time not sum of tile times Assembly best; // initialized to null – instruction order matters int optTileCost() { if (best != null) return best.cost(); r/m32 • Processors pipeline instructions and execute if (src instanceof IR_Plus && different pieces of instructions in parallel ((IR_Plus)src).lhs.equals(dst) && is_regmem32(dst)) { • bad ordering (e.g.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us