Register Allocation Back End Structure After Instruction Selection

Total Page:16

File Type:pdf, Size:1020Kb

Register Allocation Back End Structure After Instruction Selection Register Allocation 15-745 Optimizing Compilers Spring 2008 School of Computer Science School of Computer Science Back end structure After Instruction Selection define i32 @erk(i32 %a, i32 %b) {! entry:! ! %tmp3 = mul i32 %a, 37!! ; <i32> [#uses=1]! ! %tmp4 = mul i32 %tmp3, %a!! ; <i32> [#uses=1]! instruction ! %tmp6 = add i32 %b, 33!! ; <i32> [#uses=1]! Assem ! %tmp7 = sdiv i32 %tmp4, %tmp6!! ; <i32> [#uses=1]! IR selector ! %tmp9.neg = xor i32 %a, -1!! ; <i32> [#uses=1]! ! %tmp10 = add i32 %tmp7, %tmp9.neg!! ; <i32> [#uses=1]! ! ret i32 %tmp10! }! register entry: 0x803ce0, LLVM BB @0x801a90, ID#0:! allocator TempMap ! %reg1024 = MOV32ri 33! ! %reg1025 = ADD32rm %reg1024, <fi#-2>, 1, %NOREG, 0! ! %reg1026 = MOV32rm <fi#-1>, 1, %NOREG, 0! ! %reg1027 = IMUL32rr %reg1026, %reg1026! ! %reg1028 = IMUL32rri8 %reg1027, 37! ! %EAX = MOV32rr %reg1028! instruction ! %EAX = CDQ %EDX<imp-def>, %EAX<imp-use>! Assem ! IDIV32r %reg1025, %EAX<imp-def>, %EDX<imp-def>, %EAX<imp-use>, %EDX<imp-use>! scheduler ! %reg1029 = MOV32rr %EAX! ! %reg1030 = NOT32r %reg1026! ! %reg1031 = ADD32rr %reg1029, %reg1030! ! %EAX = MOV32rr %reg1031! ! RET! School of Computer Science School of Computer Science 1 Abstract View Register allocator’s job …! … …Written Read t7 <- 2! movl $2, t7 t7 <- () The register allocator’s t11 <- t7! movl t7, t11 t11 <- (t7) t11 <- (t7, t11)! job is to assign each t10 <- t11! imull t7, t11 t11 <- (t7, t11) temp to a machine t10 <- t10! movl t11, t10 t10 <- (t11) t8 <- t10! imull $37, t10 t10 <- (t10) register. t17 <- t7! movl t10, t8 t8 <- (t10) t17 <- t17! movl t7, t17 t17 <- (t7) %eax <- ! addl $1, t17 t17 <- (t17) %edx <- (%eax,%edx)! movl $33, %eax %eax <- () If that fails, the register %eax,%edx <- t17! cltd %edx,%eax <- (%eax) allocator must rewrite …! idivl t17 %eax,%edx <- (t17,%eax,%edx) the code so that it can … … … t7 : %ebx succeed. t8 : %ecx t10 : %eax The t11 : %eax Abstract view, for register allocation purposes t17 : %esi TempMap … School of Computer Science School of Computer Science Some terminology A graph-coloring problem Two temps interfere if at some point in the program they cannot both occupy the same Interference graph: an undirected graph where •!nodes = temps register. •!there is an edge between two nodes if their v corresponding temps interfere v <- 1 x w Which temps interfere? v <- 1 w <- v + 3 w <- v + 3 x <- w + v x <- w + v u <- v u <- v t <- u + x u t <- u + x <- w <- w <- t <- t <- u <- u t School of Computer Science School of Computer Science 2 A graph-coloring problem History For early architectures, register allocation was not very A graph is k-colorable if every node in the graph can important be colored with one of k colors such that two adjacent Early work by Cocke (in 1971) proposed the idea that nodes do not have the same color v register allocation can be viewed as a graph coloring problem Assigning k registers = Coloring with k colors Chaitin was the first to implement this idea for the IBM 370 PL/1 compiler, in 1981 x w v <- 1 In 1982, at IBM, Chaitin’s allocator was used for the PL.8 w <- v + 3 compiler for the IBM 801 RISC system x <- w + v eax u <- v Today, register allocation is the most essential of code edx u optimizations t <- u + x ecx <- w <- t <- u t School of Computer Science School of Computer Science History, cont’d Steps in register allocation Motivated by the first MIPS architecture, Chow and Hennessy developed priority-based graph coloring in 1984 “top down” coloring Build Another popular algorithm for register Color allocation based on graph coloring is due to Briggs in 1992 Spill “bottom up” coloring School of Computer Science School of Computer Science 3 Building the interference graph Edges of Interference Graph Given liveness information, we can build the Intuitively: interference graph (IG) Two variables interfere if they overlap at some point in the program. Algorithm: At each point in program, enter an edge for every pair of variables at that point How? v v <- 1 {} w <- v + 3 {v} An optimized definition & algorithm for edges: x <- w + v {w,v} x w u <- v {w,x,v} "" For each defining inst i Let x be definition at inst i t <- u + x {w,u,x} For each variable y live at end of inst i <- x {x,w,u,t} u insert an edge between x and y <- w {w,u,t} Faster? <- t {u,t} {D} A = 2 (A ) Edge between A and D <- u {u} Better quality? 2 {A,D} t School of Computer Science School of Computer Science Building the interference graph A Better Interference Graph for each defining inst i! let x be temp defined at inst i" x = 0;! What does the interference for all y in LIVE-IN of succ(i)! for(i = 0; i < 10; i++)! graph look like? insert an edge between x and y! {! x += i;! v What’s the minimum }! number of registers v <- 1 {} needed? w <- v + 3 {v} x w y = global;! x <- w + v {w,v} y *= x;! u <- v {w,x,v} t <- u + x {w,u,x} <- x {x,w,u,t} u for(i = 0; i < 10; i++)! <- w {w,u,t} {! <- t {u,t} y += i;! <- u {u} t }! School of Computer Science School of Computer Science 4 Live Ranges & Merged Live Ranges Example Live Variables A live range consists of a definition and all Reaching Definitions {} {} A = ... (A1) {A} {A } IF A goto L1 1 the points in a program (e.g. end of an {A} {A1} instruction) in which that definition is live. {A} {A1} B = ... (B ) 1 L1: {A} {A } {A,B} {A1,B1} = A 1 C = ... (C1) –!How to compute a live range? {B} {A1,B1} {A,C} {A1,C1} D = B (D2) = A {D} {A1,B1,D2} {C} {A1,C1} D = ... (D1) a = … a = … {D} {A1,C1,D1} {D} {A ,B ,C ,D ,D } A = 2 (A ) 1 1 1 1 2 Merge … = a 2 {A,D} {A ,B ,C ,D ,D } 2 1 1 1 2 Two overlapping live ranges for the same {A,D} {A2,B1,C1,D1,D2} {D} {A ,B ,C ,D ,D } = A variable must be merged 2 1 1 1 2 ret D A live range consists of a definition and all the points in a program in which that definition is live. School of Computer Science School of Computer Science Merging Live Ranges Example: Merged Live Ranges Merging definitions into equivalence classes: {} A = ... (A1) {A } –! Start by putting each definition in a different equivalence class IF A goto L1 1 {A1} –! For each point in a program •! if variable is live, {A1} B = ... (B ) 1 L1: {A } and there are multiple reaching definitions for the variable {A1,B1} = A 1 C = ... (C1) {B1} {A1,C1} •! merge the equivalence classes of all such definitions into a one D = B (D2) = A {D1,2} {C1} equivalence class D = ... (D1) {D1,2} {D1,2} A = 2 (A2) {A2,D1,2} {A ,D } 2 1,2 = A {D1,2} Merged live ranges are also known as “webs” A has two “webs” ret D makes register allocation easier School of Computer Science School of Computer Science 5 Steps in register allocation Steps in register allocation Build Build Prioritize Color Color Select Spill Spill School of Computer Science School of Computer Science Priority Coloring Priority Coloring Heuristic priority function computes Allocate in priority order k = 3 k = 3 priority for each variable Another heuristic selects which –!highest priority allocated first v register to assign to variable v – rotating registers Ideas for priority functions? ! –!lowest numbered register x w x w v <- 1 Priorities: w <- v + 3 v: 4 w: 3 x <- w + v eax x: 2 u u u <- v edx u: 3 t <- u + x ecx <- w t: 2 <- t Order: Order: t t <- u v, w, u, x, t v, w, u, x, t School of Computer Science School of Computer Science 6 Steps in register allocation Graph coloring Once we have the interference graph, we can attempt Simplify register allocation by searching for a K-coloring Build This is an NP-complete problem (K!3)* Coalesce But a linear-time simplification algorithm by Kempe (in 1879) Color tends to work well in practice Potential Spill Spill Select * [1] H. Bodlaender, J. Gustedt, and J. A. Telle, “Linear-time register allocation for a fixed number of registers,” in Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, pp. 574–583, Society for Industrial and Applied Mathematics, 1998. [2] S. Kannan and T. Proebsting, “Register allocation in structured programs,” in Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms, pp. 360–368, Society for Industrial and Applied Mathematics, 1995. [3] M. Thorup, “All structured programs have small tree width and good register allocation,” Inf. Comput., vol. 142, no. 2, pp. 159–181, 1998. School of Computer Science School of Computer Science Kempe’s algorithm Kempe’s algorithm, cont’d Basic observation: If all nodes are removed by step #1, then –!given a graph that contains a node with the graph is K-colorable degree less than K, the graph is K-colorable However, the degree<K rule does not iff the graph without that node is K-colorable always work, for example: –!this is called the “degree<K” rule So, step #1 of Kempe’s algorithm: –!iteratively remove nodes with degree<K t2 t3 This graph is 3-colorable, t5 but the degree<3 rule doesn’t work t1 t4 School of Computer Science School of Computer Science 7 Kempe’s algorithm, cont’d Example In step #1, each removed node should be pushed onto a k = 3 stack v –!when all are removed, we pop each node and put it back into v the graph, assigning a suitable color as we go In case we get stuck (i.e., there are no nodes with x degree<K), we apply step #2: x w –!choose a node with degree!K and optimistically remove it, and w then continue u u t t Stack School of Computer Science School of Computer Science Example Another Example Now what? k = 3 k = 3 Be optimistic: v - Put a node with degree ! k on stack v v - Lose guarantee that anything we put on stack is colorable x x w -! If we’re lucky this node will still be x w colorable when popped from stack w Be realistic: t u u
Recommended publications
  • Resourceable, Retargetable, Modular Instruction Selection Using a Machine-Independent, Type-Based Tiling of Low-Level Intermediate Code
    Reprinted from Proceedings of the 2011 ACM Symposium on Principles of Programming Languages (POPL’11) Resourceable, Retargetable, Modular Instruction Selection Using a Machine-Independent, Type-Based Tiling of Low-Level Intermediate Code Norman Ramsey Joao˜ Dias Department of Computer Science, Tufts University Department of Computer Science, Tufts University [email protected] [email protected] Abstract Our compiler infrastructure is built on C--, an abstraction that encapsulates an optimizing code generator so it can be reused We present a novel variation on the standard technique of selecting with multiple source languages and multiple target machines (Pey- instructions by tiling an intermediate-code tree. Typical compilers ton Jones, Ramsey, and Reig 1999; Ramsey and Peyton Jones use a different set of tiles for every target machine. By analyzing a 2000). C-- accommodates multiple source languages by providing formal model of machine-level computation, we have developed a two main interfaces: the C-- language is a machine-independent, single set of tiles that is machine-independent while retaining the language-independent target language for front ends; the C-- run- expressive power of machine code. Using this tileset, we reduce the time interface is an API which gives the run-time system access to number of tilers required from one per machine to one per archi- the states of suspended computations. tectural family (e.g., register architecture or stack architecture). Be- cause the tiler is the part of the instruction selector that is most dif- C-- is not a universal intermediate language (Conway 1958) or ficult to reason about, our technique makes it possible to retarget an a “write-once, run-anywhere” intermediate language encapsulating instruction selector with significantly less effort than standard tech- a rigidly defined compiler and run-time system (Lindholm and niques.
    [Show full text]
  • Attacking Client-Side JIT Compilers.Key
    Attacking Client-Side JIT Compilers Samuel Groß (@5aelo) !1 A JavaScript Engine Parser JIT Compiler Interpreter Runtime Garbage Collector !2 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !3 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !4 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !5 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !6 Agenda 1. Background: Runtime Parser • Object representation and Builtins JIT Compiler 2. JIT Compiler Internals • Problem: missing type information • Solution: "speculative" JIT Interpreter 3.
    [Show full text]
  • 1 More Register Allocation Interference Graph Allocators
    More Register Allocation Last time – Register allocation – Global allocation via graph coloring Today – More register allocation – Clarifications from last time – Finish improvements on basic graph coloring concept – Procedure calls – Interprocedural CS553 Lecture Register Allocation II 2 Interference Graph Allocators Chaitin Briggs CS553 Lecture Register Allocation II 3 1 Coalescing Move instructions – Code generation can produce unnecessary move instructions mov t1, t2 – If we can assign t1 and t2 to the same register, we can eliminate the move Idea – If t1 and t2 are not connected in the interference graph, coalesce them into a single variable Problem – Coalescing can increase the number of edges and make a graph uncolorable – Limit coalescing coalesce to avoid uncolorable t1 t2 t12 graphs CS553 Lecture Register Allocation II 4 Coalescing Logistics Rule – If the virtual registers s1 and s2 do not interfere and there is a copy statement s1 = s2 then s1 and s2 can be coalesced – Steps – SSA – Find webs – Virtual registers – Interference graph – Coalesce CS553 Lecture Register Allocation II 5 2 Example (Apply Chaitin algorithm) Attempt to 3-color this graph ( , , ) Stack: Weighted order: a1 b d e c a1 b a e c 2 a2 b a1 c e d a2 d The example indicates that nodes are visited in increasing weight order. Chaitin and Briggs visit nodes in an arbitrary order. CS553 Lecture Register Allocation II 6 Example (Apply Briggs Algorithm) Attempt to 2-color this graph ( , ) Stack: Weighted order: a1 b d e c a1 b* a e c 2 a2* b a1* c e* d a2 d * blocked node CS553 Lecture Register Allocation II 7 3 Spilling (Original CFG and Interference Graph) a1 := ..
    [Show full text]
  • Feasibility of Optimizations Requiring Bounded Treewidth in a Data Flow Centric Intermediate Representation
    Feasibility of Optimizations Requiring Bounded Treewidth in a Data Flow Centric Intermediate Representation Sigve Sjømæling Nordgaard and Jan Christian Meyer Department of Computer Science, NTNU Trondheim, Norway Abstract Data flow analyses are instrumental to effective compiler optimizations, and are typically implemented by extracting implicit data flow information from traversals of a control flow graph intermediate representation. The Regionalized Value State Dependence Graph is an alternative intermediate representation, which represents a program in terms of its data flow dependencies, leaving control flow implicit. Several analyses that enable compiler optimizations reduce to NP-Complete graph problems in general, but admit linear time solutions if the graph’s treewidth is limited. In this paper, we investigate the treewidth of application benchmarks and synthetic programs, in order to identify program features which cause the treewidth of its data flow graph to increase, and assess how they may appear in practical software. We find that increasing numbers of live variables cause unbounded growth in data flow graph treewidth, but this can ordinarily be remedied by modular program design, and monolithic programs that exceed a given bound can be efficiently detected using an approximate treewidth heuristic. 1 Introduction Effective program optimizations depend on an intermediate representation that permits a compiler to analyze the data and control flow semantics of the source program, so as to enable transformations without altering the result. The information from analyses such as live variables, available expressions, and reaching definitions pertains to data flow. The classical method to obtain it is to iteratively traverse a control flow graph, where program execution paths are explicitly encoded and data movement is implicit.
    [Show full text]
  • Iterative-Free Program Analysis
    Iterative-Free Program Analysis Mizuhito Ogawa†∗ Zhenjiang Hu‡∗ Isao Sasano† [email protected] [email protected] [email protected] †Japan Advanced Institute of Science and Technology, ‡The University of Tokyo, and ∗Japan Science and Technology Corporation, PRESTO Abstract 1 Introduction Program analysis is the heart of modern compilers. Most control Program analysis is the heart of modern compilers. Most control flow analyses are reduced to the problem of finding a fixed point flow analyses are reduced to the problem of finding a fixed point in a in a certain transition system, and such fixed point is commonly certain transition system. Ordinary method to compute a fixed point computed through an iterative procedure that repeats tracing until is an iterative procedure that repeats tracing until convergence. convergence. Our starting observation is that most programs (without spaghetti This paper proposes a new method to analyze programs through re- GOTO) have quite well-structured control flow graphs. This fact cursive graph traversals instead of iterative procedures, based on is formally characterized in terms of tree width of a graph [25]. the fact that most programs (without spaghetti GOTO) have well- Thorup showed that control flow graphs of GOTO-free C programs structured control flow graphs, graphs with bounded tree width. have tree width at most 6 [33], and recent empirical study shows Our main techniques are; an algebraic construction of a control that control flow graphs of most Java programs have tree width at flow graph, called SP Term, which enables control flow analysis most 3 (though in general it can be arbitrary large) [16].
    [Show full text]
  • CS415 Compilers Register Allocation and Introduction to Instruction Scheduling
    CS415 Compilers Register Allocation and Introduction to Instruction Scheduling These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Announcement • First homework out → Due tomorrow 11:59pm EST. → Sakai submission window open. → pdf format only. • Late policy → 20% penalty for first 24 hours late. → Not accepted 24 hours after deadline. If you have personal overriding reasons that prevents you from submitting homework online, please let me know in advance. cs415, spring 16 2 Lecture 3 Review: ILOC Register allocation on basic blocks in “ILOC” (local register allocation) • Pseudo-code for a simple, abstracted RISC machine → generated by the instruction selection process • Simple, compact data structures • Here: we only use a small subset of ILOC → ILOC simulator at ~zhang/cs415_2016/ILOC_Simulator on ilab cluster Naïve Representation: Quadruples: loadI 2 r1 • table of k x 4 small integers loadAI r0 @y r2 • simple record structure add r1 r2 r3 • easy to reorder loadAI r0 @x r4 • all names are explicit sub r4 r3 r5 ILOC is described in Appendix A of EAC cs415, spring 16 3 Lecture 3 Review: F – Set of Feasible Registers Allocator may need to reserve registers to ensure feasibility • Must be able to compute addresses Notation: • Requires some minimal set of registers, F k is the number of → F depends on target architecture registers on the • F contains registers to make spilling work target machine (set F registers “aside”, i.e., not available for register assignment) cs415, spring 16 4 Lecture 3 Live Range of a Value (Virtual Register) A value is live between its definition and its uses • Find definitions (x ← …) and uses (y ← … x ...) • From definition to last use is its live range → How does a second definition affect this? • Can represent live range as an interval [i,j] (in block) → live on exit Let MAXLIVE be the maximum, over each instruction i in the block, of the number of values (pseudo-registers) live at i.
    [Show full text]
  • Register Allocation by Puzzle Solving
    University of California Los Angeles Register Allocation by Puzzle Solving A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Fernando Magno Quint˜aoPereira 2008 c Copyright by Fernando Magno Quint˜aoPereira 2008 The dissertation of Fernando Magno Quint˜ao Pereira is approved. Vivek Sarkar Todd Millstein Sheila Greibach Bruce Rothschild Jens Palsberg, Committee Chair University of California, Los Angeles 2008 ii to the Brazilian People iii Table of Contents 1 Introduction ................................ 1 2 Background ................................ 5 2.1 Introduction . 5 2.1.1 Irregular Architectures . 8 2.1.2 Pre-coloring . 8 2.1.3 Aliasing . 9 2.1.4 Some Register Allocation Jargon . 10 2.2 Different Register Allocation Approaches . 12 2.2.1 Register Allocation via Graph Coloring . 12 2.2.2 Linear Scan Register Allocation . 18 2.2.3 Register allocation via Integer Linear Programming . 21 2.2.4 Register allocation via Partitioned Quadratic Programming 22 2.2.5 Register allocation via Multi-Flow of Commodities . 24 2.3 SSA based register allocation . 25 2.3.1 The Advantages of SSA-Based Register Allocation . 29 2.4 NP-Completeness Results . 33 3 Puzzle Solving .............................. 37 3.1 Introduction . 37 3.2 Puzzles . 39 3.2.1 From Register File to Puzzle Board . 40 iv 3.2.2 From Program Variables to Puzzle Pieces . 42 3.2.3 Register Allocation and Puzzle Solving are Equivalent . 45 3.3 Solving Type-1 Puzzles . 46 3.3.1 A Visual Language of Puzzle Solving Programs . 46 3.3.2 Our Puzzle Solving Program .
    [Show full text]
  • MAO - an Extensible Micro-Architectural Optimizer
    MAO - an Extensible Micro-Architectural Optimizer Robert Hundt, Easwaran Raman, Martin Thuresson, Neil Vachharajani Google 1600 Amphitheatre Parkway Mountain View, CA, 94043 frhundt, eraman, [email protected], [email protected] .L3 movsbl 1(%rdi,%r8,4),%edx Abstract—Performance matters, and so does repeatability and movsbl (%rdi,%r8,4),%eax predictability. Today’s processors’ micro-architectures have be- # ... 6 instructions come so complex as to now contain many undocumented, not movl %edx, (%rsi,%r8,4) understood, and even puzzling performance cliffs. Small changes addq $1, %r8 in the instruction stream, such as the insertion of a single NOP instruction, can lead to significant performance deltas, with the nop # this instruction speeds up effect of exposing compiler and performance optimization efforts # the loop by 5% to perceived unwanted randomness. .L5: movsbl 1(%rdi,%r8,4),%edx This paper presents MAO, an extensible micro-architectural movsbl (%rdi,%r8,4),%eax assembly to assembly optimizer, which seeks to address this # ... identical code sequence problem for x86/64 processors. In essence, MAO is a thin wrapper movl %edx, (%rsi,%r8,4) around a common open source assembler infrastructure. It offers addq $1, %r8 basic operations, such as creation or modification of instructions, cmpl %r8d, %r9d simple data-flow analysis, and advanced infra-structure, such jg .L3 as loop recognition, and a repeated relaxation algorithm to compute instruction addresses and lengths. This infrastructure Fig. 1. Code snippet with high impact NOP instruction enables a plethora of passes for pattern matching, alignment specific optimizations, peep-holes, experiments (such as random insertion of NOPs), and fast prototyping of more sophisticated optimizations.
    [Show full text]
  • A Little on V8 and Webassembly
    A Little on V8 and WebAssembly An V8 Engine Perspective Ben L. Titzer WebAssembly Runtime TLM Background ● A bit about me ● A bit about V8 and JavaScript ● A bit about virtual machines Some history ● JavaScript ⇒ asm.js (2013) ● asm.js ⇒ wasm prototypes (2014-2015) ● prototypes ⇒ production (2015-2017) This talk mostly ● production ⇒ maturity (2017- ) ● maturity ⇒ future (2019- ) WebAssembly in a nutshell ● Low-level bytecode designed to be fast to verify and compile ○ Explicit non-goal: fast to interpret ● Static types, argument counts, direct/indirect calls, no overloaded operations ● Unit of code is a module ○ Globals, data initialization, functions ○ Imports, exports WebAssembly module example header: 8 magic bytes types: TypeDecl[] ● Binary format imports: ImportDecl[] ● Type declarations funcdecl: FuncDecl[] ● Imports: tables: TableDecl[] ○ Types memories: MemoryDecl[] ○ Functions globals: GlobalVar[] ○ Globals exports: ExportDecl[] ○ Memory ○ Tables code: FunctionBody[] ● Tables, memories data: Data[] ● Global variables ● Exports ● Function bodies (bytecode) WebAssembly bytecode example func: (i32, i32)->i32 get_local[0] ● Typed if[i32] ● Stack machine get_local[0] ● Structured control flow i32.load_mem[8] ● One large flat memory else ● Low-level memory operations get_local[1] ● Low-level arithmetic i32.load_mem[12] end i32.const[42] i32.add end Anatomy of a Wasm engine ● Load and validate wasm bytecode ● Allocate internal data structures ● Execute: compile or interpret wasm bytecode ● JavaScript API integration ● Memory management
    [Show full text]
  • Lecture Notes on Instruction Selection
    Lecture Notes on Instruction Selection 15-411: Compiler Design Frank Pfenning Lecture 2 August 28, 2014 1 Introduction In this lecture we discuss the process of instruction selection, which typcially turns some form of intermediate code into a pseudo-assembly language in which we assume to have infinitely many registers called “temps”. We next apply register allocation to the result to assign machine registers and stack slots to the temps be- fore emitting the actual assembly code. Additional material regarding instruction selection can be found in the textbook [App98, Chapter 9]. 2 A Simple Source Language We use a very simple source language where a program is just a sequence of assign- ments terminated by a return statement. The right-hand side of each assignment is a simple arithmetic expression. Later in the course we describe how the input text is parsed and translated into some intermediate form. Here we assume we have arrived at an intermediate representation where expressions are still in the form of trees and we have to generate instructions in pseudo-assembly. We call this form IR Trees (for “Intermediate Representation Trees”). We describe the possible IR trees in a kind of pseudo-grammar, which should not be read as a description of the concrete syntax, but the recursive structure of the data. LECTURE NOTES AUGUST 28, 2014 Instruction Selection L2.2 Programs ~s ::= s1; : : : ; sn sequence of statements Statements s ::= x = e assignment j return e return, always last Expressions e ::= c integer constant j x variable j e1 ⊕ e2 binary operation Binops ⊕ ::= + j − j ∗ j = j ::: 3 Abstract Assembly Target Code For our very simple source, we use an equally simple target.
    [Show full text]
  • Lecture Notes on Register Allocation
    Lecture Notes on Register Allocation 15-411: Compiler Design Frank Pfenning, Andre´ Platzer Lecture 3 September 2, 2014 1 Introduction In this lecture we discuss register allocation, which is one of the last steps in a com- piler before code emission. Its task is to map the potentially unbounded numbers of variables or “temps” in pseudo-assembly to the actually available registers on the target machine. If not enough registers are available, some values must be saved to and restored from the stack, which is much less efficient than operating directly on registers. Register allocation is therefore of crucial importance in a compiler and has been the subject of much research. Register allocation is also covered thor- ougly in the textbook [App98, Chapter 11], but the algorithms described there are complicated and difficult to implement. We present here a simpler algorithm for register allocation based on chordal graph coloring due to Hack [Hac07] and Pereira and Palsberg [PP05]. Pereira and Palsberg have demonstrated that this algorithm performs well on typical programs even when the interference graph is not chordal. The fact that we target the x86-64 family of processors also helps, because it has 16 general registers so register allocation is less “crowded” than for the x86 with only 8 registers (ignoring floating-point and other special purpose registers). Most of the material below is based on Pereira and Palsberg [PP05]1, where further background, references, details, empirical evaluation, and examples can be found. 2 Building the Interference Graph Two variables need to be assigned to two different registers if they need to hold two different values at some point in the program.
    [Show full text]
  • Graph Decomposition in Routing and Compilers
    Graph Decomposition in Routing and Compilers Dissertation zur Erlangung des Doktorgrades der Naturwissenschaften vorgelegt beim Fachbereich Informatik und Mathematik der Johann Wolfgang Goethe-Universität in Frankfurt am Main von Philipp Klaus Krause aus Mainz Frankfurt 2015 (D 30) 2 vom Fachbereich Informatik und Mathematik der Johann Wolfgang Goethe-Universität als Dissertation angenommen Dekan: Gutachter: Datum der Disputation: Kapitel 0 Zusammenfassung 0.1 Schwere Probleme effizient lösen Viele auf allgemeinen Graphen NP-schwere Probleme (z. B. Hamiltonkreis, k- Färbbarkeit) sind auf Bäumen einfach effizient zu lösen. Baumzerlegungen, Zer- legungen von Graphen in kleine Teilgraphen entlang von Bäumen, erlauben, dies zu effizienten Algorithmen auf baumähnlichen Graphen zu verallgemeinern. Die Baumähnlichkeit wird dabei durch die Baumweite abgebildet: Je kleiner die Baumweite, desto baumähnlicher der Graph. Die Bedeutung der Baumzerlegungen [61, 113, 114] wurde seit ihrer Ver- wendung in einer Reihe von 23 Veröffentlichungen von Robertson und Seymour zur Graphminorentheorie allgemein erkannt. Das Hauptresultat der Reihe war der Beweis des Graphminorensatzes1, der aussagt, dass die Minorenrelation auf den Graphen Wohlquasiordnung ist. Baumzerlegungen wurden in verschiede- nen Bereichen angewandt. So bei probabilistischen Netzen[89, 69], in der Bio- logie [129, 143, 142, 110], bei kombinatorischen Problemen (wie dem in Teil II) und im Übersetzerbau [132, 7, 84, 83] (siehe Teil III). Außerdem gibt es algo- rithmische Metatheoreme [31,
    [Show full text]