4. Selected Topics in Compiler Construction

Content of Lecture 1. Introduction Compilers and Language Processing Tools 2. Syntax and Type Analysis 2.1 Lexical Analysis Summer Term 2011 2.2 Context-Free Syntax Analysis 2.3 Context-Dependent Analysis 3. Translation to Target Language Prof. Dr. Arnd Poetzsch-Heffter 3.1 Translation of Imperative Language Constructs 3.2 Translation of Object-Oriented Language Constructs Software Technology Group TU Kaiserslautern 4. Selected Topics in Compiler Construction 4.1 Intermediate Languages 4.2 Optimization 4.3 Register Allocation 4.4 Just-in-time Compilation 4.5 Further Aspects of Compilation 5. Garbage Collection 6. XML Processing (DOM, SAX, XSLT) c Prof. Dr. Arnd Poetzsch-Heffter 1 c Prof. Dr. Arnd Poetzsch-Heffter 2 Chapter Outline 4. Selected Topics in Compiler Construction 4.1 Intermediate Languages 4.1.1 3-Address Code 4.1.2 Other Intermediate Languages 4. Selected Topics in Compiler 4.2 Optimization 4.2.1 Classical Optimization Techniques Construction 4.2.2 Potential of Optimizations 4.2.3 Data Flow Analysis 4.2.4 Non-local Optimization 4.3 Register Allocation 4.3.1 Sethi-Ullman Algorithm 4.3.2 Register Allocation by Graph Coloring 4.4 Just-in-time Compilation 4.5 Further Aspects of Compilation c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 3 c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 4 Selected topics in compiler construction Selected topics in compiler construction (2) Focus: Learning objectives: Techniques that go beyond the direct translation of source Intermediate languages for translation and optimization of • • languages to target languages imperative languages Concentrate on concepts instead of language-dependent details Different optimization techniques • • Use program representations tailored for the considered tasks Different static analysis techniques for (intermediate) programs • (instead of source language syntax): • Register allocation I simplifies representation • Some aspects of code generation I (but needs more work to integrate tasks) • c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 5 c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 6 Intermediate languages Intermediate languages Intermediate languages Intermediate languages are used as • I appropriate program representation for certain language implementation tasks I common representation of programs of different source languages 4.1 Intermediate languages Source Source Source ... Language 1 Language 2 Language n Intermediate Language Target Target Target ... Language 1 Language 2 Language m c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 7 c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 8 Intermediate languages Intermediate languages 3-Address Code Intermediate languages (2) Intermediate languages for translation are comparable to data • structures in algorithm design, i.e., for each task, an intermediate 4.1.1 3-Address Code language is more or less suitable. Intermediate languages can conceptually be seen as abstract • machines. c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 9 c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 10 Intermediate languages 3-Address Code Intermediate languages 3-Address Code 3-address code 3-address code (2) 3-address code (3AC) is a common intermediate language with many variants. A program in 3AC consists of Properties: a list of global variables • only elementary data types (but often arrays) a list of procedures with parameters and local variables • • no nested expressions a main procedure • • sequential execution, jumps and procedure calls as statements each procedure has a sequence of 3AC commands as body • • named variables as in a high level language • unbounded number of temporary variables • c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 11 c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 12 Intermediate languages 3-Address Code Intermediate languages 3-Address Code 3AC commands 3AC commands (2) Syntax Explanation call p(x1, ..., xn) is encoded as: Syntax Explanation (block is considered as one command) x: variable (global, local, parameter, temporary) param x1 x := y bop z y,z: variable or constant ... x : = uop z bop: binary operator param xn x:= y param x uop: unary operator call p jump or conditional jump to label L call p goto L cop: comparison operator return y return y causes jump to return address if x cop y goto L only procedure-local jumps with (optional) result y x:= a[i] a one-dimensional array a[i]:= y x : = & a a global, local variable or parameter We assume that 3AC only contains labels x:= *y & a address of a for which jumps are used in the program. *x := y * dereferencing operator c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 13 c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 14 Intermediate languages 3-Address Code Intermediate languages 3-Address Code Basic blocks Basic blocks (2) Remarks: A sequence of 3AC commands can be uniquely partitioned into basic The commands of a basic block are always executed sequentially, blocks. • there are no jumps to the inside A basic block B is a maximal sequence of commands such that Often, a designated exit-block for a procedure containing the • at the end of B, exactly one jump, procedure call, or return return jump at its end is required. This is handled by additional • command occurs transformations. The transitions between basic blocks are often denoted by flow labels only occur at the first command of a basic block • • charts. c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 15 c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 16 Beispiel: (3AC und Basisblöcke) Wir betrachten den 3AC für ein C-Programm: int a[2]; int b[7]; int skprod(int i1, int i2, int lng) {... } int main( ) { Intermediate languages 3-Address Code Intermediate languages a[0] = 1; a[1] = 2;3-Address Code b[0] = 4; b[1] = 5; b[2] = 6; skprod(0,1 ,2); Example: 3AC and basic blocks Example: 3ACreturn and basic0; blocks (2) } 3AC with basic block partitioning for main procedure Beispiel: (3AC und Basisblöcke) 3AC mit Basisblockzerlegung für die Prozedur main: ConsiderWir the betrachten following C den program: 3AC für ein C-Programm: main: int a[2]; int b[7]; a[0] := 1 a[1] := 2 int skprod(int i1, int i2, int lng) {... } b[0] := 4 b[1] := 5 int main( ) { b[2] := 6 a[0] = 1; a[1] = 2; param 0 b[0] = 4; b[1] = 5; b[2] = 6; param 1 skprod(0,1 ,2); param 2 return 0; call skprod } return 0 3AC mit Basisblockzerlegung für die Prozedur main: 28.06.2007© A. Poetzsch-Heffter, TU Kaiserslautern 296 main: c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 17 c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 18 a[0] := 1 Intermediate languages 3-Address Code Intermediate languages Prozedur skprod mit 3AC und3-Address Basisblockzerlegung: Code a[1] := 2 b[0] := 4 int skprod(int i1, int i2, int lng) { Example: 3AC and basic blocks (3) Example: 3AC andint ix, basic res = 0; blocks (4) b[1] := 5 for( ix=0; ix <= lng-1; ix++ ){ b[2] := 6 res += a[i1+ix] * b[i2+ix]; } param 0 Procedure skprod asreturn 3AC res; with basic blocks param 1 } ProcedureProzedur skprod: skprodparam mit 23AC und Basisblockzerlegung: skprod: call skprod res:= 0 int skprod(int i1, int i2, int lng) { ix := 0 return 0 int ix, res = 0; t0 := lng-1 for( ix=0; ix <= lng-1; ix++ ){ if ix<=t0 28.06.2007res +=© a[i1+ix]A. Poetzsch-Heffter, * TU b[i2+ix]; Kaiserslautern 296 true false t1 := i1+ix } t2 := a[t1] t1 := i2+ix return res; t3 := b[t1] } t1 := t2*t3 res:= es+t1 ix := ix+1 skprod: return res res:= 0 28.06.2007© A. Poetzsch-Heffter, TU Kaiserslautern 297 ix := 0 c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 19 c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 20 Intermediate languages 3-Address Code Intermediate languages 3-Address Code t0 := lng-1 Intermediate LanguageVariation im Rahmen Variationsif einer ix<=t0 Zwischensprache: Characteristics of 3-Address Code 3-Adress-Code nach Elimination von Feldoperationen 3 AC after eliminationanhand of des array obigentrue operationsBeispiels: (atfalse above example) t1 := i1+ixskprod: t2 := a[t1]res:= 0 t1 := i2+ixix := 0 t3 := b[t1] Control flow is explicit. t0 := lng-1 • t1 := t2*t3if ix<=t0 Only elementary operations res:= es+t1 • true false Rearrangement and exchange of commands can be handled ix t1:= := ix+1i1+ix • tx := t1*4 return res relatively easily. ta := a+tx t2 := *ta t1 := i2+ix tx := t1*4 tb := b+tx return res t3 := *tb t1 := t2*t3 res:= res+t1 ix := ix+1 28.06.2007© A. Poetzsch-Heffter, TU Kaiserslautern 297 c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 21 c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 22 28.06.2007© A. Poetzsch-Heffter, TU Kaiserslautern 298 Intermediate languages Other Intermediate Languages Intermediate languages Other Intermediate Languages Further Intermediate Languages 4.1.2 Other Intermediate Languages We consider 3AC in Static Single Assignment (SSA) representation • Stack Machine Code • c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 23 c Prof. Dr. Arnd Poetzsch-Heffter Selected Topics in Compiler Construction 24 IntermediateIn SSA-Repräsentation languages besitztOther Intermediate jede Languages Variable genau Intermediate languages Other Intermediate Languages Singleeine Definition. Static Assignment Dadurch Form wird der Zusammenhang Single Static Assignment Form (2) zwischen Anw endu ng u nd Definition in der Zwischensprache explizit, d.h. eine zusätzliche Ifdef-use-Verkettung a variable a is read at a program oder use-def-Verkettung position, this is a use of a .wird SSA is essentially a refinement of 3AC.

4. Selected Topics in Compiler Construction

A Deep Dive Into the Interprocedural Optimization Infrastructure

Attacking Client-Side JIT Compilers.Key

Handout – Dataflow Optimizations Assignment

1 More Register Allocation Interference Graph Allocators

Comparative Studies of Programming Languages; Course Lecture Notes

Introduction Inline Expansion

Feasibility of Optimizations Requiring Bounded Treewidth in a Data Flow Centric Intermediate Representation

Iterative-Free Program Analysis

PGI Compilers

Dataflow Optimizations

Register Allocation by Puzzle Solving

Eliminating Scope and Selection Restrictions in Compiler Optimizations