CSE 501: Course Outline Implementation of Programming Languages

CSE 501: Course outline Implementation of Programming Languages Models of compilation/analysis Main focus: program analysis and transformation ¥ how to represent programs? Standard optimizing transformations ¥ how to analyze programs? what to analyze? ¥ how to transform programs? what transformations to apply? Basic representations and analyses Study imperative, functional, and object-oriented languages Fancier representations and analyses Official prerequisites: Interprocedural representations, analyses, and transformations ¥ CSE 401 or equivalent ¥ for imperative, functional, and OO languages ¥ CSE 505 or equivalent Run-time system issues ¥ garbage collection Reading: Appel’s “Modern Compiler Implementation” ¥ compiling dynamic dispatch, first-class functions, ... + ~20 papers from literature ¥ “Compilers: Principles, Techniques, & Tools”, Dynamic (JIT) compilation a.k.a. the Dragon Book, as a reference Other program analysis frameworks and tools Coursework: ¥ model checking, constraints, best-effort "bug finders" ¥ periodic homework assignments ¥ major course project ¥ midterm, final Craig Chambers 1 CSE 501 Craig Chambers 2 CSE 501 Why study compilers? Goals for language implementation Meeting area of programming languages, architectures Correctness ¥ capabilities of compilers greatly influence design of these others Efficiency ¥ of: time, data space, code space Program representation, analysis, and transformation ¥ at: compile-time, run-time is widely useful beyond pure compilation ¥ software engineering tools Support expressive, safe language features ¥ DB query optimizers, programmable graphics renderers ¥ first-class, higher-order functions (domain-specific languages and optimizers) ¥ method dispatching ¥ safety/security checking of code, e.g. in programmable/extensible systems, networks, ¥ exceptions, continuations databases ¥ reflection, dynamic code loading ¥ bounds-checked arrays, ... Cool theoretical aspects, too ¥ garbage collection ¥ lattice domains, graph algorithms, computability/complexity ¥ ... Support desirable programming environment features ¥ fast turnaround ¥ separate compilation, shared libraries ¥ source-level debugging ¥ profiling ¥ ... Craig Chambers 3 CSE 501 Craig Chambers 4 CSE 501 Standard compiler organization Mixing front-ends and back-ends Analysis Synthesis of input program of output program Define intermediate language (front-end) (back-end) (e.g. Java bytecode, MSIL, SUIF, WIL, C, C--, ...) character Compile multiple languages into it stream intermediate form ¥ each such compiler may not be much more than a front-end Lexical Analysis token Optimization Compile to multiple targets from it stream ¥ may not be much more than back-end Syntactic Analysis intermediate form abstract Or, interpret/execute it directly syntax Or, perform other analyses of it tree Interpreter Code Generation Semantic Analysis target language annotated Advantages: AST ¥ reuse of front-ends and back-ends ¥ portable “compiled” code Interpreter Intermediate Code Generation BUT: design of portable intermediate language is hard intermediate form ¥ how universal? across input language models? target machine models? ¥ high-level or low-level? Craig Chambers 5 CSE 501 Craig Chambers 6 CSE 501 Key questions Overview of optimizations How are programs represented in the compiler? First analyze program to learn things about it Then transform the program based on info How are analyses organized/structured? Repeat... ¥ Over what region of the program are analyses performed? ¥ What analysis algorithms are used? Requirement: don’t change the semantics! ¥ transform input program into What kinds of optimizations can be performed? semantically equivalent but better output program ¥ Which are profitable in practice? ¥ How should analyses/optimizations be sequenced/ combined? Analysis determines when transformations are: ¥ legal How best to compile in face of: ¥ profitable ¥ pointers, arrays ¥ first-class functions ¥ inheritance & message passing ¥ parallel target machines Caveat: “optimize” a misnomer ¥ result is almost never optimal Other issues: ¥ sometimes slow down some programs on some inputs ¥ speeding compilation (although hope to speed up most programs on most inputs) ¥ making compilers portable, table-driven ¥ supporting tools like debuggers, profilers, garbage collect’rs Craig Chambers 7 CSE 501 Craig Chambers 8 CSE 501 Semantics Scope of analysis Exactly what are the semantics that are to be preserved? Peephole: across a small number of “adjacent” instructions Subtleties: [adjacent in space or time] ¥ evaluation order ¥ trivial analysis ¥ arithmetic properties like associativity, commutativity ¥ behavior in “error” cases Local: within a basic block ¥ simple, fast analysis Some languages very precise Intraprocedural (a.k.a. global): across basic blocks, within a procedure ¥ programmers always know what they’re getting ¥ analysis more complex: branches, merges, loops Others weaker Interprocedural: ¥ allow better performance (but how much?) across procedures, within a whole program ¥ analysis even more complex: calls, returns ¥ hard with separate compilation Semantics selected by compiler option? Whole-program: analysis examines whole program in order to prove safety Craig Chambers 9 CSE 501 Craig Chambers 10 CSE 501 A tour of common optimizations/transformations common subexpression elimination (CSE) x := a + b ⇒ x := a + b ... ... arithmetic simplifications: y := a + b y := x ¥ constant folding ¥ can also eliminate redundant memory references, ⇒ x := 3 + 4 x := 7 branch tests ¥ strength reduction x := y * 4 ⇒ x := y << 2 partial redundancy elimination (PRE) ¥ like CSE, but with earlier expression only available along constant propagation subset of possible paths x := 5 ⇒ x := 5 ⇒ x := 5 if ... then ⇒ if ... then y := x + 2 y := 5 + 2 y := 7 ... ... x := a + b t := a + b; x := t end else t := a + b end integer range analysis ... ... ¥ fold comparisons based on range analysis y := a + b y := t ¥ eliminate unreachable code for(index = 0; index < 10; index ++) { if index >= 10 goto _error a[index] := 0 } ¥ more generally, symbolic assertion analysis Craig Chambers 11 CSE 501 Craig Chambers 12 CSE 501 copy propagation pointer/alias analysis x := y ⇒ x := y p := &x ⇒ p := &x ⇒ p := &x w := w + x w := w + y *p := 5 *p := 5 *p := 5 y := x + 1 y := 5 + 1 y := 6 dead (unused) assignment elimination x := y ** z x := 5 ... // no use of x *p := 3 ⇒ x := 6 y := x + 1 ??? ¥ a common clean-up after other optimizations: ¥ augments lots of other optimizations/analyses x := y ⇒ x := y ⇒ x := y w := w + x w := w + y ⇒ w := w + y ... // no use of x partial dead assignment elimination ¥ like DAE, except assignment only used on some later paths dead (unreachable) code elimination if false goto _else ... goto _done _else: ... _done: ¥ another common clean-up after other optimizations Craig Chambers 13 CSE 501 Craig Chambers 14 CSE 501 loop-invariant code motion parallelization for j := 1 to 10 ⇒ for j := 1 to 10 for i := 1 to 1000 ⇒ forall i := 1 to 1000 for i := 1 to 10 t := b[j] a[i] := a[i] + 1 a[i] := a[i] + 1 a[i] := a[i] + b[j] for i := 1 to 10 loop interchange, skewing, reversal, ... a[i] := a[i] + t blocking/tiling induction variable elimination ¥ restructuring loops for better data cache locality for i := 1 to 10 ⇒ for p := &a[1] to &a[10] for i := 1 to 1000 a[i] := a[i] + 1 *p := *p + 1 for j := 1 to 1000 ¥ a[i] is several instructions, *p is one for k := 1 to 1000 c[i,j] += a[i,k] * b[k,j] ⇒ loop unrolling for i := 1 to 1000 by TILESIZE for j := 1 to 1000 by TILESIZE for i := 1 to N ⇒ for i := 1 to N by 4 for k := 1 to 1000 a[i] := a[i] + 1 a[i] := a[i] + 1 for i’ := i to i+TILESIZE a[i+1] := a[i+1] + 1 for j’ := j to j+TILESIZE a[i+2] := a[i+2] + 1 c[i’,j’] += a[i’,k] * b[k,j’] a[i+3] := a[i+3] + 1 Craig Chambers 15 CSE 501 Craig Chambers 16 CSE 501 inlining register allocation l := ... ⇒ l := ... ⇒ l := ... w := 4 w := 4 w := 4 instruction selection a := area(l,w) a := l * w a := l << 2 p1 := p + 4 ⇒ ld %g3, [%g1 + 4] ¥ lots of “silly” optimizations become important after inlining x := *p1 ¥ particularly important on CISCs interprocedural constant propagation, alias analysis, etc. instruction scheduling static binding of dynamic calls ld %g2, [%g1 + 0] ⇒ ld %g2, [%g1 + 0] ¥ in imperative languages, for call of a function pointer: add %g3, %g2, 1 ld %g5, [%g1 + 4] if can compute unique target of pointer, ld %g2, [%g1 + 4] add %g3, %g2, 1 can replace with direct call add %g4, %g2, 1 add %g4, %g5, 1 ¥ in functional languages, for call of a computed function: ¥ particularly important with instructions that have delayed if can compute unique value of function expression, results, and on wide-issue machines can replace with direct call ¥ vs. dynamically scheduled machines? ¥ in OO languages, for dynamically dispatched message: if can deduce class of receiver, can replace with direct call ¥ other possible optimizations even if several possible targets procedure specialization Craig Chambers 17 CSE 501 Craig Chambers 18 CSE 501 Optimization themes The phase ordering problem Don’t compute it if you don’t have to Typically, want to perform a number of optimizations; ¥ dead assignment elimination in what order should the transformations be performed? Compute it at compile-time if you can some optimizations create opportunities for other optimizations ⇒ order optimizations using this dependence ¥ constant folding, loop unrolling, inlining ¥ some optimizations simplified if can assume another opt will run later & “clean up” Compute it as few

Load more