1 Lecture 25

I. Goals of This Lecture Lecture 25 • Beyond static compilation • Example of a complete system Dynamic Compilation • Use of data flow techniques in a new context • Experimental approach I. Motivation & Background II. Overview III. Compilation Policy IV. Partial Method Compilation V. Partial Dead Code Elimination VI. Escape Analysis VII. Results “Partial Method Compilation Using Dynamic Profile Information”, John Whaley, OOPSLA 01 (Slide content courtesy of John Whaley & Monica Lam.) Carnegie Mellon Carnegie Mellon Todd C. Mowry 15-745: Dynamic Compilation 1 15-745: Dynamic Compilation 2 Todd C. Mowry Static/Dynamic High-Level/Binary • Compiler: high-level à binary, static • Binary translator: Binary-binary; mostly dynamic • Interpreter: high-level, emulate, dynamic – Run “as-is” – Software migration • Dynamic compilation: high-level à binary, dynamic (x86 à alpha, sun, transmeta; 68000 à powerPC à x86) – machine-independent, dynamic loading – Virtualization (make hardware virtualizable) – cross-module optimization – Dynamic optimization (Dynamo Rio) – specialize program using runtime information – Security (execute out of code in a cache that is “protected”) • without profiling Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 3 Todd C. Mowry 15-745: Dynamic Compilation 4 Todd C. Mowry 1 Closed-world vs. Open-world II. Overview of Dynamic Compilation • Closed-world assumption (most static compilers) • Interpretation/Compilation policy decisions – all code is available a priori for analysis and compilation. – Choosing what and how to compile • Open-world assumption (most dynamic compilers) • Collecting runtime information – code is not available – Instrumentation – arbitrary code can be loaded at run time. – Sampling • Open-world assumption precludes many optimization opportunities. • Exploiting runtime information – Solution: Optimistically assume the best case, but provide a way out – frequently-executed code paths if necessary. Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 5 Todd C. Mowry 15-745: Dynamic Compilation 6 Todd C. Mowry Speculative Inlining III. Compilation Policy V-Table Method 1 Bytecode • ∆Ttotal = Tcompile – (nexecutions * Timprovement) Method 1 Method 2 Bytecode Method 2 – If ∆Ttotal is negative, our compilation policy decision was effective. Method 3 Bytecode Method 3 Method 4 Method 4 Bytecode • We can try to: Method 5 – Reduce Tcompile (faster compile times) Method 5 Bytecode • Virtual call sites are deadly. – Increase Timprovement (generate better code) – Kill optimization opportunities – Focus on large nexecutions (compile hot spots) – Virtual dispatch is expensive on modern CPUs • – Very common in object-oriented code 80/20 rule: Pareto Principle – 20% of the work for 80% of the advantage • Speculatively inline the most likely call target based on class hierarchy or profile information. – Many virtual call sites have only one target, so this technique is very effective in practice. Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 7 Todd C. Mowry 15-745: Dynamic Compilation 8 Todd C. Mowry 2 Latency vs. Throughput Multi-Stage Dynamic Compilation System • Tradeoff: startup speed vs. execution performance interpreted Stage 1: code Startup speed Execution performance when execution count = t1 (e.g. 2000) Interpreter Best Poor Execution count is the sum of ‘Quick’ compiler Fair Fair compiled method invocations & back edges Stage 2: executed. Optimizing compiler Poor Best code when execution count = t2 (e.g. 25000) fully Stage 3: optimized code Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 9 Todd C. Mowry 15-745: Dynamic Compilation 10 Todd C. Mowry Granularity of Compilation Example: SpecJVM db • Compilation time is proportional to the amount of code being compiled. void read_db(String fn) { • Many optimizations are not linear. int n = 0, act = 0; byte buffer[] = null; try { • Methods can be large, especially after inlining. FileInputStream sif = new FileInputStream(fn); • Cutting inlining too much hurts performance considerably. buffer = new byte[n]; • Even “hot” methods typically contain some code that is rarely or never Hot while ((b = sif.read(buffer, act, n-act))>0) { executed. loop act = act + b; } sif.close(); if (act != n) { /* lots of error handling code, rare */ } } catch (IOException ioe) { /* lots of error handling code, rare */ } } Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 11 Todd C. Mowry 15-745: Dynamic Compilation 12 Todd C. Mowry 3 Example: SpecJVM db Optimize hot “regions”, not methods void read_db(String fn) { • Optimize only the most frequently executed segments within a method. int n = 0, act = 0; byte buffer[] = null; – Simple technique: try { • any basic block executed during Stage 2 is considered to be hot. FileInputStream sif = new FileInputStream(fn); • Beneficial secondary effect of improving optimization opportunities on buffer = new byte[n]; the common paths. while ((b = sif.read(buffer, act, n-act))>0) { act = act + b; } sif.close(); Lots of if (act != n) { rare code! /* lots of error handling code, rare */ } } catch (IOException ioe) { /* lots of error handling code, rare */ } } Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 13 Todd C. Mowry 15-745: Dynamic Compilation 14 Todd C. Mowry Method-at-a-Time Strategy Actual Basic Blocks Executed 100.00% 100.00% Linpack Linpack JavaC UP JavaC UP 80.00% 80.00% JavaLEX JavaLEX SwingSet SwingSet 60.00% check 60.00% check com press com press 40.00% je s s 40.00% je s s db db ja v a c ja v a c 20.00% 20.00% mpegaud mpegaud of basic blocks compiled ofbasic blocks compiled of basic blocks executed executed of basic blocks % mtrt mtrt % 0.00% 0.00% ja c k ja c k 1 10 100 500 1000 2000 5000 1 10 100 500 1000 2000 5000 execution threshold execution threshold Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 15 Todd C. Mowry 15-745: Dynamic Compilation 16 Todd C. Mowry 4 Dynamic Code Transformations IV. Partial Method Compilation • Compiling partial methods 1. Based on profile data, determine the set of rare blocks. • Partial dead code elimination – Use code coverage information from the first compiled version • Escape analysis Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 17 Todd C. Mowry 15-745: Dynamic Compilation 18 Todd C. Mowry Partial Method Compilation Partial Method Compilation 2. Perform live variable analysis. 3. Redirect the control flow edges that targeted rare blocks, and remove – Determine the set of live variables at rare block entry points. the rare blocks. live: x,y,z to interpreter… Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 19 Todd C. Mowry 15-745: Dynamic Compilation 20 Todd C. Mowry 5 Partial Method Compilation Partial Method Compilation 4. Perform compilation normally. 5. Record a map for each interpreter transfer point. – Analyses treat the interpreter transfer point as an unanalyzable – In code generation, generate a map that specifies the location, in method call. registers or memory, of each of the live variables. – Maps are typically < 100 bytes live: x,y,z x: sp - 4 y: r1 z: sp - 8 Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 21 Todd C. Mowry 15-745: Dynamic Compilation 22 Todd C. Mowry V. Partial Dead Code Elimination Partial Dead Code Example • Move computation that is only live on a rare path into the rare block, x = 0; if (rare branch 1) { saving computation in the common case. if (rare branch 1){ x = 0; ... ... z = x + y; z = x + y; ... ... } } if (rare branch 2){ if (rare branch 2) { ... x = 0; a = x + z; ... ... a = x + z; } ... } Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 23 Todd C. Mowry 15-745: Dynamic Compilation 24 Todd C. Mowry 6 IV. Escape Analysis Escape Analysis • Escape analysis finds objects that do not escape a method or a thread. • Stack allocate objects that don’t escape in the common blocks. – “Captured” by method: can be allocated on the stack or in registers. • Eliminate synchronization on objects that don’t escape the common – “Captured” by thread: can avoid synchronization operations. blocks. • All Java objects are normally heap allocated, so this is a big win. • If a branch to a rare block is taken: – Copy stack-allocated objects to the heap and update pointers. – Reapply eliminated synchronizations. Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 25 Todd C. Mowry 15-745: Dynamic Compilation 26 Todd C. Mowry VII. Run Time Improvement 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% check compress jess db javac mpegaud mtrt jack SwingSet linpack JLex JCup First bar: original (Whole method opt) Second bar: Partial Method Comp (PMC) Third bar: PMC + opts Bottom bar: Execution time if code was compiled/opt. from the beginning Carnegie Mellon 15-745: Dynamic Compilation 27 Todd C. Mowry 7 .

1 Lecture 25

The LLVM Instruction Set and Compilation Strategy

Dynamic Extension of Typed Functional Languages

Comparative Studies of Programming Languages; Course Lecture Notes

Bohm2013.Pdf (3.003Mb)

Language and Compiler Support for Dynamic Code Generation by Massimiliano A

Transparent Dynamic Optimization: the Design and Implementation of Dynamo

Performance Analyses and Code Transformations for MATLAB Applications Patryk Kiepas

Is Dynamic Compilation Possible for Embedded System ? Scopes 2015, St Goar

Csc 553 Principles of Compilation 2 : Interpreters Department Of

Interprocedural Type Specialization of Javascript Programs Without Type Analysis∗

Sulong - Execution of LLVM-Based Languages on the JVM

Practical Partial Evaluation for High-Performance Dynamic Language Runtimes