I. Goals of This Lecture

Lecture 25 • Beyond static compilation • Example of a complete system Dynamic Compilation • Use of data flow techniques in a new context • Experimental approach I. Motivation & Background II. Overview III. Compilation Policy IV. Partial Method Compilation V. Partial Dead Code Elimination VI. Escape Analysis VII. Results

“Partial Method Compilation Using Dynamic Profile Information”, John Whaley, OOPSLA 01 (Slide content courtesy of John Whaley & Monica Lam.)

Carnegie Mellon Carnegie Mellon

Todd C. Mowry 15-745: Dynamic Compilation 1 15-745: Dynamic Compilation 2 Todd C. Mowry

Static/Dynamic High-Level/Binary

: high-level à binary, static • Binary : Binary-binary; mostly dynamic • : high-level, emulate, dynamic – Run “as-is” – Software migration • Dynamic compilation: high-level à binary, dynamic ( à alpha, sun, transmeta; 68000 à powerPC à x86) – machine-independent, dynamic loading – Virtualization (make hardware virtualizable) – cross-module optimization – Dynamic optimization (Dynamo Rio) – specialize program using runtime information – Security (execute out of code in a cache that is “protected”) • without profiling

Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 3 Todd C. Mowry 15-745: Dynamic Compilation 4 Todd C. Mowry

1 Closed-world vs. Open-world II. Overview of Dynamic Compilation

• Closed-world assumption (most static ) • Interpretation/Compilation policy decisions – all code is available a priori for analysis and compilation. – Choosing what and how to compile • Open-world assumption (most dynamic compilers) • Collecting runtime information – code is not available – Instrumentation – arbitrary code can be loaded at run time. – Sampling • Open-world assumption precludes many optimization opportunities. • Exploiting runtime information – Solution: Optimistically assume the best case, but provide a way out – frequently-executed code paths if necessary.

Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 5 Todd C. Mowry 15-745: Dynamic Compilation 6 Todd C. Mowry

Speculative Inlining III. Compilation Policy

V-Table Method 1 • ∆Ttotal = Tcompile – (nexecutions * Timprovement) Method 1 Method 2 Bytecode Method 2 – If ∆Ttotal is negative, our compilation policy decision was effective. Method 3 Method 3 Bytecode Method 4 Method 4 Bytecode • We can try to: Method 5 – Reduce Tcompile (faster compile times) Method 5 Bytecode • Virtual call sites are deadly. – Increase Timprovement (generate better code) – Kill optimization opportunities – Focus on large nexecutions (compile hot spots) – Virtual dispatch is expensive on modern CPUs • – Very common in object-oriented code 80/20 rule: Pareto Principle – 20% of the work for 80% of the advantage

• Speculatively inline the most likely call target based on class hierarchy or profile information. – Many virtual call sites have only one target, so this technique is very effective in practice.

Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 7 Todd C. Mowry 15-745: Dynamic Compilation 8 Todd C. Mowry

2 Latency vs. Throughput Multi-Stage Dynamic Compilation System

• Tradeoff: startup speed vs. performance interpreted Stage 1: code

Startup speed Execution performance when execution count = t1 (e.g. 2000) Interpreter Best Poor Execution count is the sum of ‘Quick’ compiler Fair Fair compiled method invocations & back edges Stage 2: executed. Poor Best code

when execution count = t2 (e.g. 25000)

fully Stage 3: optimized code

Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 9 Todd C. Mowry 15-745: Dynamic Compilation 10 Todd C. Mowry

Granularity of Compilation Example: SpecJVM db

• Compilation time is proportional to the amount of code being compiled. void read_db(String fn) { • Many optimizations are not linear. int n = 0, act = 0; byte buffer[] = null; try { • Methods can be large, especially after inlining. FileInputStream sif = new FileInputStream(fn); • Cutting inlining too much hurts performance considerably. buffer = new byte[n]; • Even “hot” methods typically contain some code that is rarely or never Hot while ((b = sif.read(buffer, act, n-act))>0) { executed. loop act = act + b; } sif.close(); if (act != n) { /* lots of error handling code, rare */ } } catch (IOException ioe) { /* lots of error handling code, rare */ } }

Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 11 Todd C. Mowry 15-745: Dynamic Compilation 12 Todd C. Mowry

3 Example: SpecJVM db Optimize hot “regions”, not methods

void read_db(String fn) { • Optimize only the most frequently executed segments within a method. int n = 0, act = 0; byte buffer[] = null; – Simple technique: try { • any basic block executed during Stage 2 is considered to be hot. FileInputStream sif = new FileInputStream(fn); • Beneficial secondary effect of improving optimization opportunities on buffer = new byte[n]; the common paths. while ((b = sif.read(buffer, act, n-act))>0) { act = act + b; } sif.close(); Lots of if (act != n) { rare code! /* lots of error handling code, rare */ } } catch (IOException ioe) { /* lots of error handling code, rare */ } }

Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 13 Todd C. Mowry 15-745: Dynamic Compilation 14 Todd C. Mowry

Method-at-a-Time Strategy Actual Basic Blocks Executed

100.00% 100.00% Linpack Linpack JavaC UP JavaC UP 80.00% 80.00% JavaLEX JavaLEX SwingSet SwingSet 60.00% check 60.00% check com press com press 40.00% je s s 40.00% je s s db db ja v a c ja v a c 20.00% 20.00% mpegaud mpegaud of basic blocks compiled compiled blocks ofbasic of basic blocks executed executed blocks ofbasic

% mtrt mtrt % 0.00% 0.00% ja c k ja c k 1 10 100 500 1000 2000 5000 1 10 100 500 1000 2000 5000

execution threshold execution threshold Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 15 Todd C. Mowry 15-745: Dynamic Compilation 16 Todd C. Mowry

4 Dynamic Code Transformations IV. Partial Method Compilation

• Compiling partial methods 1. Based on profile data, determine the set of rare blocks. • Partial dead code elimination – Use code coverage information from the first compiled version • Escape analysis

Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 17 Todd C. Mowry 15-745: Dynamic Compilation 18 Todd C. Mowry

Partial Method Compilation Partial Method Compilation

2. Perform live variable analysis. 3. Redirect the control flow edges that targeted rare blocks, and remove – Determine the set of live variables at rare block entry points. the rare blocks.

live: x,y,z

to interpreter…

Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 19 Todd C. Mowry 15-745: Dynamic Compilation 20 Todd C. Mowry

5 Partial Method Compilation Partial Method Compilation

4. Perform compilation normally. 5. Record a map for each interpreter transfer point. – Analyses treat the interpreter transfer point as an unanalyzable – In code generation, generate a map that specifies the location, in method call. registers or memory, of each of the live variables. – Maps are typically < 100 bytes

live: x,y,z x: sp - 4 y: r1 z: sp - 8

Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 21 Todd C. Mowry 15-745: Dynamic Compilation 22 Todd C. Mowry

V. Partial Dead Code Elimination Partial Dead Code Example

• Move computation that is only live on a rare path into the rare block, x = 0; if (rare branch 1) { saving computation in the common case. if (rare branch 1){ x = 0; ...... z = x + y; z = x + y; ...... } } if (rare branch 2){ if (rare branch 2) { ... x = 0; a = x + z; ...... a = x + z; } ... }

Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 23 Todd C. Mowry 15-745: Dynamic Compilation 24 Todd C. Mowry

6 IV. Escape Analysis Escape Analysis

• Escape analysis finds objects that do not escape a method or a thread. • Stack allocate objects that don’t escape in the common blocks. – “Captured” by method: can be allocated on the stack or in registers. • Eliminate synchronization on objects that don’t escape the common – “Captured” by thread: can avoid synchronization operations. blocks. • All Java objects are normally heap allocated, so this is a big win. • If a branch to a rare block is taken: – Copy stack-allocated objects to the heap and update pointers. – Reapply eliminated synchronizations.

Carnegie Mellon Carnegie Mellon 15-745: Dynamic Compilation 25 Todd C. Mowry 15-745: Dynamic Compilation 26 Todd C. Mowry

VII. Run Time Improvement

100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% check compress jess db javac mpegaud mtrt jack SwingSet linpack JLex JCup

First bar: original (Whole method opt) Second bar: Partial Method Comp (PMC) Third bar: PMC + opts Bottom bar: Execution time if code was compiled/opt. from the beginning

Carnegie Mellon 15-745: Dynamic Compilation 27 Todd C. Mowry

7