Lecture 17 1) Profile-Based Compiler: High-Level  Binary, Static – Dynamic Code Optimization Uses (Dynamic=Runtime) Information Collected in Profiling Passes

Lecture 17 1) Profile-Based Compiler: High-Level  Binary, Static – Dynamic Code Optimization Uses (Dynamic=Runtime) Information Collected in Profiling Passes

I. Beyond Static Compilation Lecture 17 1) Profile-based Compiler: high-level binary, static – Dynamic Code Optimization Uses (dynamic=runtime) information collected in profiling passes I. Motivation & Background 2) Interpreter: high-level, emulate, dynamic II. Overview III. Partial Method Compilation 3) Dynamic compilation / code optimization: high-level binary, dynamic IV. Partial Dead Code Elimination – interpreter/compiler hybrid V. Partial Escape Analysis – supports cross-module optimization VI. Results – can specialize program using runtime information “Partial Method Compilation Using Dynamic Profile Information”, • without separate profiling passes John Whaley, OOPSLA 01 Carnegie Mellon Carnegie Mellon Phillip B. Gibbons 15-745: Dynamic Code Optimization 1 15-745: Dynamic Code Optimization 2 1) Dynamic Profiling Can Improve Compile-time Optimizations Profile-Based Compile-time Optimization • Understanding common dynamic behaviors may help guide optimizations 1. Compile statically 2. Collect profile 3. Re-compile, using profile – e.g., control flow, data dependences, input values (using typical inputs) void foo(int A, int B) { prog1.c prog1.c … What are typical values of A, B? input1 while (…) { How often is this condition true? execution if (A > B) profile runme.exe How often does *p == val[i]? compiler compiler *p = 0; (instrumented) C = val[i] + D; Is this loop invariant? E += C – B; execution … runme.exe runme_v2.exe profile } } • Collecting control-flow profiles is relatively inexpensive • Profile-based compile-time optimizations – profiling data dependences, data values, etc., is more costly – e.g., speculative scheduling, cache optimizations, code specialization • Limitations of this approach? Carnegie Mellon Carnegie Mellon 15-745: Dynamic Code Optimization 3 15-745: Dynamic Code Optimization 4 1 Instrumenting Executable Binaries Binary Instrumentation/Optimization Tools 1. Compile statically 2. Collect profile • Unlike typical compilation, the input is a binary (not source code) (using typical inputs) • One option: static binary-to-binary rewriting prog1.c input1 runme.exe tool runme_modified.exe How to perform the instrumentation? • Challenges (with the static approach): compiler runme.exe (instrumented) – what about dynamically-linked shared libraries? binary – if our goal is optimization, are we likely to make the code faster? execution runme.exe instrumentation • a compiler already tried its best, and it had source code (we don’t) profile tool – if we are adding instrumentation code, what about time/space overheads? 1. The compiler could insert it directly • instrumented code might be slow & bloated if we aren’t careful 2. A binary instrumentation tool could modify the executable directly • optimization may be needed just to keep these overheads under control – that way, we don’t need to modify the compiler – compilers that target the same architecture (e.g., x86) can use the same tool • Bottom line: the purely static approach to binary rewriting is rarely used Carnegie Mellon Carnegie Mellon 15-745: Dynamic Code Optimization 5 15-745: Dynamic Code Optimization 6 2) (Pure) Interpreter A Sweet Spot? • One approach to dynamic code execution/analysis is an interpreter • Is there a way that we can combine: – basic idea: a software loop that grabs, decodes, and emulates each instruction – the flexibility of an interpreter (analyzing and changing code dynamically); and while (stillExecuting) { – the performance of direct hardware execution? inst = readInst(PC); instInfo = decodeInst(inst); switch (instInfo.opType) { • Key insights: case binaryArithmetic: … – increase the granularity of interpretation case memoryLoad: … … • instructions chunks of code (e.g., procedures, basic blocks) } – dynamically compile these chunks into directly-executed optimized code PC = nextPC(PC,instInfo); } • store these compiled chunks in a software code cache • Advantages: • jump in and out of these cached chunks when appropriate – also works for dynamic programming languages (e.g., Java) • these cached code chunks can be updated! – easy to change the way we execute code on-the-fly (SW controls everything) – invest more time optimizing code chunks that are clearly hot/important • Disadvantages: • easy to instrument the code, since already rewriting it – runtime overhead! • must balance (dynamic) compilation time with likely benefits • each dynamic instruction is emulated individually by software Carnegie Mellon Carnegie Mellon 15-745: Dynamic Code Optimization 7 15-745: Dynamic Code Optimization 8 2 3) Dynamic Compiler Components in a Typical Just-In-Time (JIT) Compiler while (stillExecuting) { Dynamic Code if (!codeCompiledAlready(PC)) { Optimizer compileChunkAndInsertInCache(PC); } “Interpreter” Compiled Code jumpIntoCodeCache(PC); Input Profiling // compiled chunk returns here when finished Program Control Loop Cache (Chunks) Information PC = getNextPC(…); } Cache Manager (Eviction Policy) • This general approach is widely used: – Java virtual machines • Cached chunks of compiled code run at hardware speed – dynamic binary instrumentation tools (Valgrind, Pin, Dynamo Rio) – returns control to “interpreter” loop when chunk is finished – hardware virtualization • Dynamic optimizer uses profiling information to guide code optimization – as code becomes hotter, more aggressive optimization is justified replace the old compiled code chunk with a faster version Carnegie Mellon Carnegie Mellon 15-745: Dynamic Code Optimization 9 15-745: Dynamic Code Optimization 10 II. Overview of Dynamic Compilation / Code Optimization Dynamic Compilation Policy • Interpretation/Compilation/Optimization policy decisions • ∆Ttotal = Tcompile – (nexecutions * Timprovement) – Choosing what and how to compile, and how much to optimize – If ∆Ttotal is negative, our compilation policy decision was effective. • Collecting runtime information – Instrumentation • We can try to: – Sampling – Reduce Tcompile (faster compile times) – Increase T (generate better code: but at cost of increasing T ) • Optimizations exploiting runtime information improvement compile – Focus on large nexecutions (compile/optimize hot spots) – Focus on frequently-executed code paths • 80/20 rule: Pareto Principle – 20% of the work for 80% of the advantage Carnegie Mellon Carnegie Mellon 15-745: Dynamic Code Optimization 11 15-745: Dynamic Code Optimization 12 3 Latency vs. Throughput Multi-Stage Dynamic Compilation System • Tradeoff: startup speed vs. execution performance interpreted Execution count is the sum of Stage 1: method invocations & back edges executed code Startup speed Execution performance when execution count = t1 (e.g. 2000) Interpreter Best Poor ‘Quick’ compiler Fair Fair compiled Stage 2: Optimizing compiler Poor Best code when execution count = t2 (e.g. 25,000) fully optimized Stage 3: code Carnegie Mellon Carnegie Mellon 15-745: Dynamic Code Optimization 13 15-745: Dynamic Code Optimization 14 Granularity of Compilation: Per Method? Example: SpecJVM98 db • Methods can be large, especially after inlining void read_db(String fn) { – Cutting/avoiding inlining too much hurts performance considerably int n = 0, act = 0; int b; byte buffer[] = null; try { FileInputStream sif = new FileInputStream(fn); • Compilation time is proportional to the amount of code being compiled n = sif.getContentLength(); – Moreover, many optimizations are not linear buffer = new byte[n]; Hot while ((b = sif.read(buffer, act, n-act))>0) { • Even “hot” methods typically contain some code that is rarely/never executed act = act + b; loop } sif.close(); if (act != n) { /* lots of error handling code, rare */ } } catch (IOException ioe) { /* lots of error handling code, rare */ } Carnegie Mellon } Carnegie Mellon 15-745: Dynamic Code Optimization 15 15-745: Dynamic Code Optimization 16 4 Example: SpecJVM98 db Optimize hot “regions”, not methods void read_db(String fn) { interpreted int n = 0, act = 0; int b; byte buffer[] = null; • Optimize only the most frequently executed Stage 1: code try { segments within a method FileInputStream sif = new FileInputStream(fn); – Simple technique: n = sif.getContentLength(); • Track execution counts of basic blocks buffer = new byte[n]; in Stages 1 & 2 while ((b = sif.read(buffer, act, n-act))>0) { • Any basic block executing in Stage 2 compiled Stage 2: act = act + b; is considered to be not rare code } Lots of sif.close(); • Beneficial secondary effect of improving if (act != n) { rare code! optimization opportunities on the common paths /* lots of error handling code, rare */ } • No need to profile any basic block executing fully optimized } catch (IOException ioe) { in Stage 3 Stage 3: code /* lots of error handling code, rare */ – Already fully optimized } } Carnegie Mellon Carnegie Mellon 15-745: Dynamic Code Optimization 17 15-745: Dynamic Code Optimization 18 % of Basic Blocks in Methods that are Executed > Threshold Times % of Basic Blocks that are Executed > Threshold Times (hence would get compiled under per-method strategy) (hence get compiled under per-basic-block strategy) 100.00% 100.00% Linpack Linpack JavaC UP JavaC UP 80.00% 80.00% JavaLEX JavaLEX SwingSet S wingSet 60.00% check 60.00% check com press com press 40.00% jess 40.00% jess db db javac javac 20.00% 20.00% m pegaud m pegaud % of basic executed % of blocks % of basic compiled % of blocks m trt m trt 0.00% jack 0.00% jack 1 10 100 500 1000 2000 5000 1 10 100 500 1000 2000 5000 execution threshold execution threshold Carnegie Mellon Carnegie Mellon 15-745: Dynamic Code Optimization

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us