Transparent Dynamic Optimization: the Design and Implementation of Dynamo
Total Page:16
File Type:pdf, Size:1020Kb
Transparent Dynamic Optimization: The Design and Implementation of Dynamo Vasanth Bala, Evelyn Duesterwald, Sanjeev Banerjia HP Laboratories Cambridge HPL-1999-78 June, 1999 E-mail: [email protected] dynamic Dynamic optimization refers to the runtime optimization optimization, of a native program binary. This report describes the compiler, design and implementation of Dynamo, a prototype trace selection, dynamic optimizer that is capable of optimizing a native binary translation program binary at runtime. Dynamo is a realistic implementation, not a simulation, that is written entirely in user-level software, and runs on a PA-RISC machine under the HPUX operating system. Dynamo does not depend on any special programming language, compiler, operating system or hardware support. Contrary to intuition, we demonstrate that it is possible to use a piece of software to improve the performance of a native, statically optimized program binary, while it is executing. Dynamo not only speeds up real application programs, its performance improvement is often quite significant. For example, the performance of many +O2 optimized SPECint95 binaries running under Dynamo is comparable to the performance of their +O4 optimized version running without Dynamo. Internal Accession Date Only Ó Copyright Hewlett-Packard Company 1999 Contents 1 INTRODUCTION ........................................................................................... 7 2 RELATED WORK ......................................................................................... 9 3 OVERVIEW ................................................................................................. 11 3.1 Design Goals .................................................................................................................11 3.2 How Dynamo works ......................................................................................................12 3.3 Innovative features.........................................................................................................13 3.4 Bailing out to direct execution........................................................................................14 3.5 System calls...................................................................................................................14 3.6 Self-modifying code.......................................................................................................15 4 MEMORY LAYOUT..................................................................................... 16 4.1 Address space separation ...............................................................................................18 5 CONTROL................................................................................................... 20 5.1 Context switches............................................................................................................21 5.2 Trapping control ............................................................................................................21 5.3 Signal handling..............................................................................................................22 6 INTERPRETER ........................................................................................... 24 6.1 Plug-in interpreter interface............................................................................................24 6.2 Interpretation via just-in-time translation........................................................................25 7 TRACE SELECTOR.................................................................................... 27 7.1 Online profiling .............................................................................................................27 7.2 Speculative trace selection (SPECL) ..............................................................................28 7.2.1 Basic block selection (BLOCK)..............................................................................29 7.2.2 Static trace selection (STATIC)..............................................................................30 7.2.3 Two-level hierarchical selection (2-LEVEL) ..........................................................31 7.3 Trace selection rate ........................................................................................................32 7.4 Bailing out.....................................................................................................................33 7.5 Evaluation .....................................................................................................................34 7.6 Trace quality metric .......................................................................................................36 7.7 Selection threshold.........................................................................................................38 8 FRAGMENT OPTIMIZER............................................................................ 40 8.1 Dynamic optimization opportunities...............................................................................41 8.1.1 Exposed partial redundancies .................................................................................41 8.1.2 Suboptimal register allocation ................................................................................42 8.1.3 Shared library invocation........................................................................................44 8.2 Intermediate form ..........................................................................................................44 8.3 Fragment formation .......................................................................................................45 8.3.1 Direct branch Adjustments .....................................................................................46 2 8.3.2 Indirect branch Adjustments...................................................................................46 8.3.3 Linker Stubs...........................................................................................................46 8.4 Optimization..................................................................................................................47 8.4.1 Optimization levels ................................................................................................47 8.4.2 Speculative optimizations.......................................................................................48 8.4.3 Multi-versioning optimization ................................................................................49 8.4.4 Fragment-link-time optimization ............................................................................49 8.5 Undesirable side-effects of optimization.........................................................................51 8.6 Register allocation .........................................................................................................52 8.7 Experimental evaluation.................................................................................................52 9 FRAGMENT LINKER.................................................................................. 56 9.1 Emitting the fragment code ............................................................................................56 9.2 Pros and cons of branch linking......................................................................................56 9.3 Link records and the fragment lookup table....................................................................57 9.4 Direct branch linking .....................................................................................................58 9.5 Indirect branch linking ...................................................................................................59 9.6 Linker stubs revisited.....................................................................................................59 9.7 Branch unlinking ...........................................................................................................60 9.8 Performance benefit from linking...................................................................................62 10 FRAGMENT CACHE MANAGER ............................................................... 63 10.1 Fragment Cache design ..............................................................................................63 10.2 Working-set based Fragment Cache flushing..............................................................64 10.2.1 The case for Fragment Cache flushing ................................................................65 10.2.2 Selecting a time to flush......................................................................................66 10.2.3 Performance: memory usage...............................................................................68 10.2.4 Performance: speedups .......................................................................................71 10.3 Alternative placement strategy ...................................................................................74 10.3.1 Functionality and behavior .................................................................................75 10.3.2 Performance .......................................................................................................75 11 SIGNAL HANDLER..................................................................................... 78 12 PERFORMANCE SUMMARY ..................................................................... 80 12.1 Experimental