Optimising Java Programs Through Basic Block Dynamic Compilation
Total Page:16
File Type:pdf, Size:1020Kb
OPTIMISING JAVA PROGRAMS THROUGH BASIC BLOCK DYNAMIC COMPILATION A modified version of a thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Science and Engineering September 2002 By Ian A. Rogers Department of Computer Science Contents Abstract 11 Declaration 12 Copyright 13 The Author 14 Acknowledgements 15 Alterations 16 1 Introduction 17 1.1 The problem . 17 1.2 The solution . 18 1.3 Contributions . 21 1.4 Outline . 22 2 The Java Virtual Machine 23 2.1 What is Java? . 23 2.2 Objects . 25 2.3 Class file . 28 2.4 Class library and native methods . 30 2.5 Linking and loading . 31 2.6 Exceptions . 32 2.7 Interpreter JVMs . 34 2.8 Hardware JVMs . 37 2.9 Just-In-Time and dynamic compilation JVMs . 38 2.9.1 Method inlining . 39 2 2.10 HotSpot . 41 2.11 Summary . 43 3 Dynamic Binary Translation 45 3.1 Overview . 45 3.2 Above operating system dynamic binary translators . 49 3.2.1 FX!32 . 49 3.2.2 Dynamo . 51 3.2.3 UQDBT . 52 3.2.4 DAISY . 54 3.3 Between operating system dynamic binary translators . 54 3.3.1 VEST and mx . 55 3.3.2 Macintosh application environment . 55 3.3.3 Wabi . 56 3.3.4 VMWare . 56 3.4 Below operating system dynamic binary translators . 57 3.4.1 Transmeta . 57 3.5 Other directions . 58 3.6 Summary . 60 4 Dynamite 61 4.1 The Dynamite fuse module . 61 4.2 The Dynamite front-end module . 62 4.2.1 Building Dynamite’s intermediate representation . 62 4.2.2 Substitute calls . 63 4.3 The Dynamite kernel module . 63 4.3.1 Control . 63 4.3.2 Basic block cache . 64 4.3.3 Basic block compatibility . 64 4.3.4 Group blocks . 65 4.3.5 Dead code elimination . 65 4.3.6 Constant propagation . 65 4.3.7 Value-specific optimisation . 65 4.3.8 Code duplication . 66 4.4 Back-end . 66 4.4.1 Instruction scheduling . 66 3 4.4.2 Idiom recognition . 68 4.4.3 Code generation . 68 4.5 Summary . 68 5 Dynamite JVM 70 5.1 Instruction decoding . 70 5.1.1 Expression stack . 71 5.1.2 Control-of-flow bytecodes . 73 5.2 Exceptions . 75 5.3 Object layout . 77 5.4 Class loader . 79 5.4.1 Boot strapping . 79 5.4.2 New . 79 5.4.3 Checkcast and instanceof . 80 5.5 JNI . 81 5.6 Threading . 81 5.7 Related work . 82 5.8 Summary . 84 6 Inter-Procedure Optimisation 85 6.1 The method inlining problem . 85 6.2 Sliding register-window scheme . 87 6.3 Fixed register-window scheme . 87 6.4 Choice of register-window scheme . 91 6.4.1 Introduction of benchmarks . 91 6.4.2 Number of fix-up blocks for fixed register-window scheme . 92 6.4.3 Number of retranslations for sliding register-window scheme 96 6.4.4 Coverage of fixed register scheme . 99 6.4.5 Conclusion . 100 6.5 Summary . 101 7 Recursion 104 7.1 Lazy recursion detection . 104 7.1.1 Single recursion . 107 7.1.2 Direct mutual-recursion . 108 7.1.3 Indirect mutual-recursion . 110 4 7.2 Fixing the call stack . 111 7.2.1 Immediate stack fix-up technique . 111 7.2.2 Delayed stack fix-up technique . 112 7.2.3 Chosen scheme . 114 7.2.4 Example stack fix-up . 114 7.3 Performance . 115 7.3.1 Cost of recursion . 115 7.3.2 Cost of stack fix-up . 119 7.3.3 Performance comparison . 120 7.4 Summary . 125 8 Overall System Performance 126 8.1 Break down of JVM performance . 127 8.2 Translation saving . 129 8.3 Performance of translated code . 130 8.3.1 Externally measured performance . 130 8.3.2 Internally measured performance . 131 8.3.3 Analysis . 133 8.4 Translation performance . 135 8.5 Back-end performance . 139 8.6 Summary . 144 9 Summary and Conclusions 145 9.1 Summary . 145 9.1.1 Background . 145 9.1.2 Design . 145 9.1.3 Experiments . 146 9.2 Conclusions . 147 9.3 Future work . 148 9.3.1 Threading . 149 9.3.2 Garbage collection . 149 9.3.3 Exception handling . 150 9.3.4 Class library . 151 9.3.5 Memory usage . 152 9.3.6 Latency . 153 9.3.7 Performance . 154 5 9.4 Final remark . 154 A feJAVA Dynamite JVM Front-End 155 B Result Data 157 B.1 Number of fix-up blocks for fixed register-window scheme . 157 B.2 Number of retranslations for sliding register-window scheme . 157 B.3 Input to Takeuchi function . 160 Bibliography 162 6 List of Tables 3.1 Translation environments . 48 6.1 Benchmark translation and execution bytecode counts . 93 6.2 Percentage of method call bytecodes translated and executed in fixed register-window scheme . 95 6.3 Fixed register-window copying overhead per virtual call . 96 6.4 Overall sliding register-window statistics . 96 7.1 Descriptor abbreviation descriptions . 115 7.2 Recursive instruction counts . 117 7.3 Recursive instructions as a percentage of their base instruction . 117 7.4 Recursive bytecodes as a percentage of the total instruction mix . 117 7.5 The total size of frames saved and loaded from the memory stack by recursive invoke bytecodes . 118 7.6 The average size of frames saved to the memory stack by recursive invoke bytecodes . 118 7.7 Stack fix-up cost . 119 8.1 Number of translated bytes of a JIT compiler and the Dynamite JVM . 129 8.2 Reduction in translated bytes by basic block compilation ..