Combining Speculative Optimizations with Flexible Scheduling of Side-Effects

Combining Speculative Optimizations with Flexible Scheduling of Side-effects DISSERTATION submitted in partial fulfillment of the requirements for the academic degree Doktor der Technischen Wissenschaften in the Doctoral Program in Engineering Sciences Submitted by: Gilles Marie Duboscq At the: Institute for System Software Accepted on the recommendation of: Univ.-Prof. Dipl.-Ing. Dr. Dr.h.c. Hanspeter Mössenböck Dr. Laurence Tratt Linz, April 2016 Oracle, Java, HotSpot, and all Java-based trademarks are trademarks or registered trademarks of Oracle in the United States and other countries. All other product names mentioned herein are trademarks or registered trademarks of their respective owners. Eidesstattliche Erklärung Ich erkläre an Eides statt, dass ich die vorliegende Dissertation selbstständig und ohne fremde Hilfe verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt bzw. die wörtlich oder sinngemäß entnommenen Stellen als solche kenntlich gemacht habe. Die vorliegende Dissertation ist mit dem elektronisch übermittelten Textdokument identisch. Linz, April 2016 Gilles Marie Duboscq Abstract Speculative optimizations allow compilers to optimize code based on assumptions that cannot be verified at compile time. Taking advantage of the specific run-time situation opensmore optimization possibilities. Speculative optimizations are key to the implementation of high- performance language runtimes. Using them requires cooperation between the just-in-time compilers and the runtime system and influences the design and the implementation of both. New speculative optimizations as well as their application in more dynamic languages are using these systems much more than current implementations were designed for. We first quantify the run time and memory footprint caused by their usage. We then propose a structure for compilers that separates the compilation process into two stages. It helps to deal with this issues without giving up on other traditional optimizations. In the first stage, floating guards can be inserted for speculative optimizations. Then the guards are fixedinthe control-flow at appropriate positions. In the second stage, side-effecting instructions can bemoved or reordered. Using this framework we present two optimizations that help reduce the run-time costs and the memory footprint. We study the effects of both stages as well as the effects of these two optimizations inthe Graal compiler. We evaluate this on classical benchmarks targeting the JVM: SPECjvm2008, DaCapo and Scala-DaCapo. We also evaluate JavaScript benchmarks running on the Truffle platform that uses the Graal compiler. We find that combining both stages can bring upto 84 % improvement in performance (9 % on average) and our optimization of memory footprint can bring memory usage down by 27 % to 92 %(45 % on average). Kurzfassung Spekulative Optimierungen sind Optimierungen des Quellcodes, welche auf Annahmen basieren die nicht zur Compile-Zeit ausgewertet werden können. Sie können spezielle Laufzeit-Situationen optimieren, die im allgemeinen Fall nicht optimierbar sind. Für die Performanz einer Laufzeitum- gebung für höhere Programmiersprachen sind spekulative Optimierungen essenziell. Für diese Optimierungen müssen der Just-in-Time-Compiler und die Laufzeitumgebung eng zusammenar- beiten, was beim Design und bei der Umsetzung der beiden Komponenten berücksichtigt werden muss. Bei solchen neuen, spekulativen Optimierungen und deren Anwendung in hoch dynamischen Sprachen kann es zu einer hohen Belastung des Compilers und der Laufzeitumgebung kommen, da diese ursprünglich nicht für solche Optimierungen konzipiert wurden. Zuerst werden die Auswirkungen solcher spekulativen Optimierungen auf die Laufzeit und das Speicherverhalten quantifiziert. Anschließend wird ein neues Compiler-Design vorgestellt, in welchem die Übersetzung eines Programms in zwei Stufen stattfindet. Dadurch können die Probleme, die bei den neuen Optimierungen auftreten, umgangen werden, ohne bestehende Optimierungen negativ zu beinflussen. In der ersten Stufe können Laufzeit-Guards fürdie spekulativen Optimierungen eingeführt werden, welche ihre Position im Kontrollfluss noch ändern können. Am Ende der ersten Stufe werden diese Guards im Kontrollfluss fixiert. In der zweiten Stufe können Anweisungen mit Nebeneffekten verschoben oder umsortiert werden. Mittels dieses Frameworks werden zwei Optimierungen präsentiert, die zum einen die Laufzeit un zum anderen den Speicherverbrauch verringern. Diese Arbeit untersucht die Auswirkungen der oben beschriebenen Stufen und die Aus- wirkungen der beiden Optimierungen im Kontext des Graal-Compilers. Folgende Benchmarks wurden zur Auswertung verwendet: SPECjvm2008, DaCapo und Scala-DaCapo. Zudem werden Javascript-Benchmarks verwendet, die auf der Truffle-Plattform laufen, welche wiederum den Graal-Compiler verwendet. Wenn beide Stufen kombiniert werden, kann die Laufzeit zum bis 84 % verringert werden (im Durchschnitt 9 %). Die Speicheroptimierung kann den Speicherverbrauch von 27 % bis 92 % verringern (im Durchschnitt 45 %). Contents 1 Introduction 1 1.1 Research Context . 1 1.2 Problem Statement . 1 1.2.1 Existing Solutions . 2 1.2.2 Current Problems . 3 1.2.3 Proposed Solution . 3 1.3 Scientific Contributions . .3 1.4 Project Context . 4 1.5 Structure of this Thesis . 5 2 Background 7 2.1 Java and the Java Virtual Machine . 7 2.2 The HotSpot Java Virtual Machine . 8 2.2.1 Deoptimization . 9 2.3 The Graal Compiler . 10 2.3.1 Overall Design . 10 2.3.2 Intermediate Representation . 12 2.3.3 Snippets . 19 2.4 Truffle . 22 3 Opportunities for Speculative Optimizations 25 3.1 Traditional Usages of Speculation . 25 3.1.1 Exception Handling . 25 3.1.2 Unreached Code . 27 3.1.3 Type Assumptions . 28 3.1.4 Loop Safepoint Checks Elimination . 29 3.2 Advanced Usages of Speculation . 29 3.2.1 Speculative Alias Analysis . 30 3.2.2 Speculative Store Checks . 30 3.2.3 Speculative Guard Motion . 31 3.2.4 Truffle . 31 4 The Costs of Speculation 35 iii Contents 4.1 Runtime Costs . 35 4.1.1 Assumptions vs. Guards . 36 4.2 Memory Footprint . 38 4.2.1 Experimental Data . 38 4.3 Managing Deoptimization Targets . 43 4.3.1 Java Memory Model Constraints . 43 4.3.2 Deoptimizing to the Previous Side-effect . 45 4.3.3 Fixed Deoptimizations . 46 4.4 Conclusion . 47 5 Optimization Stages 49 5.1 First Stage: Optimizing Guards . 50 5.1.1 Representation . 50 5.1.2 Optimizations . 50 5.2 Stage Transition . 52 5.2.1 Guard Lowering . 52 5.2.2 FrameState Assignment . 53 5.3 Second Stage: Optimizing Side-effecting Nodes . 56 6 Case Studies 59 6.1 Speculative Guard Motion . 59 6.1.1 Rewriting Bounds Checks in Loops . 60 6.1.2 Speculation Log . 62 6.1.3 Policy . 64 6.1.4 Processing Order . 64 6.2 Deoptimization Grouping . 67 6.2.1 Relation with Deoptimization Metadata Compression . 68 6.3 Vectorization . 69 7 Evaluation 73 7.1 Methodology . 73 7.1.1 Benchmarks . 73 7.1.2 Baseline . 75 7.2 Compilation Stages . 75 7.2.1 Effects of the First Stage . 75 7.2.2 Effects of the Second Stage . 81 7.3 Speculative Guard Motion . 83 7.4 Comparison to Other Compilers . 87 7.4.1 C1 . 88 7.4.2 C2 . 88 7.5 Deoptimization Grouping . 91 7.5.1 Debug Information Sharing . 91 7.6 Deoptimization Grouping without Debug Information Sharing . 94 iv Contents 7.6.1 Combining Debug Information Sharing and Deoptimiza- tion Grouping . 94 8 Related Work 97 8.1 Intermediate Representations . 97 8.2 Deoptimization . 98 8.2.1 Exception Handling . 99 8.3 Speculative Guard Motion . 100 8.4 Deoptimization Data . 100 8.4.1 Delta encoding . 100 8.4.2 Deoptimization Metadata Size . 101 8.4.3 Deoptimizing with Metadata vs. with Specialized Code 101 9 Summary 103 9.1 Future Work . 103 9.2 Conclusion . 104 List of Figures 107 List of Tables 109 List of Listings 111 List of Algorithms 113 Bibliography 115 v Acknowledgments I thank my advisor, Professor Hanspeter Mössenböck for his valuable feed- back, for welcoming me in the System Software institute and for his patience throughout the process. I also thank my second advisor, Dr Laurence Tratt for his comments. Both their efforts helped greatly in improving the quality of my thesis. I thank Oracle Labs for funding my work and supporting many other academic projects at the System Software institute. In particular, I thank the VM research group who has allowed me to work of such an interesting subject. I was extremely lucky to be part of the Graal project from the start and to see it take shape. I thank the people I met between Oracle Labs and the System Software institute: Thomas Würthinger without whom this project would not have been possible, but also Lukas Stadler, Doug Simon, Bernhard Urban, Tom Rodriguez, Stefan Marr, Andreas Wöß, Christian Wimmer and Felix Kaser for sharing ideas and enriching discussions. I thank my parents, Bertrand and Béatrice and my family, in particular my godmother Lise, for supporting and encouraging me throughout my life and in these studies. Finally, I thank Aga for her support and patience during this process. vii Chapter 1 Introduction 1.1 Research Context Programming language implementations can divide their work between a preparation step that runs before the program starts and an execution step that happens when the program is running. Unless the program is trivial, there is always work to be done at run time. At the very least, the program must run. However the amount of work done ahead of runtime varies a lot. Some language implementations do as much work as possible during the preparation step and compile the program down to a binary that can be run by the target computer. On the other hand, some language implementations do not perform any preparation at all and do everything at runtime. This is often the case for the implementation of programming languages that use a lot of dynamic features. Program optimization is almost ubiquitous in implementations that con- centrate on the preparatory phase. Optimizing compilers use extensive static analysis to try to produce a binary form of the program that executes as fast as possible.

Load more