Symbolic Execution for Compiler Optimizations
Total Page:16
File Type:pdf, Size:1020Kb
Submitted by Sebastian Kloibhofer, BSc Submitted at Institute for System Software Supervisor o.Univ.-Prof. DI Dr.Dr.h.c. Hanspeter M¨ossenb¨ock Co-Supervisor DI Dr. David Leopoldseder September 2020 Symbolic Execution for Compiler Optimizations Master Thesis to obtain the academic degree of Diplom-Ingenieur in the Master's Program Computer Science JOHANNES KEPLER UNIVERSITY LINZ Altenbergerstraße 69 4040 Linz, Osterreich¨ www.jku.at DVR 0093696 II III Abstract Modern compilers apply many different optimizations to programs without additional input from the developer. Method inlining, constant folding and escape analysis represent just some of the possibilities by which program code may be transformed to increase run time performance. Naturally, with each optimization there is also a trade-off: For ahead- of-time (AOT) compilers it implies longer build time, as more time is spent on analyzing the source code. Just-in-Time (JIT) compilers are even more affected, as they take longer until optimization is done and the compiled code is available. The GraalVM is a state-of-the-art Java Virtual Machine which specializes in such optimizations. The Graal JIT compiler actively gathers profiling information during pro- gram execution as to then use it for generating highly optimized machine code. While existing optimization phases are based on established patterns and heuristics to identify targets, formal methods may reveal further optimization potential via logical reasoning. The number of optimizations and the transformation which source code undergoes be- fore being compiled, make it even harder for developers to manually identify promising optimization targets, let alone verify them. This thesis proposes an approach for integrating symbolic execution into the Graal compiler in order to find new optimization candidates automatically. By utilizing the intermediate representation of the Graal compiler, we can generate holistic formulas from methods to enable reasoning about individual expressions and statements. This process allows us to search for optimization patterns which may in turn be implemented as actual compiler phases. We showcase a custom optimization phase for the Graal compiler and multiple optimization techniques, which we evaluated on a variety of benchmarks as well as the Graal compiler test suite. Our findings include several bug reports for high- performance theorem provers, as well as multiple optimization patterns not yet recognized by the Graal compiler, including arithmetic operations, floating-point computations, and Boolean simplifications. An evaluation of identified optimizations showed performance improvements of up to 15% in microbenchmarks. IV V Kurzfassung Entwickler profitieren heutzutage von einer Vielzahl an Optimierungen, welche moderne Compiler automatisch anwenden. Transformationen wie Methoden-Inlining, Constant Folding oder Escape Analysis spielen eine wichtige Rolle um die Laufzeitleistung zu verbessern. Gleichzeitig bringen derartige Optimierungen jedoch meist auch Nachteile mit sich. So ergeben sich f¨urAhead-of-Time (AOT) Compiler durch die angewandten Optimierungsalgorithmen sowie die Analyse des Programmcodes l¨angereBuild-Zeiten. Noch mehr befangen sind Just-in-Time (JIT) Compiler, da sie Programme zur Laufzeit compilieren, womit weitere Optimierungen diesen Prozess m¨oglicherweise verz¨ogern. GraalVM ist eine moderne Java Virtual Machine, spezialisiert auf derartige Opti- mierungen. Der Graal JIT Compiler analysiert Laufzeitinformation, welche von laufenden Programmen gesammelt und sp¨aterzur Erzeugung von hochgradig optimiertem Maschi- nencode verwendet wird. Die dabei angewandten Optimierungen basieren zumeist auf etablierten Mustern, sowie bew¨ahrten Heuristiken, um Optimierungsziele in Program- men zu finden bzw. deren Konformit¨atfestzustellen. Die Integration von formale Meth- oden k¨onnte dabei Abhilfe schaffen, neue Optimierungen festzustellen und zu beweisen | manuelles Vorgehen wird durch die Vielzahl an Optimierungsschritten zwischen dem Programm- und dem resultierenden Maschinencode erheblich erschwert. In dieser Arbeit beschreiben wir unseren Ansatz zur Integration einer Symbolic Ex- ecution Engine in den Graal Compiler, um damit automatisiert neue Optimierungsziele zu identifizieren. Symbolic Execution hilft uns dabei, aus der Zwischensprache des Compilers (Graal IR) logische Formeln zu generieren, welche die Programmsemantik beschreiben und es uns gleichzeitig erm¨oglichen, entsprechende Optimierungen zu pr¨ufen und zu beweisen. In einem n¨achsten Schritt k¨onnenso gefundene Muster als native Optimierungsphasen in den Compiler integriert werden. Auf diesem Ansatz basierend, entwickelten wir im Laufe unserer Arbeit eine Com- pilerphase f¨urden Graal Compiler, welche mithilfe von Symbolic Execution verschieden- ste Optimierungsziele erkennen kann. Benchmarks sowie vorhandene Graal-Test-Suites brachten dabei eine Reihe bisher unbekannter Optimierungen hervor, darunter Verein- fachungen arithmetischer bzw. Fließkommaoperationen, sowie Erkennung redundanter Vergleiche und aussagenlogischer Operationen. Dar¨uber hinaus ergab die Arbeit an der Symbolic Execution Engine mehrere Bugs in modernen Theorembeweisern. Unsere ab- schließende Evaluierung der Arbeit und der gefundenen Optimierungsmuster konnte in Microbenchmarks Leistungssteigerungen von bis zu 15% zeigen. VI VII Contents 1 Introduction 1 1.1 Motivation . .2 1.2 Challenges . .2 1.3 Contributions . .3 1.4 Thesis Structure . .4 2 System Overview 7 2.1 Java Virtual Machine . .7 2.2 GraalVM . .8 2.2.1 Graal Compiler . .9 2.2.1.1 Graal IR . 10 2.2.1.2 Graal Compilation Tiers . 13 2.2.1.3 Optimization Phases . 14 3 Symbolic Execution 15 3.1 Analysing Software Symbolically . 15 3.1.1 Symbolic Execution in the Field of Program Analysis . 18 3.1.2 Memory Modeling . 20 3.1.3 Constraint Solving . 20 3.1.3.1 SMT-LIB . 21 3.1.4 Limitations of Symbolic Execution . 23 3.2 Graal Symbolic Execution . 24 3.2.1 Preparation of the Graal IR . 25 3.2.2 Traversal and State Propagation . 26 3.2.3 Constraint Generation from Graal IR . 27 3.2.3.1 SymJEx Expression IR . 29 3.2.3.2 A Type System for Java Values in SMT-LIB . 30 3.2.4 Memory Modeling . 30 3.2.5 Constraint Solver Interface . 32 3.2.6 Limitations . 35 3.2.6.1 Restrictions on Input Domain . 35 3.2.6.2 Limitations of the Reasoning Process . 36 3.2.6.3 Limitations in Solving . 36 4 Symbolic Execution for Compiler Optimizations 37 4.1 Unbounded Reasoning . 37 4.1.1 Dominator-based traversal . 38 4.1.2 Block-based Constraints . 39 VIII CONTENTS 4.2 Optimization Algorithm . 41 4.3 System Architecture . 42 4.4 Symbolic Optimization Phase Implementation . 43 4.4.1 Error Handling . 44 4.4.2 Optimization Template Design . 45 4.4.2.1 Implemented Optimizations . 45 4.5 Example: Optimizing String.charAt .................. 48 4.6 Limitations . 53 5 Evaluation 55 5.1 Identified Optimization Patterns . 55 5.2 Solver-Time Analysis . 58 5.3 Compile-Time Analysis . 59 5.4 Performance Analysis . 65 5.4.1 Performance Microbenchmarks . 65 5.5 Discussion of Results . 67 6 Related Work 69 6.1 Symbolic Execution Engines . 69 6.2 Formal Methods in Compilers . 71 6.3 Symbolic Execution for Optimization . 71 7 Future Work 75 7.1 Implementation as Native Compilation Phases . 75 7.2 Extension of Capabilities . 75 7.3 Alternative Traversal Methods . 76 7.4 Performance Optimization . 76 7.5 Toolkit Improvements . 77 8 Conclusion 79 1 Chapter 1 Introduction In recent years, formal methods and tools have experienced a significant gain in influ- ence in the software development lifecycle. System designers have started to use theorem provers to verify critical systems, operating system kernels, and compilers while model- based or contract-driven development methodologies leverage precise specifications re- quirements definitions in formal languages to ensure conformity of the corresponding implementations [155, 104]. In addition to that, research has shown that tools and meth- ods in this area are manifold, ranging from model checking techniques [113] over run-time assertion checks [20] to applications in continuous integration environments [57]. One concept, which is also often explored in this context, is symbolic execution. Sym- bolic execution is a program analysis technique that enables reasoning about erroneous program paths and states with respect to run-time semantics such as data flow, memory effects, loops, and also IO to a certain degree. Typically, symbolic execution engines cre- ate an algebraic model of a program that describes the behavior of individual statements and expressions, thus enabling analysis of execution paths. Theorem provers are then used to check whether program faults (assertions, null pointers, division-by-0, etc.) can occur. Nowadays, symbolic execution engines are available for many different languages (source, intermediate languages, or even assembly) and research describes numerous suc- cess stories. Notable examples include bugs identified in NASA software [125], automated test generation for GNU core utilities [43], verification of kernel functions [101, 92] as well as the identification of security flaws in Windows [76]. Symbolic execution also proves to be a useful tool in other areas: Prepack [70], de- veloped by Facebook, or SPEjs by S¨usl¨uet al. [149, 148] are partial evaluators that use symbolic execution to minimize source bundle sizes by simplifying intermediate compu- tations and by removing dead code. Additionally, other approaches suggest applying symbolic execution and model checking in the