Submitted by Sebastian Kloibhofer, BSc

Submitted at Institute for System

Supervisor o.Univ.-Prof. DI Dr.Dr.h.c. Hanspeter M¨ossenb¨ock

Co-Supervisor DI Dr. David Leopoldseder

September 2020

Symbolic Execution for Optimizations

Master Thesis to obtain the academic degree of Diplom-Ingenieur in the Master’s Program Science

JOHANNES KEPLER UNIVERSITY LINZ Altenbergerstraße 69 4040 Linz, Osterreich¨ www.jku.at DVR 0093696 II III

Abstract

Modern apply many different optimizations to programs without additional input from the developer. Method inlining, constant folding and escape analysis represent just some of the possibilities by which program code may be transformed to increase run time performance. Naturally, with each optimization there is also a trade-off: For ahead- of-time (AOT) compilers it implies longer build time, as more time is spent on analyzing the . Just-in-Time (JIT) compilers are even more affected, as they take longer until optimization is done and the compiled code is available. The GraalVM is a state-of-the-art Java which specializes in such optimizations. The Graal JIT compiler actively gathers profiling information during pro- gram execution as to then use it for generating highly optimized . While existing optimization phases are based on established patterns and heuristics to identify targets, formal methods may reveal further optimization potential via logical reasoning. The number of optimizations and the transformation which source code undergoes be- fore being compiled, make it even harder for developers to manually identify promising optimization targets, let alone verify them. This thesis proposes an approach for integrating symbolic execution into the Graal compiler in order to find new optimization candidates automatically. By utilizing the intermediate representation of the Graal compiler, we can generate holistic formulas from methods to enable reasoning about individual expressions and statements. This process allows us to search for optimization patterns which may in turn be implemented as actual compiler phases. We showcase a custom optimization phase for the Graal compiler and multiple optimization techniques, which we evaluated on a variety of benchmarks as well as the Graal compiler test suite. Our findings include several bug reports for high- performance theorem provers, as well as multiple optimization patterns not yet recognized by the Graal compiler, including arithmetic operations, floating-point computations, and Boolean simplifications. An evaluation of identified optimizations showed performance improvements of up to 15% in microbenchmarks. IV V

Kurzfassung

Entwickler profitieren heutzutage von einer Vielzahl an Optimierungen, welche moderne Compiler automatisch anwenden. Transformationen wie Methoden-Inlining, Constant Folding oder Escape Analysis spielen eine wichtige Rolle um die Laufzeitleistung zu verbessern. Gleichzeitig bringen derartige Optimierungen jedoch meist auch Nachteile mit sich. So ergeben sich f¨urAhead-of-Time (AOT) Compiler durch die angewandten Optimierungsalgorithmen sowie die Analyse des Programmcodes l¨angereBuild-Zeiten. Noch mehr befangen sind Just-in-Time (JIT) Compiler, da sie Programme zur Laufzeit compilieren, womit weitere Optimierungen diesen Prozess m¨oglicherweise verz¨ogern. GraalVM ist eine moderne , spezialisiert auf derartige Opti- mierungen. Der Graal JIT Compiler analysiert Laufzeitinformation, welche von laufenden Programmen gesammelt und sp¨aterzur Erzeugung von hochgradig optimiertem Maschi- nencode verwendet wird. Die dabei angewandten Optimierungen basieren zumeist auf etablierten Mustern, sowie bew¨ahrten Heuristiken, um Optimierungsziele in Program- men zu finden bzw. deren Konformit¨atfestzustellen. Die Integration von formale Meth- oden k¨onnte dabei Abhilfe schaffen, neue Optimierungen festzustellen und zu beweisen — manuelles Vorgehen wird durch die Vielzahl an Optimierungsschritten zwischen dem Programm- und dem resultierenden Maschinencode erheblich erschwert. In dieser Arbeit beschreiben wir unseren Ansatz zur Integration einer Symbolic Ex- ecution Engine in den Graal Compiler, um damit automatisiert neue Optimierungsziele zu identifizieren. Symbolic Execution hilft uns dabei, aus der Zwischensprache des Compilers (Graal IR) logische Formeln zu generieren, welche die Programmsemantik beschreiben und es uns gleichzeitig erm¨oglichen, entsprechende Optimierungen zu pr¨ufen und zu beweisen. In einem n¨achsten Schritt k¨onnenso gefundene Muster als native Optimierungsphasen in den Compiler integriert werden. Auf diesem Ansatz basierend, entwickelten wir im Laufe unserer Arbeit eine Com- pilerphase f¨urden Graal Compiler, welche mithilfe von Symbolic Execution verschieden- ste Optimierungsziele erkennen kann. Benchmarks sowie vorhandene Graal-Test-Suites brachten dabei eine Reihe bisher unbekannter Optimierungen hervor, darunter Verein- fachungen arithmetischer bzw. Fließkommaoperationen, sowie Erkennung redundanter Vergleiche und aussagenlogischer Operationen. Dar¨uber hinaus ergab die Arbeit an der Symbolic Execution Engine mehrere Bugs in modernen Theorembeweisern. Unsere ab- schließende Evaluierung der Arbeit und der gefundenen Optimierungsmuster konnte in Microbenchmarks Leistungssteigerungen von bis zu 15% zeigen. VI VII

Contents

1 Introduction 1 1.1 Motivation ...... 2 1.2 Challenges ...... 2 1.3 Contributions ...... 3 1.4 Thesis Structure ...... 4

2 System Overview 7 2.1 Java Virtual Machine ...... 7 2.2 GraalVM ...... 8 2.2.1 Graal Compiler ...... 9 2.2.1.1 Graal IR ...... 10 2.2.1.2 Graal Compilation Tiers ...... 13 2.2.1.3 Optimization Phases ...... 14

3 Symbolic Execution 15 3.1 Analysing Software Symbolically ...... 15 3.1.1 Symbolic Execution in the Field of Program Analysis ...... 18 3.1.2 Memory Modeling ...... 20 3.1.3 Constraint Solving ...... 20 3.1.3.1 SMT-LIB ...... 21 3.1.4 Limitations of Symbolic Execution ...... 23 3.2 Graal Symbolic Execution ...... 24 3.2.1 Preparation of the Graal IR ...... 25 3.2.2 Traversal and State Propagation ...... 26 3.2.3 Constraint Generation from Graal IR ...... 27 3.2.3.1 SymJEx Expression IR ...... 29 3.2.3.2 A Type System for Java Values in SMT-LIB ...... 30 3.2.4 Memory Modeling ...... 30 3.2.5 Constraint Solver Interface ...... 32 3.2.6 Limitations ...... 35 3.2.6.1 Restrictions on Input Domain ...... 35 3.2.6.2 Limitations of the Reasoning Process ...... 36 3.2.6.3 Limitations in Solving ...... 36

4 Symbolic Execution for Compiler Optimizations 37 4.1 Unbounded Reasoning ...... 37 4.1.1 Dominator-based traversal ...... 38 4.1.2 Block-based Constraints ...... 39 VIII CONTENTS

4.2 Optimization Algorithm ...... 41 4.3 System Architecture ...... 42 4.4 Symbolic Optimization Phase Implementation ...... 43 4.4.1 Error Handling ...... 44 4.4.2 Optimization Template Design ...... 45 4.4.2.1 Implemented Optimizations ...... 45 4.5 Example: Optimizing String.charAt ...... 48 4.6 Limitations ...... 53

5 Evaluation 55 5.1 Identified Optimization Patterns ...... 55 5.2 Solver-Time Analysis ...... 58 5.3 Compile-Time Analysis ...... 59 5.4 Performance Analysis ...... 65 5.4.1 Performance Microbenchmarks ...... 65 5.5 Discussion of Results ...... 67

6 Related Work 69 6.1 Symbolic Execution Engines ...... 69 6.2 Formal Methods in Compilers ...... 71 6.3 Symbolic Execution for Optimization ...... 71

7 Future Work 75 7.1 Implementation as Native Compilation Phases ...... 75 7.2 Extension of Capabilities ...... 75 7.3 Alternative Traversal Methods ...... 76 7.4 Performance Optimization ...... 76 7.5 Toolkit Improvements ...... 77

8 Conclusion 79 1

Chapter 1

Introduction

In recent years, formal methods and tools have experienced a significant gain in influ- ence in the software development lifecycle. System designers have started to use theorem provers to verify critical systems, kernels, and compilers while model- based or contract-driven development methodologies leverage precise specifications re- quirements definitions in formal languages to ensure conformity of the corresponding implementations [155, 104]. In addition to that, research has shown that tools and meth- ods in this area are manifold, ranging from model checking techniques [113] over run-time assertion checks [20] to applications in continuous integration environments [57]. One concept, which is also often explored in this context, is symbolic execution. Sym- bolic execution is a program analysis technique that enables reasoning about erroneous program paths and states with respect to run-time semantics such as flow, memory effects, loops, and also IO to a certain degree. Typically, symbolic execution engines cre- ate an algebraic model of a program that describes the behavior of individual statements and expressions, thus enabling analysis of execution paths. Theorem provers are then used to check whether program faults (assertions, null pointers, division-by-0, etc.) can occur. Nowadays, symbolic execution engines are available for many different languages (source, intermediate languages, or even assembly) and research describes numerous suc- cess stories. Notable examples include bugs identified in NASA software [125], automated test generation for GNU core utilities [43], verification of kernel functions [101, 92] as well as the identification of security flaws in Windows [76]. Symbolic execution also proves to be a useful tool in other areas: Prepack [70], de- veloped by Facebook, or SPEjs by S¨usl¨uet al. [149, 148] are partial evaluators that use symbolic execution to minimize source bundle sizes by simplifying intermediate compu- tations and by removing dead code. Additionally, other approaches suggest applying symbolic execution and model checking in the context of compilers or partial evalua- tors to ensure compliance of the produced results or run the analysis on a compiler’s intermediate representation [83, 96, 113]. This thesis extends such concepts to the inner workings of a modern Just-in-Time (JIT ) compiler, the Graal compiler. This compiler uses profiling information gathered at run time to perform optimizations on an intermediate representation—Graal IR— and is a major component of the GraalVM [64, 146, 99]. GraalVM is a multi-purpose Java Virtual Machine (JVM), which specializes in run-time optimization of programs. It additionally offers the language implementation framework Truffle, to allow execution of dynamic languages—JavaScript, Python, Ruby—and native languages such as C/C++, 2 1.1. MOTIVATION while fully utilizing the capabilities of the Graal compiler [156, 159]. In this work, we present a novel compilation phase that uses symbolic execution for proving additional optimizations outside of existing Graal optimization phases. While existing optimizations are based on common patterns and heuristics developed over years, this technique allows us to identify further patterns by checking properties of frequent IR nodes and applying global reasoning over whole method graphs. Conceptually, we generate a set of constraints for statements—similar to traditional symbolic execution engines—and then check a predefined set of Graal IR nodes for optimization potential such as constant inputs/results.

1.1 Motivation

Compiler optimizations aim to simplify program code by transforming expensive opera- tions such as method calls or loops to cheaper alternatives and reducing the overall size of the executing code while preserving original program semantics. Compiler optimizations are typically defined on reoccurring patterns with limited scope, as to not compromise compilation time. The end goal nevertheless remains the same: Simplification of the un- derlying code via reduction or by replacing expensive operations with cheaper alternatives (e.g. calling a method call vs. inlining the method code). Eventually, this leads to better- optimized machine code and therefore run-time benefits such as higher performance or better memory utilization. With this project, we aim to identify additional optimizations by checking individual operations (nodes) within the Graal compiler’s intermediate representation (Graal IR). By using symbolic execution, we can, as an example, show that certain operations always result in constant values or how individual conditions always represent a tautology/- contradiction, therefore rendering whole program paths infeasible. Thus, we can reduce program complexity and subsequently the run time. Integration into the Graal compiler allows us to reap the benefits of the IR, in the form of existing compiler optimizations and additional compile-time information, compared to typical analysis methods on source code. The benefit of using symbolic execution for optimizations is the ability to reason about global effects. By breaking down complex control and data flow structures into logical constraints, we can check inputs or results at arbitrary locations in the code (IR). This significantly extends the possibilities for optimizations compared to built-in phases, as they are usually limited to local observations. A subsequent analysis of those results may then show previously undetected or ignored optimization patterns. Implementing such patterns in an effective way would then make those optimizations available in general and prevent the overhead that symbolic execution at inevitably implicates. Ideally, this process then yields new compiler phases or highlights weaknesses in existing ones, to be then adapted according to the new insights.

1.2 Challenges

As this project touches upon both symbolic execution and JIT compilation, we also have to face several challenges in both areas. Research has shown that certain program constructs represent challenges for contemporary symbolic execution engines or have to be approximated to allow further reasoning. Similarly, we are bound to the restrictions CHAPTER 1. INTRODUCTION 3 of the Graal compilation process as well as the context in which the compiler phase is executed.

Loops The behavior of loops is hard to model with static constraints, such as used in symbolic execution [24]. When the iteration count of a loop cannot be determined statically, symbolic execution faces a problem called state explosion—a rapid increase in potential program paths that exceeds the capabilities of the engine or even the system. This is due to the unknown number of times that the loop may be iterated, whereby symbolic execution engines naively have to consider all possible paths—i.e. iterations— through the loop. Therefore, symbolic execution engines use techniques such as approximations or un- rolling of several loop iterations to enable analysis with lower precision as a trade-off. In this system, we leverage some of the capabilities of the Graal Compiler to gain more information about loops but fundamentally share the same limitations.

Memory effects Contemporary symbolic execution engines often use a memory or heap model to capture memory operations. Due to our focus on low-level Graal IR and the thus resulting deviations from the standard Java memory model, we restrict our optimizations to primitive types. Nevertheless, we still have to consider memory access when generating constraints. As our optimizations have to apply for all values, we can neither ignore memory reads/writes nor make naive assumptions. Thus, we treat those values as unrestricted, unknown variables as to loosen their constraints, while still preventing false-positives. Despite that, our evaluation still shows that we can reason about optimizations even when memory is involved.

Java-SMT-Mismatch Internally, our approach leverages the SMT-LIB format to in- voke SMT solvers. This format is standardized and widely supported by theorem provers, but we identified several instances where the semantics of certain operations do not match the definitions from the JVM specification. Therefore, we have to emulate certain Java- specific behavior to keep our formulas sound.

Graal Compiler Ecosystem As our proposed symbolic execution compiler phase is tightly coupled with the Graal compiler, all performed optimizations have to abide by the rules of the GraalVM. This means that the IR must remain stable across all changes that occur while performing optimizations. Since the IR is subsequently translated to machine code, inconsistencies or flaws directly impact performance or can introduce faults into the compiled code. Similarly, due to the level at which optimizations take place, the IR may already contain machine-dependent structures (e.g. specialized operations, intrinsics) [64, 123]. The mapping from IR to constraints has to take this into account.

1.3 Contributions

Our contributions in this area center around integrating a symbolic execution engine into the Graal compiler. We adapted and contributed to the symbolic execution engine on GraalVM, called SymJEx. This represents the base of our approach by allowing us to generate constraints from Graal IR and invoke theorem provers. During this process, 4 1.4. THESIS STRUCTURE

testing also revealed several faults in popular solver backends, which we subsequently identified and reported in [3, 4, 13, 7]. Then, we developed a compilation phase that works on a scheduled Graal IR in the low-tier context of the Graal compiler and uses the aforementioned engine to generate con- straints. The phase features a framework for individual optimizations on specific Graal IR nodes, based on the accumulated symbolic constraints for a given program state. Notable examples for such optimizations include checks for boolean operations, (integer, bitwise, floating-point) arithmetic as well as bitwise operations. The implemented framework also allows us to easily expand our optimizations to new target operations, while the overall phase can be added as a plug-in to the GraalVM. The optimization phase also introduces a dominator-based algorithm (Section 4.1) for traversing the Graal IR to generate valid constraints that map supported Java semantics to a single logic formula. The latter is accomplished via an extension of the symbolic execution engine to, on the one hand, support this reasoning process and on the other hand manage the IR at this level correctly. Our approach to generating constraints differs from traditional symbolic execution engines due to our approach of propagating a holistic formula across all program states by linking constraints to corresponding program blocks (cf. Section 2.2.1.1)—a process we call block-based reasoning (Section 4.1.2. This allows us to generate the whole formula and cover all operations of a method in a single traversal of the IR. Finally, we performed an extensive evaluation of the optimization phase to find poten- tial improvements for the Graal compiler. Therefore, we used benchmark suites such as SPECjvm2008, Dacapo, Octane as well as CLBG1 examples to stress the phase and verify the applied optimizations. We furthermore evaluated the effects of the optimizations on the compilation process and the resulting code. Additionally, we captured identified op- timizations in a microbenchmark suite to measure their impact on run-time performance.

1.4 Thesis Structure

The first part of this thesis—Chapter 2—explains the system architecture and describes the foundation on which this project is built, namely the GraalVM and its compiler. First, the chapter gives a brief overview of the components of the GraalVM. Then, some of the novel techniques of the Graal compiler such as speculative optimizations or the intermediate representation are explained, in combination with an overview of existing optimizations. Additionally, the compilation process is outlined to highlight where and at which level in the compiler our phase is executed. We also describe compiler-specific concepts such as basic blocks and node scheduling as they are key for our unbounded reasoning approach. Chapter 3 is dedicated to the concept of symbolic execution. We explain the key aspects of this technique and highlight the differences from related program analysis techniques. Constraint generation from (source) code is an important part of every sym- bolic execution engine, hence this chapter explores some approaches that are proposed in literature to create and optimize logic formulas. Furthermore, we go into detail about the application of theorem provers and also explain the SMT-LIB standard briefly. This stan- dard and language format emphasizes the capabilities as well as limitations of common solvers and therefore impacted the development of our own symbolic execution engine for

1benchmarksgame-team.pages.debian.net/benchmarksgame CHAPTER 1. INTRODUCTION 5

GraalVM. This engine—SymJEx—is subsequently described in detail, as it was devel- oped simultaneously and is used in our approach to generate and store constraints and as an interface to the theorem provers. We showcase the advantages that the GraalVM provides compared to typical execution environments and explain the major components and ideas behind our engine. The core part in Chapter 4 explains the concept of using symbolic execution to iden- tify compiler optimizations. It very much embraces the ideas of symbolic execution, nevertheless we had to adapt some processes such as the formula creation and the solver interaction, to enable this kind of use case. Those deviations and our approach to traverse the IR are described in more detail in this section. Additionally, it features an outline of the general algorithm behind the compilation phase, as well as a list of the implemented optimization templates. The evaluation of our solution is showcased via several benchmark results, which are displayed in Chapter 5. In addition to performance measures, we also show the overhead that solving/optimizing introduces during compilation and also showcase the performance impact on a dedicated microbenchmark suite. Finally, Chapter 6 covers other applications of symbolic optimizations in similar con- texts and also mentions other research in this area. This also leads to an outlook of further improvements and future work in Chapter 7 followed by a summary and conclusion over the discussed matter in Chapter 8. 6 1.4. THESIS STRUCTURE 7

Chapter 2

System Overview

From a technical perspective, our system is intertwined with the Java environment, specif- ically the Java Virtual Machine and its target derivative, the GraalVM. In this section, we discuss those components and present some of the intricacies of both the JVM and GraalVM, with brief mentions of components such as the Truffle framework, polyglot execution, and SubstrateVM. As the compilation process is one of the main topics of this thesis, we put much focus on the Graal compiler. Specifically, the intermediate repre- sentation, the compilation process, and some of the optimization phases are described in more detail, as they are important later on when we discuss our particular implementation and its integration into the GraalVM.

2.1 Java Virtual Machine

The Java Virtual Machine (JVM) is a virtual machine in which Java [150] is executed [150], typically consisting among others of a bytecode as well as a Just-in-Time (JIT) compiler. In most JVM implementations, the bytecode is initially executed by an interpreter, while the JIT compiler generates machine code from frequently executed parts. This facilitates significant speed-ups compared to the slow interpreter, while still preserving the platform independence of the Java bytecode format. Executing bytecode also entails that the JVM is not restricted to the Java but supports all source languages that compile to Java bytecode. Hence, languages such as Kotlin [6], Scala [9], Clojure [2] or Apache Groovy [1] can be executed with the additional benefit of interoperability on the bytecode level. Due to its performance and stability, over the years several other languages also received ports for execution on the JVM, notably Jython (Python) [14] and JRuby [8]. Figure 2.1 contains an overview of the JVM and depicts the transformation from source code in various languages to Java bytecode via the Java compiler, as well as the consecutive interpretation and compilation at run time. The bytecode instructions of a Java program are stored in binary format as so-called class files [150]. At run time, the class of the corresponding JVM implementation loads the bytecode from those files and verifies the types and definitions to prevent security issues. The virtual machine at run time works with a garbage-collected heap for object or array allocations as well as a separate region for method frames. The actual garbage collection algorithms are specific to the individual implementations. While multiple vendors offer JVM implementations, one of the most commonly dis- tributed instances stems from Oracle, namely HotSpot. The HotSpot JVM is typically 8 2.2. GRAALVM

Java Virtual Machine

Java Class Loader Source code Bytecode (Java, Scala, Kotlin, ...) (Loading, Linking, Initialization) (Class files)

javac store bytecode in scalac run-time data area kotlinc Method Area Stack

Heap Registers

Runtime Data

directly interpret compile hot spots bytecode initially to machine code

JIT Compiler Bytecode (HotSpot client/server Interpreter compiler)

Execution Engine

integrate via JVMCI

Graal JIT compiler

Figure 2.1: Overview of the Java Virtual Machine.

shipped in the Java Runtime Environment (JRE)—for execution on end-user devices—or the Java Development Kit (JDK)—for development purposes. It implements both the interpreter and JIT compiler as well as multiple garbage collection algorithms. At run time, the VM actively tracks hot spots—i.e., frequently executed code parts—to target them specifically for compilation and subsequent optimization. Therefore, HotSpot features two versions—client [93] and server [123]—that are intended for different usage scenarios. While the client version ensures fast start-up times by utilizing the interpreter more frequently, the server VM puts more emphasis on the JIT compiler to achieve better run-time performance. More aggressive optimizations (inlining [129], global value numbering [64, 51], global code motion [51]) and therefore longer compilation times are especially useful for long-running server applications where the initial warm-up time is not a key metric [123].

2.2 GraalVM

GraalVM [64, 146, 99] is a multi-language Java Virtual Machine that is specialized in run-time performance and polyglot execution to allow cross-language interaction between CHAPTER 2. SYSTEM OVERVIEW 9

supported languages. Therefore, it not only allows execution of typical JVM languages— Java, Scala, Kotlin—but also dynamic languages such as JavaScript [16], Python [17] or Ruby [18] in addition to languages that translate to LLVM bitcode [97] (C/C++). For this purpose, GraalVM offers the Truffle Language Implementation Framework, which enables guest language execution via implementation of an Abstract Syntax Tree (AST) interpreter [159, 157]. Additionally, different deployment targets such as common Open- JDK environments, Node.js or Oracle Database are supported [122]. With Native Image / Substrate VM (SVM) [156], GraalVM also introduces Ahead-of-Time (AOT) compilation to the JVM ecosystem. This enables users to generate standalone from Java bytecode, thus mitigates the start-up overhead that is typically introduced by execution on the JVM, in combination with aggressive offline optimizations and points-to analysis at the cost of longer build times.

2.2.1 Graal Compiler Similar to other compilers, the HotSpot server and client compilers are written in a system language—C++ in this case. While this approach seems obvious to minimize compile time, the compiler of the GraalVM somewhat contradicts this idea. The Graal compiler [64, 146, 99], a dynamic Just-in-Time (JIT) compiler, sits at the heart of the GraalVM and parses the incoming bytecode of target methods. It is written in Java to increase maintainability and extensibility, as well as to allow integration of other GraalVM tools such as the Truffle Framework or GraalVM Native Image [159, 156]. The main goal of the Graal compiler is to generate highly optimized machine code. To achieve this, the compiler uses profiling information that is provided by HotSpot and gathered during execution. Those methods that exceed certain invocation count thresholds are compiled, with the profiling information being used to optimize the code accordingly—unrolling loops to a certain degree, inline frequently called methods, etc. Another key benefit based on profiling information comes from speculative optimiza- tions. Speculative optimizations are a technique that Graal employs to specialize “hot” methods with respect to frequent execution semantics, such as reoccurring types and branching decisions. As an example, Graal could inline polymorphic method calls if it observes stable targets in the profile. At run time, this avoids the overhead of the call and could lead to further optimizations on the basis of assuming a specific receiver type. While running this optimized and compiled version, the runtime ensures with added checks that the assumptions remain correct at each invocation and falls back to the inter- preter otherwise. While this concept—known as deoptimization [80, 71]—is not unique to Graal, its IR offers several benefits to reduce the overhead and optimize the transition from compiled code back to the same program state in the interpreted version [66, 67, 65]. Fig. 2.2 shows the interaction between the Java HotSpot VM, the Graal compiler, and its components, as well as our specific compilation phase (“symbolic execution optimiza- tions”). As depicted, Graal integrates with HotSpot via the JVM Compiler Interface (JVMCI) [131]. This enables installation of machine code in the VM and provides a native interface to access the corresponding bytecode. This bytecode is subsequently parsed to generate the intermediate representation of the Graal compiler—Graal IR. In the course of the compilation, this IR is gradually modified and specialized and finally used to generate the machine code which is eventually executed. The individual parts and details of this figure are explained in more detail in the next paragraphs. Since the Graal compiler also has to compile itself on execution—including the warm- 10 2.2. GRAALVM

Graal Compiler

Graph builder (bytecode parser) Graal IR High-tier Inlining Dead code elimination Partial escape analysis Loop unrolling Lowering

Mid-tier Lock elimination Guard lowering Java Frame state assignment Bytecode (Class files) Write barrier addition Lowering

Java HotSpotVM Low-tier JVM CompilerInterface(JVMCI) Logic expansion Memory barrier insertion Scheduling Symbolic execution optimizations Frontend

AMD64 backend

Machine AArch64 backend code SPARC backend

Backend ...

Figure 2.2: The architecture of the Graal compiler, its interaction with the Java HotSpot VM and the added symbolic execution optimization phase.

up phase when the compiler is run in the interpreter—it is also an option to use an AOT- compiled, native binary called libgraal that eliminates the typical warm-up time [144]. Otherwise, all of the compiler’s codebase runs through the same steps as the executed Java code and is also optimized at run time. The AOT capabilities of the GraalVM have been integrated into OpenJDK as of Java 9 to enable native code compilation of end-user code with Graal as its code-generating backend [132]. As of Java 10, the Graal compiler itself is available as an experimental compiler as part of the JDK [5].

2.2.1.1 Graal IR The intermediate representation of Graal is known as Graal IR. Structured as a directed graph, its nodes represent statements or expressions, while the edges denote inputs, con- trol/data flow, or similar dependencies. During compilation, the IR is generated from the parsed bytecode and subsequently used in most of the operations within the com- piler. The IR is not a one-to-one translation of the bytecode but already transformed into static single assignment form (SSA) [58]. This means that every “variable” is only assigned once, which brings benefits to subsequent optimizations. Additionally, the IR CHAPTER 2. SYSTEM OVERVIEW 11

distinguishes between control flow and data flow as well as side-effecting and side-effect free nodes. Each node manages links to input nodes and also keeps track of reverse edges from its usages, thus allowing free traversal of the graph in arbitrary directions. Additionally, information about the underlying expression is propagated, such as the ex- pected type (including value ranges for primitives), length providers for array operations, or intermediate links between nodes (e.g. phi-nodes and merges, loop exits and their headers) [64]. A single IR graph typically denotes a single method (including potentially inlined method calls), which originates in a StartNode and ends either in a (successful or ex- ceptional) return from the method or a DeoptimizeNode (cf. speculative optimizations in Section 2.2.1). The graph, therefore, has to maintain a schedule that defines a partial order over all nodes, hence emitting valid code at the end of the compilation. Graal applies code motion [51] by distinguishing between fixed nodes, which are placed at a specific location in the control flow (e.g. memory access, side-effecting computations), and floating nodes, whose computation may potentially be delayed until the correspond- ing result is required (arithmetic operations, comparisons) [64]. The different schedule phases then arrange all those nodes in an optimal order, based on their usages and the current compilation context.

1 int getOrDefault(int[] arr, int i, int alt) { 2 if (i < 0 || i >= arr.length) 3 return alt; 4 return arr[i]; 5 } Listing 2.1: Method for safe array access

Figure 2.3 shows the Graal IR representing the source code of a simple method from Listing 2.1 that allows safe array access. It depicts the different control flow branches that are introduced by the compiler to account for the different program paths in the method and also highlights the SSA form of the data flow nodes (constant 0, arguments arr and i). While the source code already checks for illegal array access, the Graal compiler adds an additional check right before the array access—it may come as no surprise that such a check represents an ideal target for later optimizations (cf. Chapter 4). Additionally, the IR contains a merge block that unites branches to a common path. Here, the paths for out-of-bounds indices are combined and result in a corresponding deopt node which—when reached at run time—causes the VM to switch to the interpreted mode. The compiler reduces this branch since it may have determined the probability of taking it to be low (or significantly less likely than the main branch). This reduced the complexity of the IR and reduces the compilation overhead, as exception handlers or other unlikely results do not have to be parsed. Therefore, not all of the method is compiled to machine code and the compiler can focus on optimizing the main program paths [66]. The merge block additionally has a link to a phi node. This node represents an input that depends on the taken branch. Here, the phi node defines the deoptimization action which is performed in case this branch is taken. As an example, one branch may directly invalidate the compiled code and schedule recompilation, while another branch may enforce an immediate switch to the interpreter. Other common usages for phi nodes are loop indices, where the value may either come from a previous iteration or from the initialization before the loop. 12 2.2. GRAALVM

0 arr i Start < If true false

Begin

ArrayLength

< Control flow node If Control flow edge true false

Data flow node Begin Begin |<|

Data flow edge End If true false Basic block Begin Begin

Begin

LoadIndexed End End

Return ......

Merge Phi

Deopt

Figure 2.3: Annotated Graal IR of a simple method

One detail that has been omitted from the illustration is the attached frame state for deoptimization edges. To enable a fluent transition from compilation to interpreter mode, such nodes capture the state of the VM (e.g., links to memory effects that have occurred) and also memorize the local variable settings as well as a reference to the current position in the bytecode (bytecode index or bci). Upon deoptimization, the corresponding frame state node is used to recreate the state in the interpreter.

Basic Blocks and Dominator Trees Nodes are usually grouped in basic blocks [123] that represent the largest, non-interrupted sequences of code with in- or outgoing branch- ing instructions only at block start or end. Condition and true / false branches, loop exits and continuation points, merge blocks that combine control flow from different branches or sequential control flow—these different links between blocks allow the cre- ation of the so-called dominator tree [98]. One basic block is dominating another basic CHAPTER 2. SYSTEM OVERVIEW 13

block, if every path to the dominated block inevitably leads through the dominating block. To reach either the true or false successor of an “if”, the block holding the conditional must always be traversed first, regardless of execution context. Therefore, in this exam- ple, the conditional block is dominating both branch successors. A subsequent merge block that unites both successor branches (e.g. to represent common code after an “if”) at the same time is also dominated by the conditional block, but not by any basic block of the branches, since either branch may be visited to reach this merge. Scheduling the graph now involves determining a valid order of all basic blocks that also respects the dominator relation as well as the optimal ordering of nodes within each [64, 51]. The concept of basic blocks is revisited later on in this thesis in Section 4.1.2, when we discuss our traversal approach and our methods for creating constraints. All basic blocks of the IR in Fig. 2.3 are visualized as well. We see how each branch spawns two new basic blocks and how a merge node combines branches from different basic blocks. Additionally, some basic blocks do not actually contain any logic (e.g. those that only contain begin and end nodes) but are nevertheless managed in the IR. During scheduling, the compiler assigns the different nodes to the basic blocks such that they are always arranged before their actual usages. Like most other Graal components, the IR is defined in Java for increased main- tainability and extensibility. In our implementation, we use the definitions of the IR to directly interact with nodes and access their properties (cf. Chapter 4).

2.2.1.2 Graal Compilation Tiers During compilation, the Graal IR undergoes several stages before the actual code gen- eration. These levels—known as Compilation Tiers—contain optimization phases that gradually transform the IR from a high-level intermediate representation to a low-level, system-dependent version. This enables more abstract/generic optimizations to work on higher levels while memory- or platform-dependent operations may occur in the low-tier of the IR. This process is also depicted in Fig. 2.2. Note that this concept is not to be con- fused with the principle of tiered compilation, which enables different levels of execution (interpreted, compiled) in the HotSpot VM [121]. The Graal compiler defines three compilation tiers, namely high-tier, mid-tier and low-tier. Initially, the Graal IR retains most of the Java semantics in the high tier, mean- ing that high-level structures such as objects and fields as well as corresponding forms of allocation and access are directly preserved. Gradually, this IR is specialized with respect to the target hardware, resulting in individual nodes of the IR being replaced by platform-specific variants—as an example, the generic FloatConvertNode, which is replaced by AMD64FloatConvertNodes 1 on Intel -64 architectures. This process is usually called “lowering” and is expressed as an additional phase in each of the com- pilation tiers (cf. Fig. 2.2). At the low-tier, field accesses and object allocations are replaced with address lookups, computations, and other nodes representing raw memory operations, thus reducing the Java memory semantics to a C-like model. The final IR, therefore, is very much platform-specific and also uses numerous intrinsics to optimize certain operations (mathematical functions, string operations) [145]. While our symbolic execution engine—SymJEx—is mostly based on high-tier IR, we

1https://github.com/oracle/graal/blob/c0113302e6d257d6f9b811d2d8080521958a4e84/compiler/ src/org.graalvm.compiler.replacements.amd64/src/org/graalvm/compiler/replacements/amd64/ AMD64FloatConvertNode.java 14 2.2. GRAALVM aim to do optimizations at the low-tier, to ensure that we do not interfere with preceding optimizations already existing in the Graal compiler. This entails that we have to deal with a potentially very diverse and highly specialized node-set. Also, we cannot rely on the comforts and safety of the Java memory model, hence memory operations have to be considered while evaluating optimizations.

2.2.1.3 Optimization Phases As already mentioned, while going through the different compilation levels, multiple phases are applied to the graph that gradually try to optimize the IR. While “offline” compilers perform the bulk of the work before actually running the program, JIT com- pilation happens on the fly, therefore immediately impacting run time. Thus, it is vital for such optimizations to be as fast as possible and to strike a balance between compile- time overhead and actual performance gain. The GraalVM features several optimization phases that are specifically built for that purpose. Each of those phases receives the current version of the graph in addition to contextual information about the current compilation and modifies IR. One example thereof is the InliningPhase 2 that uses the accumulated profiling information to decide whether to inline individual method calls. This decision may be based on the receiver type, the target method size, or whether the targets for polymorphic calls can be inferred [129]. We furthermore mention several other compilation phases in their corresponding compilation tier in Fig. 2.2. Despite there being no direct dependency between most phases, they are neverthe- less executed in a predefined order with canonicalization steps scheduled in between. Canonicalization in Graal describes a transformation phase that applies the bulk of the individual node simplifications such as constant folding, branch pruning, or global value numbering. In fact, we also apply the canonicalizer as a cleanup step after all optimiza- tions were attempted, to finally reduce the graph according to our changes in the IR. As shown in the overview in Fig. 2.2, our optimization phase is placed at the very end of the low-tier optimization pipeline. This is due to the fact that we, on the one hand, require a fully scheduled graph as an input and, on the other hand, do not want to interfere with existing optimizations (i.e., proof optimizations via symbolic execution which are covered by native phases anyway). While there are many more nuances and steps to the compilation process in Graal, this should cover some of the most important and most interesting aspects, as well as provide the necessary basis for the topics going forward.

2https://github.com/oracle/graal/blob/1928ecd565097a6416f3614be0e174c9e617d3b7/compiler/ src/org.graalvm.compiler.phases.common/src/org/graalvm/compiler/phases/common/inlining/ InliningPhase.java 15

Chapter 3

Symbolic Execution

Symbolic execution [91] is the key component of our approach that sets it apart from tra- ditional program optimization techniques. This chapter, therefore, is solely dedicated to this topic in order to explain some of the main ideas and theories behind it. Additionally, we highlight challenges of the approach and also provide some examples for problems that can be solved via symbolic execution. One of the main drivers behind the idea of symbolic-execution-based compiler optimizations was the development of a dedicated symbolic execution engine for GraalVM, SymJEx. This engine represents the roots of our approach and directly impacts how we generate constraints and how we interact with theorem provers. Therefore, we also describe this project in more detail and highlight the most important components, such that we can later discuss our extensions and changes to this engine that made our approach viable.

3.1 Analysing Software Symbolically

Symbolic execution fundamentally is a program analysis technique that has originated during the 70s to generate verification conditions and test cases from programs [91, 50, 90, 36, 84]. In recent years, the concept has gained newfound attention due to tech- nological advancements in the field of theorem proving and model checking [161, 45], leading to wide acceptance across the field of software verification. Symbolic execution is used to track faults and security issues within software by mapping program semantics to logical constraints. The idea behind symbolic execution is to effectively interpret a program and treat input values or other contextual information as so-called symbolic vari- ables [50, 76, 44]—i.e., variables with an unknown value (and sometimes even arbitrary types when considering polymorphism). Those values are subsequently constrained and refined via logic formulas that are generated from statements and expressions within the code. Symbolic execution engines then use theorem provers [54] to check whether con- straints describing error states are (un)satisfiable (may / may not be reached via a valid program path) and extract counterexamples and assignments for inputs and symbolic variables for reproducing such faults and generate test cases [91]. Engines typically follow the control flow of a program while managing a program state that describes the current formula and any associated mappings such as symbolic memory and type information. On branching instructions, execution is split up into the individual branches while adding the corresponding conditions to their state. This accumulates to a path condition [79, 43], whose formulas define the program path up to this point. Additionally, it allows reasoning over the program inputs that are required 16 3.1. ANALYSING SOFTWARE SYMBOLICALLY

to reach certain states such as exceptions—e.g., whether the index of an array access is always within its boundaries—or to also define the behavior in terms of its final states when returning successful results. The latter approach is especially useful for generating test cases that verify the program semantics over multiple outputs. Compared to common testing utilities, symbolic execution, therefore, enables much higher coverage since it is not bound to predefined inputs or states (cf. unit testing). 1 boolean isPalindrome(String s) { 2 if (s == null) 3 return false; 4 if (s.length() <= 1) 5 return true; 6 for (int i = 0, j = s.length()-1; i != j; i++, j--) { 7 if (s.charAt(i) != s.charAt(j)) 8 return false; 9 } 10 return true; 11 } 12 /* class String: char[] value; */ 13 boolean charAt(int index) { 14 if ((index < 0) || (index >= value.length)) { 15 throw new StringIndexOutOfBoundsException(index); 16 return value[index]; 17 }

Listing 3.1: Java source code of isPalindrome

Figure 3.1 contains an example of a symbolic execution problem. We analyze the code from Listing 3.1, which checks whether a given input string denotes a palindrome1. With symbolic execution, we can check all individual program paths through this example for potential faults and produce model assignments that recreate the corresponding behavior. Each branch condition in the source code creates a new branch in the symbolic execution tree, where a leaf is either a successful return (Line 5) or a potential exception that may occur at run time (Line 15). While in this case the initial exception branches are unreachable or sufficiently covered by checks in the program, the implementation actually contains a bug: Since the loop condition only evaluates to false when indices i and j match, this may actually never apply in case of a true palindrome of even length. Indeed, this behavior is reproducible within two loop iterations: Assuming a palindrome of size 2 as an input string, after the variables hold the following values after the first loop iteration: i0 = j1 = 0, j0 = i1 = 1. The loop is re-entered at this point as the indices do not match, leading to these variable assignments after the next iteration: i2 = −1, j2 = 2. Due to the negative index i2, in the next invocation of String.charAt from the Java Class Library [120] the exceptional branch is taken—i.e., in a real execution the program reports an index-out-of-bounds / negative-array-index error. A simple fix for this fault is to change the loop condition to i < j. This whole evaluation process is conducted automatically by symbolic execution engines such as SymJEx. A similar example is provided in Fig. 3.2. The analyzed source code contains a helper method to insert a simple integer value twice at a given position in the target array, including corresponding bounds checks (Listing 3.2). Again, the implementation contains a subtle error, in this instance, however, caused by an arithmetic overflow. In Java, the

1a palindrome is a word or phrase that reads the same backward as forward CHAPTER 3. SYMBOLIC EXECUTION 17

s =

s = null ¬(s = null)

length(s) ≤ 1 length(s) > 1 Iteration 0 i0 = 0 j0 = length(s) - 1

i0 = j0 ¬(i0 = j0)

i0 < 0 ∨ i0 ≥ length(s) i0 ≥ 0 ∧ i0 < length(s)

j0 < 0 ∨ j0 ≥ length(s) j0 ≥ 0 ∧ j0 < length(s) Iteration 1 i1 = i0 + 1 j1 = j0 - 1

i1 = j1 ¬(i1 = j1)

i1 < 0 ∨ i1 ≥ length(s) i1 ≥ 0 ∧ i1 < length(s)

j1 < 0 ∨ j1 ≥ length(s) j1 ≥ 0 ∧ j1 < length(s) Iteration 2 i2 = i1 + 1 j2 = j1 - 1

i2 = j2 ¬(i2 = j2)

i < 0 ∨ i ≥ length(s) 2 2 ...

Figure 3.1: isPalindrome symbolic execution tree default arithmetic operations do not exhibit errors when computations exceed the value range (for 32-bit integers −231to231 − 1). Thus, the check in Line 5 is flawed for the unique case when i has the maximum value of the 32-bit integer range (2,147,483,647). Then, the bounds check would trivially succeed, as 2,147,483,647 + 1 = −2,147,483,648 due to arithmetic overflow. Inevitably, the array access in Line 8 then results in an error (array length in Java per specification is a 32-bit integer [77], hence an index of this size is impossible). In this case, a symbolic execution engine does not have to unroll a loop or follow loop iterations but has to track modifications to memory, such as the given array, which—in practice—presents a significant challenge in symbolic execution (cf. Section 3.2.4). In this simplified example, this operation is represented as creating a copy of the initial array as depicted in Fig. 3.2. Otherwise, the symbolic execution tree also iterates all paths and yields two potential error sources: The aforementioned arithmetic overflow and subsequent exception as well as a potential null pointer for the input array. 18 3.1. ANALYSING SOFTWARE SYMBOLICALLY

1 boolean insertTwice(int[] arr, int i, int value) { 2 if (i < 0) 3 // underflow 4 return -1; 5 if (i + 1 >= arr.length) 6 // overflow 7 return -2; 8 arr[i] = value; 9 arr[i + 1] = value; 10 return 0; 11 } Listing 3.2: Java source code of insertTwice

arr0 = i = value = i < 0 ¬(i < 0)

arr0 = null ¬(arr0 = null)

i+1 ≥ length(arr0) i+1 < length(arr)

i < 0 ∨ i ≥ length(arr0) i ≥ 0 ∧ i < length(arr0)

arr1 = arr0[i ↦ value]

i+1 < 0 ∨ i+1 ≥ length(arr0) i+1 ≥ 0 ∧ i+1 < length(arr0)

arr2 = arr1[i+1 ↦ value]

Figure 3.2: insertTwice symbolic execution tree

Over the years, numerous symbolic execution engines for various programming lan- guages and use cases have emerged. Symbolic PathFinder [126]—a prominent engine based on the Java PathFinder project [38]—successfully identified bugs in NASA flight software via test generation [126, 125]. Baldoni et al. [23] describe how symbolic execution can be used to analyze malicious software with the fuzzer2 and binary analysis tool Angr. Finally, KLEE by Cadar et al. [43] is one of the currently most prominent open source solutions. Extended to numerous use cases, KLEE has been successful in identify- ing issues in GNU Coreutils [43], GPU programs [102], distributed sensor networks [136] and BIOS implementations [27].

3.1.1 Symbolic Execution in the Field of Program Analysis Due to rise in complexity and scale of software [85, 69, 19], program analysis is more important than ever. Especially when critical (safety/security) requirements are on the line, verification of software plays an important part to ensure its functionality. Even more so, automation has found its way into this process, with static analysis as well as

2fuzzing generates random inputs to test the behavior of programs on unexpected or invalid data CHAPTER 3. SYMBOLIC EXECUTION 19 random testing or fuzzing as frequent candidates to find errors due to undefined behavior or lack of coverage. In general, the field of program analysis nowadays is vast and ranges from tools that perform offline analysis—namely static analysis, symbolic execution— to online testing in a safe environment—fuzzing, concolic execution. Cohen [53] gives an overview of the different methodologies of automated program analysis. Figure 3.3 presents an excerpt taken from his work, that showcases the variety of the available tools.

Figure 3.3: Overview of automated program analysis tools [53]

The use cases for static analysis are widely spread, ranging from optimizations in compilers using escape and dead code analysis, over proving of correctness properties and data races, to built-in analysis tools for modern IDEs [114]. Static analysis tools usually work by identifying patterns or sequences within programs that may indicate flaws as well as local analysis of control/data flow. In contrast to that, symbolic execution creates formulas from program paths by accumulating a path condition of all visited branching conditions so far. This allows a much more powerful and precise exploration of a program, as the analysis is not bound to ordinary issues such as out-of-bounds errors or null pointers but also includes defects based on complex data flow [45]. It also emphasizes the dynamic nature compared to traditional static analysis, since the execution mode of engines often resembles traditional interpreters. Both methods, however, are typically 20 3.1. ANALYSING SOFTWARE SYMBOLICALLY performed “offline”, i.e., without actually executing the program. Some symbolic execution engines also combine concrete and symbolic execution—an effort commonly known as concolic execution [24, 162, 139]. By executing programs with concrete inputs, such engines can prioritize program states on the concrete execution path and thus save time on evaluation and solver calls. This specific kind of concolic execution is typically called dynamic symbolic execution (DSE) [75, 137, 103, 74] and is often com- bined with random testing techniques such as fuzzing. Nevertheless, the applicability of symbolic execution in practice still very much relies on the underlying constraint solver capabilities. While the technique itself has now been known and researched for over four decades [91, 50, 90, 36, 84], significant improvements and technological advancements in the field of constraint solving happening in the last years have fostered its widespread acceptance [60, 111].

3.1.2 Memory Modeling While atomic data such as (integer, floating-point) numbers or boolean values are typ- ically directly supported in logical languages and theorem provers, complex memory structures represent a different challenge. Overlapping pointers, aliasing3, null derefer- encing as well as corresponding types and polymorphism are usually handled in dedicated structures (memory/type models) to preserve the soundness of the system. Naturally, the solutions to this challenge very much depend on the target domain: Where engines running on memory-safe languages like Java can apply simpler memory models without mimicking address and pointer semantics as well as corresponding arithmetics, unsafe languages such as C require sophisticated heap modeling. Data abstraction plays an important part here as well and early research already proposed similar ideas using a specification to overcome the problems of formally modeling complex data [108]. Nowadays, there are various ideas for different target domains and complexity levels to manage symbolic memory. Common solutions try to model the heap symbolically as well—either using fixed “slots” (e.g. for memory-safe languages) [125, 73] or addressable memory regions for C-like domains [151, 44].

3.1.3 Constraint Solving While it is generally up to the symbolic execution engines to generate formulas from their respective input domain, the heavy lifting in terms of constraint evaluation is typically relaid to dedicated theorem provers and model checkers [91, 30]. Verification problems in the realm of symbolic execution typically fall into the category of decision problems [35]. Queries thereof mostly resemble requests such as: Can the index exceed the target array? Can a certain reference be null at some point? Does a computation potentially involve arithmetic overflow? These problems are NP-complete in the worst case [26, 54], which means that all currently known solutions to those problems experience exponential growth in their run time, but at the same time all problems of this class are easily transformable into another. Despite the complexity of these problems, however, theorem provers allow us to solve some of them efficiently and in a reasonable time by applying a variety of optimizations that speed up the decision procedure. Theorem provers—colloquially also simply called “solvers”—are specialized software to determine for a given logical formula,

3the aliasing problem concerns multiple pointers that may reference the same memory location or object CHAPTER 3. SYMBOLIC EXECUTION 21

whether it is satisfiable or not—i.e., whether there is a valid assignment for all free variables of the formula, such that it evaluates to true. SAT solvers represent one category of theorem provers. They receive formulas written in propositional logic as an input and can be used to solve boolean satisfiability problems. Satisfiability Modulo Theories (SMT) are a generalization of these principles and accept formulas in first-order logic [59, 39]. While SMT in theory also supports quantified formulas, most solvers focus on quantifier-free representations [25, 39] or perform excessive simplifications to remove them [59, 117]. Modern SMT solvers are typically specialized for individual reasoning domains. By limiting themselves to individual theories, they can achieve better performance and solve problems more efficiently [25]. As an example, the SMT solver Boolector by Brum- mayer and Biere [39] only supports quantifier-free reasoning over bit-vector arithmetics and arrays, but experiments have shown significant performance increases compared to general-purpose solvers.

3.1.3.1 SMT-LIB

SMT-LIB [25] is a collective effort for research on SMT. It furthermore proposes a stan- dardized format for a logic language that serves as input and output of SMT solvers. The SMT-LIB format encompasses a set of theories to represent the different domains of SMT, such as fixed-size bit-vectors [10], arrays [11] or reals [12]. Various extensions exist, that make SMT-LIB applicable for a number of problem statements, including linear arithmetic and real numbers, arrays and bit-precise reasoning using bit-vectors, as well as experimental extensions such as IEEE 754 floating-point or strings [106, 31, 133, 147, 163], which are often implementation-dependent. Theories define the avail- able operations for the data types and furthermore specify the ways to combine different theories (e.g. the FloatingPoint theory defining operations to convert bit-vectors of the FixedSizeBitVector theory to floating-point values—corresponding to a primitive typecast in the programming world) [119]. In our work, we use the SMT-LIB v2 language format to express logic constraints. Henceforth, we refer to this representation whenever mentioning SMT-LIB.

Logic-bomb Example The SMT-LIB syntax resembles the LISP programming lan- guage, hence there are notable differences when encoding problems of source code from a C-language family contender. One example thereof is given in Listing 3.3 that en- codes a so-called “logic bomb”, which represent programs that only trigger on specific predefined conditions. These code snippets are developed by Xu et al. [160] and cover language semantics such as floating-point operations, arithmetic or buffer overflows as well as loops or memory operations, intended to stress the capabilities of symbolic exe- cution engines. While the original sources are written in C4, the Listing 3.3 contains a slightly simplified version in Java representation, with a single assertion that checks the “expected behavior”. Here, the problem exemplifies the impreciseness of typical IEEE-754 floating-point

4https://github.com/hxuhack/logic bombs/blob/2ec0df256aa20abb2662e846a9f437f8cf2b47f2/src/ floating point/float2 fp l1.c 22 3.1. ANALYSING SOFTWARE SYMBOLICALLY

operations, in this case manifested in the two seemingly contradicting statements: i + 0.0000005 6= 7 (3.1) i + 0.0000005 + 1 = 8 (3.2)

1 void float2_fp_l1(int i) { 2 float x = i + 0.0000005f; 3 float y = x + 1; 4 if (x != 7) { 5 assert y != 8; 6 } 7 } Listing 3.3: Java representation of a logic bomb from [160]

1 ; defines set of available theories(QF_BVFP= Quantifier-Free fixed-size Bit-Vectors and Floating Point) 2 (set-logic QF_BVFP) 3 ; variable declarations 4 (declare-fun i () (_ BitVec 32)) 5 (declare-fun x () (_ FloatingPoint 8 24)) 6 (declare-fun y () (_ FloatingPoint 8 24)) 7 8 ; integer-to-float converted values 9 (declare-fun i_float () (_ FloatingPoint 8 24)) 10 (assert (= i_float ((_ to_fp 8 24) RNE i))) 11 12 ;x=i+ 0.0000005f 13 (assert (= x (fp.add RNE i_float (fp #b0 #b01101010 # b00001100011011110111101)))); 0.0000005 in IEEE-754 Floating Point representation 14 15 ;y=x+1 16 (assert (= y (fp.add RNE x ((_ to_fp 8 24) RNE (_ bv1 32))))); implicit conversion of bit-vector constant 17 18 ;x !=7 19 (assert (not (fp.eq x ((_ to_fp 8 24) RNE (_ bv7 32)))));"fp.eq" represents inexact floating point equality(as opposed to bit-exact "=") 20 21 ; to check whether the assertion holds, its condition is negated 22 ;y ==8 23 (assert (fp.eq y ((_ to_fp 8 24) RNE (_ bv8 32)))) 24 25 ; satisfiability check 26 (check-sat) 27 28 ; produce model assignment that satisfies the formula 29 (get-model) Listing 3.4: SMT-LIB2 formula to verify the assertion Using the SMT-LIB format, we can now verify the assertion and get a counterexample for the failure case. The corresponding constraints are depicted in Listing 3.4, containing a valid SMT-LIB2 formula describing all conditions up to the assertion. Note that we use the theories of FixedSizeBitVector and FloatingPoint (as defined via the set- logic option) and that the formula on display contains additional variable and constant definitions to improve its readability. CHAPTER 3. SYMBOLIC EXECUTION 23

1 sat 2 (model 3 (define-fun i_float () (_ FloatingPoint 8 24) 4 (fp #b0 #x00 #b00000000000000000000111)) 5 (define-fun y () (_ FloatingPoint 8 24) 6 (fp #b0 #x00 #b00000000000000000001000)) 7 (define-fun x () (_ FloatingPoint 8 24) 8 (fp #b0 #x6a #b00001100011011110111101)) 9 (define-fun i () (_ BitVec 32) 10 #x00000007) 11 ) Listing 3.5: Generate model/counterexample that violates the assertion

When evaluating this formula, a solver can indeed prove that there is a variable as- signment (model) that “triggers the bomb”, i.e., where above conditions are both true. Listing 3.5 contains the resulting model, thus showing that an input condition of i = 7 already violates the assertion. Additionally, the model shows the results of the intermedi- ate computations, namely x = 7 + 0.0000005 = 7.0000005 and y = 7 + 0.0000005 + 1 = 8. This furthermore proves that above conditions are really both true, caused by the limited precision of 32-bit floating-point arithmetic.

3.1.4 Limitations of Symbolic Execution In theory, symbolic execution very much seems to be the “holy grail” of program verifi- cation. Global reasoning about program faults including proof generation—all with little to no concrete execution or simulation—enables great coverage as well as high precision, which is an inherent goal of most program analysis methods. However, despite the many advancements in the field, research also managed to point out its limitations. The capabilities of symbolic execution depend both on the executed code as well as the scope of the engine. In SymJEx as an example, we leverage the memory model of Java which allows us to forgo complex memory operations such as pointer arithmetics or raw memory access. Nevertheless, some engines even simulate such memory operations by managing a symbolic memory [52, 139, 128, 151]. Similarly, IO operations and even system calls are sometimes simulated [43, 76].

State Explosion State explosion [36, 43, 95, 105] refers to the problem of quickly rising numbers of potential program states, typically exponentially growing or worse [24]. This problem arises in both theorem proving and symbolic execution due to, on the one hand, large numbers of variable/input permutations that have to be observed to detect mod- els [26], and the on the other hand potentially infinitely large search space respectively the number of program paths that engines would have to analyze to reach complete- ness [127, 45]. In symbolic execution, such problems typically arise when loops or highly nested method hierarchies cannot be symbolically reduced—e.g. via upper bounds or recursion limits. As the search space in symbolic execution theoretically consists of all possible program states—i.e., all possible variable assignments and memory settings—at every point in the program, infinite or highly branching control flow can quickly deteri- orate the performance of corresponding engines and make exhaustive searches infeasible Workarounds for these kinds of problems try to either limit the search space or use pre- defined heuristics to visit more promising states/paths first [45] and prevent following 24 3.2. GRAAL SYMBOLIC EXECUTION long or even infinite program paths. Similar techniques have also been popularized for model checking with the introduction of Bounded Model Checking [30]. Symbolic exe- cution engines additionally apply techniques such as path or state merging to prevent state explosion [95] or approximate solutions with additional verification steps to prevent false-positives [107].

IO and System Environment As previously mentioned, symbolic execution is usu- ally intended for offline application, without actual program execution. The inherent disadvantage is that—in its most basic form—this prevents reasoning over the runtime environment or interaction with the system itself. Notably, file IO, network access, or system calls are particularly affected and require special handling by symbolic execution engines. Abstraction is a common approach here as well, with files being represented as collections of bytes in a corresponding symbolic memory region with appropriate file handles that are managed in the symbolic state. This technique as an example is applied by KLEE [43]. In contrast, Cloud9 [41] replaces libraries before analysis with customized variants that allow modeling of IO and POSIX functions to prevent modifications on the target file system.

Multithreading and Concurrency Multithreading or parallel execution, in general, can lead to an exponential expansion of the state space (cf. state explosion): To correctly analyze all possible states, a symbolic execution engine would have to model all combi- nations of -interleavings and schedules, which is de facto infeasible. There exist foundations, however, which use model checkers in order to simulate the non-determinism of the different thread interactions. Such approaches can be refined using manually an- notated correctness properties or pre- and postconditions, to set a specification for the corresponding model checker and to allow race condition identification [89, 38, 126]. Ad- ditional propositions consider more sophisticated search strategies and state merging for limiting the state space to relevant or promising ones [28, 78, 48].

3.2 Graal Symbolic Execution

To broaden their domain of application, symbolic execution engines often work on inter- mediate representations such as Java bytecode [126] or LLVM bitcode [43, 113] or directly on assembly [76]. In this spirit, the idea for using symbolic execution in the context of the Graal compiler originated during the development of a symbolic execution engine based on Graal, named SymJEx. This engine uses Graal IR (cf. Section 2.2.1.1) as an input domain and among other things supports lazy symbolic execution, array and memory modeling as well as loop approximation. SymJEx is built on top of the GraalVM by leveraging components such as the compiler infrastructure—namely the graph builder and the compilation phases—as well as Native Image to provide mostly seamless interaction with the Graal IR. While we support most functionality of contemporary symbolic execution engines (path traversal, test case generation, etc.), this design allows us to leverage information from the compiler to extend our reasoning capabilities. Aside from the already optimized IR, this also includes the already structured and expanded control flow (with exception edges, loops, and branches) in addition to type information (e.g. “nullability” of reference types) and value ranges which are inferred by the compiler. CHAPTER 3. SYMBOLIC EXECUTION 25

Graal Compiler Symex

Parsing and IR State Space Exploration Expression Preparation Famework Iterate IR nodes Bytecode Formula Parser Expression Select next state Create constraints Symbolic Symbolic Expression State Memory Invoke solvers Expression Graal IR State Pool Report results Generate test cases Graph Builder Schedule new state(s) SMT Solver

Figure 3.4: Overview of SymJEx, the symbolic execution engine for GraalVM

The system itself consists of three major parts, as depicted in Fig. 3.4: The initial preparation phase where Graal IR is generated from Java bytecode, the main engine that creates constraints from the IR nodes and attaches to a proprietary solver, as well as our state and formula structures that manage the information flow within the engine. Moreover, Listing 3.6 highlights the individual steps of the engine; from receiving the IR, selecting a state, and resolving nodes to constraints. The overall algorithm therefore very much resembles a typical interpreter loop, where the engine inspects each node of the incoming IR individually and then decides whether to solely generate constraints, spawn new states for successors or treat it as a potential error source—assertions, divisions (by 0), array access (index out of bounds)—and invoke the solver accordingly. The following sections explain all of these steps in further detail. 1 SYMEX(method, graal_ref) 2 ir := PARSE(method, graal_ref) 3 PREPARE(ir)# apply custom graph builder 4 pool := INIT_STATE_POOL(ir) 5 WHILE pool has state DO 6 state := NEXT_STATE(pool) 7 FOR EACH node in current block DO 8 CREATE_CONSTRAINTS(state, node) 9 IF operation can cause error THEN 10 IF current formula is SAT THEN 11 GENERATE_TEST_CASE / REPORT_ERROR 12 END 13 SCHEDULE NEW STATES# true/ false state forIF 14 END 15 END Listing 3.6: SymJEx engine interpreter loop

3.2.1 Preparation of the Graal IR Initially, a preparation phase using the Graal compiler infrastructure parses the bytecode and creates the IR. We rely on Graal high-tier IR to retain the Java memory semantics and use a custom graph builder to generate the IR and to also remove compiler information that is not needed for the analysis (e.g. frame states, debugging information). During this process, SymJEx also applies so-called “graph builder plugins”. We customized those 26 3.2. GRAAL SYMBOLIC EXECUTION plugins to expand all potential error paths for operations that would usually happen at later stages of the compilation pipeline. For instance, for array copying, we explicitly add all error conditions including null dereferencing, negative index, and index out of bounds errors. Additionally, we introduce custom IR nodes such as “symbolic assumptions” and “assertions”. After the IR is generated, SymJEx performs several custom graph optimizations to simplify later analysis. Such optimizations include loop unrolling and inlining to expand program paths beforehand. We further added options that let the engine introduce anal- ysis boundaries during compilation to prevent loop entries or method calls at all. This should yield performance improvements at the risk of producing false-positives.

3.2.2 Traversal and State Propagation In contrast to other symbolic execution engines, our input already has a graph structure, hence we visit the different Graal IR nodes via graph traversal. Additionally, the Graal compiler ensures a fixed schedule for all nodes within a basic block where nodes are always visited before their corresponding usages (cf. Section 2.2.1.1). Consequently, SymJEx traverses the IR by first following the sequence of basic blocks within a given IR and subsequently iterating the contained nodes based on their schedule. Throughout the analysis, SymJEx manages a symbolic state. This state contains the constraints (the formula) that have been accumulated so far and the symbolic memory that represents the current heap configuration. Each state additionally has to reference its exact location in the search space. Therefore, both the current Graal IR node and the parent state must be known to pinpoint both location and path. For loops and nested method calls, we also track iteration counts as well as a caller hierarchy. Each symbolic state (except the initial state) also has a predecessor that can be used to look up already processed information. First, this allows us to share most information on a path on state splits (e.g. share all constraints before an if for both branches). Second, we use this sequence of states later on to reconstruct the execution path as well as memory contents at certain save points (“snapshots”)—for example, a symbolic object. Research has shown that efficient state selection and search strategies can positively impact the performance of symbolic execution as well as mitigate the problem of state explosion (Section 3.1.4) [76, 44]. In our case, a method’s Graal IR represents the search space that is explored during analysis. Consequently, we traverse all individual paths through a given IR from the to a leaf (typically an exception edge or a return). A state pool collects all currently unexplored states and schedules them according to a predefined cost strategy. This assigns weights to a state depending on its current depth in the search space as well as the “complexity” of its current path. Sequential control flow typically has a low weight, while branches, loops, and method calls are penalized in this system. Hence, the engine discourages continuing multiple iterations through a loop or deep recursion during analysis. The state pool also allows access to resources independent of the current traversal such as the compiler interface. Thereby, we can access compiler metadata to infer concrete types and class instances, get class information, or retrieve constant and reflection information. For performance reasons, we also manage a cache of already parsed method graphs to prevent repeated recreation of known method IR. Both state and state pool are depicted in the class diagram in Fig. 3.5 in addition to other components that are discussed in more detail later, such as the symbolic memory (Section 3.2.4) and the formula / logical expressions (Section 3.2.3). CHAPTER 3. SYMBOLIC EXECUTION 27

StatePool

1 State schedule(State)

id: Integer stop(State)

graalIR: Graph states(): Collection

currentBlock: BasicBlock ScheduledStatePool currentNode: IRNode costStrategy: Strategy pool: StatePool 1 graphCache: Map formula: Formula states: List declareVariable(Type,IRNode) next(): State getCached(IRNode): Expression

cache(IRNode,Expression) 1

1 Formula 1 parent id: Integer parent PathState origin: Formula origin: PathState facts: List loop: LoopInformation

method: InvokeInformation add(BooleanExpression)

memory: FieldHeap add(BooleanExpression) checkIfTrivial(BooleanExpression): Boolean advance() : State getRequiredCapabilities(): Set

advance(): Formula

clone(): Formula

Figure 3.5: Class structure of symbolic states within SymJEx

3.2.3 Constraint Generation from Graal IR A typical characteristic of symbolic execution engines concerns how they create con- straints from their target language. Whether it is LLVM bitcode [97], Java bytecode [150] or machine code [76, 162]—engines often have to write complex interpreters for their input domain to capture all language semantics. On the other end of the spectrum, theorem provers and solvers traditionally also only accept inputs in predefined formats (c.f. SMT- LIB in Section 3.1.3.1). There are frameworks and bindings to bridge the gap from traditional high-level languages to such proprietary formats—in the Java world notably JConstraints [81] and JavaSMT [88]—but we nevertheless decided to develop our own intermediate representation to store constraints in-engine and later translate them to a solver-compatible format. This allows us to incorporate specifics of the Graal IR and the corresponding classes into the framework and expressions respectively, without having to confine ourselves to a proprietary format. The Graal IR by now defines over 500 node types across all components. In order to 28 3.2. GRAAL SYMBOLIC EXECUTION

generate constraints and enable interaction with solvers, we defined transformations for these nodes into our custom constraint intermediate representation. While our engine is designed to work mostly with high-tier Graal IR, in the end, this resulted in still over 100 different node types for whom we implemented transformations. A transformation takes the current state as well as the target node as an input. Then it maps the behavior of the respective node to logical constraints in the form of our expression framework and adds them to the current state. Depending on the node type, a transformation may spawn multiple new successor states—for conditions or loops—or modify the symbolic memory when memory operations are involved (cf. Section 3.2.4). We denote collections of such transformations as resolvers which generally conform to individual classes. With this pat- tern we gain an additional abstraction layer for our transformations to support extensions such as proving compiler optimizations via symbolic execution—these require a slightly different approach for propagating state and generating constraints (cf. Section 4.1). The different resolvers and their implementations—collectively denoted “formula builders”— are invoked in a visitor structure that relays all defined node types to their corresponding methods. This concept is emphasized in Fig. 3.6, which contains an excerpt from the class structure. As depicted, the different types of node transformations are implemented in different formula builders, which allows us to provide different implementations for different engine modes—seen here in the visualization of BasicDataFlowResolver where one transformation takes into account potentially allocated memory from parameters, while the simplified variant solely handles primitive variables.

Resolver

ArithmeticResolver SymbolicNodeResolver LowTierResolver

resolve(State,AddNode): State resolve(State,VariableNode): State resolve(State,ReadNode): State

resolve(State,MulNode): State resolve(State,AssumeNode): State resolve(State,WriteNode): State

resolve(State,FloatDivNode): State resolve(State,AssertNode): State resolve(State,AddressNode): State

ArithmeticFormulaBuilder SymbolicFormulaBuilder LowTierFormulaBuilder

BasicDataFlowResolver

resolve(State,ConstantNode): State

resolve(State,ParameterNode): State

resolve(State,ConditionalNode): State

SimpleDataFlowFormulaBuilder MemoryDataFlowFormulaBuilder

Figure 3.6: Class diagram of Graal IR transformations in SymJEx CHAPTER 3. SYMBOLIC EXECUTION 29

3.2.3.1 SymJEx Expression IR We capture logical constraints in a strongly typed intermediate representation based on SMT-LIB [25]. The type system maps Java primitives and arrays to SMT-LIB sorts with expressions mostly being based on SMT-LIB functions. All expressions are treated as (immutable) first-class objects that support operations such as cloning or local simplifi- cations. Constraints in our expression format resemble directed acyclic graph structures with variables and constants at the leaves. Our framework additionally provides a ba- sic DSL for generating and combining expressions. During development, we identified several simplifications that heavily reduce our formulas, hence we apply those eagerly upon creation of a new expression. Notably, we propagate and fold constant expressions and apply strength reduction [22] to operations. Additionally, we define (labeled) sym- bolic variables that correspond to program parameters and are later used to map model assignments back to the original inputs.

double a = ...;/ *1*/ Variable a = logic.variable(FloatType double b = ...;/ *2*/ .FP64,"a");/ *1*/ int c = ...;/ *3*/ Variable b = logic.variable(FloatType return ( .FP64,"b");/ *2*/ (a * b)/ *4*/ Variable c = logic.variable( + (c / 3/*5*/)/ *6,7*/ BitVectorType.BV32,"c");/ *3*/ );/ *8*/ Constant three = logic.constant( BitVectorType.BV32, 3);/ *5*/ (a) Java source code Variable result = logic.variable( FloatType.FP64,"result");

3 5 Start c 3 logic.eq(result, logic.add(/ *8*/ 1 2 logic.mul(a, b),/ *4*/ / 6 a b logic.cast(FloatType.FP64,/ *7*/ 7 4 FloatConvert * logic.sdiv(c, three))/ *6*/ 8 + )); Control flow Data flow Return (c) Expression DSL (b) Graal IR

(set-logic QF_BVFP) (declare-const a (_ FloatingPoint 11 53));1 (declare-const b (_ FloatingPoint 11 53));2 (declare-const c (_ BitVec 64));3 (declare-const result (_ FloatingPoint 11 53)) (assert (= result (fp.add RNE;8 (fp.mul RNE a b);4 ((_ to_fp 11 53);7 (bvsdiv c (_ bv3 64)))))); 6,5

(d) SMT-LIB2 formula

Figure 3.7: Transformation from Java source code and Graal IR into expressions and constraints

Fig. 3.7 denotes the steps from the source code that has to be analyzed and the resulting Graal IR to the expressions that SymJEx subsequently produces. In the end, the depicted SMT-LIB2 formula denotes the encoded expression. As shown in the image, 30 3.2. GRAAL SYMBOLIC EXECUTION all operations and variables are statically typed with explicit conversion operations to transform values from one type into another (e.g. an integer result to a floating-point value). This intermediate representation is generated for all operations of a program path and later used for generating SMT-LIB formulas or interaction with proprietary solver bindings. While some expressions translate well from Java to SMT-LIB (or our intermediate rep- resentation), we nevertheless had to overcome minor differences in their semantics. As an example, while in Java the shift operations are limited in their distance due to bit-masking of the operands [77], the SMT-LIB variants can utilize the full width of the correspond- ing sort. Additionally, while both formats follow the IEEE Standard for Floating-Point Arithmetic [15], we also had to incorporate some conventions of the Java specification (predefined rounding modes, mostly unique special values) as well as peculiarities of the Graal IR (floating-point zero constants usually providing no sign information) into our expression format.

3.2.3.2 A Type System for Java Values in SMT-LIB Each of the aforementioned expressions is bound to a specific type that is inferred from the corresponding Graal IR node. As our expression IR should bridge the gap between Java and SMT-LIB, the type system also has to take into account both areas. Our engine covers a large subset of the Java language and feature space. All primitive types are natively supported; we allow bit-precise reasoning over integers of varying widths as well as floating-point values according to (the Java implementation of) IEEE 754 [15]. Therefore, we implemented a type system for storing Java values annotated with SMT-LIB types (“sorts”). Since Java natively treats all integers as 32 or 64-bit values, we try to infer the underlying primitives over the values ranges which are provided by the Graal compiler. This is primarily used to handle boolean values, which are encoded as integers in the JVM, hence requiring type conversions when encoded in SMT-LIB. Objects and arrays are stored in our memory model, where references (“addresses”) are again mapped to numeric types. While we have functions to interact with the string theory extensions of SMT-LIB, strings are typically also mapped to character arrays. In summary, we mostly rely on four SMT-LIB sorts to which all Java structures are mapped: The built-in Boolean sort, fixed-size bit-vectors for all integer types, floating-point and arrays for all memory data structures.

3.2.4 Memory Modeling “Pure” or side-effect free operations such as arithmetic computations and read-access typically can be serialized relatively efficiently. Due to the requirement of being able to transform program behavior into static constraints as well as the constraint format itself, memory operations represent a significant challenge for symbolic execution engines. When memory is referenced symbolically—e.g. array access with a symbolic index, field access of a symbolic object—the potential value has to be encoded based on this symbolic reference. The underlying solvers typically also mostly support “simple” theories such as linear algebra and basic (immutable) structures in the form of arrays or tuples and quantification is not widely adopted due to its limitations [25]. Thus, it is not an option to formulate memory effects directly in a solver-specific format. Therefore, most engines implement sophisticated memory models tailored to their corresponding language semantics [34]. CHAPTER 3. SYMBOLIC EXECUTION 31

Already in 1976, King [91] proposed several solutions to the so-called variable storage- referencing problem. This manifests in memory operations, where different variables may reference the same memory locations (“pointer aliasing”), with both being affected by modifications simultaneously. Notably, King suggests to either split up the symbolic execution process on each memory access based on the potential reference target—similar to how engines typically split up execution on conditionals for each branch—or to encode the conditions and variants in the values themselves. Both approaches however have their respective downsides: The former model would essentially lead to a significant expansion of the state space with the benefits of achieving exhaustive coverage of all possibilities. Lazy initialization is one way to improve upon this idea, by keeping memory (objects in particular) “uninitialized” until they or their contents/fields are accessed [89, 61]. Then, an engine would fork different successor states to denote the possibilities for the actual object contents, namely one case where it is actually uninitialized, one where it references an already known and mapped object, and another one where a “new” memory region is targeted—i.e., not previously referenced by any other object known at this point of the analysis. The second alternative trades additional state splits and forks for more complex formulas by encoding the variation directly as constraints. This means that every memory access may not only contain symbolic or constant values, but also complex constraints [151]. As an example, an array access on an array with known contents—arr = [1, 2, 3]—may be encoded via arr[i] := IF i = 0 THEN 1 ELSE IF i = 1 THEN 2 ELSE 3, which can subsequently be translated into constraints for evaluation with a solver. The problem of this approach is that every memory modification has to be encoded for each variant as well, again leading to more complex constraints. The memory model in SymJEx uses a mixture of those variants, mostly using encod- ings as proposed in the latter approach, with basic support for forking and lazy initializa- tion as in the former mention. Modeled after the Burstall-Bornat ‘component-as-array’ model [42], we encode distinct memory regions—such as potential values of a field, specific types of arrays—as separate arrays, which allows us to use the SMT-LIB ‘ArraysEx‘ the- ory [25] to reflect our memory as actual constraints and allows us to distinguish memory operations on different types via different arrays. 1 class Stack { 2 int top; 3 int[] values; 4 /* ... implementation ... */ 5 } 6 7 Stack s0 = ...; 8 Stack s1 = ...; 9 combine(s0, s1); Listing 3.10: Simple definition of a Stack type and an example call

To give an exemplary use case for this memory model, Listing 3.10 contains the source code of a basic implementation for a stack data structure for storing integer values. The stack consists of an array of the values as well as an index denoting the current top of the stack. The snippet also contains an exemplary method call that combines two such stack objects. When we symbolically execute this method, we would then have to reason about the memory occupied of those objects. Assuming no prior knowledge about the parameters, multiple different cases are possible and visualized in Fig. 3.12. First, Fig. 3.8 depicts the situation when both stacks are properly initialized and 32 3.2. GRAAL SYMBOLIC EXECUTION

Stack int[] .top .values array length REF Stack 3 2 0 0 2 int[] s0 1 .top .values array length 1 0 2 0 4 REF 3 2 s1 3 0 1 s0 1 1 0 2 0 4 s1 0 null

Figure 3.8: Distinct references—no aliasing Figure 3.9: One reference is null

Stack int[] Stack int[] .top .values array length .top .values array length REF REF 3 2 s0 0 null s0 1 1 0 2 0 4 s1 0 null s1 1

Figure 3.10: All references are null Figure 3.11: Aliasing between references

Figure 3.12: Potential memory settings when combining two stacks point to different memory locations. As mentioned before, our model encodes memory in arrays that are created for each possible field or array type. Thus, in this example, the memory encompasses four distinct arrays: The top field, the references of the value array, and the array values and length fields themselves. For this first case, the variables therefore denote the positions of the elements within each array (similar to “real” memory addresses). As the values field represents another array, it again only stores the reference ID, once again denoting a position in the corresponding target array. Figure 3.9 shows the example for the case that one reference is null. We represent this in our memory model by assigning 0 as the corresponding reference value. Figure 3.10 then denotes the case if both elements are null, hence not referencing any actual memory contents. Finally, Fig. 3.11 describes the memory settings if the values actually denote the same memory position—i.e., aliasing references. This pinpoints the problem of managing memory, as each change to such a value in memory now has to be reflected in our memory model respectively all other reference variables as well. In symbolic execution problems, we typically cannot precisely determine which of those cases applies to individual variables. Therefore, we use the aforementioned encod- ings to directly represent this variability in our reference variables, which may now be symbolic variables with predefined limits or ranges. In the above example, we could—if no other objects of this kind are possible at this point in the analysis—limit the value of both references to either 0 (for null), the first or the second memory position, with any combination being possible. In contrast to actual memory addresses, this is now easy to express via logical constraints.

3.2.5 Constraint Solver Interface We manage connections to the supported SMT solvers—“solver backends”—using a ded- icated abstraction layer. This uses the aforementioned constraint IR to either generate raw SMT-LIB strings or to interact with proprietary solver bindings via custom formats. The basic interface that we provide for all solvers therefore merely spawns the correspond- CHAPTER 3. SYMBOLIC EXECUTION 33

ing solver instance in a new process with communication in the form of SMT-LIB strings over the input and output streams. While most solvers currently lack the corresponding— Java—connection libraries, we also allow interaction with proprietary Java bindings such as Z3’s, where we translate our expression IR into their custom expression format and interact with the solver over the Java Native Interface (JNI), to bypass the SMT-LIB generation and parser. This increases the extensibility of the system by allowing fast integration of new backends and reduces the reliance on specific solver implementations. All our interfaces support the basic SMT-LIB commands (check-sat, push, pop, etc.), thus enabling fine-grained control over the solving process during analysis. We furthermore utilize incremental solving, where we try to reduce the solver overhead by reusing connections as much as possible. Thereby, we use push and pop to create and drop new, temporary solver contexts for individual checks and branches. This optimization also allows the solver to reuse results and insights of already passed constraints, without having to evaluate the whole formula again at each new constraint. We encode all of our generated formulas in the quantifier-free theorems of fixed-size bit-vectors, IEEE 754-2008 floating-point, and arrays. While universal quantification is useful to constrain unbounded arrays and loops, such formulas are in general harder to solve due to increased complexity [94, 116] and only enjoy very limited solver support. As an example, we provide the SMT-LIB formula for the error path of isPalindrome (cf. Fig. 3.1) in Listing 3.11. In this case, the formula depicts the program path which causes an index-out-of-bounds error at run time due to an invalid loop condition in the source code. Memory operations are encoded using the names of the fields and types which they reference, while the different iterations of the loop counters are represented via multiple variables. Due to our memory model, in this case, we require array-in-array support to capture the (symbolic) character array that represents the string. Similarly, the formula also defines the values of the array length in an additional symbolic array. This represents the length field of the Java array, where each entry denotes the length of a potentially allocated character array. The variables |s.value|, |str.length|, |str .chars| then respectively represent the concrete allocation identifier of the parameter that references an entry in the |char[]| memory region as well as the concrete length and array values. As shown in the example before, this formula actually is satisfiable, hence the error state is indeed reachable. 34 3.2. GRAAL SYMBOLIC EXECUTION

1 ; declare required theories(logic): Quantifier-free Arrays and Fixed-size Bit-vectors 2 (set-logic QF_ABV) 3 ; variable declarations 4 ; concrete reference of input variable 5 (declare-const s (_ BitVec 16)) 6 ; memory settings for all String.value fields 7 (declare-const |String.value| (Array (_ BitVec 16) (_ BitVec 16))) 8 ; memory settings for all character array length fields 9 (declare-const |char[].length| (Array (_ BitVec 16) (_ BitVec 32))) 10 ; memory settings for all character arrays(character is 16-bit) 11 (declare-const |char[]| (Array(_ BitVec 16) (Array (_ BitVec 32) (_ BitVec 16)))) 12 ; concrete reference of input value field 13 (declare-const |s.value| (_ BitVec 16)) 14 ; concrete value of input string length 15 (declare-const |s.length| (_ BitVec 32)) 16 ; concrete values of input array field 17 (declare-const |s.chars| (Array (_ BitVec 32) (_ BitVec 16))) 18 ; all permutations of loop variables 19 (declare-const j0 (_ BitVec 32)) 20 (declare-const j1 (_ BitVec 32)) 21 22 (assert (not (= s (_ bv0 16)))); isPalindrome():2;s != null 23 (assert (= |s.value| (select |String.value| s))) 24 (assert (not (= |s.value| (_ bv0 16)))) 25 (assert (= |s.length| (select |char[].length| |s.value|))) 26 ; isPalindrome():4;s.length() >=2 27 (assert (bvuge |s.length| (_ bv2 32))) 28 ; ------iteration0 ------i0= 0; j0=s.length()-1 29 (assert (= j0 (bvsub |s.length| (_ bv1 32)))) 30 (assert (not (= (_ bv0 32) j0))); isPalindrome():6; i0 != j0 31 ; String.charAt(0):14; length>0 32 (assert (and (bvslt (_ bv0 32) |s.length|) (bvslt j0 |s.length|))) 33 ; isPalindrome():7; charAt(i0) == charAt(j0) 34 (assert (= |s.chars| (select |char[]| |s.value|))) 35 (assert (= (select |s.chars| (_ bv0 32)) (select |s.chars| j0))) 36 ; ------iteration1 ------i1= 1; j1=s.length()-2 37 (assert (= j1 (bvsub |s.length| (_ bv2 32)))) 38 (assert (not (= (_ bv3 32) |s.length| ))) 39 (assert (and (bvslt (_ bv1 32) |s.length|) (bvslt j1 |s.length|))) 40 (assert (= (select |s.chars| (_ bv1 32)) (select |s.chars| j1))) 41 ; ------iteration2 ------i2= 2; j2=s.length()-3 42 (assert (not (= (_ bv5 32) |s.length|))) 43 ; String.charAt(2):14; length <=2; Exception Branch 44 (assert (bvsle |s.length| (_ bv2 32))) 45 (check-sat); sat Listing 3.11: SMT-LIB2 formula for analysis of the error branch in isPalindrome The corresponding SMT-LIB2 code for the second symbolic execution example— Fig. 3.2—is shown in Listing 3.12. Here, both error cases are explored in one formula by utilizing incremental solving: Via the push and pop commands we can open up new assertion scopes that contain expressions and checks only valid for one branch, and sub- sequently pop them again from the assertion stack, therefore checking satisfiability of both error (null pointer and index-out-of-bounds error) paths. As presented in Fig. 3.2, both errors can occur at run time. CHAPTER 3. SYMBOLIC EXECUTION 35

1 ; declare required theories(logic): Quantifier-free Fixed-size Bit-vectors 2 (set-logic QF_BV) 3 ; variable declarations 4 ; memory operations not required for error branches, hence no actual field- heap variable is declared- no actual array operations required(cf. QF_BV without array-support) 5 ; concrete reference of input variable 6 (declare-fun arr () (_ BitVec 16)) 7 ; concrete value of input array length 8 (declare-fun |arr.length| () (_ BitVec 32)) 9 (declare-fun i () (_ BitVec 32)) 10 11 ; input reference either denotes valid memory position or is null (0) 12 (assert (or (= arr (_ bv0 16)) (= arr (_ bv1 16)))) 13 ; array length field is always positive(regardless of whether array is null) 14 (assert (bvslt (_ bv4294967295 32) |arr.length|)) 15 (assert (not (bvslt i (_ bv0 32)))); insertTwice():2;i >=0 16 17 (push); incremental solving branch for null-pointer error 18 (assert (= arr (_ bv0 16))); insertTwice():5; arr == null; null pointer exception 19 (check-sat); sat 20 (pop); clear assertions since last push(restore state before branch) 21 22 (push); incremental solving branch for index-out-of-bounds error 23 (assert (= arr (_ bv1 16))); insertTwice():5; arr != null 24 (assert (bvslt (bvadd i (_ bv1 32)) |arr.length|)); insertTwice():5;i+1 < arr.length 25 (assert (not (bvult i |arr.length|))); insertTwice():8;i >= arr.length ; index may be larger than array size 26 (check-sat); sat

Listing 3.12: SMT-LIB2 formula for analysis of both errors in insertTwice

3.2.6 Limitations At the point of writing this thesis, SymJEx is still very much in active development. Despite evaluation showing promising results and support of most functionality of related engines, we still see much potential in improving and extending the project. Nevertheless, in its current state, the engine still has some severe limitations, here categorized into three distinct areas: Limitations regarding the input domain, such as assumptions to the program under analysis; problems in the reasoning process such as memory management and performance; indirect restrictions due to the dependency on proprietary formats and solvers and their inherent limitations and requirements.

3.2.6.1 Restrictions on Input Domain Our implementation is currently restricted to the Java environment. Wile GraalVM inher- ently supports polyglot execution and also AST interpretation via the Truffle framework, as of yet we cannot utilize those concepts to their full extent. This is due to the lack of information that is initially present in Truffle AST nodes, which are gradually specialized via self-optimization at run time [157, 159]. Hence, our engine would have to work on 36 3.2. GRAAL SYMBOLIC EXECUTION mostly untyped and abstract nodes for dynamically-typed languages such as JavaScript or Ruby. Another limitation concerns environment modeling. While GraalVM provides intrin- sics and optimizations for IO operations and native (JNI) calls, SymJEx currently is unable to reason beyond such boundaries. For JNI, due to the lack of information about the implementation of the corresponding native methods, we obviously cannot determine the actual memory effects and results. IO operations suffer from a similar problem, where we currently do not provide any form of a symbolic file system to emulate those operations without requiring actual disk access. Furthermore, we assume that the application is single-threaded. While this is sufficient to analyze a variety of implementations, it also prevents us to identify concurrency issues such as race conditions or deadlocks. Also, introspection features such as Java reflection are again only implemented to the extent that prominent functions from the Java Class Library (collections, arrays) use them.

3.2.6.2 Limitations of the Reasoning Process In addition to limitations due to our target domain, the engine also suffers from prob- lems well-known in the field of symbolic execution. Loops and recurring control flow are a prominent source of problems, as they can quickly cause state explosion due to the exponential increase of the search space. GraalVM allows us to perform aggressive optimizations to prevent this risk by inlining control flow, unrolling loops, or peeling individual loop iterations. Nevertheless, when such control flow structures remain in the Graal IR, we can only use approximation techniques (e.g. cutting off the loop after a predefined number of iterations or loosening the constraints) to continue analysis, while potentially risking false-positives. As an example, the engine may report errors that are not valid due to symbolic variable mutations or memory modifications in loops.

3.2.6.3 Limitations in Solving Other restrictions in our engine stem from the underlying solvers and input encoding: Most prominently, to our knowledge, there are no widely supported theories for strings or structures and even floating-point is still considered experimental. Additionally, arrays in SMT-LIB behave differently from typical Java arrays as they commonly are not en- dued with a fixed length. SymJEx therefore uses workarounds to limit array lengths and perform the corresponding checks. The stateless nature of SMT-LIB2 additionally limits the supported array (and thus string) operations: Length-dependent operations such as concatenation and copying have to be approximated whenever the corresponding bound- aries cannot be resolved directly, as there are no equivalent operations in the SMT-LIB standard and we do not want to use quantifiers to express those constraints logically. Besides SMT-LIB, there are other formats such as BTOR2 [118] that support state manipulations in the form of register and memory modifications, however, they are still not widely supported by solvers and also restricted in their respective theory support. 37

Chapter 4

Symbolic Execution for Compiler Optimizations

Application scenarios for symbolic execution in today’s software development processes are manifold. Researchers continuously develop or improve symbolic execution engines, produce better techniques, and extend the field to new language domains and usage sce- narios. Symbolic execution has established itself as a reliable and precise tool to improve application security, as evidenced by the numerous achievements in finding weaknesses and flaws in core software systems [115, 43, 143, 125]. Since most implementations operate on intermediate representations of programs, symbolic execution engines are already trivially linked to compilers by benefiting from optimizations and refinements that apply when generating the corresponding IR. With symbolic-execution-based compiler optimizations, we aim to show how we can reverse this link to use symbolic execution as an extension to traditional compilation and program optimization by proving optimization patterns symbolically. Specifically, the definition of a mapping of Graal IR expressions to formulas (cf. Section 3.2.3) in combination with predefined optimization targets, allows us to prove optimizations using high-performance theorem provers, as common in traditional symbolic execution. This logical mapping has the added benefit of inherently allowing global reasoning by incorporating all conditions and operations in a method as opposed to typical optimizations. This chapter explores the details of this approach as well as our implementation. Be- fore we give an overview of the technical nature of our compilation phase and the different optimizations, the next section describes the theoretical background and deviations from the engine in terms of graph traversal, formula generation, and solver interaction.

4.1 Unbounded Reasoning

The most basic idea of many symbolic execution engines and applications is to determine inputs (models) that cause faults in individual program paths. At first glance, this is in direct contradiction to the principal idea of program optimization, namely introducing transformations for instructions and instruction sequences that apply in general, i.e., for all program paths that cross the specific section in the code. While researchers have proposed ideas to also apply traditional optimizations similarly—even in GraalVM itself with the idea of simulation-based duplication [99]—compiler optimizations are still mostly based on general code invariants that apply regardless of the run-time path context— escape analysis for objects that do not require heap allocation, constant folding and loop 38 4.1. UNBOUNDED REASONING

unrolling representing examples thereof. For merging these two approaches, we therefore have to adapt the traditional way of symbolic execution. Instead of proving that there exists an input combination such that the program exhibits erroneous behavior due to an issue on the corresponding program path, we have to provide evidence that a respective optimization is valid for all possible inputs or states of an operation. Consequently, there are two key points that have to be adapted: First, path-based rea- soning (as in SymJEx) traditionally involves following individual program branches, e.g. the true and false branches of a condition. While we still have to incorporate the se- mantics implied by the condition in this example into any generated formula, we also face the problem of having to observe both branches simultaneously. As different branches may merge again at some point, both branches and their corresponding effects (variable assign- ments, memory modifications) have to be considered to enable a proof for an optimization. Figure 4.1 contains a simple Graal IR including a merging path

i 11 between two conditions. At run time, SymJEx would traverse the paths through this IR and accumulate the corresponding con- < straints. For one path, the engine could perform in-engine sim- If plifications to reduce the second condition, since i ≥ 10 results true false in the second condition 10 < 1, which is false, thus simpli-

Merge fied due to our constant propagation. The engine can therefore phi 1 prune the corresponding true branch at run time and reduce < its search space. While this may also seem like a viable target If for optimizations, our requirement of generating a single formula true false prevents us from doing so. As we observe all program paths si- Figure 4.1: Graal IR multaneously, we have to encode the corresponding conditionals merge sample as well as the subsequent phi result—i < 11 =⇒ phii = i, i >= 11 =⇒ phii = 10. Similarly, the second condition has to be encoded but as we do not know the actually taken program path, we cannot propagate any constants to optimize a branch or even prune it com- pletely. While there are minor simplifications possible for this example, it nevertheless shows how we have to merge constraints from different branches at the cost of losing the ability to specialize formulas for individual program paths. These requirements furthermore entail a change in how we generate constraints. Where traditional engines again follow individual branches and encode their semantics directly, the aforementioned updates to the traversal method demand a slightly different approach. Operations on different branches are incorporated into a single formula and therefore have to be adapted to encode the branching behavior directly in the constraints. While SymJEx primarily supports path-based reasoning, we extended it to also sup- port another method of analysis: Unbounded reasoning. In this mode, we do not visit and check individual program paths for errors, but perform a single traversal over the whole IR and generate a holistic formula capturing the whole program behavior at once.

4.1.1 Dominator-based traversal As described in Section 3.2, SymJEx follows individual paths through the IR of a method by iterating over all nodes within a basic block on said path. For creating one holistic formula, we therefore adapted this approach to use the dominator-relation as described in Section 2.2.1.1 instead. For our purposes, we use the dominator tree to perform graph CHAPTER 4. SYMBOLIC EXECUTION FOR COMPILER OPTIMIZATIONS 39

Start i 0 value Control flow Start i 0 value

< (B0) If Data flow < If true false Basic block true false Read#Length Read#Length If < (B1) < false true |<| If If false true false true (B2) If |<| LoadIndexed false true

Merge LoadIndexed

(B3) (B4) Merge Merge Merge Exception Return Exception Exception Exception Return

Path 1 Path 2 Path 3 Path 4

Figure 4.2: Path-based vs. dominator-based traversal traversal and iterate over the basic blocks of an IR. Figure 4.2 highlights the differences in our traversal methods using the example of the Graal IR of java.lang.String.charAt from the Java Class Library. As depicted in the “expanded” version on the left, symbolic execution in SymJEx would follow the four individual branches to the respective control flow ends and accumulate constraints, while checking for potential errors. In dominator-based traversal however (right-hand side of Fig. 4.2), we visit each basic block of an IR in the order of the dominator-tree. In the example, this would result in the following order of visited blocks: B0 → B1 → B2 → B3 → B4. The dominator tree here ensures that the input of every node is always visited before the node itself. Additionally, the Graal compiler already provides such a schedule for all basic blocks and the nodes within. The main benefit of this approach is, that it allows us to traverse a method’s IR exactly once, while still being able to visit all nodes. Nevertheless, it complicates loop handling as we lose the ability to reiterate paths through loops.

4.1.2 Block-based Constraints Generation of constraints now also has to take into account the branching and merging behavior of the underlying program. We therefore encode this directly in the formula by assigning constraints always to the basic block of the respective operation. Thereby, we introduce additional boolean symbolic variables for all basic blocks of the program. Those essentially encode the branching behavior / the control flow of the whole program. To combine the now encoded program paths with the corresponding data dependencies (e.g., conditions, inputs), we logically link the operations within specific basic blocks to these control flow variables—i.e., an operation and its corresponding constraint only have to hold for all paths through the respective basic block. Thus, we get one single formula for the solver to evaluate. The difference in semantics to our previous implementation is highlighted in the for- mula that is generated for the analysis of String.charAt (Fig. 4.2). Listing 4.1 shows the constraints for each of the possible program paths using path-based reasoning, resulting 40 4.1. UNBOUNDED REASONING

in four distinct formulas1. 1 Path 1: 2( i < 0) 3 4 Path 2: 5 ¬(i < 0) ∧ ¬(i < value.length) 6 7 Path 3: 8 ¬(i < 0) ∧ (i < value.length) ∧ ¬(|i| < value.length) 9 10 Path 4: 11 ¬(i < 0) ∧ (i < value.length) ∧ (|i| < value.length)

Listing 4.1: Path-based reasoning over String.charAt The formulas for unbounded reasoning are shown in Listing 4.2. B0–B4 denote the aforementioned basic block variables. In addition to the constraints that are now bound to (“implied by”) these basic blocks, the formula also shows how we recreate the actual control flow via those variables. While conditions before impacted the whole formula (i.e., if one condition in the path-based reasoning approach does not hold, the actual branch is unreachable), they now influence the setting of the basic block variables by specifying the successors that are taken depending on the result of the condition. While this system may seem counter-intuitive to tackle loops, we compensate for this problem by transforming all loop effects that are used in subsequent operations (phi nodes) into unbounded symbolic variables to prevent false-positives. This is required anyway, as we cannot model multiple loop iterations due to the lack of quantification or summarization of loops. With that, we also do not prevent optimizations in sequential control flow within loops if they do not depend on the loop variables (or are sound regardless). 1 Case distinction per branch − only one successor is possible in a single path ( B3 ∧ ¬B1 if (i < 0) 2 B0 =⇒ ¬B3 ∧ B1 otherwise 3 4 Visiting B1 is only possible when B0 was visited as well 5 B1 =⇒ B0 ( B2 ∧ ¬B3 if (i < value.length) 6 B1 =⇒ ¬B2 ∧ B3 otherwise 7 8 B2 =⇒ B1 ( B4 ∧ ¬B3 if (i| < |value.length) 9 B2 =⇒ ¬B4 ∧ B3 otherwise 10 11 B3 is a merge block − exactly one of the given blocks must have been visited before 12 B3 =⇒ (B0 ∧ ¬B1 ∧ ¬B2) ∨ (¬B0 ∧ B1 ∧ ¬B2) ∨ (¬B0 ∧ ¬B1 ∧ B2) 13 14 B4 =⇒ B2

Listing 4.2: Unbounded reasoning over String.charAt Similar to our extensions regarding traversal, such block-based constraints have the inherent benefit of covering the whole method in a single traversal of the IR. Conse- quently, we can also reduce our interaction with the underlying constraint solver, as

1|.| denotes the magnitude of a value—with the assumption that value.length cannot be negative, the condition amounts to an unsigned comparison of integers CHAPTER 4. SYMBOLIC EXECUTION FOR COMPILER OPTIMIZATIONS 41 every individual condition is only checked, once as opposed to on every program path leading through. However, the example already shows how the formula itself is more complex than one for an individual path. This in turn can impact the solving times quite significantly. Due to our encoding of control flow behavior, we also limit the potential for in-engine optimizations, as we cannot simply propagate or nest expressions anymore that depend on preceding branches. Nevertheless, in our opinion, the aforementioned benefits compensate those problems and also simplify the actual implementation.

4.2 Optimization Algorithm

Using unbounded reasoning, we developed an algorithm for performing compiler opti- mizations using symbolic execution. Inherently, our approach consists of two parts: The navigation through the IR that accumulates the formula and the optimization templates that are invoked for individual nodes. The overall working process is depicted in pseu- docode in Listing 4.3. For a given method’s Graal IR, we create a symbolic state (as described in Section 3.2.2) that contains all generated constraints as well as cached ex- pressions for already resolved Graal IR nodes. Then, we iterate over the Graal IR in order of the dominator relation, with the minor distinction that we also ensure that merge-blocks are always visited after corresponding branches. Otherwise, this would vio- late the invariant that each node is visited before its usage. This is exemplified in Fig. 4.2, where we ensure that B3 is only visited after both B1 and B2 have been visited as well, despite not being dominated by them. For each node, the phase invokes a slightly cus- tomized formula builder of SymJEx and generates the corresponding constraints, adding them to the current state. Additionally, the phase checks whether there is an optimiza- tion available for the current node—typically, individual optimizations are registered for specific node types (e.g. LogicNode, AddNode). Each optimization works on the current formula and temporarily adds its individual constraints to encode the corresponding optimization. In contrast to symbolic execution, where we invoke a solver to determine whether an erroneous program state is satisfi- able/reachable via some input assignment, we use the solver to check if an optimization holds for all possible cases. As an example, an optimization that checks whether a given condition is always true has to prove that there is no case where this assumption does not hold. To once again avoid quantification, optimizations usually check the existence of the inverse case—for the example from before, this implies a check whether the corresponding condition may be false. An optimization succeeds if one of the aforementioned checks passes. Then, depending on the implementation, the corresponding node is modified to reflect the optimization in the Graal IR. Currently, most optimizations perform rather simple operations such as introducing new (constant) nodes, replacing existing ones, or modifying and rewiring inputs. Regardless of whether an optimization was applied to the current node, the phase then repeats the above process for subsequent nodes and basic blocks. The exit criteria for the iteration is either that the whole IR was traversed or that a user-defined limit or restriction (maximum number of timeouts, solver calls, etc.) was met. Finally, we still have to ensure that the resulting Graal IR is again valid and that all performed optimizations and replacements are appropriately propagated. 42 4.3. SYSTEM ARCHITECTURE

1 SymExBasedOptimization(graalir) 2 # create base state for storing constraints 3 state := CREATE_INITIAL_STATE(graalir) 4 FOR EACH basic block in reverse post-order of the dominator tree DO 5 FOR EACH node in schedule of basic block DO 6 CREATE_FORMULA(state, node)# invoke SymJEx 7 optimization := SELECT_OPTIMIZATION_STRATEGY(node) 8 IF optimization is available THEN 9 APPLY optimization(state, node) 10 END 11 END 12 IF graalir was modified THEN 13 CLEANUP(graalir)# scheduling, propagate changes 14 END 15 16 # basic template for all optimizations(e.g. arithmetic, logic) 17 Optimization(state, node) 18 formula := GET_CURRENT_FORMULA(state) 19 ADD_CONSTRAINTS(formula)# express corresponding optimization via formula 20 IF formula is sound THEN # check if optimization applies 21 UPDATE(node)# reflect optimization in actualIR 22 END Listing 4.3: Pseudocode of our symbolic-execution-based compilation phase

4.3 System Architecture

Formula accumulation Symbolic Formula State

4 express potential 1 create formulas from nodes optimizations as constraints 2 propagate state Z3 Resolvers Optimizations

AddNode arithmetic Graph Solver API CVC4 Traversal IfNode logic Graal IR MulNode floats use solver to prove 5 Boolector optimization

3 determine optimizations for node

6 mutate graph if optimization is sound Optimization

Figure 4.3: Schematic overview of our implementation of symbolic-execution-based com- piler optimizations

We integrated parts of the previously developed symbolic execution engine, SymJEx, into a newly developed compilation phase for the Graal compiler. To provide a layer of abstraction from the Graal compiler core, our approach was designed to be usable in a plug-and-play fashion. Therefore, our system is wrapped into a custom compiler configuration—an infrastructure provided by GraalVM that is accessible via command- CHAPTER 4. SYMBOLIC EXECUTION FOR COMPILER OPTIMIZATIONS 43

line arguments. This enables an opt-in approach for users and does not interfere with the default workflow of Graal. The overall architecture of our phase and its components are depicted in Fig. 4.3. Generally, our approach is roughly divided into two parts: First, we have the formula accumulation process. This component handles the traversal and contains the interface to SymJEx. The latter is used to derive constraints and propagate the state accord- ingly, while also managing the solver connection. Second, we have the optimization process that is invoked in-between traversal steps, whenever a node is found for which an optimization is available. Optimizations are designed as decoupled modules, typically performing optimizations on one type of Graal IR node. This allows for future extensions and additionally simplifies testing of individual optimization patterns. Furthermore, we can selectively enable or disable individual optimizations which is especially useful for evaluation and benchmarking. An optimization uses the current state and the formula up to this point and uses a theorem prover to verify its claims or constraints. If such an optimization is proven, the nodes are modified to reflect those optimizations onto the Graal IR.

4.4 Symbolic Optimization Phase Implementation

Graal presents a uniform interface for all phases that perform mutations on a Graal IR. As all such phases are executed at run time on a number of different hardware in various compilation threads, there are, however, some requirements to prevent negative impacts on the JIT compiler. First and foremost, each phase has to emit valid Graal IR for its current level (as later phases may accept/require other nodes than ones that operate on the high-tier). This includes—for the level at which our approach operates—a valid schedule that defines a partial order over an IR’s basic blocks, respectively its nodes (cf. Section 2.2.1.1). Our algorithm in Listing 4.3 therefore has to perform a cleanup step, where it ensures that the Graal IR is once again in a valid state. Our phase is strictly designed to work on low-tier Graal IR, thus it is executed at the very end of the low-tier context—the last of three compilation levels within the Graal IR. This stage simultaneously represents the end of the Graal compiler frontend. In the subsequent backend, the compiler generates machine code for the corresponding Graal IR. As the different compilation levels are usually executed strictly sequentially, we therefore also have to ensure that the IR only contains nodes allowed at this stage. As an example, at the low-tier, memory access nodes are already annotated with their corresponding address offsets. Thus, we cannot emit a “higher-level” allocation node, as this would cause issues in subsequent compilation steps. Graal may spawn several interleaving compiler threads, so a compiler phase has to be designed thread-safe when accessing shared resources (notably reporting and debugging tools). Every time this phase is started, it therefore initializes a separate solver connection to a predefined backend (Z3 is used by default) and sets up the state and formula. The incoming Graal IR is assumed to be already scheduled—the dominator relation is consequently taken from Graal. More specifically, we traverse the individual block nodes in reverse post-order, which ensures that—also within basic blocks—inputs are always visited before usages. For the example in Fig. 4.2, this results in the following order for the nodes in basic block B0: Start → i → value → 0 → < → if. As optimizations gradually modify the Graal IR and may remove or add nodes, we also check for each 44 4.4. SYMBOLIC OPTIMIZATION PHASE IMPLEMENTATION

node, whether it is still “alive”, i.e., if the node is still used in the IR. At this point, the phase invokes a slightly modified formula builder from SymJEx that resolves the current node to a logical constraint, which is appended to the formula. Our extensions in that regard allow us to handle low-tier IR nodes or create symbolic variables where we cannot determine actual constraints. Currently, our approach does not feature a memory model, as the memory semantics at this level differ from those at the Graal high-tier (for which SymJEx implements its memory model). Therefore, we also extend the existing resolvers to handle memory effects appropriately. As an example, we perform no allocation or type assignment for parameters and constants that are reference types, but simply denote them as unbounded symbolic values. The result of this process is a symbolic state containing constraints for all nodes up to this point. Optimizations are invoked depending on the current node. During traversal, we check whether an optimization is available for a given node type and then apply it. Each opti- mization receives the current symbolic state (including the formula), the target Graal IR node as well as a reference to the solver connection. We realized different optimizations by extending a common optimization template. The design thereof and the individual implementations are described in more detail in Section 4.4.2. In any case, each optimiza- tion reports its status and whether it has modified the Graal IR. This process repeats until all nodes are visited, an error occurs or a user-defined exit criterion is reached. After the traversal is done, we apply a canonicalizer on the Graal IR. This built- in compiler phase performs more aggressive optimizations such as pruning of branches, constant folding, and simplification of conditions. Canonicalization therefore typically causes the majority of changes in the graph, as previously inserted or replaced constants get propagated and redundant branches are removed. We apply this step only once at the end to ensure that our optimizations and changes to the IR are still linear compared to the number of actual IR nodes within a method. The compilation phase also has to ensure that the resulting Graal IR is valid. There- fore, we invoke a special scheduler after canonicalization. This Graal phase computes the already mentioned dominator relation of the basic blocks and subsequently schedules all Graal IR nodes within. This ensures that newly added nodes (e.g. replacements due to optimizations) are correctly recognized by the IR and added to the appropriate basic block.

4.4.1 Error Handling As any error in a compilation phase subsequently may prevent compilation of the method as a whole, it is vital to prevent foreseeable exceptions and perform frequent error checks. In our case, we have to distinguish between two types of errors, namely solver errors and Graal errors. The former kind is raised whenever the solver interface—regardless of implementation (text-based, native, etc.)—reports an error or a connection is interrupted prematurely. Typical scenarios involve memory issues on the solver side or unsupported constructs (theories or logics—cf. Section 3.1.3.1). Additionally, we have several expected errors for timeouts, which are handled in-engine, depending on the current use case. None of these errors actually impact the current Graal IR. Solver timeouts are treated as if the solver reported an optimization to be invalid, as to not make false assumptions and the analysis itself is simply interrupted if a different solver error occurs. In any case, the Graal IR should still be valid after subsequent canonicalization and scheduling. Graal errors normally arise when the IR becomes inconsistent. Dangling control flow CHAPTER 4. SYMBOLIC EXECUTION FOR COMPILER OPTIMIZATIONS 45

nodes, type mismatches, or the aforementioned unexpected nodes types for a compilation tier all can result in a compilation issue. Thus, we cannot guarantee a valid IR—and there- fore correct execution—anymore and have to interrupt compilation completely. While we developed a sophisticated test suite to mitigate such errors, we can never fully rule them out.

4.4.2 Optimization Template Design The premise of our work included multiple optimizations for a variety of Graal IR nodes. Therefore, we designed them based on a generic template to minimize the ef- fort of introducing new optimizations. The basic template is given in Listing 4.4. An OptimizationModule essentially captures optimizations on a single Graal IR node type (or subclasses) and uses multiple implemented Optimizers to check for specific optimiza- tions based on individual values and expressions. The module, in any case, first sets up the environment by cloning the current formula to prevent interference with the actual constraints. As shown in Line 21, we then use the control flow variables of our formula (cf. Section 4.1.2) to assert that the current node (or its basic block) is indeed visited. Due to our encoding, this ensures that only constraints on paths that reach this node are active. Then a template method is called with those arguments, which performs the actual checks and optimizations in the corresponding implementation/subclass. Additionally, this template class defines a helper construct to perform simple checks based on a given Optimizer. Implementations use this interface in a strategy pattern to define individual optimizations. The framework in the base class OptimizationModule first loads the cached expression for a corresponding node and then uses the provided optimizer to add constraints to the formula that define the optimization (or rather its validity) logically (Line 28). We formulate such constraints by assuming the case that the optimization does not hold and then check for satisfiability. As a consequence, we can avoid quantification and verify each optimization in a single check. Therefore, the solver is used to determine whether the inverse case is indeed unsatisfiable (i.e., the optimization always holds). We also ensure that this check is conservative, meaning that it only passes if the solver reports strict unsatisfiability, to exclude timeouts or potential errors. If this is not the case, this means that there is no assignment possible, where the optimization does not apply. Then, the optimization is actually valid and the process can continue. Subsequently, the replacement node is determined (again abstracted via the optimizer in Line 32) and added to the Graal IR. As a final cleanup step, we also have to ensure that subsequent optimizations “know” about this change—i.e. that the constraints mirror the behavior of the Graal IR again. Thus, we use the optimizer again to replace the target node as well as update the corresponding expressions in the formula and expression cache, as depicted in Line 38.

4.4.2.1 Implemented Optimizations Our design allows us to add new optimizations for specific nodes with only minor over- head. By now, we support 19 different optimizations that cover a large variety of Graal IR node types, with each optimization performing a number of different checks for spe- cific values or patterns. While the full list also contains more experimental optimizations such as constant input checks on Graal intrinsics (e.g. for Math.sin), we will showcase the different supported checks for a selected number of optimizations in the next few 46 4.4. SYMBOLIC OPTIMIZATION PHASE IMPLEMENTATION

1 abstract class OptimizationModule { 2 interface Optimization { 3 Node getReplacement(Node node); 4 void addOptimizationExpression(Formula formula, Expr expr); 5 void onOptimizationSuccessful(State state, Node replacement, Expr expr); 6 } 7 8 ExpressionDSL logic =/ * ... */; 9 Solver solver =/ * ... */; 10 11 // template method that is implemented by concrete optimization 12 apply(State state, Formula formula, Node node); 13 14 boolean apply(State state, T node) { 15 // clone current formula to prevent modifications 16 Formula formula = state.getFormula().clone(); 17 18 // the current block must be traversed 19 Block block = state.getBasicBlockOfNode(node); 20 //i.e.,B == true 21 formula.add(logic.variable(BoolType.BOOL, block)); 22 return apply(state, formula, node); 23 } 24 25 boolean check(Optimization optimization, State state, Formula formula, T node) { 26 // get actually resolved expression for target node 27 Expr expression = state.load(node); 28 optimization.addOptimizationExpression(formula, expr); 29 try { 30 if (solver.isUnsatisfiable(formula)) { 31 // state is unsatisfiable -> optimization is valid 32 Node replacement = optimization.getReplacement(node); 33 34 // replaces target node in GraalIR 35 Utils.replaceInIR(node, replacement); 36 37 // also reflect optimization in formula(i.e. replace original expression) 38 optimization.onOptimizationSuccessful(state, replacement, expr); 39 40 return true; 41 } else { 42 // state is satisfiable -> optimization invalid 43 return false; 44 } 45 } catch (SolverTimeout e) { 46 // solver could not complete request -> cannot safely apply optimization 47 return false; 48 } 49 } 50 } Listing 4.4: Basic templates for optimizations CHAPTER 4. SYMBOLIC EXECUTION FOR COMPILER OPTIMIZATIONS 47

1 class LogicModule extends OptimizationModule { 2 Optimization IsAlwaysTrue = new Optimization() { 3 Node getReplacement(Node node) { 4 // GraalIR node for"true" constant 5 return LogicConstantNode.tautology(); 6 } 7 8 void addOptimizationExpression(Formula formula, Expr expr ){ 9 formula.add(logic.not(expr)); 10 } 11 12 void onOptimizationSuccessful(State state, Node replacement, Expr< BoolType> expr) { 13 state.cache(replacement, logic.TRUE); 14 } 15 } 16 17 Optimization IsAlwaysFalse = new Optimization() { 18 /* ... */ 19 } 20 21 boolean apply(State state, Formula formula, LogicNode node) { 22 // check both potential optimization cases 23 return check(IsAlwaysFalse, state, formula, node) || 24 check(IsAlwaysTrue, state, formula, node); 25 } 26 } Listing 4.5: Implementation of optimizations on boolean nodes and values

paragraphs. These optimizations resulted in the most findings in our evaluation and give a good overview of the kinds of checks we support in general.

Logic Optimizations While—by the JVM specification [150]—boolean values are en- coded as integers, conditions and checks are represented in Graal as LogicNodes. All boolean comparisons and checks are variations of this type, hence this was the first node type that we actually targeted. Listing 4.5 shows the required extensions to perform optimizations for this node type. First, this class denotes Optimizers for the different cases—Line 2 and Line 17. As the implementation shows, the only checks that we per- form here are for constant values; true or false. Therefore, the “always-true” optimizer asserts that the current expression is false by simply negating it (Line 9). If this case is unsatisfiable—i.e., the program state at this point always implies that the expression is true; there is no case where it is false—then the corresponding replacement (a Graal IR constant for true as in Line 5) is inserted. Additionally, we replace the cached value in Line 13 to reflect the change in the formulas as well. Similarly, the check whether the node always resolve to false is implemented. While these optimizers define the behavior of the checks, their application in Line 23 specifies the actual order in which those are attempted and also selects the corresponding nodes or inputs.

Arithmetic Optimizations GraalVM defines a variety of different nodes for arith- metic operations, ranging from simple integer computations (addition, subtraction), over 48 4.5. EXAMPLE: OPTIMIZING STRING.CHARAT

exact operations that prohibit arithmetic overflows to floating-point computations. Since we cover most of these specific implementations in our engine already and therefore are able to map them to logical constraints, our optimizations can leverage that fact by us- ing mostly abstract supertypes of these operations (e.g. defining optimizations for all implementations of AddNode, SubNode, etc.). Exemplified in Listing 4.6, we mostly define a variety of different optimizers for the different possible constant values or edge cases. In this case, this includes constant checks for 0, 1, −1, etc. for integer values as well as NaN (Not-a-Number), +0, −0, ∞, etc. for their floating-point counterparts. In Line 9 for example, we define the check for whether the result of an addition is always 0. The formula, therefore, is adapted to check if the case that the result is non-zero is possible. When successful, the optimizer inserts the corresponding Graal IR constant node (Line 5) and adapts the cache (Line 13). As depicted in Line 26, the integer variant performs all those checks in order and also attempts them on the individual operands of the node, to enable subsequent simplification in the canonicalization step that removes neutral opera- tions (such as addition with 0). Similarly, the floating-point variant (Line 34) adds checks for floating-point constants and special values. As with all floating-point operations, this first and foremost includes a NaN check, the special value in the floating-point stan- dard [15], typically resulting from 0/0 operations or propagated from other operations involving NaN. Additionally, we provide optimization templates for subtraction (SubNodes) and mul- tiplication (MulNode) as well as dedicated optimizations for integer division and remainder operations (IntegerDivRemNode) and floating-point division (FloatDivNode).

Bitwise Optimizations Our phase attempts bitwise optimizations on operations such as bitwise AND, OR, and XOR, in addition to shift operations (left-shift, arithmetic / logical right-shift), which are encapsulated in ShiftNodes. Therefore, we can check for both constant results and operands for all kinds of bitwise operations with minimal additional implementation effort.

Other Optimizations We also implemented several optimizations for nodes that are commonly found in low-tier Graal IR or nodes that may represent potentially expensive operations in machine code later on. This includes narrowing operations for converting values from wider data types to narrower ones (e.g. 64-bit long to 32-bit integer) and also pointer compression nodes, whose pointer values—despite our lack of actual memory management—are still checked for potentially constant values. We also defined optimiza- tions for Graal intrinsics. Those mainly cover mathematical operations, including cosine, power-off and square-root.

4.5 Example: Optimizing String.charAt

One of the first identified optimizations came from the analysis of the Java Class Li- brary method java.lang.String.charAt(int). This method returns the character at the defined location within the target string and is therefore frequently used in most applications. This section now gives a practical example of our optimizations based on this method and respectively its Graal IR in Figure 4.4. CHAPTER 4. SYMBOLIC EXECUTION FOR COMPILER OPTIMIZATIONS 49

1 class AddModule extends OptimizationModule { 2 Optimization IsZero = new Optimization() { 3 Node getReplacement(Node node) { 4 // GraalIR constant0 5 return ConstantNode.forIntegerKind(target.getStackKind(), 0); 6 } 7 8 void addOptimizationExpression(Formula formula, Expr expr) { 9 formula.state(logic.not(logic.eq(expr, logic.constant( BitVectorType.BV32, 0)))); 10 } 11 12 void onOptimizationSuccessful(State state, Node replacement, Expr< BitVectorType> expr) { 13 state.cache(replacement, logic.constant(BitVectorType.BV32, 0) ); 14 } 15 } 16 17 18 Optimization IsOne =/ * ... */; 19 /* ... */ 20 Optimization IsNaN =/ * ... */; 21 /* ... */ 22 23 boolean apply(State state, Formula formula, AddNode node) { 24 if (node.stamp() instanceOf IntegerStamp) { 25 // integer operation 26 return check(IsZero, state, formula, node) || 27 // apply check to all operands as well 28 check(IsZero, state, formula, node.x()) || 29 check(IsZero, state, formula, node.y()) || 30 check(IsOne, state, formula, node) || 31 /* ... */; 32 } else { 33 // floating point operation 34 return check(IsNaN, state, formula, node) || 35 /* ... */; 36 } 37 } 38 } Listing 4.6: Implementation of optimizations on binary addition 50 4.5. EXAMPLE: OPTIMIZING STRING.CHARAT

We removed some of the more Start i 0 value 1 2 detailed memory-related nodes (B0) < If 4 3 (e.g. address nodes before ev- true false ery memory access) that usually

Read#Length occur at this compilation tier. 5 (B1) < In terms of analysis, they would 6 If 7 only introduce additional sym- false true Control flow node bolic values without actual usage

(B2) If 9 |<| Control flow edge and—in terms of the example— 8 false true Data flow node just add needless complexity to Merge LoadIndexed Data flow edge the IR. Nodes that are important (B3) (B4) Basic block for the example are furthermore Exception Return numbered, to link them to their corresponding variable names in Figure 4.4: Graal IR of java.lang.String.charAt the example later on. The com- pilation phase now traverses this IR and uses the formula generator of SymJEx to construct logical constraints. Due to the order of the nodes, we first visit the corresponding inputs and constants of the first basic block, namely i and 0. As they are not yet attached to control flow, we store the generated expression IR in a node cache that maps Graal IR nodes to logical constraints. For i (node 1) we simply create a new symbolic variable and infer its type from the Graal IR node, resulting in a bit-vector variable with length 32—(declare-const i (_ BitVec 32)) when declared in SMT-LIB2. The constant (node 2) is also mapped to a symbolic expression, a 32-bit bit-vector constant ((_ bv0 32)). Then, the logical combinator “<” in node 3 combines those two expressions to a new boolean expression by accessing the cache for the corresponding input nodes: (bvslt i (_ bv0 32)). At this point, our opti- mizations are invoked for the first time, as the comparison is a LogicNode. Therefore, our implementation checks whether the condition is constant. The fundamental principle of our unbounded reasoning approach is, that we generate one single formula in the course of the whole traversal, saving as many solver calls (that generally represent overhead) as possible in the process. Therefore, we now actually push two variants of the formula to the solver. First, we add everything that is factual according to the Graal IR and there- fore a fixed part of the overall formula. Then, we add our speculative constraints to “try” the corresponding optimizations. Incremental solving allows us to add the second variant only temporarily by using different assertion contexts, where we first reserve a new slot via (push) and afterwards remove it again with (pop). Thus, the first check—whether the condition is always true—is encoded as in Listing 4.7. Additionally, this also contains the variable for the current basic block, but since we did not encounter any actual control flow yet, it is superfluous for this check. Naturally, in this case, both optimization checks fail, as the condition can evaluate to both true or false, depending on the value of i. This is why the optimization simply finishes without actually changing the IR or impacting the formula. Finally, the traversal procedure reaches control flow node “if 4” to generate the first actual control flow constraint for the formula. As B0 is the current basic block, the boolean variable B0 is used to imply the actual constraint. From the Graal IR node we can further infer that the corresponding successor blocks are B3—for the true case—and B1 if the condition evaluates to false. Combined with the assertion about the current block, this fact is encoded in a corresponding conditional: (=> B0 (ite (bvslt i (_ CHAPTER 4. SYMBOLIC EXECUTION FOR COMPILER OPTIMIZATIONS 51

1 (set-logic QF_BV) 2 (declare-const i (_ BitVec 32)) 3 (declare-const N3 Bool) 4 (declare-const B0 Bool) 5 (assert (= N3 (bvslt i (_ bv0 32)))) 6 (push) 7 (assert (not N3)) 8 (assert B0) 9 (check-sat) Listing 4.7: Encoding of the first check for String.charAt

1 (pop) 2 (declare-const B1 Bool) 3 (declare-const B3 Bool) 4 (assert (=> B0 (ite N4 (and B3 (not B1)) (and (not B3) B1)))) 5 (declare-const N5 Bool) 6 (declare-const N6 Bool) 7 (assert (=> B1 B0)) 8 (assert (=> B1 (bvslt (_ bv4294967295 32) N5))); -1< N5 (<=> N5 >= 0) 9 (assert (= N6 (bvslt i N5))) 10 (push) 11 (assert N6) 12 (assert B1) 13 (check-sat) Listing 4.8: Encoding of the false-check in basic block B1 of String.charAt

bv0 32)) (and B3 (not B1)) (and (not B3) B1))). This constraint now encodes the full branching behavior of the condition and since this is the last node of the block, the traversal process continues with basic block B1. Now, node 5 is the first node within the IR that accesses actual memory. Due to our restrictions and lacking memory model, we simply encode this node as a symbolic variable—(declare-const N5 (_ BitVec 32)). While we do not encode the link be- tween the array (the input parameter value) and its length, this still enables us to use the length in checks and computations. Arrays in Java must not have a length of less than zero, information that is also available for the given Graal node. Hence, we can add the additional boundary for the array length node (encoded in SMT-LIB as greater than −1). Next, another logic node (node 6—“<”) is visited subsequently cached, with an- other optimization attempt following immediately after. Listing 4.8 depicts the newly introduced constraints, starting with the aforementioned (pop) command to remove the previous speculative constraints. Also, the constraints containing the branching behavior as well as the link between the blocks (Line 7) are pushed to the solver. Once again, the current basic block has to be visited for the optimization check to be valid. The assertion that the condition is always true should—when falsified via the solver—indicate that optimization is possible. Nevertheless, again both branches are feasible; no modification takes place. In basic block B2 now we first identify optimization potential. Followed by the en- coding of the logic node 8 (representing an unsigned comparison), we perform another check to determine constant branches. For the true case, this results in the formula 52 4.5. EXAMPLE: OPTIMIZING STRING.CHARAT

1 (pop) 2 (declare-const B2 Bool) 3 (assert (=> B1 (ite N6 (and B2 (not B3)) (and (not B2) B3)))) 4 (declare-const N8 Bool) 5 (assert (=> B2 B1)) 6 (assert (= N8 (bvult i N5))) 7 (push) 8 (assert (not N8)) 9 (assert B2) 10 (check-sat) Listing 4.9: Encoding of the true-check in basic block B2 of String.charAt

presented in Listing 4.9. Once again, those parts of the formula not yet evaluated by the solver are added and the previously attempted checks are removed from the assertion stack. However, now the solver yields UNSAT at the check, as the condition at this point can never be false. This is due to the preceding conditions for the current control flow path, where we already ensure that i is not less than 0 (condition (not (bvslt i (_ bv0 32))), which is implied by the path from B0 to B1) and also that i is (signed) less than the array length (symbolic variable N5 in (bvslt i N5), implied by the path from B1 to B2). Therefore, the unsigned check of node 8 in basic block B2 is inevitably true. Thus, our implementation replaces the Graal IR node with a constant and also updates the cached value for N8—now being replaced with a constant: false.

Start i 0 value 1 2 (B0) < If 4 3 Start i 0 value true false 1 2 (B0) < Read#Length If 4 3 5 true false (B1) < 6 Read#Length If 7 5 (B1) < false true 6 If 7 false true Control flow node (B2) If 9 true|<| false true 8 9 (B2) If true Control flow edge false true 8 Data flow node Merge LoadIndexed Merge LoadIndexed Data flow edge (B3) (B4) (B3) (B4) Basic block Exception Return Exception Return

(a) Graal IR after optimization (b) Graal IR after canonicalization

Figure 4.5: Resulting Graal IR at the end of the optimization of String.charAt

Figure 4.5a presents the resulting Graal IR immediately after replacement. Since we do not yet propagate the changes to other surrounding nodes, only the old node is removed. While the phase would now continue to the end of the IR, we skip this part in this example, as there is no more potential for optimizations at this point. In the end, the canonicalizer is invoked on the changed nodes, which eventually simplifies the bulk of the IR, leading to the representation in Fig. 4.5b. Finally, the now redundant conditional as well as the data flow comparison and the whole of basic block B2 are removed, leading CHAPTER 4. SYMBOLIC EXECUTION FOR COMPILER OPTIMIZATIONS 53

to a simplified variant of the input Graal IR.

4.6 Limitations

Low-tier Graal IR Node Support As our implementation of symbolic-execution- based compiler optimizations fundamentally works on low-tier Graal IR, its operations and expressions are also tailored to the respective nodes appearing at that level. Via SymJEx, we also support a large subset of high-tier IR nodes. Other nodes however that do not fall under one of the supported node types or extend them—custom nodes in compiler extensions, platform-specific nodes without abstract super-types—currently cannot be modeled correctly, due to the unknown semantics. Our current fallback is to simply create unbounded symbolic variables in such cases, which prevents false-positives and incorrect optimizations, but also limits our constraints due to the loss of information.

Memory Operations While the engine itself does not offer a memory model for this level of Graal IR, we still have to ensure that nodes, which may imply memory effects, are not reduced or modified by mistake. Additionally, any reasoning beyond such operations has to be prohibited, as to prevent erroneous proofs of optimizations (false-positives). The example in Section 4.5 shows how we can nevertheless extract knowledge from memory accessing methods, but it still significantly limits larger methods and more intricate data flow.

Loops and Method Calls In contrast to SymJEx, we cannot enforce a more aggressive inlining policy to remove method invocations—in our case this depends solely on the preceding compilation phases. Furthermore, as the compiler always applies our phase distinctively on a single method’s Graal IR, we also cannot follow the control flow into the called method to analyze it and determine its effects on the caller. Method invocations, therefore, represent another analysis boundary where we have to introduce unbounded symbolic variables instead, to avoid premature optimizations. Loops represent a slightly different challenge, as they contradict our unbounded rea- soning (Section 4.1) approach. On the one hand, by following the dominator tree we inherently only visit a loop body once—hence, iterative summarization or repeated invo- cation as in SymJEx cannot be performed that way. Block-based constraints on the other hand also do not favor loops, as this would require additional “pseudo-blocks” (B3 for iteration 1, B3 for iteration 2, etc.) to denote individual iterations in our control flow for- mulas. Another limitation comes from our run-time constraints: Our goal is for traversal and optimization to scale as close to linear with the method size (in terms of IR). Ana- lyzing loops would therefore increase the complexity, with even more severe repercussions when loops are nested or have unbounded conditions (e.g. depending on a symbolic vari- able). The implementation, therefore, manages loops similar to method invocations, thus treating loop variables and subsequent control flow as analysis boundaries—i.e., a loop may be preceded by an arbitrary number of invocations of any loop branch and phi values or results of loops that are used in later computations are fully unbounded and symbolic. While this does limit our capabilities, it still allows us to determine optimizations within loops as evidenced by some of the examples in our evaluation. 54 4.6. LIMITATIONS 55

Chapter 5

Evaluation

As the main goal of this project was the identification of new optimizations as well as the subsequent performance improvements, our evaluation covers three critical aspects: (1) The detected optimization patterns in actual Graal IR with corresponding proofs, (2) a measurement of the performance impact, directly caused by those optimizations, (3) and finally the overhead of the compilation phase itself, including the time spent in-engine and during solving via theorem provers. While we also used benchmarking suites such as Dacapo [32] and ScalaDacapo [140] to ensure that compilation via our phase succeeds, we mainly used specific benchmarks within the SPECjvm2008 benchmark suite [142] to determine performance metrics. We made this decision based on feedback from researchers and compiler developers, to focus on those benchmarks that may be particularly susceptible to compiler optimizations. In addition to those main points, we also developed a test suite that allowed us to verify our optimizations and ensure that the resulting Graal IR is valid. We used Graal-specific compilation tests—namely JTT tests—which evaluate the fundamental transformations that the compiler performs on the Graal IR, to further verify our approach by selectively enabling our compilation phase during those runs. This approach was simplified due to our plug-and-play-style integration. All of the following benchmarks and measurements were taken on an Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz with 16GiB RAM.

5.1 Identified Optimization Patterns

While executing our proposed symbolic-execution-based optimizations on the different test and benchmark suites, we identified several optimization patterns. By using contin- uous integration, we compiled a large amount of data for hundreds of methods. In a first step, we focused on those optimization locations that could potentially be implemented as native compilation phases and whose correctness could also be verified by hand. There- fore, our data contains even more identified optimizations on different methods than presented in this part of the evaluation. The already showcased example in Section 4.5 actually denotes the first identified optimization. Additionally, the next paragraphs detail some of the other patterns that were not yet covered by a Graal compiler phase.

Redundant addition The first pattern concerns arithmetic additions, represented by AddNodes in Graal. The example in Fig. 5.1 showcases the snippet of Graal IR for which we could apply the optimization. In this snippet, the object field LinkedList.size— henceforth denoted x—is used for a simple decrement, represented in Graal IR as x+(−1). 56 5.1. IDENTIFIED OPTIMIZATION PATTERNS

The field value itself is unbounded. The preceding control flow however allows our engine to reason about its value symbolically. The first “if” condition already ensures that the value is not less than 1 for the current path—i.e., x ≥ 1. The second “if” now does an integer check. This operation essentially compares the bit-representations of two values and only yields true if no bit is set in both values. As the second operand, in this case, is a constant with value 2—its 32-bit representation therefore being 11111111 11111111 11111111 11111110—the only operand for which the integer test succeeds, would be its respective inverse, 1 (00000000 00000000 00000000 00000001 in 32-bit representation). Since the target AddNode is on the true-path of the corresponding con- ditional, this must be the case. The subsequent addition (decrement) therefore always results in the value 0.

Read#LinkedList.size 1 Read#LinkedList.size 1

< < If -2 If -2 true false true false

IntegerTest IntegerTest If If false true false true

... -1 ... + 0

Figure 5.1: Optimization of a constant arithmetic addition

Floating-point division Graal defines several constant folding operations for all kinds of arithmetic nodes. Floating-point operations however represent a challenge due to special values such as ∞, NaN, and signed zeroes. We identified a minor pattern for one such case that was not covered by Graal as of yet. The Graal IR in Fig. 5.2 contains the corresponding pattern—in this case directly taken from one of the Graal compiler tests. The IEEE-754 floating-point standard [15] specifies that invalid operations such as the depicted 0/0 result in a NaN, which our optimizations correctly determine. In an additional step, another optimization (logic-based) evaluates the now constant operators of the comparison and simplifies that as well.

0.0 0.0

0.0 / 0.0 NaN < < false

Conditional Conditional Conditional

Figure 5.2: Optimization of a floating-point division that is inevitably NaN CHAPTER 5. EVALUATION 57

Object equality As we mentioned before, we cannot reason about memory effects or the potential contents of certain memory locations. Nevertheless, we identified one optimization pattern actually involving an address comparison. Without any notion of whether the uncompressed pointer references a correct memory location, the visualization in Fig. 5.3 shows that we can still determine that the corresponding check for equality already occurred at a previous location in the IR, thus the result of the second comparison is inevitably constant.

Uncompress 1 Uncompress 1

If == If == true false true false ......

1

... == ... false

Figure 5.3: Optimization of an address comparison

Bitwise simplification The depiction in Fig. 5.4 showcases an optimization pattern based on a bitwise operation. Here, the bitwise OR depends on values from different branches. These are encoded as phi values, whose inputs denote the values depending on the originally taken branch. Despite that, the operands are simply swapped in either case. The solver can therefore prove that the result of the bitwise operation is always 1 and our phase simplifies the IR accordingly.

1 0 0 1 1 0 0 1

Merge phi phi Merge phi phi

| 1

Figure 5.4: Optimization of a bitwise computation, where the branching behavior merely swaps the operands

Bit shift optimization We also provide optimizations for bit shift operations of all kinds. This last pattern in Fig. 5.5 contains one such optimization, where once again an array-length access is encoded and used. An arithmetic right-shift1 of length 31 as in this example simply shifts the old sign-bit to the position of the least-significant bit (LSB) (e.g. from the left-most to the right-most position), while preserving the value sign. Therefore, the operation either results in −231 + 1 (i.e., only the sign-bit and the LSB being set) for values below zero or in 0 otherwise. Once again, Graal already infers that the array length must not be below zero, with the subsequent comparison excluding 0 itself as well. The

1while an arithmetic right-shift maintains the sign of the shifted value, a logical right-shift does not 58 5.2. SOLVER-TIME ANALYSIS value range of the left input for the shift operation therefore is [1, 231 − 1], which—even when reduced by one as in the IR—still is never less than zero. Thus, the solver ensures that the result of the shift operation can actually never be different from 0 at this point.

Read#[].length 0 Read#[].length 0

== == If If true false true false

-1

... + 31 ...

>> 0

Figure 5.5: Optimization of an actually constant shift operation

5.2 Solver-Time Analysis

We evaluated the performance of our phase on the scimark benchmarks of the SPECjvm2008 benchmark suite. All of these runs were performed with strict limitations on the maximum duration for each solver call—here at most 1000ms. We had to impose such a limit to make benchmarking possible in the first place, as the accumulated over- head in each method would otherwise slow down the compilation too much. While the optimization phase allows such customization, we did not put limits on the number of calls in general. Additionally, we ensured that compilation was completed before taking measurements as to not let any interleaving JIT compilation tasks interfere with the ex- ecution. To furthermore compensate for the expected solver overhead and thus extended compilation period, we also added additional iterations for execution with our symbolic compiler optimization phase. In the end, we used 200 iterations for the run without our phase and 400 iterations with integrated symbolic execution. First, we tried to get a measurement of the impact of our phase on the actual compi- lation time and how much of it is caused by the solver. As research as well as previous evaluation on SymJEx had shown, the satisfiability checks for large formulas often ac- count for a bulk of the run time in symbolic execution. We experienced similar results—as shown in Fig. 5.6—where the time spent in the solver (for optimization checks) contributes a large part of the overhead that is generally introduced by our phase. As shown, the formula builder from SymJEx (the engine) is only a minor part since we mostly create simple constraints from individual Graal IR nodes. Another significant part of the over- head, however, comes from the optimizations themselves. In this time span, we perform the individual optimization checks for a node, including mutating the state and formula, and also modify the IR if an optimization is found. While we cannot give more detailed insights about the main reasons for the slowdown in the solver, we can identify reasons for the same in our optimizations. As described in Chapter 4, we perform a variety of checks for each node and sometimes even for each input. For every one of those checks, we have to ensure that the original state and CHAPTER 5. EVALUATION 59

Optimization time Engine time Solver time SPECjvm2008 scimark benchmark suite phase time analysis

12000

10000

8000

6000

4000 Time [ms]

2000

0

scimark.ft.largescimark.lu.large scimark.lu.small scimark.sor.large scimark.ft.small scimark.sor.small scimark.sparse.large scimark.sparse.smallscimark.monte_carlo

Figure 5.6: SPECjvm2008 solver time analysis

formula are not tainted, hence we have to clone or copy multiple structures and create new expressions that define the target optimization for the current node. In future updates, we can probably reduce this overhead by performing even more in-engine simplifications to reduce the formula altogether and to manually select tho checks for each node with the most optimization potential.

5.3 Compile-Time Analysis

In addition to observing the phase timings in the performance benchmarks, we also captured the timings of several limited runs (on a reduced set of iterations). This should prevent one of the main problems of our approach, namely the overhead induced by the solving process. This directly impacts compilation time as it impedes individual compilation threads and therefore prevents or delays compilation of additional methods, hence decreasing run time performance.

Unlimited Solvertime Fig. 5.7 contains the results of our first evaluation, where we again used the SPECjvm2008 benchmark suite. While we performed this run on only a subset of the iterations compared to our performance-based evaluation, it nevertheless shows the impact that the solvers have on our optimization phase. In this run, we dras- tically increased the solver timeout to 10 minutes (compared to the 1000 milliseconds of our other runs) to showcase its effects and thus its necessity. While the overall captured solver times already increased in some of the benchmarks, we still identified several oc- currences where the corresponding solver ran into timeouts—displayed in Fig. 5.8—and consequently delayed compilation of a single method by over 10 minutes. It is impor- tant to note that those runs typically exceeded the corresponding benchmark run and 60 5.3. COMPILE-TIME ANALYSIS therefore were not captured by the timer or manually excluded to not skew the graphs. These timeouts usually occurred on larger methods, which in turn produce more complex formulas. The count of optimizations that were found on each benchmark is displayed in Fig. 5.9.

Optimization time Engine time Solver time 12000

10000

8000

6000

Time [ms] 4000

2000

0

scimark.ft.largescimark.lu.large scimark.lu.small scimark.sor.large scimark.ft.small scimark.sor.small scimark.sparse.large scimark.sparse.smallscimark.monte_carlo

Figure 5.7: SPECjvm2008 compile time analysis without early interrupt

10

8

6

4 Timeouts

2

0

scimark.ft.largescimark.lu.large scimark.lu.small scimark.sor.large scimark.ft.small scimark.sor.small scimark.sparse.large scimark.sparse.smallscimark.monte_carlo

Figure 5.8: SPECjvm2008 timeouts without early interrupt CHAPTER 5. EVALUATION 61

120

100

80

60

40 Optimizations

20

0

scimark.ft.largescimark.lu.large scimark.lu.small scimark.sor.large scimark.ft.small scimark.sor.small scimark.sparse.large scimark.sparse.smallscimark.monte_carlo

Figure 5.9: SPECjvm2008 optimizations without early interrupt

Early Interrupt After Timeout We duplicated the previous run with another engine mode, where we ensured that optimization attempts stop on the first solver timeout. Due to our approach of continuously expanding one single formula, timeouts in solver calls most likely also result in timeouts on consecutive optimization attempts. The results of this approach are showcased in Fig. 5.10, where we again see a spike in solver time despite the small workload. The corresponding timeouts are displayed in Fig. 5.11. As shown in Fig. 5.12, the number of optimizations remained relatively stable.

Minimal Timeout Finally, we also performed another benchmark run with a minimal (1000ms) solver timeout. The results thereof are displayed in Fig. 5.15. As shown, the solver times are drastically reduced compared to the runs with a higher timeout. While the number of timeouts has increased significantly (Fig. 5.15), the number of identified optimizations has still remained relatively stable (Fig. 5.14). 62 5.3. COMPILE-TIME ANALYSIS

Optimization time Engine time Solver time 25000

20000

15000

10000 Time [ms]

5000

0

scimark.ft.largescimark.lu.large scimark.lu.small scimark.sor.large scimark.ft.small scimark.sor.small scimark.sparse.large scimark.sparse.smallscimark.monte_carlo

Figure 5.10: SPECjvm2008 compile time analysis with early interrupt

5

4

3

2 Timeouts

1

0

scimark.ft.largescimark.lu.large scimark.lu.small scimark.sor.large scimark.ft.small scimark.sor.small scimark.sparse.large scimark.sparse.smallscimark.monte_carlo

Figure 5.11: SPECjvm2008 timeouts with early interrupt CHAPTER 5. EVALUATION 63

90

80

70

60

50

40

30 Optimizations

20

10

0

scimark.ft.largescimark.lu.large scimark.lu.small scimark.sor.large scimark.ft.small scimark.sor.small scimark.sparse.large scimark.sparse.smallscimark.monte_carlo

Figure 5.12: SPECjvm2008 optimizations with early interrupt

Optimization time Engine time Solver time 4500

4000

3500

3000

2500

2000

Time [ms] 1500

1000

500

0

scimark.ft.largescimark.lu.large scimark.lu.small scimark.sor.large scimark.ft.small scimark.sor.small scimark.sparse.large scimark.sparse.smallscimark.monte_carlo

Figure 5.13: SPECjvm2008 compile time analysis with early interrupt and minimal time- out 64 5.3. COMPILE-TIME ANALYSIS

50

40

30

20 Timeouts

10

0

scimark.ft.largescimark.lu.large scimark.lu.small scimark.sor.large scimark.ft.small scimark.sor.small scimark.sparse.large scimark.sparse.smallscimark.monte_carlo

Figure 5.14: SPECjvm2008 timeouts with early interrupt and minimal timeout

90

80

70

60

50

40

30 Optimizations

20

10

0

scimark.ft.largescimark.lu.large scimark.lu.small scimark.sor.large scimark.ft.small scimark.sor.small scimark.sparse.large scimark.sparse.smallscimark.monte_carlo

Figure 5.15: SPECjvm2008 optimizations with early interrupt and minimal timeout CHAPTER 5. EVALUATION 65

5.4 Performance Analysis

For the performance evaluation, we also analyzed the identified and applied optimizations. For our benchmark run on SPECjvm2008, those numbers are visualized in Fig. 5.16. It shows how—despite the minimal solver timeout—the phase is still able to identify numerous optimizations.

SPECjvm2008 scimark optimization counts

120

100 93 93 89 91 91 87 84 85 80 79

60

40

20 Successful optimizations 0

scimark.ft.largescimark.lu.large scimark.lu.small scimark.sor.large scimark.ft.small scimark.sor.small scimark.sparse.large scimark.sparse.smallscimark.monte_carlo

Figure 5.16: SPECjvm2008 applied optimizations

Finally, the actual performance comparison is depicted in the chart of Fig. 5.17. Un- fortunately, this shows that the optimizations that were identified and the corresponding performance gain (if any) still could not outweigh the introduced overhead, producing mixed or worse results for almost all runs. To get another estimate about the effectiveness of our optimizations, we also ran benchmarks on a slightly modified variant of the GraalVM. In this version, IfNodes would not perform canonicalization, thus resulting in less optimized conditions. We settled on this exact change as previous attempts showed that logic optimizations are most useful and therefore could potentially negate some of the now missing simplifications. The results are shown in Fig. 5.18. While the numbers are mostly in favor of our approach, we would still argue that the actual benefit, in this case, is negligible, especially when comparing the range of the scores with those of the results on the unmodified GraalVM.

5.4.1 Performance Microbenchmarks Due to the previous benchmarks yielding no satisfying results, we still aimed to eval- uate the gain for particular optimizations. Therefore, we designed a microbenchmark suite, consisting mostly of simple, individual method calls for which benchmarks showed actual optimization optimizations (that was also reproducible). Those were repeated a predefined number of times to force JIT compilation and their corresponding execution 66 5.4. PERFORMANCE ANALYSIS

Graal w/ Symbolic Compiler Optimizations Graal Core SPECjvm2008 scimark benchmark suite score

500 450 400 350 300 250 200 150

Score [ops/s] 100 50 0

scimark.ft.largescimark.lu.large scimark.lu.small scimark.sor.large scimark.ft.small scimark.sor.small scimark.sparse.large scimark.sparse.smallscimark.monte_carlo

Figure 5.17: SPECjvm2008 results

Graal w/ Symbolic Compiler Optimizations Graal Core SPECjvm2008 scimark benchmark suite score w/o If canonicalization

350

300

250

200

150

100 Score [ops/s] 50

0

scimark.ft.largescimark.lu.large scimark.lu.small scimark.sor.large scimark.ft.small scimark.sor.small scimark.sparse.large scimark.sparse.smallscimark.monte_carlo

Figure 5.18: SPECjvm2008 results without If canonicalization time was measured. These results now proved more fruitful, as visible in Fig. 5.19. The suite actually contains some of the methods from Section 5.1, notable the Java Class Library methods of java.lang.String as well as some optimizations that were identi- fied while compiling actual Graal IR nodes. While some of these benchmarks—indexOf, isPalindrome—show no performance benefits or even reductions, the others yielded per- CHAPTER 5. EVALUATION 67

formance improvements ranging from 1% to 15%.

Symbolic Execution Compiler Optimizations 16

14

12

10

8

6

4

Performance Increase [%] 2

0

-2

charAt indexOf forEach insertTwice writeString toUpperCase isPalindrome canonicalize

Figure 5.19: Microbenchmark results

5.5 Discussion of Results

Overall, our evaluation could not prove the general usefulness of our optimizations. For the given workload, however, the microbenchmarks showed that there is potential, which may have to be balanced with the introduced overhead when running the phase itself. Individual examples such as those presented in the examples before may be viable candi- dates for integration into the Graal compiler as further shown in our microbenchmarks. Yet, it would still require a more in-depth analysis to determine why existing optimiza- tions do not yet cover some of the cases (e.g. the redundant array length checks for String.charAt in Section 4.5), or why some of the redundant checks and operations are added to the Graal IR in the first place. The analysis of the imposed solver overhead in Fig. 5.6 also showed, how—despite our strict timeout of 1s—the accumulated solver calls still contribute a large amount of the total run-time overhead. Currently, this heavily impacts the JIT compilation process, thus severely affecting the performance of the executed benchmarks and tests. While we performed many runs with many different timeout values, experience showed that we had to limit it to a minimum to prevent expensive delays in compilation. This is emphasized by our compile-time analysis results. In general, evaluation of this project is rather tricky due to the lack of insight into the actual compilation process. Because of the multiple, interleaving compilation threads, we typically cannot measure results on-the-fly but have to capture the corresponding metrics, timers, and counters for later evaluation. The Graal compiler provides us with some utilities in that regard, but actual evaluation still often required various runs over 68 5.5. DISCUSSION OF RESULTS multiple days, due to the uncertainty that the solver overhead introduces to the JIT compilation process. The fact that there are several abstraction layers between incoming code and the SMT-LIB formulas passed to the solvers also exacerbates evaluation and debugging. This was most prominent when designing tests and microbenchmarks that verify certain optimizations, as we could never rely on source code being transformed by the compiler in the same way as in the benchmark that triggered the optimizations. As the profile (the contextual information, speculative optimizations, etc.) often plays an important part during compilation, we sometimes were just unable to reproduce certain patterns without having to manually construct Graal IR—again a rather infeasible task for larger methods or more complex patterns. 69

Chapter 6

Related Work

Our work essentially interacts with two primary topics, namely symbolic execution and compiler optimization. To bridge the gap between them, we distinguish between three areas of related work: Contemporary and influential symbolic execution engines both for the Java world and other languages, more general formal methods that are used in compilers—primarily for verification and certification—and also existing optimization techniques in the context of symbolic execution.

6.1 Symbolic Execution Engines

Nowadays, the field of symbolic execution is already well-populated, with numerous sym- bolic execution engines available for a variety of purposes and languages. Despite that, our approach of integrating an engine directly into the JIT compiler distinguishes SymJEx from many other solutions in the field. This includes our way of managing a sea-of-nodes IR and the different optimizations that we apply already before the actual analysis takes place. Nevertheless, continuous research in the area has produced numerous mature sym- bolic execution engines, some of which greatly influenced the development of this project.

Java PathFinder Derivatives Due to our common target language, the probably most closely related engines stem from the Java PathFinder (JPF) project [38]. JPF is a model checker for Java bytecode that offers a customized JVM implementation that provides custom implementations of Java Class Library functions. This enables JPF to emulate real program execution and verify the behavior of target applications against a given specification in the form of invariants. While JPF does not constitute a symbolic execution engine, there are extensions such as Symbolic PathFinder (SPF) [126] and JDart [110] to accomplish that. Additionally, SPF acts as the basis for Java Ranger [141], yet another symbolic execution engine, which extends its functionality with dynamic path- merging techniques by creating summaries of control flow regions to reduce the search space. JPF-based engines mostly use the framework to operate on Java bytecode and its pro- vided utilities for memory management and state propagation. SymJEx does not depend on a custom JVM implementation—rather a real production-ready compiler—but in turn lacks support for multithreaded systems. Another difference emerges from the way that SymJEx handles memory operations. While the model-checking aspect of JPF typically checks all variants for symbolic references and potentially aliasing structures, we encode 70 6.1. SYMBOLIC EXECUTION ENGINES variations directly in our memory model as complex constraints that are propagated as array operations for evaluation via solvers (cf. Section 3.2.4).

Standalone Java Engines The verification tool JayHorn [86] shares some similarities with SymJEx by also using an intermediate representation (in fact, multiple intermedi- ate representations) generated from Java bytecode where most of the optimization and transformation is accomplished. Despite that, JayHorn inherently is designed as a model checker, producing Horn clauses to encode verification conditions. Underneath, a horn engine is used to verify the produced clauses. While typically the engine has to resort to over-approximation (i.e., allowing false-positives) in order to verify heap data structures, it uses a counterexample validation step to check potential incorrect solutions. In con- trast to SymJEx, JayHorn also does not fully support Java features such as floating-point values, enums, and strings [87]. JBMC [55] is another Java model checker, developed as a port of CBMC for the C programming language. Unlike our engine, JBMC also uses quantified expressions to provide support for string operations (concatenation, replacement, etc.). It also provides a Java Operational Model to represent parts of the Java Class Library. On the one hand, this puts it at an advantage over SymJEx as we do not provide such simplifications or a model of this kind at the moment, while—on the other hand—it may also prevent optimizations as we perform them after method inlining.

Concolic Engines Concolic execution [139, 162, 152, 154] (symbolic + concrete execu- tion) is often used to reduce the search space and save solver invocations by additionally allowing concrete execution on individual program paths. Particular instances perform analysis on the main branch that is derived by a concrete execution to find critical or more frequent issues first. Additionally, engines use random testing to quickly determine whether certain paths are feasible instead of employing constraint solvers. For Java, we want to highlight two solutions that effectively use concolic execution. First, jCUTE [138] (and its counterpart for the C language—CUTE [139]) uses instru- mentation to add operations to Java code that perform the symbolic execution at run time. This allows symbolic and concrete execution simultaneously, where hard to solve constraints are replaced by their concrete counterparts. SymJEx currently lacks the ability to execute analyzed programs concretely. However, as we do not rely on direct instrumentation of—most likely—less-than-ideal source code by letting the Graal com- piler perform the bulk of the setup, preparation, and optimization, we depend less on the input format; thus we can avoid superfluous analysis or checks. In a similar manner, COASTAL [152] also performs concolic execution on instru- mented Java bytecode. Similarly, COASTAL uses concrete execution and random input generation to explore the search space faster. Additionally, they also allow accumulation of path conditions for concrete executions to create a path tree that summarizes the actual application behavior on the individual paths. COASTAL compares to SymJEx in a sim- ilar way as jCUTE. However, an evaluation in a separate, yet-to-be-published paper, has shown that we achieve a slightly higher score than COASTAL on the SV-COMP’20 [29] benchmark suite.

KLEE KLEE [43] is currently one of the most prominent symbolic execution engines. Originated from EXE [44], it uses the LLVM compiler infrastructure to apply optimiza- tions akin to SymJEx. Throughout the years, KLEE was extended to numerous different CHAPTER 6. RELATED WORK 71 use cases and continuously improved, including support for concolic execution [154], mul- tiple, parallel running solvers [124] as well as taint analysis [56]. Therefore, KLEE exceeds SymJEx in terms of capabilities, with native support for library calls and IO via built-in models at the system call level. Yet, with our focus on the GraalVM/Java environment, SymJEx—while not directly comparable—can be further improved by the insights gained from KLEE.

6.2 Formal Methods in Compilers

Due to their requirements for correctness and safety, compilers and their corresponding code transformations are a frequent target for formal methods. Early ideas by Boyle et al. [37] go as far as to design a fully verified compiler, consisting of several verifiably correct transformation steps, specified in denotational semantics of a generic language. A more recent approach by Lacey et al. [96] uses temporal logic (CTL) to prove compiler optimizations such as dead code elimination and loop invariant hoisting by applying a model checker to operational semantics of a simple research language. Temporal logic allows queries that are quantified over paths of states (e.g. “for all paths, n0 must hold at some point”) which makes it very suitable for this kind of task. The scarcity of mature solvers for these kinds of constraints, however, led us to choose quantifier-free SMT instead. While they showcase their theory by manually solving the corresponding proofs, they argue that an automated (or semi-automated) approach may be viable as well. Due to their relatedness to their research language, compiler intermediate repre- sentations would arguably be a well-suited target. Frederiksen [72] furthermore extends this approach to prove optimizations on redundant, repeated computations. More practical proposals for verification of compilers come from Mathews and Jha [112], who also use temporal logic in combination with a custom intermediate repre- sentation to verify compiler optimizations. Lopes et al. [109] designed a custom domain- specific language that allows implementation and integrated verification of LLVM com- piler optimizations, where counterexamples are generated for optimizations that do not hold. Additionally, Leroy [100] describes the use of a formal proof assistant to build and certify the compiler-backend for a C-like language. The corresponding frontend is verified in additional work by Blazy et al. [33]. Our approach differs from this work in the sense that verification of known compiler optimizations is not our goal, but we rather try to find new optimization patterns that exceed or complement existing ones.

6.3 Symbolic Execution for Optimization

We could identify some approaches that also use symbolic execution for optimizing pro- grams. However, the field remains scarcely populated, with little to no contemporary work. Specifically, to the best of our knowledge, there is no approach for integrating a symbolic execution engine into a compiler infrastructure for performing and identifying compiler optimizations. Nonetheless, we want to highlight some of the key ideas and projects in the research area. Rus and Wyk [135, 134] proposed a very early idea in this area, namely the integration of temporal logic into a Fortran compiler to identify new opportunities for compiler optimizations. This pioneering work already correctly handles a variety of data types and values, by first transforming the input into a process graph—its custom intermediate 72 6.3. SYMBOLIC EXECUTION FOR OPTIMIZATION representation. It then uses an extended form of Computation Tree Logic (CTL) to encode the corresponding branching conditions and states. Temporal logic enables their approach—in contrast to our implementation—to fully utilize quantification and therefore also allow more straight-forward handling of loops. Subsequently, a model checker is used to determine whether the input contains actual optimization potential. Despite this promising, early work, we could not determine any meaningful successors or contemporary work that ports or translates their approach to a modern (intermediate) language or compiler infrastructure. However, their work nevertheless is an important inspiration and may as well drive future extensions to our approach. We are currently heavily restricted to first-order logic due to our expression format and the supported solver, but the inclusion of temporal logic surely could represent an interesting extension—provided that there is actual solver support. Another closely related approach stems from S¨usl¨u[148], who integrated a symbolic execution engine into a JavaScript (symbolic) partial evaluator—JSSpe (also denoted SPEjs). This partial evaluator aims to minimize JavaScript code by eliminating re- dundant branches and simplifying corresponding expressions. JSSpe uses a JavaScript source-to-source transpiler to check branch conditions via Microsoft’s Z3 SMT solver [59] and subsequently remove dead code. It is presented as an improvement over Facebook’s partial evaluator, Prepack [70], development of which has since been put on hold. In contrast to those approaches, symbolic-execution-based compiler optimizations are not limited to branch conditions. As we base our optimizations on multiple Graal IR nodes, we also perform other simplifications (arithmetic, floating-point operations) by proving them via constraint solvers. JSSPe furthermore does not fully support complex control flow and is therefore probably most useful for semi-local optimizations, as it seemingly over-approximates data flow after merging branches—updates in two branches of condi- tionals are thus simply represented as symbolic variables [149]. Not only program sources are a viable target for symbolic optimizations. SYMPLE by Raychev et al. [130] is essentially a query optimization system for processing and transforming large datasets via extensive use of parallelization. Therefore, it evaluates the input queries symbolically, to derive summaries that allow for efficient distribution and subsequent parallel execution. Users can describe queries in a C-like language. In contrast to our approach, however, SYMPLE mainly translates individual queries in a restricted format that can be converted to a transformation function. Additionally, due to their target domain, they focus mostly on primitive data types, with basic support for structured data such as vectors and structs, which have to be defined manually before analysis. Hu et al. [83] present a slightly different idea, where they use symbolic execution to identify minimal conditions for C/C++ preprocessor directives. Such directives can enable or disable platform-specific source code regions or redefine values—a technique frequently used to enable conditional compilation. In their work, they symbolically an- alyze the dependencies and values of such preprocessor directives to determine which conditions have to apply in order to compile specific sections of code. The inverse idea—(compiler) optimizations for symbolic execution—is significantly more populated due to the everlasting requirement of making symbolic execution faster and better scalable: Dong et al. [63] evaluates different compiler optimizations on KLEE to determine their usefulness in terms of test coverage. Similarly, Wagner et al. [153] argue that compilers should only apply optimizations with care when symbolic execution or program analysis of the produced results is the target. They propose an experimental CHAPTER 6. RELATED WORK 73 compilation mode that specifically only applies optimizations that do not impact the performance and complexity of subsequent analysis negatively, with little to no regard to actual run-time performance on concrete execution. Due to our way of integrating our symbolic execution phase into the Graal compilation pipeline and thus limited in- fluence on the preceding optimizations, this idea is actually more closely related to our implementation of SymJEx, where we also apply a selected number of optimizations and modify the Graal IR to simplify the analysis step. In addition to its application in JSSpe, partial evaluation is also proposed in this area: Bubel et al. [40] showcase how the optimizing capabilities of a partial evaluator can effectively boost the performance of a symbolic execution engine that is executed concurrently in the target program. While their system is based on a subset of Java, it nevertheless emphasizes the potential of using mature optimizations to prevent redundant solver invocations and perform much of the evaluation already in-engine. 74 6.3. SYMBOLIC EXECUTION FOR OPTIMIZATION 75

Chapter 7

Future Work

SymJEx is still an engine in active development with a variety of options for improve- ment. As the developed compilation phase largely reuses parts of the symbolic execu- tion engine, most updates therefore also directly impact our implementation. The most promising candidates here probably are improvements in our expression framework, in- cluding additional in-engine optimizations (cf. Section 3.2.3) as well as advancements in our solver interaction layer to decrease the corresponding overhead and add further solver backends, specialized for our tasks. However, also in other aspects, we still see potential for extensions and improvements, ranging from (better) memory handling to alterations in our traversal method.

7.1 Implementation as Native Compilation Phases

A natural next step of this project would be the implementation of identified optimizations as native Graal optimization phases. The evaluation has shown that there is indeed potential for additional optimizations, for which we intend to provide implementations that could avoid the solver overhead imposed by symbolic execution. Not only are such phases much easier to integrate into the compiler, but we would also gain a more practical way to conduct evaluation and experiments.

7.2 Extension of Capabilities

SymJEx Extensions SymJEx by now supports most functionality of contemporary symbolic execution engines. Perpetual new developments in the field, however, render continuous improvements invaluable. This includes incorporation of new theories and approaches such as lazy initialization [89], concolic execution [74] or approximation tech- niques such as under-constrained execution [68] and counterexample-guided abstraction refinement [49] that reduce the state space of symbolic execution problems or enable better handling of loops in engines. Since most of those approaches also require modifi- cations in our constraint generation procedure, our compilation phase would be a direct beneficiary.

Low-level Memory Modelling The lacking memory model for low-tier Graal IR severely restricts our reasoning capabilities. While the evaluation has shown that we can still find optimization despite this limitation, a functioning memory model as well 76 7.3. ALTERNATIVE TRAVERSAL METHODS

as proper integration into the unbounded reasoning approach could potentially produce better results. Memory access also represents one category of operations where symbolic execution in our opinion could infer the most information compared to traditional com- piler optimizations, provided a global memory model is available to correctly emulate the program behavior.

Functional Extensions Our work includes the implementation of several optimization templates that should tackle some of the most frequent nodes and operations that can occur in Graal IR. Nevertheless, further extensions in that regard are still possible. No- table targets for such improvements include Graal intrinsics for native (math) methods (in addition to those already implemented) as well as simple memory operations with potentially constant results.

7.3 Alternative Traversal Methods

Unbounded reasoning relies on the dominator relation of basic blocks for traversal. We nevertheless considered alternative approaches that may allow faster analysis for individ- ual target nodes, with additional overhead when extended for whole programs. One idea concerns backward symbolic execution, where traversal starts from a target expression (or a leaf) and iterates over the path to this point in reverse [82, 36]. As this approach is also used successfully for traditional symbolic execution to generate test cases for indi- vidual target branches [62, 47] or to perform weakest precondition computation [46], we think that we may use a similar strategy to identify a specific set of optimizations faster. The drawback in our case would be the added complexity due to necessary changes in our traversal method as well as handling of path merges (in the reverse notation the con- ditions represent the actual merges). If this approach is extended to iteratively check all nodes with optimization potential, we furthermore have to find a solution to merge the different paths that originate in the individual optimization targets, in order to reduce complexity and make it scalable for larger IRs.

7.4 Performance Optimization

Online Formula Simplifications While our expression and formula framework al- ready includes many optimizations, we still see potential for more reductions that take the global context into account. Constraint contextual rewriting [21] would be one poten- tial candidate to infer knowledge from constraints and apply it to optimize our formulas for particular checks or branches. Even on a more restricted level, this could help us in reducing solver calls, thus improving the overall performance of the phase. We could identify that this is indeed a real potential, due to observations during evaluation showing how many formulas that occur still exhibit behavior that could be simplified in-engine.

Solver Interface Improvements As discussed in Section 3.2.5, our solver interface currently communicates with solver backends either via a textual interface to a separate solver process, or—specifically for Z3—via proprietary Java bindings. We designed the “raw” interface with an emphasis on performance, but evaluation nevertheless showed that direct integration via third-party API still yields a significant performance boost. CHAPTER 7. FUTURE WORK 77

Z3 is currently the primary solver for our implementation of symbolic-execution-based compiler optimizations, due to its wide support of different theories and competitive per- formance. The integration into the Graal compiler, however, complicates the integration of the aforementioned proprietary bindings, as we need to register the corresponding libraries already when building the compiler phase—i.e. when compiling the compiler. This limits the performance of our solver connection as we have to resort to the raw, textual interface. Our solver layer therefore also leaves room for improvement, with the added potential for integrating other solvers as well, if we can determine a measurable impact.

7.5 Toolkit Improvements

In terms of debugging and evaluation, the generated (SMT-LIB) formulas represent a significant barrier of entry for newcomers due to the mismatch to the input language and operations. The different abstraction layers and intermediate representations in-between further impair any form of analysis. Moreover, the resulting formulas—especially in unbounded reasoning—also tend to be significantly larger than their source language counterparts. To combat those challenges, visualization of formulas or a corresponding mapping to its source language or IR would be beneficial. GraalVM’s Idealgraphvisual- izer [158] already accomplishes a related task for Graal IR, hence an extension or similar tool would improve the understandability and debugging capabilities of SymJEx signifi- cantly. 78 7.5. TOOLKIT IMPROVEMENTS 79

Chapter 8

Conclusion

All of our work originated in the idea of combining the polyglot nature of GraalVM with the expressiveness and reasoning capabilities of symbolic execution. While our cur- rent implementation of SymJEx still cannot quite utilize all of GraalVM’s tools and frameworks—with Truffle support still being a work-in-progress—our work nevertheless highlights the potential of symbolic execution in an environment such as the Graal com- piler. Symbolic execution in general is a trend on the rise, as numerous sophisticated solutions appeared in the last years that still enjoy support, community-driven extension, and excessive usage for a variety of verification-, test- or coverage-related problems. With KLEE [43] being one of the cornerstones for symbolic execution on LLVM, Java/Sym- bolic PathFinder [126] plus similar derivatives for Java bytecode / JVM languages and projects such as SAGE for assembly code, the field is already well-populated and contin- ued success, in addition to steady improvements in the research area, further emphasize the importance of the concept. All of this makes any attempt to also contribute to the field and bring forward new engines or ideas from scratch very challenging. Nonethe- less, by combining our approach with a mature compiler infrastructure and adopting new developments regarding symbolic analysis and execution, we could quickly produce promising results and continuously improve the engine from there. The second part of our work—and in fact the primary subject of this thesis—concerns the idea of not only using the compiler to aid the analysis but to also use our analysis framework in the actual JIT compiler. Generating constraints for Graal IR at compile time in a dedicated compilation phase allows us to use theorem provers to identify poten- tial compiler optimizations. While SymJEx provides the basis for our approach—as we reuse its expression framework, the process of generating constraints from Graal IR and also the interface to the constraint solvers—we nevertheless had to adapt it to make our approach viable. Our compilation phase still very much uses symbolic execution, but our reasoning approach differs significantly from SymJEx. This is due to our requirement of proving optimizations that are valid for all program paths, in contrast to SymJEx, where we search for errors that are reachable at all. This disconnect was overcome by modulariz- ing SymJEx and introducing unbounded reasoning, an approach that constructs a holistic formula of all Graal IR nodes covering all program states in a single graph traversal. We accomplish this by also encoding the control flow in the formula itself, therefore giving power to the solver to determine a valid execution sequence. This technique speeds up our traversal with the added cost of generating more and more complex constraints. In the course of this project, we implemented several optimizations that check indi- vidual operations such as comparisons, integer and floating-point arithmetic, and bitwise 80 operations for optimization potential by invoking a proprietary theorem prover on an ac- cumulated formula. These were evaluated on a number of test and benchmark suites, in the course of which we identified several actual optimizations in Graal IR. Subsequently, we measured their impact on run-time performance via additional benchmark runs as well as separately crafted microbenchmarks. Unfortunately, the evaluation did not pro- duce the expected successes but rather proved that the Graal IR—while sometimes still not quite reduced to a minimum—is already significantly optimized. Additionally, we learned the hard way about the overhead of using theorem provers, sometimes taking up more than half of the total phase execution time. Nonetheless, we think that the compilation phase provides a well-suited base for further extensions and optimizations. Also, in our opinion more extensive benchmarking could show further potential or give us more insight about the best parameter settings, solver timeouts, and boundaries. While working on our engine and the compilation phase, we gained significant insight into the development of the Graal compiler and the many different components. Those include the various types of nodes in the intermediate representation, the compilation pipeline with its variety of optimizations and transformations, or AOT compilation and points-to analysis via SubstrateVM. As SMT-LIB was our format of choice regarding constraint generation and solving, we furthermore learned to use the language efficiently; studying the different logics and theories and memorizing the semantic differences com- pared to the familiarity of Java and similar languages. It furthermore presented a chal- lenge to map from object-orientated, memory- and side-effect-ridden Java code—albeit in the form of a structured graph—to this static, pure and complete format without any notion of memory, references, or complex structures. Overall, we can summarize that it is indeed an interesting idea to combine formal methods and verification techniques with concepts such as JIT compilation, compiler IR and program optimizations. While our work could not directly contribute to improve- ments of the GraalVM as of yet, we hope that we opened up the possibilities for future extensions of the engine as well as the compilation phase and showcased how verification methods of this kind can be combined with a state-of-the-art compiler. 81

List of Figures

2.1 Overview of the Java Virtual Machine...... 8 2.2 The architecture of the Graal compiler, its interaction with the Java HotSpot VM and the added symbolic execution optimization phase...... 10 2.3 Annotated Graal IR of a simple method ...... 12

3.1 isPalindrome symbolic execution tree ...... 17 3.2 insertTwice symbolic execution tree ...... 18 3.3 Overview of automated program analysis tools [53] ...... 19 3.4 Overview of SymJEx, the symbolic execution engine for GraalVM . . . . 25 3.5 Class structure of symbolic states within SymJEx ...... 27 3.6 Class diagram of Graal IR transformations in SymJEx ...... 28 3.7 Transformation from Java source code and Graal IR into expressions and constraints ...... 29 3.8 Distinct references—no aliasing ...... 32 3.9 One reference is null ...... 32 3.10 All references are null ...... 32 3.11 Aliasing between references ...... 32 3.12 Potential memory settings when combining two stacks ...... 32

4.1 Graal IR merge sample ...... 38 4.2 Path-based vs. dominator-based traversal ...... 39 4.3 Schematic overview of our implementation of symbolic-execution-based compiler optimizations ...... 42 4.4 Graal IR of java.lang.String.charAt ...... 50 4.5 Resulting Graal IR at the end of the optimization of String.charAt ... 52

5.1 Optimization of a constant arithmetic addition ...... 56 5.2 Optimization of a floating-point division that is inevitably NaN ..... 56 5.3 Optimization of an address comparison ...... 57 5.4 Optimization of a bitwise computation, where the branching behavior merely swaps the operands ...... 57 5.5 Optimization of an actually constant shift operation ...... 58 5.6 SPECjvm2008 solver time analysis ...... 59 5.7 SPECjvm2008 compile time analysis without early interrupt ...... 60 5.8 SPECjvm2008 timeouts without early interrupt ...... 60 5.9 SPECjvm2008 optimizations without early interrupt ...... 61 5.10 SPECjvm2008 compile time analysis with early interrupt ...... 62 5.11 SPECjvm2008 timeouts with early interrupt ...... 62 5.12 SPECjvm2008 optimizations with early interrupt ...... 63 82 LIST OF FIGURES

5.13 SPECjvm2008 compile time analysis with early interrupt and minimal timeout ...... 63 5.14 SPECjvm2008 timeouts with early interrupt and minimal timeout . . . . 64 5.15 SPECjvm2008 optimizations with early interrupt and minimal timeout . 64 5.16 SPECjvm2008 applied optimizations ...... 65 5.17 SPECjvm2008 results ...... 66 5.18 SPECjvm2008 results without If canonicalization ...... 66 5.19 Microbenchmark results ...... 67 83

Listings

2.1 Method for safe array access ...... 11 3.1 Java source code of isPalindrome ...... 16 3.2 Java source code of insertTwice ...... 18 3.3 Java representation of a logic bomb from [160] ...... 22 3.4 SMT-LIB2 formula to verify the assertion ...... 22 3.5 Generate model/counterexample that violates the assertion ...... 23 3.6 SymJEx engine interpreter loop ...... 25 3.7 Source code to expression transformation: Java source code ...... 29 3.8 Source code to expression transformation: Expression DSL ...... 29 3.9 Source code to expression transformation: SMT-LIB2 formula ...... 29 3.10 Simple definition of a Stack type and an example call ...... 31 3.11 SMT-LIB2 formula for analysis of the error branch in isPalindrome ... 34 3.12 SMT-LIB2 formula for analysis of both errors in insertTwice ...... 35 4.1 Path-based reasoning over String.charAt ...... 40 4.2 Unbounded reasoning over String.charAt ...... 40 4.3 Pseudocode of our symbolic-execution-based compilation phase ...... 42 4.4 Basic templates for optimizations ...... 46 4.5 Implementation of optimizations on boolean nodes and values ...... 47 4.6 Implementation of optimizations on binary addition ...... 49 4.7 Encoding of the first check for String.charAt ...... 51 4.8 Encoding of the false-check in basic block B1 of String.charAt ..... 51 4.9 Encoding of the true-check in basic block B2 of String.charAt ..... 52 84 LISTINGS 85

Bibliography

[1] “The Apache Groovy Programming Language,” http://www.groovy-lang.org/.

[2] “The Clojure Programming Language,” https://clojure.org/.

[3] “CVC4 suffered a segfault: Offending address is 0x30 · Issue #3601 · CVC4/CVC4,” https://github.com/CVC4/CVC4/issues/3601.

[4] “Error in bitvector reasoning with QF BV · Issue #2856 · Z3Prover/z3,” https://github.com/Z3Prover/z3/issues/2856.

[5] “JDK 10,” https://openjdk.java.net/projects/jdk/10/.

[6] “The Kotlin Programming Language,” https://kotlinlang.org/.

[7] “Memory leak (inifinite loop?) · Issue #3602 · CVC4/CVC4,” https://github.com/CVC4/CVC4/issues/3602.

[8] “The Ruby Programming Language on the JVM,” https://www.jruby.org/.

[9] “The Scala Programming Language,” https://www.scala-lang.org/.

[10] “SMT-LIB The Satisfiability Modulo Theories Library,” http://smtlib.cs.uiowa.edu/theories-FixedSizeBitVectors.shtml.

[11] “SMT-LIB The Satisfiability Modulo Theories Library,” http://smtlib.cs.uiowa.edu/theories-ArraysEx.shtml.

[12] “SMT-LIB The Satisfiability Modulo Theories Library,” http://smtlib.cs.uiowa.edu/theories-Reals Ints.shtml.

[13] “Unexpected model for ALL logic · Issue #2857 · Z3Prover/z3,” https://github.com/Z3Prover/z3/issues/2857.

[14] “What is Jython?” https://www.jython.org/.

[15] “IEEE Standard for Floating-Point Arithmetic,” Institute of Electrical and Elec- tronics Engineers, Tech. Rep., Jul. 2019.

[16] “Graalvm/graaljs,” Oracle, Sep. 2020.

[17] “Graalvm/graalpython,” Oracle, Sep. 2020.

[18] “Oracle/truffleruby,” Oracle, Sep. 2020. 86 BIBLIOGRAPHY

[19] B. Algaze, “Software is Increasingly Complex. That Can Be Danger- ous. - ExtremeTech,” https://www.extremetech.com/computing/259977-software- increasingly-complex-thats-dangerous, Dec. 2017.

[20] W. Araujo, L. C. Briand, and Y. Labiche, “Enabling the runtime assertion checking of concurrent contracts for the Java modeling language,” in 2011 33rd International Conference on Software Engineering (ICSE), May 2011, pp. 786–795.

[21] A. Armando and S. Ranise, “Constraint contextual rewriting,” Journal of Symbolic Computation, vol. 36, no. 1, pp. 193–216, Jul. 2003.

[22] D. F. Bacon, S. L. Graham, and O. J. Sharp, “Compiler transformations for high- performance computing,” ACM Computing Surveys, vol. 26, no. 4, pp. 345–420, Dec. 1994.

[23] R. Baldoni, E. Coppa, D. C. D’Elia, and C. Demetrescu, “Assisting Malware Anal- ysis with Symbolic Execution: A Case Study,” in Cyber Security Cryptography and Machine Learning, ser. Lecture Notes in Computer Science, S. Dolev and S. Lodha, Eds. Cham: Springer International Publishing, 2017, pp. 171–188.

[24] R. Baldoni, E. Coppa, D. C. D’elia, C. Demetrescu, and I. Finocchi, “A Survey of Symbolic Execution Techniques,” ACM Computing Surveys, vol. 51, no. 3, pp. 1–39, Jul. 2018.

[25] C. Barrett, A. Stump, and C. Tinelli, “The SMT-LIB standard: Version 2.0,” in Proceedings of the 8th International Workshop on Satisfiability modulo Theories (Edinburgh, UK), A. Gupta and D. Kroening, Eds., 2010.

[26] C. Barrett, D. Dill, and J. Levitt, “A decision procedure for bit-vector arith- metic,” in Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175), Jun. 1998, pp. 522–527.

[27] O. Bazhaniuk, J. Loucaides, L. Rosenbaum, M. R. Tuttle, and V. Zimmer, “Sym- bolic execution for BIOS security,” in Proceedings of the 9th USENIX Conference on Offensive Technologies, ser. WOOT’15. USA: USENIX Association, Aug. 2015, p. 8.

[28] T. Bergan, D. Grossman, and L. Ceze, “Symbolic execution of multithreaded pro- grams from arbitrary program contexts,” ACM SIGPLAN Notices, vol. 49, no. 10, pp. 491–506, Oct. 2014.

[29] D. Beyer, “Advances in Automatic Software Verification: SV-COMP 2020,” in Tools and Algorithms for the Construction and Analysis of Systems, ser. Lecture Notes in Computer Science, A. Biere and D. Parker, Eds. Cham: Springer International Publishing, 2020, pp. 347–367.

[30] A. Biere, A. Cimatti, E. M. Clarke, O. Strichman, and Y. Zhu, “Bounded Model Checking,” Advances in , vol. 58, p. 138652 Bytes, 2018.

[31] N. Bjørner, V. Ganesh, R. Michel, and M. Veanes, “An SMT-LIB Format for Se- quences and Regular Expressions,” in SMT workshop, p. 10, Jan. 2012. BIBLIOGRAPHY 87

[32] S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosk- ing, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovi´c,T. VanDrunen, D. von Dincklage, and B. Wiedermann, “The DaCapo benchmarks: Java bench- marking development and analysis,” in Proceedings of the 21st Annual ACM SIG- PLAN Conference on Object-Oriented Programming Systems, Languages, and Ap- plications, ser. OOPSLA ’06. New York, NY, USA: Association for Computing Machinery, Oct. 2006, pp. 169–190.

[33] S. Blazy, Z. Dargaye, and X. Leroy, “Formal Verification of a C Compiler Front- End,” in FM 2006: Formal Methods, ser. Lecture Notes in Computer Science, J. Misra, T. Nipkow, and E. Sekerinski, Eds. Berlin, Heidelberg: Springer, 2006, pp. 460–475.

[34] S. B¨ohme and M. Moskal, “Heaps and Data Structures: A Challenge for Auto- mated Provers,” in Automated Deduction – CADE-23, N. Bjørner and V. Sofronie- Stokkermans, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, vol. 6803, pp. 177–191.

[35] E. B¨orger,E. Gr¨adel, and Y. Gurevich, The Classical Decision Problem. Springer Science & Business Media, 2001.

[36] R. S. Boyer, B. Elspas, and K. N. Levitt, “SELECT—a formal system for testing and debugging programs by symbolic execution,” in Proceedings of the Interna- tional Conference on Reliable Software. Los Angeles, California: Association for Computing Machinery, Apr. 1975, pp. 234–245.

[37] J. Boyle, R. Resler, and V. Winter, “Do you trust your compiler? Applying for- mal methods to constructing high-assurance compilers,” in Proceedings 1997 High- Assurance Engineering Workshop, Aug. 1997, pp. 14–24.

[38] G. Brat, K. Havelund, S. Park, and W. Visser, “Java PathFinder - Second Genera- tion of a Java Model Checker,” in In Proceedings of the Workshop on Advances in Verification, 2000.

[39] R. Brummayer and A. Biere, “Boolector: An Efficient SMT Solver for Bit-Vectors and Arrays,” in Tools and Algorithms for the Construction and Analysis of Systems, ser. Lecture Notes in Computer Science, S. Kowalewski and A. Philippou, Eds. Springer Berlin Heidelberg, 2009, pp. 174–177.

[40] R. Bubel, R. H¨ahnle,and R. Ji, “Interleaving Symbolic Execution and Partial Evaluation,” in Formal Methods for Components and Objects. Springer, Berlin, Heidelberg, Nov. 2009, pp. 125–146.

[41] S. Bucur, V. Ureche, C. Zamfir, and G. Candea, “Parallel symbolic execution for automated real-world software testing,” in Proceedings of the Sixth Conference on Computer Systems, ser. EuroSys ’11. Salzburg, Austria: Association for Comput- ing Machinery, Apr. 2011, pp. 183–198.

[42] R. M. Burstall, “Some Techniques for Proving Correctness of Programs which Alter Data Structures,” Machine intelligence, vol. 7, no. 23-50, p. 3, 1972. 88 BIBLIOGRAPHY

[43] C. Cadar, D. Dunbar, and D. R. Engler, “KLEE: Unassisted and Automatic Gen- eration of High-Coverage Tests for Complex Systems Programs,” in OSDI, 2008.

[44] C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler, “EXE: Automatically Generating Inputs of Death,” ACM Transactions on Information and System Security, vol. 12, no. 2, pp. 10:1–10:38, Dec. 2008.

[45] C. Cadar and K. Sen, “Symbolic execution for software testing: Three decades later,” Communications of the ACM, vol. 56, no. 2, p. 82, Feb. 2013.

[46] S. Chandra, S. J. Fink, and M. Sridharan, “Snugglebug: A powerful approach to weakest preconditions,” in Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’09. New York, NY, USA: Association for Computing Machinery, Jun. 2009, pp. 363–374.

[47] F. Charreteur and A. Gotlieb, “Constraint-Based Test Input Generation for Java Bytecode,” in 2010 IEEE 21st International Symposium on Software Reliability Engineering. San Jose, CA, USA: IEEE, Nov. 2010, pp. 131–140.

[48] L. Ciortea, C. Zamfir, S. Bucur, V. Chipounov, and G. Candea, “Cloud9: A soft- ware testing service,” ACM SIGOPS Operating Systems Review, vol. 43, no. 4, pp. 5–10, Jan. 2010.

[49] E. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith, “Counterexample-Guided Abstraction Refinement,” in Computer Aided Verification, ser. Lecture Notes in Computer Science, E. A. Emerson and A. P. Sistla, Eds. Berlin, Heidelberg: Springer, 2000, pp. 154–169.

[50] L. A. Clarke, “A program testing system,” in Proceedings of the 1976 Annual Con- ference, ser. ACM ’76. Houston, Texas, USA: Association for Computing Machin- ery, Oct. 1976, pp. 488–491.

[51] C. Click, “Global Code Motion/Global Value Numbering,” in Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Imple- mentation, ser. PLDI ’95. New York, NY, USA: ACM, 1995, pp. 246–257.

[52] E. Cohen, M. Moskal, S. Tobies, and W. Schulte, “A Precise Yet Efficient Memory Model For C,” Electronic Notes in Theoretical Computer Science, vol. 254, pp. 85–103, Oct. 2009.

[53] J. Cohen, “Contemporary Automatic Program Analysis,” Las Vegas, 2014.

[54] S. A. Cook, “The complexity of theorem-proving procedures,” in Proceedings of the Third Annual ACM Symposium on Theory of Computing, ser. STOC ’71. Shaker Heights, Ohio, USA: Association for Computing Machinery, May 1971, pp. 151–158.

[55] L. Cordeiro, P. Kesseli, D. Kroening, P. Schrammel, and M. Trtik, “JBMC: A Bounded Model Checking Tool for Verifying Java Bytecode,” in Computer Aided Verification, H. Chockler and G. Weissenbacher, Eds. Cham: Springer Interna- tional Publishing, 2018, vol. 10981, pp. 183–190. BIBLIOGRAPHY 89

[56] R. Corin and F. A. Manzano, “Taint Analysis of Security Code in the KLEE Sym- bolic Execution Engine,” in Information and Communications Security, ser. Lecture Notes in Computer Science, T. W. Chim and T. H. Yuen, Eds. Berlin, Heidelberg: Springer, 2012, pp. 264–275.

[57] L. D. Couto, P. W. V. Tran-Jørgensen, R. S. Nilsson, and P. G. Larsen, “En- abling continuous integration in a formal methods setting,” International Journal on Software Tools for Technology Transfer, Oct. 2019.

[58] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, “Efficiently computing static single assignment form and the control dependence graph,” ACM Transactions on Programming Languages and Systems, vol. 13, no. 4, pp. 451–490, Oct. 1991.

[59] L. de Moura and N. Bjørner, “Z3: An Efficient SMT Solver,” in Tools and Algo- rithms for the Construction and Analysis of Systems, ser. Lecture Notes in Com- puter Science, C. R. Ramakrishnan and J. Rehof, Eds. Springer Berlin Heidelberg, 2008, pp. 337–340.

[60] L. De Moura and N. Bjørner, “Satisfiability modulo theories: Introduction and applications,” Communications of the ACM, vol. 54, no. 9, pp. 69–77, Sep. 2011.

[61] X. Deng, J. Lee, and Robby, “Efficient and formal generalized symbolic execution,” Automated Software Engineering, vol. 19, no. 3, pp. 233–301, Sep. 2012.

[62] P. Dinges and G. Agha, “Targeted test input generation using symbolic-concrete backward execution,” in Proceedings of the 29th ACM/IEEE International Con- ference on Automated Software Engineering, ser. ASE ’14. New York, NY, USA: Association for Computing Machinery, Sep. 2014, pp. 31–36.

[63] S. Dong, O. Olivo, L. Zhang, and S. Khurshid, “Studying the influence of standard compiler optimizations on symbolic execution,” in 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE), Nov. 2015, pp. 205–215.

[64] G. Duboscq, L. Stadler, T. Wuerthinger, D. Simon, C. Wimmer, and H. M¨ossenb¨ock, “Graal IR: An Extensible Declarative Intermediate Representa- tion,” in Proceedings of the Asia-Pacific Programming Languages and Compilers Workshop, Shenzhen, China, Feb. 2013.

[65] G. Duboscq, T. W¨urthinger,and H. M¨ossenb¨ock, “Speculation without regret: Reducing deoptimization meta-data in the Graal compiler,” in Proceedings of the 2014 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools, ser. PPPJ ’14. Cracow, Poland: Association for Computing Machinery, Sep. 2014, pp. 187–193.

[66] G. Duboscq, T. W¨urthinger, L. Stadler, C. Wimmer, D. Simon, and H. M¨ossenb¨ock, “An intermediate representation for speculative optimizations in a dynamic com- piler,” in Proceedings of the 7th ACM Workshop on Virtual Machines and Inter- mediate Languages, ser. VMIL ’13. Indianapolis, Indiana, USA: Association for Computing Machinery, Oct. 2013, pp. 1–10. 90 BIBLIOGRAPHY

[67] G. M. Duboscq, “Combining speculative optimizations with flexible scheduling of side-effects,” Ph.D. dissertation, Linz, April 2016, 2016.

[68] D. Engler and D. Dunbar, “Under-constrained execution: Making automatic code destruction easy and scalable,” in Proceedings of the 2007 International Symposium on Software Testing and Analysis - ISSTA ’07. London, United Kingdom: ACM Press, 2007, pp. 1–4.

[69] J. Etheredge, “Software Complexity Is Killing Us,” https://www.simplethread.com/software-complexity-killing-us/, Jan. 2018.

[70] Facebook, “Prepack · Partial evaluator for JavaScript,” https://prepack.io/, 2015.

[71] S. J. Fink and F. Qian, “Design, Implementation and Evaluation of Adaptive Re- compilation with On-Stack Replacement,” in In International Symposium on Code Generation and Optimization (CGO, 2003, pp. 241–252.

[72] C. C. Frederiksen, “Correctness of Classical Compiler Optimizations using CTL,” Electronic Notes in Theoretical Computer Science, vol. 65, no. 2, pp. 37–51, Apr. 2002.

[73] A. Fromherz, K. S. Luckow, and C. S. P˘as˘areanu, “Symbolic Arrays in Symbolic PathFinder,” ACM SIGSOFT Software Engineering Notes, vol. 41, no. 6, pp. 1–5, Jan. 2017.

[74] P. Godefroid, N. Klarlund, and K. Sen, “DART: Directed automated random test- ing,” in Proceedings of the 2005 ACM SIGPLAN Conference on Programming Lan- guage Design and Implementation, ser. PLDI ’05. Chicago, IL, USA: Association for Computing Machinery, Jun. 2005, pp. 213–223.

[75] P. Godefroid, M. Y. Levin, and D. Molnar, “SAGE: Whitebox Fuzzing for Security Testing,” Queue, vol. 10, no. 1, pp. 20–27, Jan. 2012.

[76] P. Godefroid, M. Y. Levin, and D. A. Molnar, “Automated whitebox fuzz testing,” in Proceedings of the Network and Distributed System Security Symposium, NDSS 2008, San Diego, California, USA, 10th February - 13th February 2008. The Internet Society, 2008.

[77] J. Gosling, B. Joy, G. Steele, G. Bracha, and A. Buckley, The Java R Language Specification, Java SE 8 Edition, 1st ed. Upper Saddle River, NJ: Addison-Wesley, 2014.

[78] S. Guo, M. Kusano, C. Wang, Z. Yang, and A. Gupta, “Assertion guided sym- bolic execution of multithreaded programs,” in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2015. Bergamo, Italy: Association for Computing Machinery, Aug. 2015, pp. 854–865.

[79] S. L. Hantler and J. C. King, “An Introduction to Proving the Correctness of Programs,” ACM Computing Surveys, vol. 8, no. 3, pp. 331–353, Sep. 1976.

[80] U. H¨olzle,C. Chambers, and D. Ungar, “Debugging optimized code with dynamic deoptimization,” ACM SIGPLAN Notices, vol. 27, no. 7, pp. 32–43, Jul. 1992. BIBLIOGRAPHY 91

[81] F. Howar, F. Jabbour, and M. Mues, “JConstraints: A Library for Working with Logic Expressions in Java,” in Models, Mindsets, Meta: The What, the How, and the Why Not? Essays Dedicated to Bernhard Steffen on the Occasion of His 60th Birthday, ser. Lecture Notes in Computer Science, T. Margaria, S. Graf, and K. G. Larsen, Eds. Cham: Springer International Publishing, 2019, pp. 310–325.

[82] W. Howden, “Methodology for the Generation of Program Test Data,” IEEE Trans- actions on Computers, vol. C-24, no. 5, pp. 554–560, May 1975.

[83] Y. Hu, E. Merlo, M. Dagenais, and B. Lag¨ue,“C/C++ Conditional Compilation Analysis Using Symbolic Execution,” in Proceedings of the International Conference on Software Maintenance (ICSM’00), ser. ICSM ’00. Washington, DC, USA: IEEE Computer Society, 2000, pp. 196–.

[84] J. C. Huang, “An Approach to Program Testing,” ACM Computing Surveys, vol. 7, no. 3, pp. 113–128, Sep. 1975.

[85] Jim Highsmith, Mike Mason, and Neal Ford, “Implications of Tech Stack Complex- ity for Executives,” https://www.thoughtworks.com/insights/blog/implications- tech-stack-complexity-executives, Dec. 2015.

[86] T. Kahsai, P. R¨ummer,H. Sanchez, and M. Sch¨af,“JayHorn: A Framework for Veri- fying Java programs,” in Computer Aided Verification, S. Chaudhuri and A. Farzan, Eds. Cham: Springer International Publishing, 2016, vol. 9779, pp. 352–358.

[87] T. Kahsai, P. R¨ummer,and M. Sch¨af,“JayHorn: A Java Model Checker,” in Tools and Algorithms for the Construction and Analysis of Systems, ser. Lecture Notes in Computer Science, D. Beyer, M. Huisman, F. Kordon, and B. Steffen, Eds. Cham: Springer International Publishing, 2019, pp. 214–218.

[88] E. G. Karpenkov, K. Friedberger, and D. Beyer, “JavaSMT: A Unified Interface for SMT Solvers in Java,” in Verified Software. Theories, Tools, and Experiments, ser. Lecture Notes in Computer Science, S. Blazy and M. Chechik, Eds. Cham: Springer International Publishing, 2016, pp. 139–148.

[89] S. Khurshid, C. S. P˘as˘areanu, and W. Visser, “Generalized Symbolic Execution for Model Checking and Testing,” in Tools and Algorithms for the Construction and Analysis of Systems, ser. Lecture Notes in Computer Science, H. Garavel and J. Hatcliff, Eds. Springer Berlin Heidelberg, 2003, pp. 553–568.

[90] J. C. King, “A new approach to program testing,” ACM SIGPLAN Notices, vol. 10, no. 6, pp. 228–233, Apr. 1975.

[91] J. C. King, “Symbolic Execution and Program Testing,” Commun. ACM, vol. 19, no. 7, pp. 385–394, Jul. 1976.

[92] G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elka- duwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Winwood, “seL4: Formal verification of an OS kernel,” in Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, ser. SOSP ’09. Big Sky, Mon- tana, USA: Association for Computing Machinery, Oct. 2009, pp. 207–220. 92 BIBLIOGRAPHY

[93] T. Kotzmann, C. Wimmer, H. M¨ossenb¨ock, T. Rodriguez, K. Russell, and D. Cox, “Design of the Java HotSpot™ client compiler for Java 6,” ACM Trans- actions on Architecture and Code Optimization, vol. 5, no. 1, pp. 7:1–7:32, May 2008. [94] G. Kov´asznai,A. Fr¨ohlich, and A. Biere, “On the Complexity of Fixed-Size Bit- Vector Logics with Binary Encoded Bit-Width,” in SMT 2012. 10th International Workshop on Satisfiability Modulo Theories, 2012, pp. 44–30. [95] V. Kuznetsov, J. Kinder, S. Bucur, and G. Candea, “Efficient State Merging in Symbolic Execution,” in Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’12. New York, NY, USA: ACM, 2012, pp. 193–204. [96] D. Lacey, N. D. Jones, E. Van Wyk, and C. C. Frederiksen, “Compiler Optimization Correctness by Temporal Logic,” Higher-Order and Symbolic Computation, vol. 17, no. 3, pp. 173–206, Sep. 2004. [97] C. Lattner and V. Adve, “LLVM: A compilation framework for lifelong program analysis transformation,” in International Symposium on Code Generation and Op- timization, 2004. CGO 2004., Mar. 2004, pp. 75–86. [98] T. Lengauer and R. E. Tarjan, “A fast algorithm for finding dominators in a flow- graph,” ACM Transactions on Programming Languages and Systems, vol. 1, no. 1, pp. 121–141, Jan. 1979. [99] D. Leopoldseder, L. Stadler, T. W¨urthinger,J. Eisl, D. Simon, and H. M¨ossenb¨ock, “Dominance-based duplication simulation (DBDS): Code duplication to enable compiler optimizations,” in Proceedings of the 2018 International Symposium on Code Generation and Optimization - CGO 2018. Vienna, Austria: ACM Press, 2018, pp. 126–137. [100] X. Leroy, “Formal certification of a compiler back-end or: Programming a compiler with a proof assistant,” in POPL ’06, 2006. [101] G. Li and G. Gopalakrishnan, “Scalable SMT-based verification of GPU kernel functions,” in Proceedings of the Eighteenth ACM SIGSOFT International Sym- posium on Foundations of Software Engineering, ser. FSE ’10. Santa Fe, New Mexico, USA: Association for Computing Machinery, Nov. 2010, pp. 187–196. [102] G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan, “GK- LEE: Concolic verification and test generation for GPUs,” ACM SIGPLAN Notices, vol. 47, no. 8, pp. 215–224, Feb. 2012. [103] L. Li, Y. Lu, and J. Xue, “Dynamic symbolic execution for polymorphism,” in Proceedings of the 26th International Conference on Compiler Construction, ser. CC 2017. Austin, TX, USA: Association for Computing Machinery, Feb. 2017, pp. 120–130. [104] X. Li, D. Shannon, I. Ghosh, M. Ogawa, S. P. Rajan, and S. Khurshid, “Context- Sensitive Relevancy Analysis for Efficient Symbolic Execution,” in Programming Languages and Systems, ser. Lecture Notes in Computer Science, G. Ramalingam, Ed. Berlin, Heidelberg: Springer, 2008, pp. 36–52. BIBLIOGRAPHY 93

[105] Y. Li, Z. Su, L. Wang, and X. Li, “Steering symbolic execution to less traveled paths,” ACM SIGPLAN Notices, vol. 48, no. 10, pp. 19–32, Oct. 2013. [106] T. Liang, A. Reynolds, N. Tsiskaridze, C. Tinelli, C. Barrett, and M. Deters, “An efficient SMT solver for string constraints,” Formal Methods in System Design, vol. 48, no. 3, pp. 206–234, Jun. 2016. [107] Y. Lin, “Symbolic Execution with Over-Approximation,” Thesis, University of Mel- bourne, Melbourne, Dec. 2017. [108] B. H. Liskov and S. N. Zilles, “Specification techniques for data abstractions,” IEEE Transactions on Software Engineering, vol. SE-1, no. 1, pp. 7–19, Mar. 1975. [109] N. P. Lopes, D. Menendez, S. Nagarakatte, and J. Regehr, “Provably correct peep- hole optimizations with alive,” ACM SIGPLAN Notices, vol. 50, no. 6, pp. 22–32, Jun. 2015. [110] K. Luckow, M. Dimjaˇsevi´c,D. Giannakopoulou, F. Howar, M. Isberner, T. Kahsai, Z. Rakamari´c,and V. Raman, “JDart: A Dynamic Symbolic Analysis Framework,” in Tools and Algorithms for the Construction and Analysis of Systems, M. Chechik and J.-F. Raskin, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2016, vol. 9636, pp. 442–459. [111] S. Malik and L. Zhang, “Boolean satisfiability from theoretical hardness to practical success,” Communications of the ACM, vol. 52, no. 8, pp. 76–82, Aug. 2009. [112] A. Mathews and S. K. Jha, “Verifying Compiler Optimizations in the CASH com- piler framework,” http://www.cs.cmu.edu/˜jha/VCASH/, Apr. 2007. [113] F. Merz, S. Falke, and C. Sinz, “LLBMC: Bounded Model Checking of C and C++ Programs Using a Compiler IR,” in Verified Software: Theories, Tools, Experi- ments, ser. Lecture Notes in Computer Science, R. Joshi, P. M¨uller,and A. Podel- ski, Eds. Springer Berlin Heidelberg, 2012, pp. 146–161. [114] A. Møller and M. I. Schwartzbach, “Static program analysis,” Lecture notes on Static Program Analysis, Oct. 2018. [115] D. Molnar, X. C. Li, and D. A. Wagner, “Dynamic test generation to find in- teger bugs in x86 binary linux programs,” in Proceedings of the 18th Conference on USENIX Security Symposium, ser. SSYM’09. Montreal, Canada: USENIX Association, Aug. 2009, pp. 67–82. [116] D. Monniaux, “A Quantifier Elimination Algorithm for Linear Real Arithmetic,” arXiv:0803.1575 [cs], Sep. 2008. [117] L. Moura and N. Bjørner, “Efficient E-Matching for SMT Solvers,” in Proceedings of the 21st International Conference on Automated Deduction: Automated Deduction, ser. CADE-21. Bremen, Germany: Springer-Verlag, Jul. 2007, pp. 183–198. [118] A. Niemetz, M. Preiner, C. Wolf, and A. Biere, “Btor2 , BtorMC and Boolec- tor 3.0,” in Computer Aided Verification, ser. Lecture Notes in Computer Science, H. Chockler and G. Weissenbacher, Eds. Cham: Springer International Publishing, 2018, pp. 587–595. 94 BIBLIOGRAPHY

[119] Nikolaj Bjørner, “Satisfiability Modulo Theories,” https://theory.stanford.edu/˜nikolaj/setss.html#/.

[120] Oracle, “Java Platform Standard Edition 8 Documentation,” https://docs.oracle.com/javase/8/docs/.

[121] Oracle, “Java TM HotSpot Virtual Machine Performance Enhancements,” https://docs.oracle.com/javase/8/docs/technotes/guides/vm/performance- enhancements-7.html.

[122] Oracle, “Why GraalVM,” https://www.graalvm.org/docs/why-graal/.

[123] M. Paleczny, C. Vick, and C. Click, “The Java HotSpotTM Server Compiler.” in Proceedings of the 1st Java Virtual Machine Research and Technology Symposium, April 23-24, 2001, Monterey, CA, USA. Monterey, CA, USA: USENIX, Jan. 2001.

[124] H. Palikareva and C. Cadar, “Multi-solver Support in Symbolic Execution,” in Computer Aided Verification, ser. Lecture Notes in Computer Science, N. Sharygina and H. Veith, Eds. Springer Berlin Heidelberg, 2013, pp. 53–68.

[125] C. Pasareanu, P. Mehlitz, D. Bushnell, K. Gundy-Burlet, M. Lowry, S. Person, and M. Pape, “Combining unit-level symbolic execution and system-level concrete execution for testing NASA software,” in ISSTA, Jan. 2008, pp. 15–26.

[126] C. S. P˘as˘areanu, W. Visser, D. Bushnell, J. Geldenhuys, P. Mehlitz, and N. Rungta, “Symbolic PathFinder: Integrating symbolic execution with model checking for Java bytecode analysis,” Automated Software Engineering, vol. 20, no. 3, pp. 391– 425, Sep. 2013.

[127] C. Peraire, “Formal testing of object-oriented software: From the method to the tool,” Ph.D. dissertation, EPF Lausanne, 1015 Lausanne, Switzerland, 1998.

[128] L. H. Pham, Q. L. Le, Q.-S. Phan, J. Sun, and S. Qin, “Enhancing Symbolic Exe- cution of Heap-based Programs with Separation Logic for Test Input Generation,” arXiv:1712.06025 [cs], Dec. 2017.

[129] A. Prokopec, G. Duboscq, D. Leopoldseder, and T. Wirthinger, “An Optimization- Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers,” in 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Washington, DC, USA: IEEE, Feb. 2019, pp. 164–179.

[130] V. Raychev, M. Musuvathi, and T. Mytkowicz, “Parallelizing user-defined aggrega- tions using symbolic execution,” in Proceedings of the 25th Symposium on Operating Systems Principles, ser. SOSP ’15. New York, NY, USA: Association for Comput- ing Machinery, Oct. 2015, pp. 153–167.

[131] J. Rose, “JEP 243: Java-Level JVM Compiler Interface,” https://openjdk.java.net/jeps/243.

[132] J. Rose, “JEP 317: Experimental Java-Based JIT Compiler,” https://openjdk.java.net/jeps/317. BIBLIOGRAPHY 95

[133] P. Rummer and T. Wahl, “An SMT-LIB Theory of Binary Floating-Point Arith- metic,” International Workshop on Satisfiability Modulo Theories (SMT), p. 14, 2010.

[134] T. Rus and E. Van Wyk, “Using Model Checking in a Parallelizing Compiler,” Parallel Processing Letters, vol. 8, pp. 459–471, Dec. 1998.

[135] T. Rus and E. V. Wyk, “Model checking as a tool used by parallelizing compil- ers,” in 2nd International Workshop on Formal Methods for Parallel Programming: Theory and Applications, 1997, p. 13.

[136] R. Sasnauskas, O. Landsiedel, M. H. Alizai, C. Weise, S. Kowalewski, and K. Wehrle, “KleeNet: Discovering insidious interaction bugs in wireless sensor networks before deployment,” in Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks, ser. IPSN ’10. New York, NY, USA: Association for Computing Machinery, Apr. 2010, pp. 186–196.

[137] K. Sen, “Concolic testing,” in Proceedings of the Twenty-Second IEEE/ACM Inter- national Conference on Automated Software Engineering, ser. ASE ’07. Atlanta, Georgia, USA: Association for Computing Machinery, Nov. 2007, pp. 571–572.

[138] K. Sen and G. Agha, “CUTE and jCUTE: Concolic Unit Testing and Explicit Path Model-Checking Tools (Tools Paper):,” Defense Technical Information Center, Fort Belvoir, VA, Tech. Rep., Jan. 2006.

[139] K. Sen, D. Marinov, and G. Agha, “CUTE: A concolic unit testing engine for C,” ACM SIGSOFT Software Engineering Notes, vol. 30, no. 5, pp. 263–272, Sep. 2005.

[140] A. Sewe, M. Mezini, A. Sarimbekov, and W. Binder, “Da capo con scala: Design and analysis of a scala benchmark suite for the java virtual machine,” in Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, ser. OOPSLA ’11. New York, NY, USA: Association for Computing Machinery, Oct. 2011, pp. 657–676.

[141] V. Sharma, S. Hussein, M. W. Whalen, S. McCamant, and W. Visser, “Java Ranger at SV-COMP 2020 (Competition Contribution),” in Tools and Algorithms for the Construction and Analysis of Systems, A. Biere and D. Parker, Eds. Cham: Springer International Publishing, 2020, vol. 12079, pp. 393–397.

[142] K. Shiv, K. Chow, Y. Wang, and D. Petrochenko, “SPECjvm2008 Performance Characterization,” in Computer Performance Evaluation and Benchmarking, ser. Lecture Notes in Computer Science, D. Kaeli and K. Sachs, Eds. Berlin, Heidel- berg: Springer, 2009, pp. 17–35.

[143] Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel, and G. Vigna, “SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis,” in 2016 IEEE Symposium on Security and Privacy (SP), May 2016, pp. 138–157.

[144] D. Simon, “Libgraal: GraalVM compiler as a precompiled GraalVM native im- age,” https://medium.com/graalvm/libgraal-graalvm-compiler-as-a-precompiled- graalvm-native-image-26e354bee5c, Jul. 2019. 96 BIBLIOGRAPHY

[145] D. Simon, C. Wimmer, B. Urban, G. Duboscq, L. Stadler, and T. W¨urthinger, “Snippets: Taking the High Road to a Low Level,” ACM Transactions on Archi- tecture and Code Optimization, vol. 12, no. 2, pp. 20:20:1–20:20:25, Jun. 2015.

[146] L. Stadler, T. W¨urthinger,and H. M¨ossenb¨ock, “Partial Escape Analysis and Scalar Replacement for Java,” in Proceedings of Annual IEEE/ACM International Sym- posium on Code Generation and Optimization, ser. CGO ’14. Orlando, FL, USA: Association for Computing Machinery, Feb. 2014, pp. 165–174.

[147] S. Subramanian, M. Berzish, V. Ganesh, and O. Tripp, “A Solver for a Theory of String and Bit-vectors,” in Proceedings of the 39th International Conference on Software Engineering Companion, ser. ICSE-C ’17. Piscataway, NJ, USA: IEEE Press, 2017, pp. 124–126.

[148] S. S¨usl¨u,“JSSpe: A Symbolic Partial Evaluator for JavaScript,” Thesis, The Uni- versity of Texas at Arlington, Apr. 2018.

[149] S. S¨usl¨uand C. Csallner, “SPEjs: A Symbolic Partial Evaluator for JavaScript,” in Proceedings of the 1st International Workshop on Advances in Mobile App Analysis, ser. A-Mobile 2018. New York, NY, USA: ACM, 2018, pp. 7–12.

[150] Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley, The Java Virtual Machine Specification, Java SE 8 Edition, 1st ed. Addison-Wesley Professional, 2014.

[151] M. Trt´ıkand J. Strejˇcek,“Symbolic Memory with Pointers,” in Automated Tech- nology for Verification and Analysis, F. Cassez and J.-F. Raskin, Eds. Cham: Springer International Publishing, 2014, vol. 8837, pp. 380–395.

[152] W. Visser and J. Geldenhuys, “COASTAL: Combining Concolic and Fuzzing for Java (Competition Contribution),” in Tools and Algorithms for the Construction and Analysis of Systems, A. Biere and D. Parker, Eds. Cham: Springer Interna- tional Publishing, 2020, vol. 12079, pp. 373–377.

[153] J. Wagner, V. Kuznetsov, and G. Candea, “-OVERIFY: Optimizing programs for fast verification,” in 14th Workshop on Hot Topics in Operating Systems (HotOS XIV). Santa Ana Pueblo, NM: USENIX Association, May 2013.

[154] X. Wang, J. Sun, Z. Chen, P. Zhang, J. Wang, and Y. Lin, “Towards Optimal Concolic Testing,” in 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), May 2018, pp. 291–302.

[155] V. Wiels, R. Delmas, D. Doose, P.-L. Garoche, J. Cazin, and G. Durrieu, “Formal Verification of Critical Aerospace Software,” AerospaceLab, no. 4, pp. p. 1–8, May 2012.

[156] C. Wimmer, C. Stancu, P. Hofer, V. Jovanovic, P. W¨ogerer,P. B. Kessler, O. Pliss, and T. W¨urthinger,“Initialize once, start fast: Application initialization at build time,” Proceedings of the ACM on Programming Languages, vol. 3, no. OOPSLA, pp. 184:1–184:29, Oct. 2019. [157] C. Wimmer and T. W¨urthinger,“Truffle: A self-optimizing ,” in Proceedings of the 3rd Annual Conference on Systems, Programming, and Applica- tions: Software for Humanity, ser. SPLASH ’12. Tucson, Arizona, USA: Associa- tion for Computing Machinery, Oct. 2012, pp. 13–14.

[158] T. Wuerthinger, C. Wimmer, and H. M¨ossenb¨ock, “Visualization of Program De- pendence Graphs,” in Compiler Construction: 17th International Conference, Held as Part of the Joint European Conferences on Theory and Practice of Software, vol. 4959, Mar. 2008, pp. 193–196.

[159] T. W¨urthinger, C. Wimmer, A. W¨oß, L. Stadler, G. Duboscq, C. Humer, G. Richards, D. Simon, and M. Wolczko, “One VM to rule them all,” in Proceed- ings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, ser. Onward! 2013. Indianapolis, Indiana, USA: Association for Computing Machinery, Oct. 2013, pp. 187–204.

[160] H. Xu, Z. Zhao, Y. Zhou, and M. R. Lyu, “Benchmarking the Capability of Symbolic Execution Tools with Logic Bombs,” IEEE Transactions on Dependable and Secure Computing, pp. 1–1, 2018.

[161] G. Yang, A. Filieri, M. Borges, D. Clun, and J. Wen, “Advances in Symbolic Execution,” in Advances in Computers. Elsevier, 2019, vol. 113, pp. 225–287.

[162] I. Yun, S. Lee, M. Xu, Y. Jang, and T. Kim, “QSYM: A practical concolic execution engine tailored for hybrid fuzzing,” in Proceedings of the 27th USENIX Conference on Security Symposium, ser. SEC’18. Baltimore, MD, USA: USENIX Association, Aug. 2018, pp. 745–761.

[163] Y. Zheng, X. Zhang, and V. Ganesh, “Z3-str: A Z3-based String Solver for Web Application Analysis,” in Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2013. New York, NY, USA: ACM, 2013, pp. 114–124. 98 BIBLIOGRAPHY 99

Eidesstattliche Erkl¨arung

Ich erkl¨arean Eides statt, dass ich die vorliegende Masterarbeit selbstst¨andig und ohne fremde Hilfe verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt bzw. die w¨ortlich oder sinngem¨aß entnommenen Stellen als solche kenntlich gemacht habe. Die vorliegende Masterarbeit ist mit dem elektronisch ¨ubermittelten Textdoku- ment identisch.

Linz, am 14. September 2020 Sebastian Kloibhofer, BSc