jbse-manual Documentation Release latest

Pietro Braione

Aug 25, 2021

Contents

1 About this book 1 1.1 What is JBSE?...... 1 1.2 Who is the author of this book?...... 1 1.3 Where do I find this book?...... 1

2 Introduction 3 2.1 Software analysis...... 3 2.2 What is symbolic execution?...... 4 2.3 Symbolic execution with objects as inputs...... 10

3 Getting started with JBSE 17 3.1 Obtaining and installing JBSE...... 17 3.2 A basic example...... 17 3.3 Assertions and assumptions...... 22

4 Using JBSE 25 4.1 The symbolic execution classes...... 25 4.2 Creating a symbolic executor...... 25

i ii CHAPTER 1

About this book

This book teaches you how to install, use and modify JBSE, an open source framework for the analysis of Java programs.

1.1 What is JBSE?

JBSE is the Java Bytecode Symbolic Executor. Basically, it is a special-purpose written in Java that can be used for program analysis and automated test generation. The homepage of JBSE is at https://pietrobraione. github.io/jbse/ and its Github repository is at https://github.com/pietrobraione/jbse.

1.2 Who is the author of this book?

The author of this book is the main maintainer of JBSE, Pietro Braione. You can contact me via email at [email protected].

1.3 Where do I find this book?

This book is available on Github. Its repository is https://github.com/pietrobraione/jbse-manual. The book is written in reStructuredText and is published on readthedocs at https://jbse-manual.readthedocs.io.

1 jbse-manual Documentation, Release latest

2 Chapter 1. About this book CHAPTER 2

Introduction

Welcome. Let me introduce you to JBSE and explain what it is and what it can do. JBSE is a special-purpose Java Virtual Machine (JVM). As you may know, a JVM is what is necessary to execute a program written in the programming languages Java, Scala, Clojure, Groovy, and many others. To be more precise, a JVM is able to execute the special format emitted by the compilers of these languages, the so-called Java bytecode. The programming languages that compile to Java bytecode achieve their portability across different platforms because their compilers does not translate programs directly to machine language, but to Java bytecode that, differently from machine language, is CPU- and OS-independent. It is sufficient to port a JVM implementation across different plat- forms, and automatically all the programs compiled to Java bytecode can be executed unchanged on all of them. The Java bytecode format is precisely documented in the Java Virtual Machine Specification (JVMS) books, that describe how a compliant JVM must execute a program in Java bytecode. The reference JVM implementation is Oracle’s Hotspot JVM, but there are many other ones, e.g., IBM’s OpenJ9 or aicas’ JamaicaVM. So JBSE is a JVM, and therefore it can be used as a drop-in replacement to Hotspot to execute Java (or Scala, Clojure, Groovy. . . ) software. Right? Well, not really. JBSE’s main purpose is to analyze, rather than execute, Java software.

2.1 Software analysis

Let us face the reality: Too often software systems do not work as expected. There are many reasons why this happens, but the most cogent one is possibly that software systems quickly turn complex, and when they turn complex, they usually turn extremely complex. The Windows 10 operating system’s source code, for example, is about 50 millions of lines over 3.5 millions of files, and when checked out it occupies about 300 GB of disk space. Complexity in structure implies complexity in behavior, and unforeseeable behaviors are the main consequence, and the root cause of bugs. A possible way to dominate this complexity is to empower the software engineer with tools that help him or her with understanding how the system behaves. These tools perform what is commonly called software analysis and can be roughly classified into two categories: static and dynamic analysis tools. Static analysis tools extract information about a software system without executing it. The well known Findbugs, Checkstyle and PMD tools perform a kind of static analysis based on the idea of scanning a system’s source code in search for the occurrence of a number of predefined

3 jbse-manual Documentation, Release latest

code patterns, each indicating the possible occurrence of a different kind of bug. Static analysis techniques usually require the availability of the source code and produce approximate answers, where false alarms and missed bugs are the norm, but their answers, when correct, can provide very general information on the correctness of the program. On converse, dynamic analysis tools gather information on the software under analysis by observing the effects of its execution. Testing is the quintessential dynamic analysis activity: It observes the effects of the execution of the system software when fed by a finite set of inputs, in search for the manifestations of software defects. Dynamic analyses are usually very precise, but bound to the (few) executions they are able to observe, they usually produce less general results than static analyses. For example, as observed by Dijkstra, testing alone cannot be used to assess the absence of any category of software bugs, while static analyses, in principle, may. JBSE performs a kind of analysis that is called symbolic execution, that is amenable both to verify the correctness of a program with respect to some desired properties expressed as assertions, and to generate test vectors for the program. When used for verification JBSE expects that you specify the verification properties of interest for your project as a set of assumptions and assertions. Assumptions specify the conditions that must be satisfied for an execution to be relevant. Preconditions are a typical form of assumptions, allowing e.g. to specify the range of the possible values for the program inputs. Assertions specify the conditions that must be satisfied for an execution to be correct. JBSE attempts to determine whether some input exists that satisfies all the assumptions and falsifies at least one assertion. In this regard JBSE is more similar in spirit, implementation and mode of use to tools like Symbolic PathFinder, Sireum/Kiasan and JNuke.

2.2 What is symbolic execution?

If you do not know what “symbolic execution” is, then you may have a look at the corresponding Wikipedia article or to some textbook. But if you are really impatient, here is a very short tutorial. To explain what symbolic execution is we can consider that symbolic execution is to testing what symbolic equation solving is to numeric equation solving. Let us consider, for instance, the equation 푥2 − 2 · 푥 + 1 = 0, of which we are asked to find its real solutions. This second degree equation is numeric, meaning that all its coefficients are numbers. According to the value of the discriminant ∆ the equation can have two real solutions (this happens when ∆ > 0), one real solution (when ∆ = 0) or no real solution (when ∆ < 0). In this case the equation has one real solution being ∆ = (−2)2 − 4 · 1 · 1 = 4 − 4 = 0. Conversely, the equation 푥2 − 푏 · 푥 + 1 = 0 is symbolic, because one of the coefficients 푏 is not a number but a symbol, standing for an unknown numeric value ranging in a (possibly infinite) set of admissible values. If we assume that this set is the set of all the possible real numbers, then the discriminant of the second equation is ∆ = 푏2 − 4, for any real value of 푏. As with the numeric equations, to determine the solution of the symbolic equation we need to split cases based on the sign of the discriminant. But differently from our first example, where exactly one case holds, symbolic equation solving may require to follow more than one of them. Depending on the possible values of 푏 our example symbolic equation may fall in one of three cases: If |푏| > 2 the discriminant is greater than zero and the equation has two real solutions. If 푏 = 2 or 푏 = −2 the discriminant is zero and the equation has one real solution. Finally, if −2 < 푏 < 2, the discriminant is less than zero and the equation has no real solutions. Since all the three subsets for 푏 are nonempty any of the three cases may hold. As a consequence, the solution of a symbolic equation is usually expressed as a set of summaries. A summary associates a condition on the symbolic parameters with a corresponding possible result of the equation, where the result can be a number or an√ expression 2 in the symbols. For our√ running example the solution produces as summaries |푏| > 2 → 푥 = (푏 + 푏 − 4)/2, |푏| > 2 → 푥 = −(푏 + 푏2 − 4)/2, 푏 = 2 → 푥 = 1, and 푏 = −2 → 푥 = −1. Note that summaries overlap where a combination of parameters values (|푏| > 2 in the previous case) yield multiple results, and that the union of the summaries does not span the whole domain for 푏, because some values for 푏 yield no result. Symbolic execution is a program analysis technique that is based on performing the execution of a program with input values that may be symbols standing for sets of possible numeric (concrete) values. Consider for example the following Java program:

package smalldemos.ifx;

(continues on next page)

4 Chapter 2. Introduction jbse-manual Documentation, Release latest

(continued from previous page) public class IfExample { boolean a, b; public void m(int x) { if (x>0){ a= true; } else { a= false; } if (x>0){ b= true; } else { b= false; } assert a== b; } }

This program is the customary “double-if” example that is often used to illustrate how symbolic execution works. It is a sequence of two if statement with same condition, where the variables involved in the condition are not modified through the program. This ensures that every execution of the program will execute either both the then branches or both the else branches, never a then branch and an else branch. The final assertion requires for the program to be correct that the variables a and b have same final value, a fact that trivially holds for all the possible executions. Let us test the method m with input, say, x == 3: • The JVM first evaluates the branch condition x > 0 of the first if statement. Being x == 3 this yields (3 > 0) == true: Thus the then branch of the first if statement is selected for execution and a is set to true. Then the execution continues with the second if statement. • The JVM evaluates the branch condition of the second if statement, that is again x > 0. Since the value of x is still 3, the then branch of the second if statement is selected for execution and b is set to true. Then the execution continues with the assert statement. • The JVM evaluates the condition a == b of the assert statement. Since both a and b are set to true, the condition holds and the method terminates correctly.

Now let us perform symbolic execution of the same method m with a symbolic value, say 푥0, for its input x. We do not make any assumption on what the value of 푥0 might be: It could stand for any possible int value. This is how JBSE executes the method:

• JBSE evaluates the branch condition x > 0 of the first if statement. Since x == 푥0, and no assumption is made on the concrete value 푥0 stands for, JBSE cannot determine what is the next statement that must be executed. Therefore JBSE does what we did in the case of the quadratic equation with symbolic coefficients: It splits cases.

• First, JBSE assumes that the branch condition x > 0 evaluates to true. Being x == 푥0 this happens when 푥0 > 0. – In this case, JBSE selects for execution the then branch of the first if statement, a is set to true, and the execution continues with the second if statement.

– JBSE then evaluates the second branch condition: But since it has previously assumed that 푥0 > 0 the second branch condition always evaluates to true. JBSE selects the then branch of the second if statement, b is set to true, and the execution continues with the assert statement. – JBSE evaluate the condition a == b of the assert statement. Again, a and b are set to true, the condition holds and the method execution terminates correctly.

• Once finished the analysis of the case 푥0 > 0 JBSE backtracks, i.e., restores the state of the execution where the next statement to be executed is the first if statement, and considers the opposite case, i.e., the case where the

2.2. What is symbolic execution? 5 jbse-manual Documentation, Release latest

branch condition x > 0 evaluates to false. Since in the backtracked state it is again x == 푥0, this happens when 푥0 ≤ 0. – Now the else branch of the first if statement is followed and a is set to false.

– The execution continues with the second if statement, and since JBSE has now assumed that 푥0 ≤ 0 it will evaluate the second branch condition to false. The else branch of the second if statement is followed and b is set to false. – Finally, JBSE executes the assert statement. Being a and b both set to false, the assertion condition holds and the method execution terminates correctly. This example shows you that symbolic execution is not much different from ordinary (also said concrete) execution of programs. The main difference is that, at some point of a symbolic execution the presence of symbolic values might make unclear what is the next thing to do (which branch of the next if statement should I follow? Should I do another iteration of the while statement I am in or should I exit the loop?). In this case a symbolic executor must introduce an assumption on the possible values of the symbolic inputs so the next action is unambiguously identified. The sequence of all the assumptions introduced during the execution is called the path condition, because it is determined by the path followed by the execution through the branches in the control flow graph of the program. All the values for the input symbols that satisfy (i.e., solve) a path condition drive the execution of the program through the path that generated the path condition. For instance, all the input values 푥0 for the “double-if” program that satisfy the condition 푥0 ≤ 0 drive the program execution through the else branches of the two if statements. Conversely, if a path condition has no solution, then no program inputs drive the program through the corresponding path, and the path is said to be infeasible. For instance, the path in the “double-if” program that goes through the then branch of the first if statement and the else branch of the second if statement is 푥0 > 0 ∧ 푥0 ≤ 0, that is clearly without any solution. Correspondingly, no input exists that drives the program through this path. If a program is deterministic, i.e., it does always the same things when fed by the same inputs, then each of its possible concrete executions yields a linear sequence of states, where each state has exactly one successor. The sequence corresponds to a single path in the control flow graph of the program. On converse, its possible symbolic executions yield a symbolic execution tree, rooted at the initial symbolic state and branching whenever a symbolic state has more than one successor because of case splitting. Figure 2.1 reports the symbolic execution tree for the “double-if” program. Circles are program states, indicating the values stored for all the variables in the program. Arrows join a state with its possible successors, and are labeled according to the next statement to be executed: If this is an assignment, the label reports the assignment, if it is a conditional, the label reports the evaluation of the conditional in the pre-state. The final states that pass the assertion are represented by a green tick. The figure does not show the infeasible paths, but we will often consider the case of symbolic execution trees where all the paths through the control flow graph, be them infeasible or not, are reported. We will call static a symbolic execution tree that reports all the static paths (either feasible or infeasible) through the program, and by contrast we will call dynamic a symbolic execution tree that reports only feasible paths. Figure 2.2, for instance, reports the static symbolic execution tree of the “double-if” example program. The red cross signifies a final state that does not pass the assertion. The path condition for a certain path is obtained by visiting the symbolic execution tree from the root through the path, and conjoining all the edge labels for conditional expressions evaluations. In Figure 2.3 the path marked with the red dashed arrow has as path condition the logical “and” of the two expressions surrounded by a red circle, i.e., 푥0 > 0 ∧ 푥0 ≤ 0. Being the path condition unsatisfiable, the path is infeasible. The “double-if” program has a finite symbolic execution tree, but this is not the case in general. If the program has loops the static symbolic execution tree is infinite, and most likely also the dynamic symbolic execution tree is. Consider for instance the following program:

package smalldemos.loop;

public class LoopExample { public void m(int n) { while (n>0){ --n; (continues on next page)

6 Chapter 2. Introduction jbse-manual Documentation, Release latest

x ↦ x0 a ↦ a0 b ↦ b0

x0 ≤ 0 x0 > 0

x ↦ x0 x ↦ x0 a ↦ a0 a ↦ a0 b ↦ b0 b ↦ b0

a = false a = true

x ↦ x0 x ↦ x0 a ↦ false a ↦ true b ↦ b0 b ↦ b0

x0 ≤ 0 x0 > 0

x ↦ x0 x ↦ x0 a ↦ false a ↦ true b ↦ b0 b ↦ b0

b = false b = true

x ↦ x0 x ↦ x0 a ↦ false a ↦ true b ↦ false b ↦ true

assert a == b assert a == b

Figure 2.1: Symbolic execution tree for the “double-if” program.

2.2. What is symbolic execution? 7 jbse-manual Documentation, Release latest

x ↦ x0 a ↦ a0 b ↦ b0

x0 ≤ 0 x0 > 0

x ↦ x0 x ↦ x0 a ↦ a0 a ↦ a0 b ↦ b0 b ↦ b0

a = false a = true

x ↦ x0 x ↦ x0 a ↦ false a ↦ true b ↦ b0 b ↦ b0

x0 ≤ 0 x0 > 0 x0 ≤ 0 x0 > 0

x ↦ x0 x ↦ x0 x ↦ x0 x ↦ x0 a ↦ false a ↦ false a ↦ true a ↦ true b ↦ b0 b ↦ b0 b ↦ b0 b ↦ b0

b = false b = true b = false b = true

x ↦ x0 x ↦ x0 x ↦ x0 x ↦ x0 a ↦ false a ↦ false a ↦ true a ↦ true b ↦ false b ↦ true b ↦ false b ↦ true

assert a == b assert a == b assert a == b assert a == b

Figure 2.2: Static symbolic execution tree for the “double-if” program.

8 Chapter 2. Introduction jbse-manual Documentation, Release latest

x ↦ x0 a ↦ a0 b ↦ b0

x0 ≤ 0 x0 > 0

x ↦ x0 x ↦ x0 a ↦ a0 a ↦ a0 b ↦ b0 b ↦ b0

a = false a = true

x ↦ x0 x ↦ x0 a ↦ false a ↦ true b ↦ b0 b ↦ b0

x0 ≤ 0 x0 > 0 x0 ≤ 0 x0 > 0

x ↦ x0 x ↦ x0 x ↦ x0 x ↦ x0 a ↦ false a ↦ false a ↦ true a ↦ true b ↦ b0 b ↦ b0 b ↦ b0 b ↦ b0

b = false b = true b = false b = true

x ↦ x0 x ↦ x0 x ↦ x0 x ↦ x0 a ↦ false a ↦ false a ↦ true a ↦ true b ↦ false b ↦ true b ↦ false b ↦ true

assert a == b assert a == b assert a == b assert a == b

Figure 2.3: A path in the “double-if” program and its path condition.

2.2. What is symbolic execution? 9 jbse-manual Documentation, Release latest

(continued from previous page) } assert n<=0; } }

Its static symbolic execution tree, reported in Figure 2.4, is clearly infinite. If a program may diverge, i.e., it has at least one (concrete) execution that does not terminate, then this execution is infinite, and correspondingly there is an infinite path in the static symbolic execution tree for it. Note however that the vice versa does not in general hold: If the static symbolic execution tree has an infinite path, this does not necessarily imply that the program may diverge. The example loop program illustrates that: Its static symbolic execution tree has one infinite path, highlighted in Figure 2.5 with a red dashed arrow, but since we can easily prove that the program always terminates, the path is infeasible. Note also that it is not possible to exclude this path from the tree without excluding some feasible paths: In other words, it is not possible to build a dynamic symbolic execution tree that exactly contains all the feasible paths. To summarize, the symbolic execution of a program with loops may not terminate, as the symbolic executor may get stuck analyzing an infinite path, or an infinite set of finite paths. For this reason symbolic executors allow users to set an analysis budget (maximum time, maximum depth), and when they exhaust the budget they abort the analysis. Consider however that, although a symbolic executor is able in practice to analyze only a finite subset of its possible symbolic paths, each symbolic path stands for a potentially infinite set of concrete paths. For this reason symbolic execution is a program analysis technique more powerful than testing.

2.3 Symbolic execution with objects as inputs

When the inputs to a program are numeric, this is pretty much what one needs to know about symbolic execution. Things become more complex when one allows programs to take objects as inputs, as customarily happens with the Java programming language:

package smalldemos.node;

public class NodeExample { public void m(Node node) { node.swap(); } }

What if we symbolically execute the m method? As usual the value of the parameter variable node is unknown at the beginning of the execution, and the variable is initialized with a symbol 푛표푑푒0 standing for the unknown value. This symbol is a symbolic reference, and in absence of assumptions it may stand for any possible reference, either valid or invalid (null), to the heap memory at the initial state of the execution. Now what if the class Node is abstract and has 푁 concrete subclasses, each implementing a different version of the swap method? When JBSE arrives at the node.swap() statement, to determine what is the next statement to be executed it must split cases. The possible cases are, at least, 푁 + 1: One case where node == null and the next statement will be the catch block, if present, for the NullPointerException that the node.swap() statement execution raises, plus the 푁 cases where 푛표푑푒0 is a reference to each of the different concrete subclasses of Node, and the next statement will be the first statement of the implementation of swap() in the assumed class. This differs from the case where only numeric symbolic values were present, and the number of possible cases at a branch is at most two. The situation is actually worse than this, and symbolic execution typically need to consider many more subcases than 푁 + 1. How many? The answer to this question is found in this paper, which introduces a technique, called “lazy initialization”, that is the one used by JBSE to determine which cases need to be analyzed when using a symbolic reference. According to the lazy initialization technique symbolic execution needs to consider the following cases: • The symbolic reference may be null, or

10 Chapter 2. Introduction jbse-manual Documentation, Release latest

n ↦ n0

n0 ≤ 0 n0 > 0

n ↦ n0 n ↦ n0

assert n <= 0 --n

n ↦ n0 - 1

n0 ≤ 1 n0 > 1

n ↦ n0 - 1 n ↦ n0 - 1

assert n <= 0 --n

n ↦ n0 - 2

n0 ≤ 2 n0 > 2

n ↦ n0 - 2 n ↦ n0 - 2

assert n <= 0 --n

n ↦ n0 - 3

n0 ≤ 3

... n ↦ n0 - 3

assert n <= 0

Figure 2.4: Static symbolic execution tree for the loop program.

2.3. Symbolic execution with objects as inputs 11 jbse-manual Documentation, Release latest

n ↦ n0

n0 ≤ 0 n0 > 0

n ↦ n0 n ↦ n0

assert n <= 0 --n

n ↦ n0 - 1

n0 ≤ 1 n0 > 1

n ↦ n0 - 1 n ↦ n0 - 1

assert n <= 0 --n

n ↦ n0 - 2

n0 ≤ 2 n0 > 2

n ↦ n0 - 2 n ↦ n0 - 2

assert n <= 0 --n

n ↦ n0 - 3

n0 ≤ 3

... n ↦ n0 - 3

assert n <= 0

Figure 2.5: Infinite path in the static symbolic execution tree for the loop program.

12 Chapter 2. Introduction jbse-manual Documentation, Release latest

• The symbolic reference may be a reference to a fresh type-compatible object, for all 푁 compatible types, or • The symbolic reference may be a reference to a non-fresh type-compatible object, where with “non-fresh” we mean that the object was assumed to exist initially by a lazy initialization step earlier during the execution, for all 퐾 such objects. Now some terminology. We will say that a symbolic reference on which symbolic execution did no assumption is unresolved, a symbolic reference that is assumed to be null is resolved by null, a symbolic reference that is assumed to refer a fresh object to be resolved by expansion, and a symbolic reference that is assumed to refer a non-fresh object to be resolved by alias. To clarify how lazy initialization works we will now consider the following example program, that scans a list of integers and returns the sum of the stored values:

package esecfse2013;

public class Target { int sum(List list) { int tot=0; for (int item : list) { tot+= item; } return tot; } }

Let us suppose that List is an abstract class or interface whose only concrete subclass is a LinkedList class defined as follows:

public class LinkedList{ private Node head;

private static class Node { private I value; private Node next; ... } ... }

Back to the Target.sum() method, to scan the input list the for loop must first access list itself, then list. head, then list.head.next, then list.head.next.next. . . and so on, until the list termination criterion is met (for LinkedList data structures we will consider the case where they are null-terminated). Symbolic execution of Target.sum() will initially need to determine whether the symbolic reference stored in list, say 푙0, points to an object or not. The following cases may hold:

1. Either 푙0 == null,

2. Or 푙0 != null, i.e., refers some object of class LinkedList. No more cases need to be considered, since List has only one concrete subclass. In case 1, the method raises a NullPointerException. In case 2, the method starts iterating through the nodes of the list, and accesses list. head. Since no assumption is made on what the fields of the list object store, JBSE assumes that list.head stores an unresolved symbolic reference, say 푛0. Because of this access two subcases arise:

2. 푙0 != null:

1. Either 푛0 == null,

2. Or 푛0 != null, i.e., refers some object of class LinkedList.Node.

2.3. Symbolic execution with objects as inputs 13 jbse-manual Documentation, Release latest

In case 2.1 (empty list) the method stops iterating and return the value of the tot variable, i.e., its initialization value 0. In case 2.2 the method adds to tot the content of the value field of the 푛0 object. But we did not any assumption about the 푛0 object itself but that it exists, so what should its value field contain? In lack of any assumption we can only say that this field contains a value constrained by its static type, thus an int value, and represent this 1 unknown value with a symbol, say, 푣0 . Then the method performs another iteration of the loop body by accessing list.head.next, that stores another symbolic reference 푛1. This time three cases may arise:

2. 푙0 != null:

2. 푛0 != null:

1. Either 푛1 == null,

2. Or 푛1 == 푛0,

3. Or 푛1 refers some object of class LinkedList.Node different from 푛0.

In case 2.2.1 (list with one element) the method stops iterating and returns the value of tot, that is, 푣0. In case 2.2.2 (푛1 is non-fresh) the method will diverge by iterating undefinitely through list.head.next.next == list. head.next.next.next == ... == 푛0, never returning to the caller. In case 2.2.3 (푛1 fresh) the method will add list.head.next.value (say, 푣1) to tot, and iterate once again the loop body by accessing list.head. next.next, that stores yet another symbolic reference 푛2. This access yields four subcases:

2. 푙0 != null:

2. 푛0 != null:

3. 푛1 != null and fresh:

1. Either 푛2 == null,

2. Or 푛2 == 푛0,

3. Or 푛2 == 푛1,

4. Or 푛2 refers some object of class LinkedList.Node different from 푛0 and 푛1.

Case 2.2.3.1 is the case of a list with exactly two elements: The method stops iterating and returns 푣0 + 푣1. Cases 2.2.3.2 and 2.2.3.3 are similar to case 2.2.2: The 푛2 symbolic reference is non-fresh, and the method diverges by iterating indefinitely through the chain of ...next... references. Case 2.2.3.4 is similar to case 2.2.3: The 푛2 symbolic reference is fresh, and the method adds the content of the value field (say, 푣2) of the fresh object to tot and iterates once more the loop by accessing list.head.next.next.next. Figure 2.6 represents the portion of the symbolic execution tree we discussed until now. Unresolved symbolic ref- erences are depicted as arrows pointing to question marks, and resolved symbolic references are depicted as arrows pointing either to null (if the reference is resolved by null) or to a heap object represented as a box (if the reference is resolved by alias or expansion). It can be easily inferred that, whenever a 푛푖 symbolic reference, obtained by access- ing a list.head.next.next...next sequence, is used during symbolic execution, up to 푖 + 2 cases must be considered: The case where 푛푖 == null, the 푖 cases where 푛푖 is equal to (i.e., aliases) 푛0, 푛1... 푛푖−1, and the case where 푛푖 points to a fresh object, different from the objects pointed by 푛0, 푛1, . . . , 푛푖−1. If we impose a bound on the number of possible objects with class LinkedList.Node, say not more than 푊 , the symbolic execution tree will ∑︀푊 2 have 1 + 1 + 2 + 3 + ... + 푊 = 1 + 푖=1 푖 = 1 + 푊 (푊 + 1)/2 ∈ 푂(푊 ) paths. Note, however, that not all these paths are relevant to the analysis of the behaviour of the Target.sum() method. When analyzing a piece of code we usually make the implicit assumption that its inputs must be well-formed. In the case of null-terminated linked lists, all the arrangements of list nodes that contain loops are ill-formed: If such “garbage” lists can be produced, this is due to a bug in the implementation of the LinkedList class, not of the sum method, and usually we are not interested in how the code under analysis behaves when fed by garbage inputs: We want to assume that the input lists are correctly structured. In other words, while lazy initialization performs an exhaustive analysis of all the possible

1 Actually, because of boxing and type erasure, the value field has type Object, thus 푣0 should be a symbolic reference that might potentially point to any Object. To simplify the presentation we will suppose that it is an int symbolic value instead. Later in this book we will define techniques that allow to constrain list to have only elements with type Integer.

14 Chapter 2. Introduction jbse-manual Documentation, Release latest

1 2 head list null list ?

Throws NullPointerException

1 2 head head next

list null list v0 ?

Returns 0

1 2 3 head next head next next

list v0 null list v0 v1 ?

head next

list v0

Returns v0 Diverges

1 2 3 4

head next next head next next next

list v0 v1 null list v0 v1 v2 ?

head next next head next next

list v0 v1 list v0 v1

Returns v0 + v1 Diverges Diverges ...

Figure 2.6: Symbolic execution tree for the list scanning program. arrangements of the objects in the input heap, not all these arrangements are relevant to the analysis of the target code. Discarding them would meaningfully increment speed and precision of the analysis, possibly ruling out spurious errors and diverging paths. In our example, JBSE should discard all the cases where the 푛푖 symbolic references are resolved by alias, retaining only the resolutions by null or by expansion. Were JBSE able to do that, the resulting symbolic exe- cution tree would have the shape depicted in Figure 2.7. If we bound the maximum number of LinkedList.Node ∑︀푊 objects to be at most 푊 , the total number of paths becomes 1 + 1 + ... + 1 = 1 + 푖=1 1 = 1 + 푊 ∈ 푂(푊 ), much less than without filtering out the irrelevant traces. Moreover, JBSE would discard all the diverging (infinite) paths except the one marked by a red dashed arrow in Figure 2.7, that is unfeasible because that path requires the presence of an infinite number of nodes in the heap memory, and the heap memory of a program always contains a finite, albeit unbounded, number of objects. This allows us to conclude that, when fed by well-formed linked lists, the Target.sum() method always converges and returns the sum of the values stored in the list. This example shows that excluding irrelevant inputs from the symbolic analysis of a program is of paramount im- portance, both to make the analysis feasible within the typically limited computational resources available (time, memory), and to exclude spurious analysis results from its output. JBSE implements a number of techniques that empower its users by allowing them to specify rich classes of assumptions on the shape of the input heap objects. We will discuss them in details in the later chapters of this manual.

2.3. Symbolic execution with objects as inputs 15 jbse-manual Documentation, Release latest

1 2 head list null list ?

Throws NullPointerException

1 2 head head next

list null list v0 ?

Returns 0

1 3 head next head next next

list v0 null list v0 v1 ?

Returns v0 1 4 head next next head next next next

list v0 v1 null list v0 v1 v2 ?

Returns v0 + v1 ...

Figure 2.7: Symbolic execution tree for the list scanning program (only well-formed lists).

16 Chapter 2. Introduction CHAPTER 3

Getting started with JBSE

In this section you will start to get your feet wet with JBSE. We will show its basic usage by analyzing a number of simple Java programs, including the ones presented in the Introduction. But first, let us discuss how you may obtain it.

3.1 Obtaining and installing JBSE

JBSE is an open-source project distributed according to the terms of the GNU General Public License version 3.0. Its source code is available at its Github repository https://github.com/pietrobraione/jbse. Right now there are not formal releases of JBSE, that must be installed by building it from source. Follow the instructions in the README.md file at the repository for instructions on how to build and deploy JBSE.

3.2 A basic example

JBSE is, first and foremost, a Java library. Compiling the JBSE source code will yield a jar file that you need to link against a main Java program using it. No class in the JBSE jar file contains a public static void main(String[]) method that you can invoke from the command line. To start using JBSE we advise to install the latest Eclipse IDE, and import the JBSE project in an empty workspace by following the instructions in the README.md file at the JBSE repository. Then, create a new project with name example in the same Eclipse workspace where JBSE resides, and set its project dependencies to include the JBSE project. Add to the example project this class: package smalldemos.ifx; import static jbse.meta.Analysis.ass3rt; public class IfExample { boolean a, b; public void m(int x) { if (x>0){ (continues on next page)

17 jbse-manual Documentation, Release latest

(continued from previous page) a= true; } else { a= false; } if (x>0){ b= true; } else { b= false; } ass3rt(a== b); } }

This is the “double-if” example presented in the Introduction, the only difference being that, instead of using the standard Java assert statement, we use a library method jbse.meta.Analysis.ass3rt that is more JBSE- friendly. The easiest and most direct way to symbolically execute a Java method and obtain some feedback about the execution is to use the jbse.apps.run.Run class. This class demonstrates how one can build an application based on the JBSE library: Specifically, Run performs an exhaustive, possibly bounded symbolic execution of a prescribed Java method, by assigning symbolic values to all its parameters, including this if the method is not static. During the symbolic execution, it prints to the console information about the progress of the execution. The Run application is highly configurable in the format and degree of detail of what it shows. On one extreme, it can be instructed to dump the full JVM state after the execution of each single bytecode. On the other, it is possible to make it emit just some end-of-execution statistics, and nothing else. Currently the Run application cannot be invoked from the command line: We need to write a main method that configures and creates an object of class jbse.apps.run.Run, and then uses it to perform symbolic execution. Due to the high number of configuration parameters available for Run objects, configurations are encapsulated in suitable objects of class jbse.apps.run.RunParameters. We therefore need to build a RunParameters object, use it to specify a set of symbolic execution and output parameters, and finally pass it as an argument to the constructor of a Run object. Finally, we start symbolic execution by invoking Run.run(). Do this by creating this Java class in the example project: package smalldemos.ifx; import jbse.apps.run.RunParameters; import jbse.apps.run.Run; ... public class RunIf { public static void main(String[] args) { final RunParameters p= new RunParameters(); set(p); final Run r= new Run(p); r.run(); }

private static void set(RunParameters p) { ... } }

Which parameters should we set, and how? First, we need to tell JBSE where is the program to execute. Being JBSE a Java Virtual Machine, we do this by specifying the classpath (more precisely, the so-called user classpath) where JBSE will look for the binaries (class-

18 Chapter 3. Getting started with JBSE jbse-manual Documentation, Release latest

files) of the program to be executed. The user classpath shall contain the path pointing to the target smalldemos. ifx.IfExample class, and is set with the addUserClasspath method of the RunParameters class. Note that the jbse.meta.Analysis class must also be in the classpath, since it contains the ass3rt method in- voked by m. However, the path to it must be set by means of a different method in the RunParameters class, the setJBSELibPath method. As for the actual paths, Eclipse emits the compiled smalldemos.ifx.IfExample class to a hidden bin directory in the example project, while Gradle puts the compiled JBSE classfiles in a build/ classes subdirectoy, and the JBSE JAR files in a build/libs subdirectory. Given that the implicit execution directory will be the home of the example project, and supposing that the jbse git repository local clone is, say, at /home/me/git/jbse, the required paths should be as follows:

... public class RunIf { ... private static void set(RunParameters p) { p.addUserClasspath("./bin"); p.setJBSELibPath("/home/me/git/jbse/build/libs/jbse-0.10.0-SNAPSHOT.jar"); ... } }

The addUserClasspath method is varargs, so you can list as many paths as you want. Next, we must specify which method JBSE must run (remember, JBSE can symbolically execute any method). We do it by setting the method’s signature:

... public class RunIf { ... private static void set(RunParameters p) { p.addUserClasspath("./bin"); p.setJBSELibPath("/home/me/git/jbse/build/libs/jbse-0.10.0-SNAPSHOT.jar"); p.setMethodSignature("smalldemos/ifx/IfExample", "(I)V", "m"); ... } }

A method signature has three parts: The binary name of the class that contains the method ("smalldemos/ifx/ IfExample"), a method descriptor specifying the types of the method’s parameters and of its return value ("(I)V"), and finally the name of the method ("m"). You can use the javap command, included with every JDK setup, to obtain the internal format signatures of methods: javap -s my.Class prints the list of all the methods in my.Class with their signatures in internal format. Another essential parameter is the specification of which decision procedure JBSE must interface with in order to detect unfeasible paths. Without a decision procedure JBSE conservatively assumes that all paths are feasible. This is undesirable, since it would allow to conclude, for instance, that every assertion you put in your code can be violated. Supposing that you want to use Z3 and that the Z3 binary is located, e.g., at /opt/local/bin/z3, you need to configure the RunParameters object as follows:

... import static jbse.apps.run.RunParameters.DecisionProcedureType.Z3; public class RunIf { ... private static void set(RunParameters p) { p.addUserClasspath("./bin"); p.setJBSELibPath("/home/me/git/jbse/build/libs/jbse-0.10.0-SNAPSHOT.jar"); p.setMethodSignature("smalldemos/ifx/IfExample", "(I)V", "m"); p.setDecisionProcedureType(Z3); (continues on next page)

3.2. A basic example 19 jbse-manual Documentation, Release latest

(continued from previous page) p.setExternalDecisionProcedurePath("/opt/local/bin/z3"); ... } }

Now that we have set the parameters that allow the target code to be symbolically executed, we turn our attention to the parameters that customize the output. First, we ask JBSE to put a copy of the output in a dump file for of- fline inspection. At the purpose, create an out folder in the example project and add the following line to the set(RunParameters) method:

... public class RunIf { ... private static void set(RunParameters p) { p.addUserClasspath("./bin"); p.setJBSELibPath("/home/me/git/jbse/build/libs/jbse-0.10.0-SNAPSHOT.jar"); p.setMethodSignature("smalldemos/ifx/IfExample", "(I)V", "m"); p.setDecisionProcedureType(Z3); p.setExternalDecisionProcedurePath("/opt/local/bin/z3"); p.setOutputFileName("./out/runIf_z3.txt"); ... } }

Next, we specify what of the symbolic execution Run shall display on the output. By default Run dumps the whole JVM symbolic state (path condition, stack, heap, static memory) after the execution of every single bytecode, which is a bit extreme, and slows down the execution considerably. We will therefore instruct the Run object to omit the unreachable objects and the standard library objects when printing a JVM symbolic state, and to omit some (scarecly interesting) path condition clauses. We will further reduce the amount of produced output by choosing to print only the leaves of the symbolic execution tree, i.e., the last states of all the execution paths.

... import static jbse.apps.run.RunParameters.StateFormatMode.TEXT; import static jbse.apps.run.RunParameters.StepShowMode.LEAVES; public class RunIf { ... private static void set(RunParameters p) { p.addUserClasspath("./bin"); p.setJBSELibPath("/home/me/git/jbse/build/libs/jbse-0.10.0-SNAPSHOT.jar"); p.setMethodSignature("smalldemos/ifx/IfExample", "(I)V", "m"); p.setDecisionProcedureType(Z3); p.setExternalDecisionProcedurePath("/opt/local/bin/z3"); p.setOutputFileName("./out/runIf_z3.txt"); p.setStateFormatMode(TEXT); p.setStepShowMode(LEAVES); } }

Finally, run the RunIf class. The out/runIf_z3.txt file will contain something like this:

This is the Java Bytecode Symbolic Executor's Run Tool (JBSE v.0.10.0-SNAPSHOT). Connecting to Z3 at /opt/local/bin/z3. Starting symbolic execution of method smalldemos/ifx/IfExample:(I)V:m at Tue Dec 01

˓→18:30:22 CET 2020. .1.1[22] (continues on next page)

20 Chapter 3. Getting started with JBSE jbse-manual Documentation, Release latest

(continued from previous page) Leaf state Path condition: {R0} == Object[4471] (fresh) && ({V0}) > (0) where: {R0} == {ROOT}:this && {V0} == {ROOT}:x Static store: {

} Heap: { Object[4471]: { Origin: {ROOT}:this Class: (2,smalldemos/ifx/IfExample) Field[0]: Name: a, Type: Z, Value: true (type: Z) Field[1]: Name: b, Type: Z, Value: true (type: Z) }

}

.1.1[22] path is safe. .1.2[20] Leaf state Path condition: {R0} == Object[4471] (fresh) && ({V0}) <= (0) where: {R0} == {ROOT}:this && {V0} == {ROOT}:x Static store: {

} Heap: { Object[4471]: { Origin: {ROOT}:this Class: (2,smalldemos/ifx/IfExample) Field[0]: Name: a, Type: Z, Value: false (type: Z) Field[1]: Name: b, Type: Z, Value: false (type: Z) }

}

.1.2[20] path is safe. Symbolic execution finished at Tue Dec 01 18:30:24 CET 2020. Analyzed states: 637453, Analyzed pre-initial states: 637409, Analyzed paths: 2,

˓→Safe: 2, Unsafe: 0, Out of scope: 0, Violating assumptions: 0, Unmanageable: 0. Elapsed time: 1 sec 843 msec, Elapsed pre-initial phase time: 1 sec 806 msec, Average

˓→speed: 345877 states/sec, Average post-initial phase speed: 1189 states/sec,

˓→Elapsed time in decision procedure: 9 msec (0,49% of total).

Let’s analyze the output. • {V0}, {V1}, {V2}. . . (primitives) and {R0}, {R1}, {R2}. . . (references) are the symbolic initial values of the program inputs. To track down which initial value a symbol correspond to (what we call the symbol’s origin) you may read the Path condition: section of a final symbolic state. After the where: row you will find a sequence of equations that associate some of the symbols with their origins. The list is incomplete, but it contains the associations we care of. For instance you can see that {R0} == {ROOT}:this; {ROOT}

3.2. A basic example 21 jbse-manual Documentation, Release latest

is a moniker for the root frame, i.e., the invocation frame of the initial method m, and this indicates the “this” parameter. Overall, the equation means that the origin of {R0} is the instance of the IfExample class to which the m message is sent at the start of the symbolic execution. Similarly, {V0} == {ROOT}:x indicates that {V0} is the value of the x parameter of the initial m(x) invocation. • .1.1[22] and .1.2[20] are the identifiers of the leaf symbolic states, i.e., the states that return from the initial m invocation to the (unknown) caller. The state identifiers follow the structure of the symbolic execution. The initial state has always identifier .1[0], and its immediate successors have identifiers .1[1], .1[2], etc. until JBSE must take some decision involving symbolic values. In this example, JBSE takes the first decision when it hits the first if (x > 0) statement. Since at that point of the execution x has still value {V0} and JBSE has not yet made any assumption on the possible value of {V0}, two outcomes are possible: Either {V0} > 0, and the execution takes the “then” branch, or {V0} <= 0, and the execution takes the “else” branch. JBSE therefore produces two successor states, gives them the identifiers .1.1[0] and .1.2[0], and adds the assumptions {V0} > 0 and {V0} <= 0 to their respective path conditions. When the execution along the . 1.1 path hits the second if statement, JBSE detects that the execution cannot take the “else” branch (otherwise, the path condition would be {V0} > 0 && {V0} <= 0 ..., that has no solutions for any value of {V0}) and does not create another branch. Similarly for the .1.2 path. • The two leaf states can be used to extract summaries for m. A summary is extracted from the path condition and the values of the variables and objects fields at a leaf state. In our example from the .1.1[22] leaf we can extrapolate that {V0} > 0 => {R0}.a == true && {R0}.b == true, and from .1.2[20] that {V0} <= 0 => {R0}.a == false && {R0}.b == false. This proves that for every possible value of the x parameter the execution of m always satisfies the assertion. • Beware! The dump shows the final, not the initial state of the symbolic execution. For example, while Object[0] is the initial this object, as stated by the path condition clause {R0} == Object[0], the values of its fields displayed at states .1.1[22] and .1.2[20] are the final, not the initial, ones. The initial, symbolic values of these fields are lost because the code under analysis never uses them. If you want to display all the details of the initial state, suitable step show modes exist. • The last rows report some statistics. Here we are interested in the total number of paths (two paths, as discussed above), the number of safe paths, i.e., the paths that pass all the assertions (also two as expected), and the number of unsafe paths, that falsify some assertion (zero as expected). The dump also reports the total number of paths that violate an assumption (zero in this case, see the next subsection for a discussion of assumptions), and the total number of unmanageable paths. These are the paths that JBSE is not able to execute up to their leaves because of some limitation of JBSE itself.

3.3 Assertions and assumptions

An area where JBSE stands apart from all the other symbolic executors is its support to specifying custom assumptions on the symbolic inputs. Assumptions are indispensable to express preconditions over the input parameters of a method, invariants of data structures, and in general to constrain the range of the possible values of the symbolic inputs, either to exclude meaningless values, or just to reduce the scope of the analysis. Let us reconsider our running example and suppose that the method m has a precondition stating that it cannot be invoked with a value for x that is less than zero. Stating that a method has a precondition usually implies that we are not interested in analyzing how the method behaves when its inputs violate the precondition. In other words, we want to assume that the inputs always satisfy the precondition, and analyze the behaviour of m under this assumption. The easiest way to introduce an assumption on the possible values of the x input is by injecting at the entry point of m a call to the jbse.meta.Analysis.assume method as follows:

... import static jbse.meta.Analysis.assume; public class IfExample { (continues on next page)

22 Chapter 3. Getting started with JBSE jbse-manual Documentation, Release latest

(continued from previous page) boolean a, b; public void m(int x) { assume(x>0); if (x>0){ ... } }

When JBSE hits a jbse.meta.Analysis.assume method invocation it evaluates its argument, then it either continues the execution of the trace (if true) or discards it and backtracks to the next trace (if false). With the above changes the last rows of the dump will be as follows:

... .1.2[4] path violates an assumption. Symbolic execution finished at Tue Dec 01 18:48:54 CET 2020. Analyzed states: 637445, Analyzed pre-initial states: 637409, Analyzed paths:2,

˓→Safe:1, Unsafe:0, Out of scope:0, Violating assumptions:1, Unmanageable: 0. Elapsed time:1 sec 794 msec, Elapsed pre-initial phase time:1 sec 756 msec, Average

˓→speed: 355320 states/sec, Average post-initial phase speed: 947 states/sec, Elapsed

˓→time in decision procedure:6 msec (0,33% of total).

The total number of traces is still two, but now JBSE reports that one of the traces violates an assumption. Putting the assume invocation at the entry of m ensures that the useless traces are discarded as soon as possible. When one needs to constrain symbolic numeric inputs, using jbse.meta.Analysis.assume can be enough. When one needs to enforce assumptions on symbolic reference inputs, using jbse.meta.Analysis.assume is in most cases unsuitable. This because jbse.meta.Analysis.assume evaluates its argument when it is in- voked, which is OK for symbolic numeric inputs, but not in general for symbolic references since JBSE resolves a reference as soon as it is used (more precisely, as soon as it is loaded on the operand stack). Let us consider, for example, the linked list example of the Introduction and let’s say we want to assume that the value stored in the fourth list item is different from 0. If we follow the previous pattern and inject at the method entry point the statement assume(list.header.next.next.next.value != 0), JBSE will first access {ROOT}:list, then {ROOT}:list.header, then {ROOT}:list.header.next, then {ROOT}:list.header.next. next and then {ROOT}:list.header.next.next.next. All these references are symbolic, and JBSE will resolve all of them, causing an early explosion of the total number of paths to be analyzed just to prune one of them. A possible way to avoid the issue is to manually move the assume right after the points where {ROOT}:list. header.next.next.next.value is accessed for the first time, a procedure that is in general complex and error-prone. It would be much better if the symbolic executor could automatically detect the first access, should it ever happen, and prune the violating trace on-the-fly. Another issue is that often we want to express assumptions over arbi- trarily big sets of symbolic references. If, for example, we would like to assume that all the list items are nonzero, we should have a way to constrain all the symbolic values {ROOT}:list.header.value, {ROOT}:list. header.next.value, {ROOT}:list.header.next.next.value. . . A similar problem arises if we want to specify the structural invariant stating that list shall have no loops. Expressing this kind of constraints by using Analysis.assume is impossible in many cases, and impractical in almost all the others. For this reason JBSE allows to specify rich classes of assumptions on the shape of the input objects by means of a number of techniques: Conservative repOk methods, LICS rules, and triggers. • Conservative repOk methods validate the shape of a data structure by traversing it without resolving the unre- solved symbolic references in it. Every time JBSE resolves a symbolic reference, it executes the conservative repOk method on all the objects that have one. If the execution of a conservative repOk on an object detects that the object violates its structural invariant, the trace is rejected. • LICS rules restrain the possible resolutions of (sets of) symbolic references specified by means of regular expres- sions. For instance, a rule {ROOT}:list/header(/next)* aliases nothing forbids resolution by alias of all the symbolic references with origins {ROOT}:list.header, {ROOT}:list.header.next, {ROOT}:list.header.next.next. . . thus excluding the presence of loops between list nodes.

3.3. Assertions and assumptions 23 jbse-manual Documentation, Release latest

• Triggers are user-defined instrumentation methods that JBSE executes right after the resolution of a symbolic reference matching a prescribed regular expression. Triggers can be used to update ghost variables, e.g., to update an object counter as a fresh objects is assumed by the expansion of symbolic references. They can also be used to automatically detect when a symbolic reference is first used and, e.g., invoke jbse.meta. Analysis.assume without having to manually detect the points in the code where the reference is first used.

24 Chapter 3. Getting started with JBSE CHAPTER 4

Using JBSE

This section will describe in details how to use JBSE to symbolically execute Java bytecode programs and analyze them by means of assertions and assumptions.

4.1 The symbolic execution classes

The class that is responsible of performing symbolic execution of a program is jbse.jvm.Engine. It implements a symbolic JVM that can execute an arbitrary Java method in a step-by-step fashion, i.e., one bytecode at time. In the case a symbolic state has more than one successor, the Engine selects one of them as the next state, and stores the others in an internal data structure. At the end of a path, the Engine can be backtracked to one of the previously stored states. The Engine class offers a low-level API, that allows a fine-grain control of symbolic execution, but is also quite tedious to use correctly. For this reason, JBSE offers the jbse.jvm.Runner class. A Runner object drives an Engine to perform an exhaustive symbolic execution of a method. To this end, the Runner repeatedly steps and backtracks the Engine until it visits all (or a suitable subset of) the states in the symbolic execution tree of the method under execution. Users can customize the behavior of a Runner object by registering a suitable listener object that provides a set of callback methods. The Runner will invoke the appropriate callback method upon occurrence of an event, e.g., before or after a bytecode step, at the entry of a method, upon backtrack, etc. By injecting behavior through callbacks it is possible to customize the behavior of a Runner object and create ap- plications. The jbse.apps.run.Run class is an example of an application that is done this way. As discussed in the introduction, this application symbolically executes a method and prints to the console information about the execution as, e.g., the traversed states, their path conditions, the shape of the symbolic execution tree, the violations of the assertions, etc. We will introduce the capabilities of the Run application in a later section, while this section focuses on the use of the Engine and Runner classes.

4.2 Creating a symbolic executor

The creation of an Engine is performed by a suitable Builder object of class jbse.jvm.EngineBuilder. This object exposes a single method EngineBuilder.build(EngineParameters), that accepts as argument

25 jbse-manual Documentation, Release latest

an object of class jbse.jvm.EngineParameters and returns a new Engine. The EngineParameters object gathers the (many) configuration parameters that must be provided to create an Engine. Similarly, a Runner must be created via a jbse.jvm.RunnerBuilder object by invoking its RunnerBuilder. build(RunnerParameters) method. The jbse.jvm.RunnerParameters class offers a superset of the methods of the EngineParameter class, i.e., all the methods to configure the parameters necessary to create the Engine underlying the Runner, plus the methods to configure the parameters specific to the Runner (e.g., the listener). We now describe how an EngineParameters or a RunnerParameters object is set to suitably con- figure an Engine or a Runner.

4.2.1 Setting the paths

An Engine (and similarly a Runner) must know the paths where it can find the classes of the application that it must symbolically execute, and of the libraries the application invokes, including the standard Java library. Mirror- ing what Oracle’s as Hotspot JVM does, JBSE distinguishes the bootstrap classpath, where the core JDK classes reside, the extensions classpath, home of the classes that must be loaded with the extensions mechanism, and the user classpath, pointing to the classes that belong to the application to be run (the latter is what we usually mean when we talk about the “classpath of an application”). The bootstrap and the extensions classpath are, by default, auto- matically configured by the EngineParameters and RunnerParameters objects by detecting the bootstrap and the extensions classpath of the JDK on which JBSE executes. This means that in most cases you do not need to configure them manually. Should you ever need to override this default, you can set the bootstrap classpath by specifying the home directory of a Java 8 JDK setup via the EngineParameters.setJavaHome(String) or EngineParameters.setJavaHome(Path) methods. To modify the extensions classpath you may add paths to it by invoking EngineParameters.addExtClasspath(String...) or EngineParameters. addExtClasspath(Path...). If you want to completely override it, you need to first clear it from its de- fault value by invoking EngineParameters.clearExtClasspath(). Finally, to add paths to the user class- path use either the EngineParameters.addUserClasspath(String...) or the EngineParameters. addExtClasspath(Path...) method (by default, the user classpath is empty). The RunnerParameter class offers identical methods.

4.2.2 Setting the target method

To specify which method the Engine must symbolically execute, invoke EngineParameter. setMethodSignature(String, String, String) with the signature of the method. The signature of the method is composed by three strings: The name of class the method belongs to, the method descriptor (a string listing the types of the method parameters and of the return value), and the method’s name. The name of the class must be specified in binary name format: It must be fully qualified with the name of the package, and possibly of the other classes, that contain it, the separator dots must be replaced by slash (/) characters in case the container on the left of the dot is a package, and by a dollar sign ($) in case the container is a class. For example the binary name of the class java.util.LinkedList is java/util/LinkedList, and the binary name of the nested class java.util.LinkedList.Node is java/util/LinkedList$Node. The descriptor is necessary to distinguish overloaded methods. It is composed by a list of parameters types, enclosed in parentheses, followed by the type of the return value, or V if the method is declared void. The types of the parameters must be encoded as type descriptors according to Table 4.3-A of the Java Virtual Machine Specification.

26 Chapter 4. Using JBSE jbse-manual Documentation, Release latest

Table 4.1: Type descriptors for Java types Java type Type descriptor byte B char double D float F int I long J ClassName L ClassBinaryName ; JavaType[] [TypeDescriptor

Table 4.1 reports the correspondence between Java types and types descriptors for reference. As an example, if we want to symbolically execute a method double m(int, Object[][]) of class foo.Baz, we must configure the EngineParameters object by invoking setMethodSignature("foo/Baz", "(I[[Ljava/lang/ Object;)D", "m").

4.2.3 Setting the calculator

An Engine needs to be able to create, from time to time, new symbolic expression. When this happens, some basic manipulations are usually performed on the created symbolic expression at the purpose of simplify- ing it. For example, it is possible to configure an Engine so it, whenever it must add the symbol a to the number 0, simplifies the resulting expression a + 0 to a. To this end, an Engine depends on an object ex- tending the abstract class jbse.val.Calculator, that it uses to create and manipulate all the symbolic ex- pressions, and that performs all the simplifications on-the-fly. It is therefore necessary to inject the dependency to a suitable subclass of Calculator through the method setCalculator(Calculator) of the classes EngineParameters and RunnerParameters. Currently, the only concrete subclass of Calculator is the class jbse.rewr.CalculatorRewriting.A CalculatorRewriting that applies a set of rewriting rules to simplify all the symbolic expressions it produces. It is possible to plug the rewriting rules, implemented as subclasses of jbse.rewr.RewriterCalculatorRewriting, by invoking the CalculatorRewriting. addRewriter(RewriterCalculatorRewriting) method. The package jbse.rewr contains a collection of rewriting rules performing some useful simplifications. The most important ones, that are essentially compulsory, are: • jbse.rewr.RewriterExpressionOrConversionOnSimplex: necessary to simplify all the expres- sions whose operands are numeric, e.g., to simplify 3 + 2 to 5; • jbse.rewr.RewriterFunctionApplicationOnSimplex: similar to the previous, where the opera- tor is a (symbolic) function application as sin, cos, max, min... • jbse.rewr.RewriterZeroUnit: simplifies some operations with zero or one that have trivial result: e.g., simplifies a * 0 to 0, and 1 * b to b; • jbse.rewr.RewriterNegationElimination: eliminates double negations simplifying, e.g., - (- a) to a. The other rewriters in the package jbse.rewr can be used to simplify nonlinear expression with trigonometric operators and square roots. Historically they have been used to check properties involving distances in the Cartesian plane and polar-to-cartesian and their inverse coordinates conversions. A more mundane setup of JBSE would be as follows:

import jbse.jvm.EngineParameters; import jbse.rewr.CalculatorRewriting; import jbse.rewr.RewriterExpressionOrConversionOnSimplex; import jbse.rewr.RewriterFunctionApplicationOnSimplex; (continues on next page)

4.2. Creating a symbolic executor 27 jbse-manual Documentation, Release latest

(continued from previous page) import jbse.rewr.RewriterNegationElimination; import jbse.rewr.RewriterZeroUnit; ...

EngineParameters p= new EngineParameters(); ... CalculatorRewriting calc= new CalculatorRewriting(); calc.addRewriter(new RewriterExpressionOrConversionOnSimplex()); calc.addRewriter(new RewriterFunctionApplicationOnSimplex()); calc.addRewriter(new RewriterZeroUnit()); calc.addRewriter(new RewriterNegationElimination()); p.setCalculator(calc);

Unfortunately the order the rewriters are added to the calculator matters. Moreover, some rewriters depend on the presence of other rewriters. Refer the Javadoc of the rewriters classes for more information.

4.2.4 Setting the decision procedures

To prune the unfeasible branches of the symbolic execution tree an Engine must decide whether a sym- bolic expression of boolean type is satisfiable. To this end, the Engine uses an object with class jbse. dec.DecisionProcedureAlgorithms. Upon configuration it is necessary to create this object and inject the dependency by invoking the setDecisionProcedure(DecisionProcedureAlgorithms) method exposed by the EngineParameters and RunnerParameters classes. This because a DecisionProcedureAlgorithms object can be configured according to the capabilities it needs to provide. For the sake of simplicity, a JBSE decision procedure object recognizes a proper subset of the possible boolean clauses produced by an Engine. As a consequence, no single decision procedure object is able to decide the satisfiability of all the clauses generated by all the possible symbolic executions. For this reason, decision procedure objects must be organized in a Chain of Responsibility: Whenever a decision procedure is unable to decide the satisfiability of a clause, it delegates the task to the next decision procedure in the chain. A JBSE decision procedure class extends the abstract class jbse.dec.DecisionProcedure, and pro- vides a set of methods to check the satisfiability of all the possible boolean clauses that JBSE may produce. The class DecisionProcedureAlgorithms, on the other hand, is a Decorator adding to an arbitrary DecisionProcedure a set of methods that, for each bytecode instruction, perform the correct sequences of satisfi- ability queries necessary to implement the correct bytecode semantics. The necessary steps to configure the Engine are therefore: • Creating a set of DecisionProcedure objects to decide a sufficient subset of the clauses that might appear in the path conditions of the program under analysis, and arranging them in a Chain of Responsibility; • Wrapping the topmost DecisionProcedure in the Chain of Responsibility in a DecisionProcedureAlgorithms object; • Setting the EngineParameters or RunnerParameters object via the setDecisionProcedure(DecisionProcedureAlgorithms) method. Most decision procedures are declared in the package jbse.dec. They are typically configured through their con- structor, that is also used to set the next decision procedure in the Chain of Responsibility. The most important classes are: • jbse.dec.DecisionProcedureClassInit: This decision procedure recognizes only the clauses that predicate on the initialization status of a class (i.e., whether a class must be assumed to be loaded before the start of symbolic execution or not). It is indispensable, and therefore shall always be present in a Chain of Responsibility of decision procedures.

28 Chapter 4. Using JBSE jbse-manual Documentation, Release latest

• jbse.dec.DecisionProcedureSMTLIB2_AUFNIRA: This decision procedure interacts via console with any SMT solver that is compliant with the SMTLIB 2 standard and that supports the AUFNIRA logic. Currently the SMT solver is used to decide only the numeric clauses. The only SMT solvers we are sure that work with JBSE are Z3 and CVC4. • jbse.dec.DecisionProcedureLICS: This decision procedure implements the LICS rule language that allows to restrain the possible resolutions of (sets of) symbolic references. It therefore recognizes the clauses that predicate on symbolic references. • jbse.dec.DecisionProcedureAlwSat: This decision procedure is a dummy decision procedure that recognizes all the clauses and always answers that a clause is satisfiable. It must be used as the last decision procedure in the Chain of Responsibility. These classes typically yield, when combined, a sufficiently powerful and flexible solver. A possible configuration example code follows:

import jbse.dec.DecisionProcedureAlwSat; import jbse.dec.DecisionProcedureClassInit; import jbse.dec.DecisionProcedureLICS; import jbse.dec.DecisionProcedureSMTLIB2_AUFNIRA; import jbse.jvm.EngineParameters; import jbse.rewr.CalculatorRewriting; import jbse.rules.ClassInitRulesRepo; import jbse.rules.LICSRulesRepo;

EngineParameters p= new EngineParameters(); ... CalculatorRewriting calc= new CalculatorRewriting(); ... ArrayList z3CommandLine= new ArrayList<>(); z3CommandLine.add("/opt/local/bin/z3"); z3CommandLine.add("-smt2"); z3CommandLine.add("-in"); z3CommandLine.add("-t:100"); LICSRulesRepo licsRules= new LICSRulesRepo(); ClassInitRulesRepo initRules= new ClassInitRulesRepo(); ... DecisionProcedureAlwSat decAlwSat= new DecisionProcedureAlwSat(calc); DecisionProcedureSMTLIB2_AUFNIRA decSMT= new DecisionProcedureSMTLIB2_

˓→AUFNIRA(decAlwSat, z3CommandLine); DecisionProcedureLICS decLICS= new DecisionProcedureLICS(decSMT, licsRules) DecisionProcedureClassInit decInit= new DecisionProcedureClassInit(decLICS,

˓→initRules); DecisionProcedureAlgorithms decAlgo= new DecisionProcedureAlgorithms(decInit); p.setDecisionProcedure(decAlgo);

We add some final remarks: • The DecisionProcedureAlwSat constructor accepts as parameter a Calculator. It is good practice (although not strictly necessary) to pass the same calculator object passed to the EngineParameters or RunnerParameters object via the setCalculator method. •A DecisionProcedureSMTLIB2_AUFNIRA must receive as a constructor parameter the command line that must be used to invoke the SMT solver. This is of course platform- and environment-dependent. In the above example we proposed a possible command line for invoking Z3 in a UNIX-like environment. • Differently from other decision procedures, where it is possible to interactively send and retract assertions, the DecisionProcedureLICS and DecisionProcedureClassInit objects must be configured at con- struction time by passing suitable objects (with class jbse.rules.LICSRulesRepo and jbse.rules.

4.2. Creating a symbolic executor 29 jbse-manual Documentation, Release latest

ClassInitRulesRepo, respectively) that gather the assertions about reference resolution and class initial- ization, respectively. We will discuss these constraints in a later section.

30 Chapter 4. Using JBSE