DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2019

Evaluating the Ratio of Alive Code in Java Third-Party Libraries

A Comparison between a Static and a Dynamic Approach

ANDREAS BROMMUND

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Evaluating the Ratio of Alive Code in Java Third-Party Libraries A Comparison between a Static and a Dynamic Approach

ANDREAS BROMMUND

Master in Computer Science Date: July 11, 2019 Supervisor: Pontus Johnson Examiner: Elena Troubitsyna School of Electrical Engineering and Computer Science Host company: Omegapoint Stockholm AB Swedish title: Analysera andelen använd kod i tredjepartsbibliotek – en jämförelse mellan ett statiskt och ett dynamiskt tillvägagångssätt

iii

Abstract

Today’s development heavily relies on the use of third-party libraries. However, some libraries have a rich set of functionalities where only a few of them are used. This leads to an unnecessary complex codebase that needs maintenance. This thesis compares two methods used to calculate the ratio of used code in the third-party libraries. The first method uses the already existing tool JTombstone, which analyses the code statically. This static approach always examines the whole program. However, it overestimates the result. The sec- ond method uses a dynamic approach. This method always underestimates the result, because, only the part of the program which is executed will be examined. The dynamic code analyser tool modifies all classes which contains in the third-party library. At the beginning of every method a print statement is added, which prints the signature of the current method. In this way, the list of all executed methods is generated. The findings of the thesis are that the first approach always yields higher value and the difference between the two methods decreases while the code coverage increases. The thesis cannot state which method is the best; however, a good solution is to combine both methods to generate an interval which bound the correct value. iv

Sammanfattning

Dagens mjukvaruutveckling förlitar sig mycket på användningen av tredje- partsbibliotek. Emellertid innehåller många av biblioteken mycket funktiona- litet men bara en liten del av dem används. Det här skapar onödigt komplex mjukvara som måste underhållas. I den här uppsatsen jämförs två olika metoder som används för att beräkna an- delen använd kod i tredjepartsbibliotek. Den första metoden använder JTomb- stone, det här verktyget analyserar koden statiskt. Eftersom den analyserar ko- den statiskt kommer hela projektet alltid bli analyserat, däremot kommer verk- tyget beräkna ett för högt värde. Den andra metoden bygger istället på en dyna- misk utvärdering av koden. När man använder ett dynamiskt tillvägagångsätt så utvärderas bara den delen av koden som kördes, det här leder till att pro- grammet kommer att generera ett för lågt resultat. Verktyget som analyserar koden dynamiskt modifierar alla klasser som tillhör tredjepartsbiblioteket. I början av varje metod lägger verktyget till en utskrift, som skriver ut metodsignaturen för den specifika metoden. På så sätt erhålls en lista av metoderna som har blivit anropade. Uppsatsen kom fram till att den första metoden alltid genererar ett större värde. Resultaten visar också att skillnaden mellan de två metoderna minskar när testerna testar en större del av koden. Med de resultat som genererades går det inte att avgöra vilken av de två metoder som är bäst. En bra lösning är att kombinera metoderna och med hjälp av de två resultaten skapa en övre och undre gräns för det korrekta värdet. Contents

1 Introduction 1 1.1 Background ...... 1 1.2 Research Question ...... 2 1.3 Hypothesis ...... 2 1.4 Delimitations ...... 2 1.5 Contribution ...... 2

2 Theory 3 2.1 Third-Party Libraries ...... 3 2.2 Dead Code ...... 4 2.3 Dynamic and Static Dispatch ...... 4 2.4 Call Graph ...... 5 2.4.1 Sound and Precise Call Graph ...... 6 2.5 Code Coverage ...... 6 2.6 Static Code Analysis ...... 7 2.6.1 Class Hierarchy Analysis ...... 8 2.6.2 Rapid Type Analysis ...... 11 2.7 Dynamic Analysis ...... 13

3 Related Work 15 3.1 Call Graph Construction for Java Libraries ...... 15 3.2 DUM-Tool ...... 16 3.3 Dead Code Elimination for Web Systems Written in PHP: Lessons Learned from an Industry Case ...... 17

4 Methods 18 4.1 Test Data ...... 18 4.2 Dead Code Granularity ...... 19 4.3 Tools Used ...... 19

v vi CONTENTS

4.3.1 Java ...... 19 4.3.2 Javap ...... 20 4.3.3 JTombstone ...... 20 4.3.4 Java Agent ...... 21 4.4 Experiment Process ...... 21 4.4.1 Initialisation ...... 22 4.4.2 Dynamic Analysis ...... 23 4.4.3 Static Analysis ...... 23 4.4.4 Calculat Code Coverage ...... 24 4.4.5 Calculating and Validating the Result ...... 24

5 Results 25

6 Discussions 29 6.1 Result ...... 29 6.2 Methodology ...... 30 6.3 Sources of Error ...... 31 6.4 Future Work ...... 32 6.4.1 Improve the Code Coverage ...... 32 6.4.2 Improve the Static Code Analysis ...... 32 6.4.3 Analyse the Functionality of the Code ...... 33 6.5 Ethical Considerations ...... 33

7 Conclusions 34

Bibliography 35 Glossary

API Application Programming Interface. 3, 15, 19, 21

CHA Class Hierarchy Analysis. 8–12, 20 CPA Closed-package assumption. 15

HTTP Hypertext Transfer Protocol. 18

JVM Java Virtual Machine. 21

OPA Open package assumption. 15

RTA Rapid Type Analysis. 11–13, 32

vii

Chapter 1

Introduction

This chapter has a brief background of the problem statement, the aim of the thesis and the research question.

1.1 Background

Today’s software development heavily relies on the use of third-party code libraries and the reuse of code is a necessary part of modern development. It is an easy way to include functionality and speed up the development process. However, the risk analysis usually is omitted when the decision to include a library is decided. Most of the third-party libraries have a rich set of functions where only a few of them are used in the software. This leads to an increased codebase with a high ratio of dead code. [1] An increased codebase leads to a bigger attack surface, and more resources are needed to maintain the software. To prevent being vulnerable to known vulnerabilities, the maintainer must always be updated about new patches and actively patch the libraries. The burden of patching increases with the number of libraries and when the size of the codebase increase. Therefore, it is neces- sary to have control of the dependency tree to reduce unnecessary code. A first step to take control of the dependency problem is to gain knowledge of which dependencies are included. Secondly, it is necessary to find which func- tions are essential for the project and which are not. One way of solving this is to measure the ratio of used and unused code in the dependencies. This thesis

1 2 CHAPTER 1. INTRODUCTION

is focused on the second problem and investigates two different approaches to solve this challenge.

1.2 Research Question

Is a dynamic approach a better technique for measuring the amount of unused code in third-party libraries included in Java open source projects with high code coverage, compared to a static approach?

1.3 Hypothesis

1. The number of methods categorised as alive is higher for the static anal- ysis approach compared to the dynamic approach. 2. The difference between the two methods decreases when the code cov- erage increases.

1.4 Delimitations

Test data only contains a few projects, and all of them must be written in the programming language Java. Only static code analysis and dynamic analysis is investigated.

1.5 Contribution

In this thesis, a new dead code analyser is developed. The analyser measures the number of used methods in the third-party libraries. The application is classified as a dynamic approach and is compared with the already existing tool JTombstone1.

1http://jtombstone.sourceforge.net Chapter 2

Theory

The relevant theory is presented in this chapter.

2.1 Third-Party Libraries

Third-party libraries are a vital part of this thesis and must be defined more precisely. The definition is taken from Heinemann et al. [1]; however, they use the term software reuse. All code not written by the developers themselves are categorised as third-party code. Furthermore, code which is provided by the operating system or the programming language is not included in the defini- tion. Therefore, the Java API is not classified as a third-party library. To distinguish between the third-party code and the self-written code in the upcoming sections, in this chapter and throughout the rest of the thesis, the following six definitions are used; the definitions are taken from Romano et al. [2], • internal code is the code written specifically for the software, • external code is code in the third-party libraries, • internal classes are the classes in the internal code, • external classes are the classes in the external code, • internal methods are the methods in the internal classes and • external methods are the methods in the external classes.

3 4 CHAPTER 2. THEORY

1 public int add(int a, int b){ 2 int c = a; 3 int d = 3; 4 if(false){ 5 System.out.print("Dead"); 6 } 7 return c + b; 8 } Listing 2.1: This code snippet exemplifies dead code. For example, line 5 is dead because of the if statement.

2.2 Dead Code

Dead code is one of the key concepts in this thesis, and it is crucial to under- stand the meaning of the concept. In computer science, dead code or unreach- able code has different meanings in different fields. Software engineers define dead code[3] as the part of the program which never is executed. For exam- ple, the print statement at line 5 in listing 2.1 is considered dead according to software engineers. This is the case because of the if statement, which always will be false, and therefore, the program never reaches this line. In the compiler field the definition is slightly different. In this area, it is more relevant to minimise the numbers of executions. Therefore, all expressions that are never used are considered dead[4]. An example exists in listing 2.1, where line 3 and 5 are both dead. Line 3 is dead because the variable c never is used later in the program, and line 5 is considered dead for the same reason as in the previous definition.

2.3 Dynamic and Static Dispatch

The information in this section is taken from Abadi and Cardelli [5]. Dynamic dispatch and static dispatch are two essential concepts when analysing dead code in the software. Especially in an object-oriented language such as Java, more of that in section 2.4. CHAPTER 2. THEORY 5

1 public void move(boolean isBike){ 2 Vehicle v; 3 if(isBike) 4 v = new Bike(); 5 else 6 v = new Car(); 7 v.drive(); 8 } Listing 2.2: In this example, it is not possible to statically define the type of v.

The notion of dynamic and static dispatch is used to distinguish if the binding between the call site and method is decided during runtime or compile time. Static dispatch means that it is possible to create the binding during compile time. For example, in Java, all calls to static methods and calls to constructors can be analysed using static dispatch. Dynamic dispatch must be used when it is not possible to calculate the binding during compile time; in this case, the binding is set while the program is running. One example is the following, A.run() if A is an object and it is not possible to decide the type of A during compile time dynamic dispatch must be used.

2.4 Call Graph

One of the methods investigated in this thesis uses a static approach. To fully understand how this method works, one must understand the concept of a call graph, and this section covers the fundamental theory in this topic. A call graph is a directed multigraph[6] G := (V,A) where V is the set of vertices and A is a multiset of directed edges. The methods in the program (both internal and external) are represented as vertices. Method calls are rep- resented as edges, and it exists an edge from v1 to v2 if and only if the body of the method v1 invoke method v2. The call graph is, usually, an approximation due to the difficulty to precisely calculate the graph statically[7]. The code example in listing 2.2 illustrates the problem. The program must first calculate the type of v to obtain the knowl- edge about which drive method is invoked in line 7. The type of v can only 6 CHAPTER 2. THEORY

be obtained during run time, and therefore, it is necessary to approximate the graph and add an edge from move to both Bike’s and Car’s drive method. To handle this problem in an object-oriented language, a set-based analysis often is used[7]. For a call e.m() the set Se is computed, and it is an approximation of all valid types of v.

2.4.1 Sound and Precise Call Graph

The terms sound and precise are used to describe the properties of the call graph. Grove et al. [8] have a good explanation of these concepts, and the source is used in this section. A call graph is sound if the graph contains all edges which are possible during runtime. A precise call graph only has the edges which occur during runtime. It is easy to create a sound graph; the difficulty is to make it precise. An example of a sound but not precise call graph is the complete graph. The complete call graph is only precise if all methods invoke all other methods.

2.5 Code Coverage

The two following sections in this chapter describes the relevant theory for the methods used in the thesis. It is necessary to understand the concept code coverage to recognise the advantage and disadvantage of the two methods. Therefore, the topic is explained here. Code coverage is a metric used to evaluate how comprehensive the tests are. It tells how large percentage of the source code is reached when the test suite is executed. The metric only shows how much of the code the tests hit. However, it does not take into account the value of variables and parameters. Therefore, it is not a metric for evaluating the quality of the test, but instead, check if all parts of the program are tested. [9] CHAPTER 2. THEORY 7

1 public void hello(){ 2 World w = new World(); 3 if(w.isGreen()){ 4 w.print("Green"); 5 } 6 } Listing 2.3: The example exemplifies an overestimation when a static code analyser is used. The print method is always marked as used regardless if the World is green.

2.6 Static Code Analysis

Static code analysis is a widely used technique in software development, such as, finding security bugs (Sonarqube1 is one example of a tool for static code analysis), however, in this thesis, it is used to find dead code in the third-party libraries. It examines the software by investigating the code it contains. The advantage of examining the code without running the software is because the analysis is independent of the code coverage, and always examine the entire program[10]. On the other hand, false positives can arise; a method is marked as used even if it is never invoked. An example is shown in listing 2.3; in this case, a naïve static code analyser could mark w.print("Green") as used. However, this is only true if w.isGreen return true and maybe this method always return false. Because of this, the result of static code analysis is always a superset of the correct result. The static code analyser must first create a call graph[7]. Finally, when the call graph is created, it is possible to traverse the graph and find all unreachable vertices from the desired starting points[6].

1https://www.sonarqube.org 8 CHAPTER 2. THEORY

Figure 2.1: The figure illustrates an example of a class hierarchy for a program.

2.6.1 Class Hierarchy Analysis

Class hierarchy analysis[11] (CHA) is an example of an algorithm which is used to create a call graph. It uses a naïve approach when creating the call graph and combines the class hierarchy of the program with the static type of the object. Therefore, the whole program must be available when the analysis is performed as the algorithms depend on the class hierarchy. Figure 2.1 illus- trates the class hierarchy for a program. If the print method in class A executes C.read() CHA will add one edge in the call graph, form A.print to A.read(). This result is calculated by combining the fact that the static type is C and by examining the class hierarchy. C does not implement the read method, and therefore, the implementation of the closest superclass must be used, in this case, A.read(). The second option is that one of C subclasses overrides read. However, none of the subclasses overrides read. A more formal definition of this set is defined in equations 2.1, 2.2 and 2.3. The getMethod (equation 2.1) returns the method that is invoked during exe- cution. It first checks if the method signature exists in class c. If this not is the case, the type hierarchy is traversed from c to the root of the tree until the method is found. In other words, it returns the method that exists in c or the closest superclass. The second part (equation 2.2) is the set of the methods in the subclasses which override the invoked method, s is the signature, and the equal sign compares the signatures of two methods. Lastly (equation 2.3), the union operation is applied to these sets; the result of the operation is the set containing all possible methods. CHAPTER 2. THEORY 9

X = c.getMethod(s) (2.1)

Y = {m|∀m∃c, m = s ∧ m ∈ c ∧ X ∈ c.getSuperClasses} (2.2)

X ∪ Y (2.3)

Algorithm 2.1 is an example implementation of CHA. The algorithm iterates over every method in the project and finds every call site. Every call site is examined once. In the trivial case when the call site is a static dispatch call, an edge is added from the current method to the method that is called. In the second scenario, a more profound analysis is used. An edge must be added from the current method to all possible methods. The set of possible methods use the same approach, as explained above in equation 2.3. The described algorithm produces a sound but not precise call graph. If the algorithm is applied to the example in listing 2.4, the set of valid types for v1 is equal to {Car, Bike, Vehicle}. Therefore, three edges are added to the call graph (one for every class).

1 public void move(){ 2 Vehicle v1 = new Car(); 3 Vehicle v2 = new Bike(); 4 v1.drive(); 5 } Listing 2.4: Vehicle has a method drive, Car and Bike implements Vehicle. 10 CHAPTER 2. THEORY

Data: M: a set of all methods Data: G: call graph 1 forall m ∈ M do 2 forall c ∈ m.getCallSites do 3 if c.isStaticDispatch then 4 to ← c.type.getMethod(c.signature) 5 G.addEdge(m, to) 6 else 7 to ← c.type.getMethod(c.signature) 8 subT ypes ← getSubT ypes(c.type) 9 forall t ∈ subTypes do 10 to = to ∪ t.getMethod(c.signature) 11 end 0 12 forall to ∈ to do 0 13 G.addEdge(m, to ) 14 end 15 end 16 end 17 end 18 return G Algorithm 2.1: Pseudo code for a CHA algorithm. CHAPTER 2. THEORY 11

2.6.2 Rapid Type Analysis

A second algorithm used to create a call graph is rapid type analysis[12] (RTA). This method is a more sophisticated algorithm compared to CHA. The graph estimation is improved compared to CHA. However, RTA requires more computational power. RTA uses the graph created by CHA and the information about instantiated classes to reduce the call graph. Algorithm 2.2 is a fully implemented RTA algorithm. It starts with two graphs (GCHA and GRTA). The first one is a copy of the graph generated by the CHA algorithm. The second graph is the same graph; however, all edges are removed. The algorithm iterates until the list S is empty, and this set is initiated with methods specified as reachable without having a call site, for example, the main method. For every iteration, the first method m from the set is received and removed. Line 3 and 4 in the algorithm generates a set of all initialised types in m and all methods invoking m. In the next step, all call sites in m are iterated. If the call site is a static dispatch call the edge is added, in the same way as in the CHA algorithm. When the call must use dynamic dispatch, the procedure is more complicated. Line 10 and 11 assign m0 with the set 0 of all edges in Gcha starting from m and assign t a set of all types that are supertypes of t. Line 12 is an integral part of the algorithm; it removes edges by ensuring that m0 only contains methods that belong to a class of the correct type. The right part of the intersection contains all methods that belong in the class with a type that is initialised in the method or a class that is a subtype of the class. If this algorithm is executed on the example in listing 2.4, only Car and Bike are instantiated. Therefore, they are the only possible types for v1, and the call graph is reduced. 12 CHAPTER 2. THEORY

Data: GCHA: a call graph generated by CHA Data: GRTA: all nodes from GCHA but no edges Data: S: a list containing all initial methods 1 while S 6= ∅ do 2 m ← S.next 3 t ← m.initialisedT ypes

4 t = t ∪ {e.from.initialisedT ypes | ∀e ∈ GRTA ∧ e.to = m} 5 forall c ∈ m.getCallSites do 6 if c.isStaticDispatch then 7 to ← c.type.getMethod(c.signature)

8 GRTA.addEdge(m, to) 9 else 0 10 m ← {e.to | ∀e ∈ GCHA ∧ e.from = m} 0 11 t ← getSuperT ypes(t) 0 0 00 00 0 12 m ← m ∩ {m |∀m ∈ {t.getMethods ∪ t .getMethods}} 0 0 13 forall to ∈ m do 0 14 GRTA.addEdge(m, to ) 0 15 S ← w ∪ to 16 end 17 end 18 end 19 end

20 return GRTA Algorithm 2.2: Pseudo code for an RTA algorithm. [12] CHAPTER 2. THEORY 13

1 public void main(String[] args){ 2 Car car = getVehicle(); 3 } 4 5 public void travel(Car car){ 6 car.drive(); 7 } 8 9 public Car getVehicle(){ 10 return new Car(); 11 } Listing 2.5: An example of why RTA is not sound.

The algorithm is not sound because it does not include all edges which are possible during runtime. The example in listing 2.5 illustrates the problem. The generated call graph misses the edge between travel and Car.drive(). The problem arises because the Car is not instantiated in the travel method nor in the method that invoked travel. If the type of the return value and types of the parameters are taken into account, it is possible to fulfil the criteria for a sound call graph[7].

2.7 Dynamic Analysis

Dynamic analysis is a second method for testing or analysing a project. In contrast to static code analysis, the dynamic analyser must run the code and is only examining the part of the code that is executed. Because of this, it is crucial for the tests to be exhaustive and have a high code coverage metric. The analysis only gives the correct result if the code coverage reaches a value of 100%. In most cases, a 100% code coverage is infeasible. However, the analysis is, in most cases, still relevant and provide useful information if the code coverage is high enough. Because the result is a lower bound of the correct result. [10] To obtain a call graph when using a dynamic analysis the following process can be used. First of all, the program must be executed while monitored by another application. For example, one possibility is to run all test included 14 CHAPTER 2. THEORY

in the program. Second of all, the program which is overseeing the applica- tion must keep track of when a function is executed, and keep track of which function invoked it. Lastly, with this information, it is possible to create a call graph. Once again, it is important to mention that the accuracy of the analysis depends on how much of the code is executed. If some part of the system not is executed, because, the test was too limited, this part is marked as dead even if it theoretically is possible to run this part of the application. Chapter 3

Related Work

In this chapter, the related work in the field of dead code analysis is sum- marised. The analysis of dead code in the internal code and the external code (third-party libraries) are related but have some differences. Because of this, both articles focusing on internal and external code analysis are included in this chapter.

3.1 Call Graph Construction for Java Libraries

The report by Reif et al. [13] designs two different algorithms for building a call graph for Java libraries. The researchers claim that the gold standard is to treat all non-private methods as initially reachable methods. According to the report, this is not a good representation of the real word. Splitting the analysis into two categories is necessary. Firstly, the open-package assumption (OPA) when the libraries can be extended or implemented by the user. Secondly, the closed-package assumption (CPA) when only the methods listed by the API are contained in the set of initially reachable methods. The authors develop two different tools that assume either OPA or CPA. The OPA method makes a conservative worst-case assumption and can be used for security purposes. CPA is used in software development, for example, to analyse dead methods in a library. The latter method is less conservative and therefore works better in practice but can generate false negatives. An extensive set of libraries are used to benchmark the tools. The result of their reports concludes that it is necessary to use different methods to create

15 16 CHAPTER 3. RELATED WORK

the graph depending on the use case, which is not necessary when analysing the dead code in the internal part of the program.

3.2 DUM-Tool

Romano and Scanniello [6] investigate if it is possible to develop a more ac- curate dead code analyser and created a tool named DUM-tool. The tool is implemented as a graphical Eclipse1 plugin and analyse the ratio of unused internal methods in the project. Four open source projects are used to benchmark DUM-tool. The open source projects which are used are ArtOfIllusion2, LaTeXDraw3, aTunes4 and Me- diaPesata. The result is compared with two existing tools JTombstone5 and Google CodePro AnalytiX. Three different measuring techniques are used to evaluate the result. • Correctness: A method marked as unreachable is unreachable. • Completeness: The ratio of actually unreachable methods compared to the output of unreachable methods. • Accuracy: A balanced value between correctness and completeness. The exact number of unreachable methods are calculated by hand. The result of the report shows a trend in favour of DUM-tool. The DUM-tool is better if correctness and accuracy are taken into account compared to the other two tools. JTombstone is slightly worse than DUM-tool if completeness is taken into account and Google CodePro AnalytiX received the worst completeness result. 1http://www.eclipse.org 2http://www.artofillusion.org 3http://latexdraw.sourceforge.net 4https://sourceforge.net/projects/atunes/ 5http://jtombstone.sourceforge.net CHAPTER 3. RELATED WORK 17

3.3 Dead Code Elimination for Web Systems Written in PHP: Lessons Learned from an Industry Case

Boomsma, Hostnet, and Gross [14] investigates how to design a dead code analysis for a PHP project. Dynamic and weak typing are two factors that make it hard to find dead code in PHP. A dynamic analyser is developed by Boomsma and Gross to solve the problem. The smallest unit they use in this report is a whole PHP file. The first reason for this decision is because it is easy to remove a PHP file. Secondly, it is easy to measure a single file. For every file, the following events are logged (1) when it was used the first time, (2) how many times it is used, (3) when it last time was used and (4) when the file was changed. A file is marked as "potentially dead" if it never is used after some execution time. However, this does not mean that a file is dead. The file is maybe only ex- ecuted once a month or perhaps only once per year. To safely remove a "poten- tially dead" file the developers must manually decide if the file is dead. The researchers tested their approach on a web server developed by Hostnet. They managed to remove almost 30% of the original code inventory. A sim- ilar analysis has not been done before; therefore, it does not exist any data to compare. However, after three months running only half of the files are not marked as "potentially dead". Therefore, it probably exists more dead files in the project. Chapter 4

Methods

This chapter contains a detailed explanation of the two experiments performed in this thesis. The test is performed on the same data set. The first experiment use a static code analysis approach, and the second approach uses a dynamic code analyser. The dynamic code analyser uses a custom developed program, and the first experiment uses an already existing tool.

4.1 Test Data

For testing, five different data sets are used. All the projects fulfil two different requirements, • exist on github.com and • has a code coverage over 60%. The selected projects are: Spring-boot-admin1 is an application providing a user interface for the ad- ministration of a spring-boot application. Nanohttpd2 is an HTTP server. The server is lightweight and designed for embedding in other applications.

1https://github.com/codecentric/spring-boot-admin 2https://github.com/NanoHttpd/nanohttpd

18 CHAPTER 4. METHODS 19

Jadx3 provides the functionality to produce Java source code from an Android executable file. Activiti4 is a product targeting business people. Activiti helps the user to man- age the workflow and the business process. Genie5 is a federated job orchestration engine. Genie provides an API to run and monitor big data jobs.

4.2 Dead Code Granularity

Before designing the experiments, the smallest unit of dead code must be de- fined. Other reports have used different granularity; for example, Boomsma, Hostnet, and Gross [14] use a PHP file as the smallest unit. It is also common to use a method as the smallest unit[13, 7, 15]. In this thesis, the granularity is set on the method level. If a higher granularity is used, such as a class, it is possible that the accuracy is reduced. On the other hand, if a more fine- grained unit is used, such as an instruction, the complexity of the analysis increase.

4.3 Tools Used

In this thesis, some existing tools are used to simplify the experiment; the tools are listed in this section.

4.3.1 Java

Java is used to compile and run all projects. For all projects except Activiti Java 1.86 is used. Activiti is not compatible with Java 8. Therefore, Java 117 is used for this project. However, the dynamic code analyser is compatible with both versions, and therefore, this has no impact on the result.

3https://github.com/skylot/jadx 4https://github.com/Activiti/Activiti 5https://github.com/Netflix/genie 6https://www.java.com/sv/download/mac_download.jsp 7https://www.oracle.com/technetwork/java/javase/downloads/ jdk11-downloads-5066655.html 20 CHAPTER 4. METHODS

4.3.2 Javap

Javap is used to extract all method signatures from the compiled class files. Javap is a disassembler for Java which takes one or multiple class files as input and outputs the code in Java source code format which makes it easier for a human to read the code. [16]

4.3.3 JTombstone

A static code analyser is a sophisticated program which takes time to develop and is hard to implement correctly; therefore, an existing tool is used in this thesis. JTombstone8 is a static code analyser that finds dead code in a Java project. The tool examines the Java bytecode and not the source code. JTomb- stone uses a variant of CHA to perform the analysis. The program requires the user to specify which methods are rooted. Marking a method as rooted means that it is flagged as used even if it is not invoked. This is used as a starting point for the analyser. Romano et al. [2] use the term initially reachable methods for these methods. Examples of such methods are, • the main method in Java, • methods executed by reflection or • methods executed to initialise fields. The tool is examined in previous research[15] and performed well in most cases; that is why this tool is used in this thesis. JTombstone has some lim- itations that are important to have in mind. First, JTombstone cannot handle reflections on its own, which means that methods that only are called via re- flections are flagged dead. A workaround is to mark the desired methods as rooted in the configuration file. A second limitation is that JTombstone can- not handle native function calls, which later calls a method in the project. The solution to this problem is the same as for the previous problem. The third lim- itation is that the smallest supported granularity level is on the method level. [17] 8http://jtombstone.sourceforge.net CHAPTER 4. METHODS 21

4.3.4 Java Agent

One way of design a dynamic code analyser for a Java application is to run a Java agent parallel with the program. A Java agent[18] is a Java program used to debug or alter the behaviour of another Java program. The agent can modify the class files while they are loaded to the JVM. For example, the agent can add instructions to an existing method or add a new method to a class. Depending on the desired outcome, the agent executes before or after the main application is started. It is necessary to be aware that most of the safety checks are disabled when the agent is executed; therefore, it can crash the JVM. In this thesis, a custom-made agent is used for debugging the software under observation. To simplify the modification of the program, Javassist9 is used. It is an API which simplifies the modification of the Java program[19]. In this project, version 3.24.0-GA of Javassist is used.

4.4 Experiment Process

This section contains a more in-depth description of how the experiment is executed. To understand the experiment, the process is divided into different parts. The two main parts are the dynamic code analyser (Java agent), and the static code analyser (JTombstone). Where the agent is developed from scratch and JTombstone is not. Figure 4.1 contains a flow diagram over the experiment process, and it shows how the different parts depend on each other.

9http://www.javassist.org 22 CHAPTER 4. METHODS

Figure 4.1: The picture illustrates in which order every part of the experiment is conducted and how they depend on each other. The names in the figure correspond to the headlines below.

4.4.1 Initialisation

This step is responsible for deciding which packages should be used in the analysis and secondly find all methods in these packages. Both of these parts are developed explicitly for this experiment. Retrieving all packages is a manual step and require the developer to go through the software under scrutiny and find all packages which should be included in the examination. In this thesis, all external methods are of interest. Therefore, all packages which belong to third-party libraries are included in the examination. The second step is to find all the methods in the selected packages. However, interfaces, enums, native method and empty methods are ignored. This de- cision was taken because the agent cannot handle this type of methods. The selected methods are later used to calculate the ratio of alive methods. To achieve this, all class files are decompiled by Javap. To find all signatures the output from Javap is analysed. CHAPTER 4. METHODS 23

4.4.2 Dynamic Analysis

One of the two main parts of the examination is the dynamic analysis. In the experiment, the dynamic analysis is achieved via a Java agent, and this agent is custom developed for this experiment. The agent uses a similar approach as Boomsma, Hostnet, and Gross [14] use in their report. However, the agent in their report is more sophisticated. The agent used in our experiment has one primary action. This action is to add a System.err.println instruction at the beginning of every method in the specified packages. These print statements are executed every time the method is invoked. In this way, it is possible to track which methods are executed when running the program. To retrieve some statistic the agent must be loaded while the project under investigation is executed. In this case, all the existing test cases are executed to retrieve data. To get a reliable result, the project under observation must have a high code coverage. Boomsma, Hostnet, and Gross [14] cannot guarantee if a method is dead, they can only tell if it is not dead, because they do not know if the method is exe- cuted in the future. The agent in this thesis suffering from the same problem, as long as the test data do not have 100% code coverage. This is something which is discussed more in chapter 6.

4.4.3 Static Analysis

The static analysing part is also one of the main parts of the experiment. This part depends on the already existing tool JTombstone, discussed in section 4.3.3. The root methods are configured to all internal methods and the tool is configured only to analyse external methods. In this way, only methods in the third-party library are analysed. JTombstone returns a list of dead methods, to obtain all alive methods the relationship shown in equation 4.1 is used.

MaliveJT ombstone = Mall \ MdeadJT ombstone (4.1) 24 CHAPTER 4. METHODS

4.4.4 Calculat Code Coverage

The code coverage for every test data is also necessary later when analysing the result. This part does not contain any self-written code; instead the Java build- ing tools Maven10 and Gradle11 are used to calculate the code coverage.

4.4.5 Calculating and Validating the Result

The final part is the calculation and validation of the result. The result is repre- sented as the ratio of alive methods for every project in the test data. The ratio is calculated for both the result yielded by the dynamic code analyser (equa- tion 4.2) and the static code analyser (equation 4.3). Lastly, a validation check is performed for the result generated by the static code analyser, the check is shown in equation 4.4, and validate if all methods JTombstone finds exist in the set of all methods, the set of all methods is explained in section 4.4.1.

|Malive | agent (4.2) |Mall|

|Malive | JT ombstone (4.3) |Mall|

∅ = MdeadJT ombstone \ Mall (4.4)

10https://maven.apache.org 11https://gradle.org Chapter 5

Results

In this chapter, the result of the analysis is presented. The result of the static code analyser has an error, therefore, it is not possible to obtain an exact re- sult. Therefore, the ratio of alive methods is presented as an interval where the interval is bounded by a maximum value and a minimum value. The cause of this error is discussed in section 6.3. The difference between the static and dynamic result is also presented in this chapter. This value is also divided into two values, the maximum value, which is the most significant difference be- tween the two methods and the minimum value, which is the smallest possible difference between the two methods. In table 5.1, the calculated code coverage for the five projects is shown. The result of the two different methods is shown in table 5.2. Table 5.2 contains the difference between the maximum and the minimum difference between the agent and the static code analyser, in the table 5.3, the standard deviation and the mean value of the difference are displayed. Figure 5.2, contains a plot of the code coverage and the difference, where the code coverage is aligned along the x-axis and the difference with respect to the total number of methods along the y-axis. A linear least square approximation[20] is used to find the trend line. Lastly, figure 5.3 is a bar diagram where the error is shown.

Activiti Genie Jadx Nanohttpd Spring-boot-admin 61% 70% 63% 85% 88%

Table 5.1: The calculated code coverage for every project.

25 26 CHAPTER 5. RESULTS

Spring- Activiti Genie Jadx Nanohttpd boot-admin Minimum 47.5 43.7 33.6 6.97 14.0 Maximum 47.8 45.0 36.2 7.81 15.3

Table 5.2: The ratio of the difference between the agent result and the static code analyser result with respect to the total numbers of methods.

Mean Standard deviation Minimum 29.2 17.9 Maximum 30.4 17.9

Table 5.3: The mean value and the standard deviation of table 5.2.

Figure 5.1: The diagram shows the percentage of alive methods in third-party libraries. CHAPTER 5. RESULTS 27

Figure 5.2: A plot of the difference between the two methods. Both the max- imum and minimum are plotted. Along the x-axis is the code coverage spec- ified and along the y-axis is the difference with respect to the total number of methods plotted. The trend line is calculated using the linear least square approximation[20]. 28 CHAPTER 5. RESULTS

Figure 5.3: The static code analyser returns some methods (unknown methods) which do not appear in the list of all methods. The reason for this behaviour is discussed in section 6.3. This graph show this error in percentages. The value unknown methods on the y-axis is calculated using the following formula all methods . Chapter 6

Discussions

This chapter contains a discussion of the result and the methodology used in the experiments. The chapter also addresses the topics of future work and ethical consideration.

6.1 Result

This section contains a comparison and analysis of the result in chapter 5, the section also analyse the result together with the stated research question in section 1.2. The first finding obtained in the experiment is that the two methods yield sig- nificantly different results for most of the test data. This finding is seen in figure 5.1 and table 5.2, and it is clear that the agent yields a lower value compared to the static analyser. The outcome is expected, according to the theory of static analysis and dynamic analysis, discussed in section 2.6 and 2.7. However, the result does not say anything about which method is closest to the correct value. The mean difference for the five projects is shown in table 5.3; the table shows that the difference is somewhere between 29.2% and 30.4%. One can argue that this is a big difference where the two experiments almost classify 1/3 of the methods different. It is also interesting that the standard deviation is high (17.9 percentages points). One reason for the high standard deviation probably is the variation in code coverage (table 5.1) in the test data.

29 30 CHAPTER 6. DISCUSSIONS

The correlation between code coverage and the difference between the meth- ods is shown in figure 5.2. The result from this figure strengthens the hypothe- sis that the difference decreases when the code coverage increases. The reason for this behaviour likely is the fact that the dynamic code analyser executes more of the code when more comprehensive test suites are used and therefore yield a better result. Additionally, the result of the static code analyser is not depended on the code coverage, because it examines the code by investigating the source code. Therefore, it has the same level of accuracy independent of how comprehensive the tests are. These two facts together explain why the difference decreases while the code coverage increases. The research question was, is a dynamic approach a better technique for mea- suring the amount of unused code in third-party libraries included in Java open source projects with high code coverage, compared to a static approach? The answer to this question based on the results in this thesis is that it depends, both methods should be used to create a bound instead of an exact value. Be- cause it is hard to obtain the correct value for both the static approach and the dynamic approach according to theory. It is hard, if not impossible, to develop a perfect static analyser, because of the difficulty to create a precise call graph[13, 8, 15]. To develop a perfect dynamic code analyser has its chal- lenges, for example, the difficulty to be convinced that a method never will be executed at a later stage[14].

6.2 Methodology

The method chosen in this thesis has both advantages and disadvantages. This section contains a discussion about these advantages and disadvantages. The dynamic code analyser phase is straightforward (print all signatures and count them). The simplicity of the method makes it more transparent, and it is possible to verify and find errors throughout the full process quickly. How- ever, the rather simple method limits the size of the test data. The problem arises when the number of invoked methods increases; this increases the run- time with a few hours in the worst case. To conclude, in this case, the used method is reasonable because it is transparent, which makes it easier to verify and find errors. However, it is hard to scale this process and analyse a bigger project. The decision to use an already existing static code analyser was the best choice CHAPTER 6. DISCUSSIONS 31

for this thesis. The time it takes to write and validate a self-written static code analyser would not be possible with the time frame of this thesis. The chosen analyser has known weaknesses, mainly when it must deal with reflections[17]. It exists better tools, for example, DUM-tool (section 3.2); however, these tools could not be modified to fit the requirements for this thesis. Because they analyse the internal code and not the external code. The control to include a method if it is called because it is a transitive depen- dency[21] or not is lost due to the method in this thesis. It is not possible to limit the analyser only to count a method if it is called from one of the decided dependencies or the internal code. For example, a project M depends on two libraries A and B. M depends on A and B directly, however, A also depends on B. In this case, B is both a transitive dependency and a first-order depen- dency and it is impossible to know if a method is alive because it was called from M or A. However, this is not a problem in this thesis, because, the two methods behave the same. However, this problem is vital to think about if a more in-depth dependency analysis is done.

6.3 Sources of Error

The experiments contain two primary sources of error, both of which are dis- cussed in this section. The first one is shown in figure 5.3. This number is calculated using the for- |M \M | mula JT ombstonedead all . A lower value is better, and in theory, this ratio |Mall| should be 0. Because it should not exist methods in MJT ombstonedead which not exist in Mall. However, this is not the case in the experiments. The problem occurs because JTombstone and Javap represent the method sig- natures different in some cases. One example is how the two programs handle generics, in Java, it is possible to write a generic type[22] with a letter, for example, E or T. JTombstone never modifies this signature and therefore has signatures with generic types. Javap, on the other hand, sometimes have more information and specify the type, for example, set the type to java.lang.Object. Because of this, it exists two signatures for the same method, which is the rea- son for the error. The way this problem is handled in the thesis is to represent the result as a bound instead of an exact value. Figure 5.3 shows the error for the different projects. It is interesting to reason about why some projects are more sensitive to this error. Jadx has a high error 32 CHAPTER 6. DISCUSSIONS

value, a bit over 2.5%, while Activiti only has around 0.3%. One explanation could be that Jadx uses more methods from the libraries which require gener- ics. For example, Google commons is one of the dependencies used in Jadx. This library has many methods working with collections, and generally, col- lections use generics because the type of the item in the collection is usually unknown. In the experiment, only five different projects were used as test data, which is not a particular big data set. Because of this, the result of the experiments is only used to show the pattern and cannot be used to conclude that the hypoth- esis is correct. The reason for not using more data points was the limited time frame of the thesis; it was not feasible to test more projects. Because, both the test itself and the configuration of every project was time-consuming.

6.4 Future Work

6.4.1 Improve the Code Coverage

The precision of the result the agent yield seems to be correlated to the code coverage. In this thesis, no effort has been made to improve the code cover- age. A higher code coverage could be achieved by writing more tests and use more rigorous methods to force the tests to retrieve a higher code coverage, for example, using fuzzing techniques[23]. Hopefully, the increase of code coverage with this technique would lead to an even smaller difference between the two methods, and therefore, generate a more accurate result.

6.4.2 Improve the Static Code Analysis

The algorithm used by the JTombstone is in some scenarios limited, and the result may gain from using a better approach[6]. First of all, the tool does not perform well when reflections are used[17]. Second of all another graph creating algorithm may improve the result. One example could be to use an RTA approach, and a starting point could be to take a closer look into the WALA1 framework. The framework has implemented some useful algorithms for this purpose.

1http://wala.sourceforge.net/wiki/index.php/Main_Page CHAPTER 6. DISCUSSIONS 33

6.4.3 Analyse the Functionality of the Code

One purpose of this thesis was to get a better understanding of the dependency graph. The focus of this thesis was to discover what part of the dependencies is dead and alive, and two different methods were compared. The next natural step is to analyse the functionality of the dead code. Is it possible to categorise the code in different groups based on functionality? This information could help the developer to make better decisions about if the library is necessary or not. Maybe the developer can find a smaller library which fulfils the require- ments or even implement the code without using a third-party library.

6.5 Ethical Considerations

Ethical aspects are something everyone who works with research must con- sider when planning and designing their research. Which test data you use is one important aspect, for example, are you allowed to use this data in your research. In this project the licenses of the software projects were considered, all projects included in the test data are either licensed under Apache License 2.0[24] or BSD 3-Clause[25] both of these licenses allow someone to use these projects as test data. Chapter 7

Conclusions

This is the last chapter in the thesis and the conclusions based on the experi- ments are presented here. Because the data set in the experiment was limited, it is not possible to draw any general conclusions. However, some findings in the experiments aligned well with the theory. First of all, the hypothesis which states that the static method would yield a higher number of alive methods compared to the dynamic method is supported by the results of the experiments in this thesis. The second hypothesis in this thesis suggested that the difference between the two methods would decrease while the code coverage increase. According to the findings in this report, this is also true. Lastly, the answer to the research question. According to the findings in this thesis, it is not possible to tell which approach is the best. A suitable method is to use both approaches and estimate a bound instead of an exact value. How- ever, the code coverage of the test is essential and can minimise the size of the bound.

34 Bibliography

[1] Lars Heinemann et al. “On the Extent and Nature of Software Reuse in Open Source Java Projects”. In: Top Productivity through Software Reuse. Ed. by Klaus Schmid. Berlin, Heidelberg: Springer Berlin Hei- delberg, 2011, pp. 207–222. isbn: 978-3-642-21347-2. [2] Simone Romano et al. “A Graph-based Approach to Detect Unreach- able Methods in Java Software”. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing. SAC ’16. Pisa, Italy: ACM, 2016, pp. 1538–1541. isbn: 978-1-4503-3739-7. doi: 10 . 1145 / 2851613.2851968. url: http://doi.acm.org.focus. lib.kth.se/10.1145/2851613.2851968. [3] Robert C. Martin. Clean code: a handbook of agile software craftsman- ship. 1st ed. Prentice Hall PTR, 2008. [4] Saumya K. Debray et al. “Compiler Techniques for Code Compaction”. In: ACM Trans. Program. Lang. Syst. 22.2 (Mar. 2000), pp. 378–415. issn: 0164-0925. doi: 10.1145/349214.349233. url: http: //doi.acm.org.focus.lib.kth.se/10.1145/349214. 349233. [5] Martin Abadi and Luca Cardelli. A Theory of Objects (Monographs in Computer Science). 1st ed. Springer, 1996. [6] S. Romano and G. Scanniello. “DUM-Tool”. In: 2015 IEEE Interna- tional Conference on Software Maintenance and Evolution (ICSME). Sept. 2015, pp. 339–341. doi: 10.1109/ICSM.2015.7332484. [7] Frank Tip and Jens Palsberg. “Scalable Propagation-based Call Graph Construction Algorithms”. In: SIGPLAN Not. 35.10 (Oct. 2000), pp. 281–293. issn: 0362-1340. doi: 10.1145/354222.353190. url: http : / / doi . acm . org . focus . lib . kth . se / 10 . 1145/354222.353190.

35 36 BIBLIOGRAPHY

[8] David Grove et al. “Call Graph Construction in Object-oriented Lan- guages”. In: SIGPLAN Not. 32.10 (Oct. 1997), pp. 108–124. issn: 0362- 1340. doi: 10 . 1145 / 263700 . 264352. url: http : / / doi . acm.org.focus.lib.kth.se/10.1145/263700.264352. [9] Shekhar Gulati and Rahul Sharma. Java unit testing with JUnit 5: test driven development with JUnit 5. Apress, 2017. [10] Atanas Rountev, Scott Kagan, and Michael Gibas. “Static and Dynamic Analysis of Call Chains in Java”. In: SIGSOFT Softw. Eng. Notes 29.4 (July 2004), pp. 1–11. issn: 0163-5948. doi: 10.1145/1013886. 1007514. url: http://doi.acm.org.focus.lib.kth. se/10.1145/1013886.1007514. [11] Jeffrey Dean, David Grove, and Craig Chambers. “Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis”. In: ECOOP’95 — Object-Oriented Programming, 9th European Confer- ence, Åarhus, Denmark, August 7–11, 1995. Ed. by Mario Tokoro and Remo Pareschi. Berlin, Heidelberg: Springer Berlin Heidelberg, 1995, pp. 77–101. isbn: 978-3-540-49538-3. [12] David Francis Bacon. “Fast and Effective Optimization of Statically Typed Object-oriented Languages”. AAI9828589. PhD thesis. 1997. isbn: 0-591-81143-X. [13] Michael Reif et al. “Call Graph Construction for Java Libraries”. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. FSE 2016. Seattle, WA, USA: ACM, 2016, pp. 474–486. isbn: 978-1-4503-4218-6. doi: 10.1145/ 2950290.2950312. url: http://doi.acm.org.focus. lib.kth.se/10.1145/2950290.2950312. [14] H. Boomsma, B. V. Hostnet, and H. Gross. “Dead code elimination for web systems written in PHP: Lessons learned from an industry case”. In: 2012 28th IEEE International Conference on Software Maintenance (ICSM). Sept. 2012, pp. 511–515. doi: 10 . 1109 / ICSM . 2012 . 6405314. [15] S. Romano and G. Scanniello. “Exploring the Use of Rapid Type Anal- ysis for Detecting the Dead Method Smell in Java Code”. In: 2018 44th Euromicro Conference on Software Engineering and Advanced Appli- cations (SEAA). Aug. 2018, pp. 167–174. doi: 10 . 1109 / SEAA . 2018.00035. BIBLIOGRAPHY 37

[16] Oracle. Javap. url: ://docs.oracle.com/javase/ 8 / docs / technotes / tools / windows / javap . html # BEHDBJHJ (visited on 04/09/2019). [17] JTombstone. JTombstone Limitations. url: http://jtombstone. sourceforge.net/limitations.html (visited on 04/08/2019). [18] Peter Verhas. Java 9 Programming By Example. 1st ed. Packt Publish- ing Lrd., 2017. [19] Shigeru Chiba. Javassist. url: http://www.javassist.org (visited on 04/09/2019). [20] Otto Bretscher. Linear Algebra With Applications. 3rd ed. Prentice Hall, 1995. [21] Tim Berglund. Gradle Beyond the Basics. 1st ed. O’Reilly Media, 2013. isbn: 9781449304676. [22] Maurice Naftalin and Philip Wadler. Java Generics and Collections. 1st ed. O’Reilly Media, 2006. isbn: 9780596527754. [23] Ari Takanen, Jared DeMott, and Charlie Miller. Fuzzing for Software Security Testing and Quality Assurance. Artech house, 2008. isbn: 9781596932142. [24] The Apache Software Foundation. Apache License, Version 2.0. url: hhttps://www.apache.org/licenses/LICENSE-2.0. html (visited on 05/28/2019). [25] GNU. Various Licenses and Comments about Them. url: https : / / www . gnu . org / licenses / license - list . html # ModifiedBSD (visited on 05/28/2019).

TRITA-EECS-EX-2019:515

www.kth.se