Quantitative Analysis of Exploration Schedules for Symbolic Execution

DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2017 Quantitative Analysis of Exploration Schedules for Symbolic Execution CHRISTOPH KAISER KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Quantitative Analysis of Exploration Schedules for Symbolic Execution CHRISTOPH KAISER Master in Computer Science Date: August 21, 2017 Supervisor: Cyrille Artho Examiner: Mads Dam Swedish title: Kvantitativ analys av utforskningsscheman för Symbolisk Exekvering School of Computer Science and Communication i Abstract Due to complexity in software, manual testing is not enough to cover all relevant behaviours of it. A different approach to this problem is Symbolic Execution. Symbolic Execution is a software testing technique that tests all possible inputs of a program in the hopes of finding all bugs. Due to the often exponential increase in possible program paths, Symbolic Execution usually cannot exhaustively test a program. To never- theless cover the most important or error prone areas of a program, search strategies that prioritize these areas are used. Such search strategies navigate the program execution tree, analysing which paths seem interesting enough to execute and which to prune. These strategies are typically grouped into two categories, general purpose searchers, with no specific target but the aim to cover the whole program and targeted searchers which can be directed towards specific areas of interest. To analyse how different searching strategies in Symbolic Execution affect the finding of errors and how they can be combined to improve the general outcome, the experiments conducted consist of several different searchers and combinations of them, each run on the same set of test targets. This set of test targets contains amongst others one of the most heav- ily tested sets of open source tools, the GNU Coreutils. With these, the different strategies are compared in distinct categories like the total number of errors found or the percentage of covered code. With the results from this thesis the potential of targeted searchers is shown, with an example implementation of the Pathscore-Relevance strategy. Further, the results obtained from the conducted experiments endorse the use of combinations of search strategies. It is also shown that, even simple combinations of strategies can be highly effective. For example, interleaving strategies can provide good results even if the underlying searchers might not perform well by themselves ii Sammanfattning På grund av programvarukomplexitet är manuell testning inte tillräcklig för att täcka alla relevanta beteenden av programvaror. Ett annat tillvägagångssätt till detta problem är Sym- bolisk Exekvering (Symbolic Execution). Symbolisk Exekvering är en mjukvarutestningsteknik som testar alla möjliga inmatning- ar i ett program i hopp om att hitta alla buggar. På grund av den ofta exponentiella ökningen i möjliga programsökvägar kan Symbolisk Exekvering vanligen inte uttömmande testa ett program. För att ändå täcka de viktigaste eller felbenägna områdena i ett program, används sökstrategier som prioriterar dessa områden. Sådana sökstrategier navigerar i programexe- kveringsträdet genom att analysera vilka sökvägar som verkar intressanta nog att utföra och vilka att beskära. Dessa strategier grupperas vanligtvis i två kategorier, sökare med allmänt syfte, utan något specifikt mål förutom att täcka hela programmet, och riktade sökare som kan riktas mot specifika intresseområden. För att analysera hur olika sökstrategier i Symbolisk Exekvering påverkar upptäckandet av fel och hur de kan kombineras för att förbättra det allmänna utfallet, bestod de expe- riment som utfördes av flera olika sökare och kombinationer av dem, som alla kördes på samma uppsättning av testmål. Denna uppsättning av testmål innehöll bland annat en av de mest testade uppsättningarna av öppen källkod-verktyg, GNU Coreutils. Med dessa jämför- des de olika strategierna i distinkta kategorier såsom det totala antalet fel som hittats eller procenttalet av täckt kod. Med resultaten från denna avhandling visas potentialen hos riktade sökare, med ett exempel i form av implementeringen av Pathscore-Relevance strategin. Vidare stöder resultaten som erhållits från de utförda experimenten användningen av sökstrategikombinatio- ner. Det visas också att även enkla kombinationer av strategier kan vara mycket effektiva. Interleaving-strategier kan till exempel ge bra resultat även om de underliggande sökarna kanske inte fungerar bra själva. Contents Contents iii 1 Introduction 1 1.1 Research Question . .2 1.2 Scope . .2 1.3 Ethics and sustainability . .2 1.4 Structure of this thesis . .3 2 Background 5 2.1 Symbolic Execution . .5 2.2 Search Strategies . .8 2.2.1 Depth-First . .8 2.2.2 Breadth-First . .9 2.2.3 Random . .9 2.2.4 Coverage-Optimized . .9 2.2.5 Others . 10 2.3 Meta Strategies . 10 2.4 KLEE . 11 2.5 Cluster . 12 3 Methods 13 3.1 Pathscore-Relevance . 13 3.1.1 Path Score . 14 3.1.2 Component Relevance . 14 3.1.3 Coalesce Pathscore-Relevance . 15 3.2 Random-Shuffle-Round-Robin . 15 3.3 Evaluation . 16 3.3.1 Metrics . 16 3.3.2 Test Design . 17 3.3.3 Evaluation on a Cluster . 19 3.3.4 Evaluating the Evaluation . 20 3.4 Test Setup . 21 3.4.1 Software . 21 3.4.2 Hardware . 21 3.4.3 Searchers . 22 3.4.4 Targets . 22 iii iv CONTENTS 4 Results 25 4.1 Number of Found Errors . 25 4.2 Time until First Error . 26 4.3 Coverage . 29 4.4 Consistency of the Results . 30 4.5 Quality of Targeting for PR . 32 4.6 Cluster vs. Dedicated Machine . 34 5 Related Work 37 5.1 Automated Testing Techniques . 37 5.2 Symbolic Execution . 39 5.3 Solvers . 40 6 Conclusion 41 6.1 Discussion . 42 6.2 Future Work . 43 Bibliography 45 A Appendix 53 A.1 KLEE Test Arguments . 53 A.2 Reduction Proof . 53 A.3 List of GNU Coreutils . 55 A.4 Results . 57 Chapter 1 Introduction Software development is a complex process and often results in complex software as well. Since any complex processes can easily lead to mistakes, some form of quality assurance is required. This is usually done by testing the product continuously along its development [1]. Most of today’s tests in software are implemented by the developers themselves. In many of these cases this is done by hand with respect to the intended outcome of a function or even a full program, which has several problems to it with the most important one being incompleteness. To avoid the exhausting process of writing test cases by hand, mechanics for automated test case generation already exist. One of these is a method called Symbolic Execution. Symbolic Execution [2] is a technique in software testing which analyses a given program automatically. To do so the program’s (or function’s) inputs are represented as symbolic val- ues and test cases to cover all different possible combinations are generated automatically. By these means, a completely and successfully tested piece of code either detects errors for some specific input that then can be used for further debugging or no errors and is therefore proven to be correct according the executed assertions. The last part makes Symbolic Execu- tion also interesting for software verification. Although this is only in theory of real interest, since analysing all different possible paths of a program is hard to achieve in practice. Most of the time it is even near impossible with today’s technology in any feasible amount of time. This problem is due to the fact that Symbolic Execution tries every possible path of a given software to test it completely and the amount of paths in a program typically grows expo- nentially, which is often also denoted as path explosion problem [3]. For practical usage this therefore represents a huge problem when thinking about scalability. With these problems of an otherwise great process in mind, one can see that the task of finding a good execution sequence is crucial for the practical use of this method. One possible approach that will be followed in this thesis is to build or modify the searcher in such a way to improve the choice of paths to take along the program and therefore have a higher chance of finding an error early on. 1 2 CHAPTER 1. INTRODUCTION 1.1 Research Question The main research question this thesis answers is: What effect do different searching heuristics in Symbolic Execution have on finding errors and how could they be combined or influence each other to improve the general outcome? Because there exist two typical methods to approach that problem, the question can be bro- ken further down into two distinct categories. The two categories typically followed when searching are directed search, where the goal is to navigate primarily towards a specified area, and general search, which aims to cover as much as possible. The same principles also apply to the automated search heuristics analysed in this thesis, which can be seen as part of either of the categories. A more specific research question results from that: Are search heuristics of one category strictly better in their purpose than other search heuristics, which do not share the same specialisation. 1.2 Scope The scope of this thesis includes identification and classification of existing heuristics for exploration strategies (searchers). This mainly focuses on KLEE available strategies, but also on, within the course of this thesis implemented, prototypes of other strategy. Furthermore the overall evaluation process including a quantitative analysis, executed on a large enough environment, of a chosen set of promising searchers is as a goal of course in the scope of this thesis. Symbolic Execution still has to deal with a certain amount of unsolved problems, like the interaction with the general environment or dealing with parallelism. These problems are clearly out of scope of this theses and thus not further addressed.

Load more