Legion: Best-First Concolic Testing

Legion: Best-First Concolic Testing Dongge Liu∗ Gidon Ernst The University of Melbourne LMU Munich School of Computing and Information Systems Software and Computational Systems Lab Melbourne, Victoria, Australia Munich, Bavaria, Germany [email protected] [email protected] Toby Murray Benjamin I.P. Rubinstein The University of Melbourne The University of Melbourne School of Computing and Information Systems School of Computing and Information Systems Melbourne, Victoria, Australia Melbourne, Victoria, Australia [email protected] [email protected] ABSTRACT 1 INTRODUCTION Concolic execution and fuzzing are two complementary coverage- Theseus killed Minotauros in the furthest section of the based testing techniques. How to achieve the best of both remains labyrinth and then made his way out again by pulling an open challenge. To address this research problem, we propose himself along the thread. —Pseudo-Apollodorus, Bib- and evaluate Legion. Legion re-engineers the Monte Carlo tree liotheca E1. 7 - 1. 9 trans. Aldrich search (MCTS) framework from the AI literature to treat automated The complexity of modern software programs are like labyrinths test generation as a problem of sequential decision-making un- for software testers to wander: their program states and execution der uncertainty. Its best-first search strategy provides a principled paths form a confusing set of connecting rooms and paths. Like the way to learn the most promising program states to investigate at Minotaur, faults often hide deep inside. One might guess at a fault’s each search iteration, based on observed rewards from previous possible location via static analysis, but in order to slay it Theseus iterations. Legion incorporates a form of directed fuzzing that we needs to know the path to it for sure, and the software tester needs call approximate path-preserving fuzzing (APPFuzzing) to investi- to know which input will trigger it. gate program states selected by MCTS. APPFuzzing serves as the In the myth of Theseus, the hero king finds Minotauros by ac- Monte Carlo simulation technique and is implemented by extending curately tracing past paths with a ball of thread, allowing him to prior work on constrained sampling. We evaluate Legion against learn and estimate the maze structure. We argue that the very same competitors on 2531 benchmarks from the coverage category of tricks, namely recording exact concrete execution traces and apply- Test-Comp 2020, as well as measuring its sensitivity to hyperpa- ing machine learning to estimate software structure and guide its rameters, demonstrating its effectiveness on a wide variety of input exploration, can also benefit coverage-based testing. programs. The focus of this paper is the quest of coverage-based testing, which is to cover as many paths in as little time as possible, dele- KEYWORDS gating Minotaur detection to separate tools (e.g. AddressSanitiser Concolic execution, constrained fuzzing, Monte Carlo tree search [25], UBSan [23], Valgrind [19], Purify [21]). Traditional methods for coverage-based testing have been dominated by the two com- ACM Reference Format: plimentary approaches of concolic execution (as exemplified by Dongge Liu, Gidon Ernst, Toby Murray, and Benjamin I.P. Rubinstein. DART [13] and SAGE [14]) and coverage-guided greybox fuzzing 2020. Legion: Best-First Concolic Testing. In 35th IEEE/ACM International (as exemplified by libFuzzer [24], AFL [36], its various extensions Conference on Automated Software Engineering (ASE ’20), September 21– such as AFLFast [6], AFLGo [5], CollAFL [12], Angora [10]. arXiv:2002.06311v3 [cs.SE] 23 Sep 2020 25, 2020, Virtual Event, Australia. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3324884.3416629 Continuing the mythological metaphor, with concolic execution one spends a long time rigorously planning each path through the maze via constraint solving, to make the correct turn at each ∗This research was supported by Data61 under the Defence Science and Technology branching point and ensure that no path will ever be repeated. Group’s Next Generation Technologies Program. However, such computation is expensive and, for most modern software, the maze is so large that repeating it for every path is infeasible. Permission to make digital or hard copies of part or all of this work for personal or In contrast, a coverage-guided fuzzer like AFL blindly scurries classroom use is granted without fee provided that copies are not made or distributed around the maze, neither spending much time on planning nor for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. accurately memorising the paths and structure traversed. Thus For all other uses, contact the owner/author(s). much time is inevitably spent unnecessarily repeating obvious ASE ’20, September 21–25, 2020, Virtual Event, Australia execution paths. © 2020 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-6768-4/20/09. Observing the complementary nature of these two methods, https://doi.org/10.1145/3324884.3416629 our research aims to generalise them with Theseus’s strategy. Our ASE ’20, September 21–25, 2020, Virtual Event, Australia Dongge Liu, Gidon Ernst, Toby Murray, and Benjamin I.P. Rubinstein tool Legion1 traces observed execution paths to estimate the maze program states deemed promising (i.e. those for which existing structure. Then it identifies the most promising location to explore statistics indicate are worthy of investigation). next, plans its path to reach that location, and applies a form of Legion’s adoption of MCTS is designed to ensure that com- directed fuzzing to explore extensions of the path to that location. putational power is used to explore the most beneficial program Legion precisely traces each concrete execution (i.e. fuzzing run) locations, as determined by the score function. In Legion scores and gathers statistics to refine its knowledge of the maze to inform are evaluated according to a modularised reward heuristic, with its decisions about where to explore next. interchangeable coverage metrics as discussed in Section 5.1. Statistics gathering is central to Legion’s approach, unlike tradi- Our contributions are: tional fuzzers like AFL that eschew gathering detailed statistics to • We propose a variation of Monte Carlo tree search (MCTS) save time. Instead, Legion aims to harness the power of modern that maintains a balance between concolic execution and machine learning algorithms, informed by detailed execution traces fuzzing, to mitigate the path explosion problem of the former as in other contemporary tools [1]. and the redundancy of the latter (Section 3). Real-life coverage testing is more complicated than the Theseus • We propose approximate path-preserving fuzzing, which ex- myth, as it requires a universal strategy that can adjust itself ac- tends a constrained sampling technique, QuickSampler [11], cording to different program structures: i.e. to fuzz the program to generate inputs efficiently that preserve a given path with parts that are more suitable to fuzz, and favour concolic execution high probability (Section 4.2). elsewhere. However, how to determine the best balance between • We conduct experiments to demonstrate that Legion is these two strategies remains an open question. competitive against other state-of-the-art approaches (Sec- To address this challenge, Legion adopts the Monte Carlo tree tion 6.2.1), and evaluate the effect of different MCTS hyper- search (MCTS) algorithm from machine learning [17]. MCTS has parameter settings. proven to work well in complex games like Go [27], multiplayer board games [29, 32] and poker [30], as it can adapt to the game it 2 OVERVIEW is playing via iterations of simulations and reward analysis. Specif- ically, MCTS learns a policy by successively simulating different Legion generalises the two traditional approaches to coverage- plays and tracking the rewards obtained during each. In Legion, based testing: concolic execution and coverage-guided fuzzing. plays correspond to concrete execution (directed fuzzing), while re- Concolic execution relies on a constraint solver to generate con- wards correspond to increased coverage (discovering new execution crete inputs that can traverse a particular path of interest. New paths paths). More importantly, MCTS’s guiding principle of optimism in are selected by flipping path constraints of previously-observed the face of uncertainty is appropriate for exploring a maze with an execution traces, thereby attempting to cover all feasible execution unknown structure, randomness, and large branching factors where paths. However, it suffers from the high computation cost ofcon- rigorously analysing every detail is infeasible. Instead, MCTS bal- straint solving and exponential path growth in large applications. ances exploitation of the branches that appear to be most rewarding Coverage-guided fuzzing has become increasingly popular dur- based on past experience, against exploration of less-well under- ing the past decade due to its simplicity and efficiency [6, 33]. It gen- stood parts of the maze where rewards (path discovery) are less erates randomised concrete inputs at low cost (e.g. via bit flipping), certain. by mutating inputs previously observed to lead to new execution With the two tricks of Theseus, Legion provides a principled paths. However, the resulting inputs more often than not

Load more