Combining Fuzzing with Machine Learning for Automated Evaluation of Software Exploitability
Total Page:16
File Type:pdf, Size:1020Kb
ExploitMeter: Combining Fuzzing with Machine Learning for Automated Evaluation of Software Exploitability Guanhua Yan Junchen Lu Zhan Shu Yunus Kucuk Department of Computer Science Binghamton University, State University of New York fghyan, jlu56, zshu1, [email protected] Abstract—Exploitable software vulnerabilities pose severe operational cyber security. Indeed, the landscape of software threats to its information security and privacy. Although a great exploitation is constantly changing with new exploitation amount of efforts have been dedicated to improving software techniques developed and new vulnerability mitigation features security, research on quantifying software exploitability is still in its infancy. In this work, we propose ExploitMeter, a fuzzing- deployed. A survey of software vulnerabilities targeting the based framework of quantifying software exploitability that Microsoft Windows platform from the period of 2006 to 2012 facilitates decision-making for software assurance and cyber has revealed that the percentage of exploits for stack corruption insurance. Designed to be dynamic, efficient and rigorous, vulnerabilities has declined but that of exploiting use-after-free ExploitMeter integrates machine learning-based prediction and vulnerabilities has been on the rise [26]. dynamic fuzzing tests in a Bayesian manner. Using 100 Linux applications, we conduct extensive experiments to evaluate the Although there have been many efforts dedicated to improv- performance of ExploitMeter in a dynamic environment. ing software security, there still lacks a coherent framework for quantifying software exploitability in a dynamic operational I. INTRODUCTION environment, as needed by both software assurance and cyber Software security plays a key role in ensuring the trustwor- insurance. In the industry, CVSS (Common Vulnerability thiness of cyber space. Due to the blossoming underground Scoring System) [1] is widely used to estimate severity of market for zero-day software exploits, a large portion of known software vulnerabilities, including exploitability met- cyber crimes are committed through exploitation of vulnerable rics calculated based on their attack vectors, attack complex- software systems. Unfortunately, it is unlikely that software ities, privileges required, and user interactions. In addition vulnerabilities can be eradicated in the foreseeable future, as to CVSS, some other methods have also been proposed to modern software systems have become so complicated that assess software security, such as attack surface metrics [22], it is almost impossible for software programmers with only vulnerability density metrics [7], [9], reachability from entry limited cognitive capabilities to test all their corner cases, let points with dangerous system calls [39], and machine learning- alone that unsafe languages like C are still being widely used. based predictions [12], [19]. However, none of these methods In this work, we explore the following fundamental ques- allow us to quantify software exploitability in a dynamic tion: given the possibly many software vulnerabilities in a execution environment. software system, can we quantify its exploitability? Quantifi- Against this backdrop, we propose in this work a new cation of software exploitability can find its applications in two framework called ExploitMeter aimed at assessing soft- important circumstances: software assurance and cyber insur- ware exploitability quantitatively and dynamically to facilitate ance. In the National Information Assurance Glossary [30], decision-making for software assurance and cyber insurance. software assurance has been defined as the “level of confidence At the heart of the ExploitMeter framework is a Bayesian that software is free from vulnerabilities, either intentionally reasoning engine that mimics the cognitive process of a human designed into the software or accidentally inserted at anytime evaluator: the evaluator first derives her prior confidence in the during its lifecycle and that the software functions in the exploitability of a software system from machine learning- intended manner.” Quantifiable software exploitability offers a based predictions using its static features, and then updates quantitative measure of such confidence in the non-existences her beliefs in software exploitability with new observations of exploitable software vulnerabilities and thus facilitates from a number of dynamic fuzzing tests with different fuzzers. decision-making in deployments of security-critical software Moreover, the evaluator’s experiences with these different programs. On the other hand, the emerging cyber insurance fuzzers are used to update their perceived performances – also market calls for rigorous methods that insurance companies in a Bayesian manner – and these performance measures form can use to assess quantitatively the risks associated with the the basis for the evaluator to quantify software exploitability. insureds using potentially vulnerable software systems. Hence, the reasoning engine of ExploitMeter can be charac- As suggested in [33], quantifiable security measures are terized as a dynamic nonlinear system with inputs taken from hard to achieve due to the adversarial and dynamic nature of machine learning-based predictions and dynamic fuzzing tests. 978-1-5386-1027-5/17/$31.00 c 2017 IEEE Towards the end of building the ExploitMeter framework, text-based features extracted from their descriptions in two our contributions in this work are summarized as follows: public vulnerability data sources. The prediction model pro- • We extract various features from static analysis of soft- posed in [12] cannot be used to predict exploitability of zero- ware programs, from which we train classification models day vulnerabilities. In [19], Grieco et al. extracted sequences to predict the types of vulnerabilities that a software of C standard library calls from both static analysis and program may have (e.g., stack overflow and use-after- dynamic execution of software programs to classify whether free). The classification performances of these classifiers they are vulnerable to memory corruption or not. Although are used by the evaluator to derive her initial belief levels this work applies to executable programs directly, it does not on the prediction results. offer a quantifiable measure of software exploitability as does • For each software under test, we use various fuzzers ExploitMeter. Common to all these previous works is that they to generate crashes, from which we find the types of train models from historical data to predict characteristics of software vulnerabilities that have caused the crashes. For future threats. ExploitMeter differs from these works, because each type of software vulnerabilities discovered, we use predictive models are only used to derive prior beliefs but these the Bayesian method to calculate the evaluator’s posterior beliefs should be dynamically updated with new test results. beliefs in the vulnerability of the software. This effectively addresses the degradation of predictability in • Based on the probability theory, we combine the ex- a dynamic environment where there is an arms race between ploitability scores from the different types of software software exploitation techniques and threat mitigation features. vulnerabilities that a software program contains to gen- ExploitMeter also relates to some recent works on auto- erate its final exploitability score. matic generation of software exploits, such as AEG [10] and We make the source code of ExploitMeter available Mayhem [14]. Although the design of ExploitMeter has been at: http://www.cs.binghamton.edu/∼ghyan/code/ExploitMeter/. inspired by these works, it is unnecessary to find action- To demonstrate its practicality, we perform extensive experi- able exploits against a vulnerable program for evaluating its ments in which two different software fuzzers have been used exploitability. The methodology adopted by ExploitMeter is to fuzz 100 standard Linux utilities inside a virtual machine. similar to our previous work [38] which applies a Bayesian We use ExploitMeter under different configurations to quantify approach to quantifying software exploitability. However, that the exploitabilities of these programs dynamically, thereby work is mostly theoretical without specifying how to derive gaining insights into how ExploitMeter facilitates decision- prior beliefs and what tools should be used for testing software making in practice for software assurance and cyber insurance. vulnerabilities. In contrast, ExploitMeter provides a practical The remainder of the paper is organized as follows. Sec- framework for quantifying software exploitability. tion II summarizes the related work. Section III provides the III. BACKGROUND practical backdrop of ExploitMeter. The main methodologies Quantifiable software exploitability facilitates decision- adopted by ExploitMeter are discussed in Section IV. In making for both software assurance and cyber insurance. To Section V, we present the experimental results, and draw the explain the motivation, we provide an example scenario for concluding remarks in Section VI. each of these two applications: II. RELATED WORK • Software assurance: Consider a security-critical envi- Many efforts have been dedicated to improving the effi- ronment where each software running inside it should be ciency of software vulnerability discovery. These techniques immune to exploitation, even though the software may largely fall into two