Approximating Attack Surfaces with Stack Traces

Approximating Attack Surfaces with Stack Traces Christopher Theisen†, Kim Herzig‡, Patrick Morrison†, Brendan Murphy‡, Laurie Williams† [email protected], [email protected], [email protected], [email protected], [email protected] †Department of Computer Science, NCSU, Raleigh, North Carolina ‡Microsoft Research, Cambridge, UK Abstract – Security testing and reviewing efforts are a necessity code using API calls have required significant manual work for software projects, but are time-consuming and expensive to [18]. apply. Identifying vulnerable code supports decision-making We propose attack surface approximation, an automated during all phases of software development. An approach for approach to identifying parts of the system that are contained identifying vulnerable code is to identify its attack surface, the on the attack surface through stack trace analysis. We parse sum of all paths for untrusted data into and out of a system. Identifying the code that lies on the attack surface requires stack traces, adding all code found in these traces onto the expertise and significant manual effort. This paper proposes an attack surface approximation. By definition, code that appears automated technique to empirically approximate attack surfaces in stack traces caused by user activity is on the attack surface through the analysis of stack traces. We hypothesize that stack because it appears in a code path reached by users. traces from user-initiated crashes have several desirable We hypothesize that stack traces from user-initiated crashes attributes for measuring attack surfaces. The goal of this have three desirable attributes for measuring attack surfaces: research is to aid software engineers in prioritizing security efforts (a) they represent user activity that puts the system under by approximating the attack surface of a system via stack trace stress; (b) they include both direct and indirect entry points; analysis. In a trial on Windows 8, the attack surface approximation selected 48.4% of the binaries and contained and (c) they provide automatically generated control and data 94.6% of known vulnerabilities. Compared with vulnerability flow graphs. We seek to assess the degree to which these prediction models (VPMs) run on the entire codebase, VPMs run attributes of stack traces support the identification of attack on the attack surface approximation improved recall from .07 to surfaces. We call our approach attack surface approximation .1 for binaries and from .02 to .05 for source files. Precision because code entities will only be added to the attack surface remained at .5 for binaries, while improving from .5 to .69 for when a crash has occurred. As such, the attack surface source files. approximation will evolve over time. We assess our approach Index Terms—stack traces, security, vulnerability, models, by analyzing the percentage of actual reported vulnerabilities in testing, reliability, attack surface. the code and whether they occur in our approximated attack I. INTRODUCTION surface. The goal of this research is to aid software engineers in Howard et al. [17] introduced the concept of an attack prioritizing security efforts by approximating the attack surface surface, describing entry points to a system that might be of a system via stack trace analysis. vulnerable along three dimensions: targets and enablers, We explore the following questions as a part of this paper: channels and protocols, and access rights. Later, Manadhata and Wing [18] formalized the notion of attack surface, RQ1: How effectively can stack traces to be used to including methods, channels, untrusted data, and a direct and approximate the attack surface of a system? indirect entry and exit point framework that identifies methods RQ2: Can the performance of vulnerability prediction be through which untrusted data passes. Security professionals can improved by limiting the prediction space to the focus their efforts on code on the attack surface because it approximated attack surface? contains vulnerabilities that are reachable, and therefore exploitable, by malicious users. Code not on the attack surface may contain latent vulnerabilities, but these are unreachable by We build an attack surface approximation for the Windows malicious users. With this prioritization, the security operating system based on stack traces. This system is a professional could find vulnerabilities more efficiently. completely automated process, requiring no human input to what is considered on the attack surface and what is not, unlike As valuable as the concept of attack surface is, we still lack previous approaches requiring human input. To assess our a practical means of identifying the parts of the system that are contained on the attack surface. Manadhata and Wing proposed approximation, we compared the set of known security metrics to measure attack surface size, and empirically vulnerabilities from earlier versions of Windows 8 against this attack surface. measured direct entry points by building a call graph, but they left identifying indirect entry and exit points for future work. Other approaches to defining the attack surface have been done at a configuration level without considering code, and other The contributions of this paper include: the system could also contain security vulnerabilities that the • A practical, automated attack surface reviewer should be aware of. Further, their approach to approximation based upon analysis of stack traces measuring attack surfaces required expert judgment and • An evaluation of attack surface approximation manual effort. performance for vulnerability prediction in B. Exploiting Crash Reports Windows 8 The use of crash reporting systems, including stack traces • Visualizations of the attack surface approximation from the crashes, is becoming a standard industry practice2 for use during security reviews to find security- [24][26]. Bug reports contain information to help engineers related relationships between code replicate and locate software defects. Liblit and Aiken [20] introduced a technique automatically reconstructing complete The rest of the paper is organized as follows: Section II execution paths using stack traces and execution profiles. Later, discusses background and related work, Section III discusses Manevich et al. [21] added data flow analysis information on the stack trace data sources, Section IV presents our Liblit and Aiken’s approach. Other studies use stack traces to methodology, Section V discusses vulnerability prediction localize the exact fault location [22][23][24]. Lately, an model (VPM) construction for our evaluation, Section VI increasing number of empirical studies use bug reports and presents our results and a discussion of these results, Section crash reports to cluster bug reports according to their similarity VII discusses our lessons learned and challenges, Section VIII and diversity, e.g. Podgurski et al. [25] were among the first to presents limitations and threats to validity, and Section IX take this approach. Other studies followed [26][27]. Not all discusses future work. crash reports are precise enough to allow for this clustering. II. BACKGROUND AND RELATED WORK Guo et al. [28] used crash report information to predict which bugs will get fixed. Bettenburg et al. [29] assessed the quality Vulnerabilities can be seen as a special case of software of bug reports to suggest better and more accurate information defects [1]. Vulnerabilities tend to be sparser than general helping developers to fix the bug. software defects [41], as not all defects may allow an attacker With respect to vulnerabilities, Huang et al. [30] used crash to gain anything. In this section, we provide a brief overview of reports to generate new exploits while Holler et al. [31] used related work. historic crashes reports to mutate corresponding input data to A. Attack Surface find incomplete fixes. Kim et al. [32] analyzed security bug As mentioned previously, Howard et al. [17] provided a reports to predict “top crashes”—those few crashes that definition of attack surface using three dimensions: targets and account for the majority of crash reports—before new software enablers, channels and protocols, and access rights. Not all releases. areas of a system may be directly or indirectly exposed to the However, we are not aware of any study parsing stack outside. Some parts of a complex system, e.g. Windows OS, traces to identify and predict vulnerable source files or to may be for internal use only and cannot be reached or measure vulnerability exposure of software artifacts. exploited by an attacker. For example, installation routines are C. Defect Prediction Models left in the system after initialization, but they are never The goal of defect prediction models (DPMs) is to identify accessed again and are unlikely to have security implications code that is most likely to contain defects. For example, for the system. Nagappan and Ball [3] showed that code churn metrics can be Knowing the attack surface of a piece of software supports used to predict defect density, the number of defects per line of decision-making during all phases of software development. To code. Later, Zimmermann et al. [4] used code dependency date, approaches to empirical measurement of attack surfaces information and network metrics to classify defect prone code. have relied on manual effort or on alternative definitions of Other studies used change-related [1], developer-related [5], ‘attack surface’. Tools like Microsoft’s

Load more