Static security analysis based on input-related software faults Csaba Nagy Spiros Mancoridis Department of Software Engineering Department of Computer Science University of Szeged Drexel University Szeged, Hungary Philadelphia, USA [email protected] [email protected] Abstract scanning tool is available, it is difficult to locate security faults during a code review: in general, a good code review It is important to focus on security aspects during the can uncover around 50% of the security problems [9]. development cycle to deliver reliable software. However, This paper presents an approach to helping developers locating security faults in complex systems is difficult and locate faults that are related to security by identifying parts there are only a few effective automatic tools available to of the source code that involve user input. The focus is on help developers. In this paper we present an approach to the input-related parts of the source code, since attackers help developers locate vulnerabilities by marking parts of commonly exploit security vulnerabilities by passing mal- the source code that involve user input. We focus on input- formed input data to applications. Mishandling input data related code, since an attacker can usually take advantage can be a source of common security faults in many lan- of vulnerabilities by passing malformed input to the appli- guages that support pointer arithmetic such as C and C++. cation. The main contributions of this work are two metrics Examples of security faults are buffer overflows, format to help locate faults during a code review, and algorithms to string vulnerabilities, and integer overflows [19]. The best locate buffer overflow and format string vulnerabilities in C known and, arguably, the most dangerous security faults are source code. We implemented our approach as a plugin to caused by buffer overflows, which are described in an arti- the Grammatech CodeSurfer tool. We tested and validated cle published in 1996 [1], and appear in the literature as far our technique on open source projects and we found faults back as 1988 [12]. This type of vulnerability is still com- in software that includes Pidgin and cyrus-imapd. mon in software systems and is difficult to locate either au- tomatically or by a manual code review. Recent research has shown that code defects related to buffer overflows are 1. Introduction still frequent in open source projects [10]. The main contributions of this paper are: As IT solutions become more common, the security of • two metrics (input coverage, input distance), which software systems becomes an increasingly important con- can help developers during a code review to locate sideration. Companies are paying close attention to soft- functions that likely contain security faults, ware security, as evidenced by their increased focus on se- curity aspects during the development cycle. For instance, • two algorithms, one published previously [26] and a Microsoft has published its Security Development Lifecycle new algorithm we created, to locate buffer overflow standard, which is a Microsoft-wide initiative and manda- and format string vulnerabilities in C source code, tory policy since 2004 [25]. One important activity of SDL, and many other software development process models, is • a demonstration of the effectiveness of these metrics the use of static code-scanning tools during the implemen- and algorithms as they are applied to open source soft- tation phase [25, 9]. These code-scanning tools perform a ware projects. static analysis of the source code, without executingthe pro- grams built from the code, which is the domain of dynamic Our technique is implemented as a plugin to the analysis. Locating security faults in software is still difficult CodeSurfer product of GrammaTech, Inc. The technique to do even with the help of static analyzers. Moreover, there was validated on open source projects and successfully are only few effective automatic tools available that can help identified several faults in software including in Pidgin and developers locate security faults. Even if an effective code- Cyrus Imapd. The remainderof this paper is organizedas follows: Sec- 3. get a list of dangerous functions using metrics, tion 2 is an overview of our analysis technique. Section 3 describes the metrics and algorithms underlying our tech- 4. perform automatic fault detection to find vulnerabili- nique. The results of applying our technique on security- ties. critical open source projects are presented in Section 4. Sec- tion 5 presents related work. Lastly, the paper concludes 2.1.1. Locate I/O points. Input data can come from many and outlines plans for future work in Section 6. different sources, not only from the standard input. It can come from input character devices, Internet sockets, files 2. Overview in the file system, et cetera. In general, input points are statements used to read input data from an external source by calling a system function to perform an I/O operation. This section presents an overview of the technique em- The input data is often a string that is stored in a buffer that ployed for a static security analysis based on input-related has been allocated on the stack or the heap. faults. 2.1.2. Extract input-related program points. After lo- 2.1. Technique cating the input points in the source code, it is possible to determine how the input data travels from one statement to another statement. This can be done using dataflow analy- sis, a technique for gathering information about the possi- ble set of values calculated at various points in a program. Once we have the path for all input points, we can deter- mine which parts of the source code involve user input by computing the union of these paths. To perform dataflow analysis on C/C++ code we use the CodeSurfer tool of GrammaTech, Inc. 2.1.3. Get the list of dangerous functions. We can obtain a list of functions that warrant an increased scrutiny by de- termining which parts of the source code involve user input. We call the list of such functions dangerous functions. Figure 1. Illustration of input-related security To give developers more information about a dangerous faults. Faults related to user input are marked with function we measure its coverage as the percentage of its “bombs” indicating vulnerabilities. source code statements that are tainted by user input. We also measure the distance in the dataflow graph between the The main idea behind our approach is to focus on the entry point of the function and the origin of the input data input-related parts of the source code, since an attacker can (i.e., the statement where input occurs). These metrics are usually take advantage of a security vulnerability by passing used to rank the functions in order to identify the functions malformed input data to the application. If this data is not that are the most tainted by user input. handled correctly it can cause unexpected behavior while the program is running. The path which the data travels 2.1.4. Automatic fault detection. Automatic fault de- through can be tracked using dataflow analysis [20] to de- tection is performed by our technique to detect security termine the parts of the source code that involve user input. problems in dangerous functions. These fault detections Software faults, can appear anywhere in the source code, are based on algorithms that are applied to the code’s cor- but if a fault is somewhere along the path of input data it responding data dependence graph and can point to buffer can act as a “land mine” of a security vulnerability. (An overflow or format string vulnerabilities. illustration can be seen on Figure 1). The main steps of our approach (Figure 2) are the fol- 2.2. CodeSurfer lowing: Our technique is implemented as a CodeSurfer1 plugin. 1. find locations in the source code where data is read CodeSurfer is a powerful static-analysis tool for C/C++ pro- using a system call of an I/O operation. These calls are grams. This tool was chosen because it is able to cre- marked as input points, ate a wide range of intermediate representations [2] for a 2. get the set of program points involved in user input, 1http://www.grammatech.com Figure 2. The overview of our system. given program including: Abstract Syntax Tree (AST), Call ditional parameters to procedures. Graph, Interprocedural Control-Flow Graph (CFG), Points- A system dependence graph can be used for many pur- to Graph, set of variables used and modified for each func- poses such as code optimization [14], reverse engineering, tion, Control Dependence Graph, and Data Dependence program testing [6], program slicing [18], software quality Graph. The CodeSurfer tool can be extended with plugins assurance [17], and software safety analysis [29]. using its internal scripting language or its C/C++ API. This work employs SDG, and the extracted dataflow in- The most important feature of CodeSurfer, for the pur- formation stored in this representation, to determine the poses of this work, is that, after a whole-program analysis paths on which user-related input travels from its input is performed, CodeSurfer can build a precise system depen- point. dence graph [18] due to its pointer-analysis [3] capability. 3. Technique 2.3. System dependence graph Depending on the the application there are different def- This section describes our technique and presents details initions for program dependence graph (PDG) [18, 14, 22]. about how metrics and algorithms are used to locate buffer overflow and format string vulnerabilities. PDG is a directed graph representation (GP ) of a program (P ), where vertices represent program points (e.g., assign- ment statements, call-sites, variables, control predicates) 3.1.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-