Improving Software Security with a C Pointer Analysis

Improving Software Security with a C Pointer Analysis Dzintars Avots Michael Dalton V. Benjamin Livshits Monica S. Lam Computer Science Department Stanford University Stanford, CA 94305 {dzin, mwdalton, livshits, lam}@cs.stanford.edu ABSTRACT 1. INTRODUCTION This paper presents a context-sensitive, inclusion-based, Software vulnerabilities in C programs have led to tremen- field-sensitive points-to analysis for C and uses the analysis dous losses of productivity and have caused damages esti- to detect and prevent security vulnerabilities in programs. mated in billions of dollars. These vulnerabilities often stem In addition to a conservative analysis, we propose an opti- from the lack of type safety in the C language, with buffer mistic analysis that assumes a more restricted C semantics overflows and format string exploits being two well-known that reflects common C usage to increase the precision of security vulnerabilities. the analysis. Tools exist which audit programs for vulnerabilities; how- This paper uses the proposed pointer alias analyses to ever, the unsafe nature of C makes it difficult to create pre- infer the types of variables in C programs and shows that cise software analysis tools. In particular, pointer alias an- most C variables are used in a manner consistent with their alysis is an important component in any program auditing declared types. We show that pointer analysis can be used tool. The semantics of C can cause a sound pointer analysis to reduce the overhead of a dynamic string-buffer overflow to generate many false points-to relations. Existing software detector by 30% to 100% among applications with signifi- checkers often minimize false positive warnings by using un- cant overheads. Finally, using pointer analysis, we statically sound pointer analysis instead. The Metal compiler and the found six format string vulnerabilities in two of the 12 pro- Intrinsa system, known to find numerous serious vulnerabil- grams we analyzed. ities in widely used operating systems, assume that pointers are unaliased until proven otherwise with a local analysis [4, 9]. The same assumption was used in a buffer overflow Categories and Subject Descriptors detection tool by Wagner et al. [18]. However, by making D.2.5 [Software Engineering]: Testing and Debugging; such unsound assumptions, these tools may not find all pos- D.3.4 [Programming Languages]: Processors—Compil- sible vulnerabilities and cannot be used to guarantee that a ers; D.2.3 [Operating Systems]: Security and Protection program is safe. This paper explores the use of more advanced pointer alias analysis to create practical tools for improving software se- General Terms curity. We present a context-sensitive, inclusion-based, and Algorithms, Programming languages, Software security, field-sensitive points-to analysis for C. We apply the analysis Vulnerabilities, Software errors. to (1) infer the type of variables in C based on their usage, (2) reduce the overhead of a dynamic bounds checker, and (3) find format string vulnerabilities. Keywords Program analysis, context-sensitive, pointer analysis, type 1.1 Pointer Alias Analysis safety, error detection, software security, buffer overflows, Our analysis is based on a points-to algorithm devel- dynamic analysis, security flaws, format string violations. oped originally for Java [20] using a binary-decision-diagram (BDD) representation. Our analysis is context-sensitive, This material is based upon work supported by the National meaning that calling contexts along different acyclic call Science Foundation under Grant No. 0326227 and an NSF paths have distinct points-to relations. It is inclusion-based, Graduate Student Fellowship. meaning that two pointers may point to overlapping but different sets of objects. It is also field-sensitive, meaning that separate fields in an object have distinct points-to relations. The BDD representation efficiently handles the exponential Permission to make digital or hard copies of all or part of this work for number of contexts by exploiting their similarities. It also personal or classroom use is granted without fee provided that copies are accelerates inclusion-based analysis through efficient transi- not made or distributed for profit or commercial advantage and that copies tive closure computations [3, 20, 24]. Field-sensitive analy- bear this notice and the full citation on the first page. To copy otherwise, to ses are known to be much more precise than field-insensitive republish, to post on servers or to redistribute to lists, requires prior specific approaches. Our analysis, however, is flow-insensitive: the permission and/or a fee. ICSE’05, May 15–21, 2005, St. Louis, Missouri, USA. order of execution of program statements is not modeled. Copyright 2005 ACM 1-58113-963-2/05/0002 ...$5.00. Because of the semantics of the C language, pointer analysis for C is much more complex than that for Java. While threat in software systems today, consistently dominating Java enforces type safety by limiting user access to memory, CERT advisories [5]. Despite years of effort on this topic, C affords the user complete, direct control. A C programmer there is no practical static checker that can guarantee to can take the address of any field within an object, treat any find all buffer overflows automatically. The C semantics portion of memory as a contiguous region of bytes, perform make it difficult to create a dynamic bounds checker that can arbitrary address arithmetic, and perform arbitrary object ensure no buffer overflows occur without breaking existing casts, whether between variant types of a union or in com- programs. plete violation of declared type. Although many of these CRED (C Range Error Detector) is a recently developed actions may violate the letter or spirit of the C99 standard, dynamic bounds checker that has been shown to work prop- in practice most C compilers allow them. None of the pro- erly on over one million lines of C code[15]. Since buffer over- posed C pointer alias analyses are strictly sound because flow exploits tend to be transported as user input strings, such an analysis would conclude that most locations in me- CRED can be run with less overhead by checking only string mory may point to any location in memory. In particular, buffer overflows. Such an approach is shown to be effective they all assume that the relative placement of objects in me- against a testbed of 20 different buffer overflow attacks [21]. mory is undefined. Most programs satisfy this assumption, CRED is based on Jones and Kelly’s concept of referent which is part of the C99 standard, for the sake of portability. objects [11]. A pointer’s referent object is the object that it We propose in this paper a conservative pointer analysis, re- is intended to reference. A pointer arithmetic operation pro- ferred to as cons, that is sound for all programs satisfying duces a new address, but the result has the same referent ob- the above assumption. ject as the original pointer. When a pointer is dereferenced, Our conservative pointer analysis is likely to be imprecise it must point within the bounds of its referent object. CRED because programs frequently conform to the C99 standard. uses an object table to dynamically track the base and extent By imposing additional assumptions on the C programs that of each object. It instruments pointer arithmetic, derefer- reflect common usage, we can improve upon the pointer an- ences, and string-manipulation functions such as memcpy to alysis results. We thus also propose an optimistic analysis, associate referent objects with pointer addresses and check referred to as pcp (practical C pointers), that is not only that out-of-bounds pointers are not dereferenced. sound for programs adhering to the C99 standard, but also Normally, CRED enters all referent objects into the ob- for programs that may violate its type model. For example, ject table. To reduce the overhead of the strings-only CRED we use structural equivalence to determine type compatibil- system, we use our points-to results to inform CRED about ity, whereas C99 resorts to name equivalence for aggregates objects which do not contain string-typed values. This op- such as structures and unions. timization reduces the overhead of some of the benchmarks Like the analysis for Java, we express our points-to anal- by 30% to 100%. ysis in Datalog, a logic programming language used in de- ductive databases [17]. We use the bddbddb tool to translate 1.4 Format String Vulnerabilities the Datalog rules into efficient BDD operations [20]. The Another common security threat is the format string vul- resulting points-to relations can be used in subsequent Dat- nerability exploit. A program may be exploited if it passes a alog programs, making it easy to develop further algorithms user-supplied string as the format string argument to a sys- based on the results. tem function of the printf family. The user-supplied string may contain format specifiers that can cause the program to 1.2 Type Inference write to unintended memory locations. The static detection The declared types of C variables are hints indcating how of user-supplied format strings is an example of a taint an- the variables are likely to be used. Type inference deter- alysis, and requires the results of a pointer analysis. We use mines the actual types of objects by analyzing the usage of our pointer analysis to identify strings containing data that those objects. This is of great interest, since it can show may be supplied by the user, and to report instances of for- that a program is type safe and portable, or identify un- mat string arguments which point to these tainted strings. safe program operations that should be audited statically or We find actual format string vulnerabilities in two of our checked dynamically. benchmarks.

Improving Software Security with a C Pointer Analysis

Yutaka Oiwa. "Implementation of a Fail-Safe ANSI C Compiler"

WWW-Based Collaboration Environments with Distributed Tool Services

The Pros of Cons: Programming with Pairs and Lists

The Use of UML for Software Requirements Expression and Management

Java for Scientific Computing, Pros and Cons

A Metaobject Protocol for Fault-Tolerant CORBA Applications

LISTSERV 16.0 Site Manager's Operations Manual

Cognitive Abilities, Interfaces and Tasks: Effects on Prospective Information Handling in Email

Unix Quickref.Dvi

Chapter 15 Functional Programming

Site Manager's Operations Manual for LISTSERV®, Version 14.3

Subtyping on Nested Polymorphic Session Types 3 Types [38]