Understanding Integer Overflow in C/C++

Appeared in Proceedings of the 34th International Conference on Software Engineering (ICSE), Zurich, Switzerland, June 2012. Understanding Integer Overflow in C/C++ Will Dietz,∗ Peng Li,y John Regehr,y and Vikram Adve∗ ∗Department of Computer Science University of Illinois at Urbana-Champaign fwdietz2,[email protected] ySchool of Computing University of Utah fpeterlee,[email protected] Abstract—Integer overflow bugs in C and C++ programs Detecting integer overflows is relatively straightforward are difficult to track down and may lead to fatal errors or by using a modified compiler to insert runtime checks. exploitable vulnerabilities. Although a number of tools for However, reliable detection of overflow errors is surprisingly finding these bugs exist, the situation is complicated because not all overflows are bugs. Better tools need to be constructed— difficult because overflow behaviors are not always bugs. but a thorough understanding of the issues behind these errors The low-level nature of C and C++ means that bit- and does not yet exist. We developed IOC, a dynamic checking tool byte-level manipulation of objects is commonplace; the line for integer overflows, and used it to conduct the first detailed between mathematical and bit-level operations can often be empirical study of the prevalence and patterns of occurrence quite blurry. Wraparound behavior using unsigned integers of integer overflows in C and C++ code. Our results show that intentional uses of wraparound behaviors are more common is legal and well-defined, and there are code idioms that than is widely believed; for example, there are over 200 deliberately use it. On the other hand, C and C++ have distinct locations in the SPEC CINT2000 benchmarks where undefined semantics for signed overflow and shift past overflow occurs. Although many overflows are intentional, a bitwidth: operations that are perfectly well-defined in other large number of accidental overflows also occur. Orthogonal languages such as Java. C/C++ programmers are not always to programmers’ intent, overflows are found in both well- defined and undefined flavors. Applications executing undefined aware of the distinct rules for signed vs. unsigned types in C, operations can be, and have been, broken by improvements in and may na¨ıvely use signed types in intentional wraparound compiler optimizations. Looking beyond SPEC, we found and operations.1 If such uses were rare, compiler-based overflow reported undefined integer overflows in SQLite, PostgreSQL, detection would be a reasonable way to perform integer error SafeInt, GNU MPC and GMP, Firefox, GCC, LLVM, Python, detection. If it is not rare, however, such an approach would BIND, and OpenSSL; many of these have since been fixed. Our results show that integer overflow issues in C and C++ be impractical and more sophisticated techniques would be are subtle and complex, that they are common even in mature, needed to distinguish intentional uses from unintentional widely used programs, and that they are widely misunderstood ones. by developers. Although it is commonly known that C and C++ programs Keywords-integer overflow; integer wraparound; undefined contain numerical errors and also benign, deliberate use behavior of wraparound, it is unclear how common these behaviors are and in what patterns they occur. In particular, there is I. INTRODUCTION little data available in the literature to answer the following Integer numerical errors in software applications can questions: be insidious, costly, and exploitable. These errors include 1) How common are numerical errors in widely-used overflows, underflows, lossy truncations (e.g., a cast of an C/C++ programs? to a in C++ that results in the value being int short 2) How common is use of intentional wraparound op- changed), and illegal uses of operations such as shifts (e.g., erations with signed types—which has undefined shifting a value in C by at least as many positions as its behavior—relying on the fact that today’s compilers bitwidth). These errors can lead to serious software failures, may compile these overflows into correct code? We e.g., a truncation error on a cast of a floating point value to refer to these overflows as “time bombs” because they a 16-bit integer played a crucial role in the destruction of remain latent until a compiler upgrade turns them into Ariane 5 flight 501 in 1996. These errors are also a source observable errors. of serious vulnerabilities, such as integer overflow errors in 3) How common is intentional use of well-defined OpenSSH [1] and Firefox [2], both of which allow attackers to execute arbitrary code. In their 2011 report MITRE places 1In fact, in the course of our work, we have found that even experts integer overflows in the “Top 25 Most Dangerous Software writing safe integer libraries or tools to detect integer errors are not always Errors” [3]. fully aware of the subtleties of C/C++ semantics for numerical operations. c 2012 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. wraparound operations on unsigned integer types? Table I EXAMPLES OF C/C++ INTEGER OPERATIONS AND THEIR RESULTS Although there have been a number of papers on tools Expression Result to detect numerical errors in C/C++ programs, no previous UINT_MAX+1 0 work we know of has explicitly addressed these questions, or LONG_MAX+1 undefined contains sufficient data to answer any of them. INT_MAX+1 undefined The closest is SHRT_MAX+1 SHRT_MAX+1 if INT_MAX>SHRT_MAX, Brumley et al.’s work [4], which presents data to motivate the otherwise undefined goals of the tool and also to evaluate false positives (invalid char c = CHAR_MAX; c++ varies1 2 error reports) due to intentional wraparound operations. -INT_MIN undefined (char)INT_MAX commonly -1 As discussed in Section V, that paper only tangentially 1<<-1 undefined addresses the third point above. We study all of these 1<<0 1 questions systematically. 1<<31 commonly INT_MIN in ANSI C and C++98; undefined in C99 and C++112,3 This paper makes the following primary contributions. 1<<32 undefined3 First, we developed Integer Overflow Checker (IOC), an 1/0 undefined open-source tool that detects both undefined integer behav- INT_MIN%-1 undefined in C11, otherwise undefined in practice iors as well as well-defined wraparound behaviors in C/C++ 1 2 The question is: Does c get “promoted” to int before being incre- programs. IOC is an extension of the Clang compiler for mented? If so, the behavior is well-defined. We found disagreement C/C++ [5]. Second, we present the first detailed, empirical between compiler vendors’ implementations of this construct. study—based on SPEC 2000, SPEC 2006, and a number 2 Assuming that the int type uses a two’s complement representation 3 Assuming that the int type is 32 bits long of popular open-source applications—of the prevalence and patterns of occurrence of numerical overflows in C/C++ programs. Part of this study includes a manual analysis of a cannot simply distinguish errors from benign operations by large number of intentional uses of wraparound in a subset checking rules from the ISO language standards. Rather, of the programs. Third, we used IOC to discover previously tools will have to use highly sophisticated techniques and/or unknown overflow errors in widely-used applications and rely on manual intervention (e.g., annotations) to distinguish libraries, including SQLite, PostgreSQL, BIND, Firefox, intentional and unintentional overflows. OpenSSL, GCC, LLVM, the SafeInt library, the GNU MPC and GMP libraries, Python, and PHP. A number of these II. OVERFLOW IN C AND C++ have been acknowledged and fixed by the maintainers (see Mathematically, n-bit two’s complement arithmetic is Section IV). congruent, modulo 2n, to n-bit unsigned arithmetic for The key findings from our study of overflows are as addition, subtraction, and the n least significant bits in follows: First, all four combinations of intentional and multiplication; both kinds of arithmetic “wrap around” at unintentional, well-defined and undefined integer overflows multiples of 2n. On modern processors, integer overflow is occur frequently in real codes. For example, the SPEC equally straightforward: n-bit signed and unsigned opera- CINT2000 benchmarks had over 200 distinct occurrences tions both have well-defined behavior when an operation of intentional wraparound behavior, for a wide range of overflows: the result wraps around and condition code bits different purposes. Some uses for intentional overflows are are set appropriately. In contrast, integer overflows in C/C++ well-known, such as hashing, cryptography, random number programs are subtle due to a combination of complex generation, and finding the largest representable value for and counter-intuitive rules in the language standards, non- a type. Others are less obvious, e.g., inexpensive floating standards-conforming compilers, and the tendency of low- point emulation, signed negation of INT_MIN, and even level programs to rely on non-portable behavior. Table I ordinary multiplication and addition. We present a detailed contains some C/C++ expressions illustrating cases that arise analysis of examples of each of the four major categories in practice. There are several issues; to clarify them we make of overflow. Second, overflow-related issues in C/C++ are a top-level distinction between well-defined (albeit perhaps very subtle and we find that even experts get them wrong. non-portable) and undefined operations. For example, the latest revision of Firefox (as of Sep 1, 2011) contained integer overflows in the library that was A. Well-Defined Behaviors designed to handle untrusted integers safely in addition to overflows in its own code.

Understanding Integer Overflow in C/C++

Integers in C: an Open Invitation to Security Attacks?

5. Data Types

Signedness-Agnostic Program Analysis: Precise Integer Bounds for Low-Level Code

Collecting Vulnerable Source Code from Open-Source Repositories for Dataset Generation

RICH: Automatically Protecting Against Integer-Based Vulnerabilities

Secure Coding Frameworks for the Development of Safe, Secure and Reliable Code

Integer Overflow

C Language Features

Practical Integer Overflow Prevention

Ada for the C++ Or Java Developer

Multistage Algorithms in C++

Security Analysis Methods on Ethereum Smart Contract Vulnerabilities — a Survey