Automatic Integer Error Repair with Proper-Type Inference

IntPTI: Automatic Integer Error Repair with Proper-Type Inference Xi Cheng∗, Min Zhou∗, Xiaoyu Songy, Ming Gu∗, Jiaguang Sun∗ ∗School of Software, TNLIST, KLISS, Tsinghua University, China yElectrical and Computer Engineering, Portland State University, USA [email protected], {mzhou,guming,sunjg}@tsinghua.edu.cn, [email protected] Abstract—Integer errors in C/C++ are caused by arithmetic set, it could lead to unexpected runtime behaviors across operations yielding results which are unrepresentable in certain different architectures or optimization levels. For example, an type. They can lead to serious safety and security issues. Due overflow in signed addition silently wraparounds on x86 but to the complicated semantics of C/C++ integers, integer errors are widely harbored in real-world programs and it is error- traps on MIPS [5]. prone to repair them even for experts. An automatic tool is Related Work. Numerous automatic solutions for integer desired to 1) automatically generate fixes which assist developers errors have been proposed, but they have various limitations to correct the buggy code, and 2) provide sufficient hints to in real-world applicability. One thread of the related work help developers review the generated fixes and better understand integer types in C/C++. In this paper, we present a tool IntPTI focuses on integer error detection by symbolic execution [6], that implements the desired functionalities for C programs. [7], [8], static analysis [9] or code instrumentation [10], [11], IntPTI infers appropriate types for variables and expressions [12]. These tools produce reports on where integer errors are to eliminate representation issues, and then utilizes the derived and how to trigger them, but they are unable to guide deve- types with fix patterns codified from the successful human-written lopers to correct the buggy implementation. Generic program patches. IntPTI provides a user-friendly web interface which allows users to review and manage the fixes. We evaluate IntPTI repair techniques are proposed to automatically correct the on 7 real-world projects and the results show its competitive implementation with its specifications. They generate patches repair accuracy and its scalability on large code bases. The demo that address certain defects by, typically, validating heuristi- video for IntPTI is available at: https://youtu.be/9Tgd4A_FgZM. cally generated patches with test suites [13], [14], [15], [16], [17], [18], or synthesizing desired expressions with respect to Index Terms—integer error, type inference, fix pattern constraints derived from test suites [19], [20], [21], [17]. The I. INTRODUCTION effectiveness of these tools, however, heavily relies on specifications which are often insufficient in practice. Moreover, even In C/C++ programs, integer arithmetic operations (e.g. the state-of-the-art generate-and-validate systems do not scale addition and assignment) may produce results that the certain to large software systems with thousands of potential defects expression type cannot represent, and such values are conver- as they generally require hours to find a plausible patch for ted somehow to fit the target type. Some conversions are well- one real-world bug. Some tools are designed for integer errors defined (e.g. unsigned wraparound) by the language standard specifically [22], [23]. They transform the internal integer but others are undefined (e.g. signed overflow). Integer errors model of a program towards a safer model but an excessive are generally caused by misuse of well-defined conversions or number of unnecessary changes are made in the program. undefined behaviors due to developer’s empirical certainty of expected outcomes. Integer error is known to be one of the Approach. We present IntPTI, an automatic tool that ge- main threats to the safety and security of software system. A nerates and applies fixes for integer errors in C programs. It potential total power loss in Boeing 787 Dreamliners [1] was aims to assist developers and testers to improve code quality caused by the signed overflow of a 32-bit counter. Multiple against integer errors. First, IntPTI preprocesses the source integer errors in Linux kernel can be exploited for denial-of- files on the fly in the building process. Next, IntPTI computes service attacks [2] or privilege escalations [3]. A CVE report the appropriate types (i.e. proper-types) for variables and in 2007 [4] suggests that integer overflow error is the second expressions to eliminate representation issues and generates most common vulnerability in the advisories for OS vendor. fixes by utilizing proper-types. Then, users interact with IntPTI Challenge. It is error-prone to correctly repair integer errors via a web interface to review fixes. Finally, accepted fixes are even for experts due to the complicated semantics of integers collected and applied to the source code. Users can benefit in C/C++. The machine representation of an integer is a fixed- from IntPTI as it proposes fixes for possible integer errors size bit-vector restricted by its type-specific characteristics: with proper explanations, which helps users to (1) locate the signedness and width. Generally, the semantics over fixed-size new integer errors in code and repair them correctly, (2) better bit-vectors and Z are inconsistent. For example, (x − y > understand the integer types in C language. 0) () (x > y) holds over Z, but no longer holds over fixed- Our key approach is proper-type inference, which finds size bit-vectors owing to the overflow in x − y. Even worse, appropriate types for expressions and variables such that each not all integer arithmetic operations are well-defined. Although expression has the type that covers all its possible values. undefinedness grants compilers freedom to generate efficient The goal of proper-type inference is achieved by static value code by exploiting specific properties of a certain instruction analysis (§II-B, which approximates possible values of ex- 978-1-5386-2684-9/17 c 2017 IEEE 996 ASE 2017, Urbana-Champaign, IL, USA Tool Demonstrations Accepted for publication by IEEE. c 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. τ ::= & j τ∗ j void j structfτc1; : : : τcng j τ(τ; : : : τ) To derive proper-types, we scan the program and collect :: & : e = n j x j e:c j e♦e j (τ)e j ∗e j e j e = e j f(e; : : : e) constraints on the types of expressions and variables by proper- Fig. 1. The core language syntax. type inference rules. All non-arithmetic expressions keep their original types. Rules for arithmetic expressions are listed in Fig. 2. The type judgment Γ; Θ; Υ ` e : τ 7! C denotes that pressions) and type inference (§II-A, which computes types given the context (Γ; Θ; Υ), the type of e is inferred as τ along for expressions and variables with respect to the proper-type with the constraint set C. The context consists of the typing property and the well-typedness of program). Inferred types hypothesis Γ which maps variables to their declared types, Θ are utilized to generate fixes (§II-C) by common fix patterns that assigns arithmetic expressions with their enforced types, codified from the real world: sanity check, explicit type casting and Υ = ( · ; · ) where the former is computed by value and declared type changing. For the scalability in real-world analysis (§II-B)J K L andM the latter is given by C language data projects, IntPTI adopts multi-entry analysis (§II-D) to run model (e.g. data type width schemes, including LP32, ILP32, proper-type inference in the compositional manner. LP64, etc.) in use. The notation &1 &2 denotes that the byte To demonstrate the accuracy and runtime efficiency of length of &2 is no less than that of &1 (namely &2 elevates &1). IntPTI, we apply it to 7 widely-used open-source projects with E maps expressions to their original types. known integer vulnerabilities. The results show that IntPTI We give brief explanations for some rules. In the BINARY- succeeds in repairing 23 out of 25 defects. Furthermore, IntPTI ARITH rule, the type & is required to 1) be eligible to represent substantially addresses the limitations of existing tools as it: 1) possible values of e1♦e2, 2) be the common type of &1 and &2 does not rely on specifications such as test suites, 2) generates (namely &1 " &2) to preserve well-typedness. In the BINARY- fixes with proper explanations of why the fix is generated LOGICAL rule, however, operands of logical operation are and how it transforms the code, 3) reduces false-positives enforced to have their common type in order to prevent 1) (i.e. fixes that correspond to no genuine bugs) on the critical implicit conversion in comparison and 2) overflow bug in each program sites (where attacks are typically performed on the operand. The ASSIGN-VAR rule elevates the declared type of subject programs) by 93.3% compared to the state-of-the-art the variable to be assigned with respect to the right operand, approach [22], and 4) scales to large code bases as it spends while the ASSIGN-NONVAR rule enforces the non-variable no more than 11 minutes on Vim with over 244 KLOC. L-value to have its original type. In the LIBRARY-CALL rule, Contribution. Main contributions are summarized as: argument expression is enforced to fit its parameter type. The operand of an address-of operation keeps its original type to 1) We propose a novel approach that automatically generates prevent memory issues after repair.

Load more