Symbolic Object Code Analysis⋆

Software Tools for Technology Transfer manuscript No. (will be inserted by the editor) Symbolic Object Code Analysis? Jan Tobias Mühlberg1, Gerald Lüttgen2 1 IBBT-DistriNet, KU Leuven, Celestijnenlaan 200A, 3001 Leuven, Belgium e-mail: [email protected] 2 Software Technologies Research Group, University of Bamberg, 96045 Bamberg, Germany e-mail: [email protected] The date of receipt and acceptance will be inserted by the editor Abstract Software model checkers quickly reach their pointers” [3], perform poorly in the presence of point- limits when being applied to verifying pointer safety ers [38], or simply cannot handle certain software. A properties in source code that includes function point- particular challenging kind of software are operating sys- ers and inlined assembly. This article introduces a novel tem (OS) components such as device drivers, which are technique for checking pointer safety violations, called usually written in C code involving function pointers, Symbolic Object Code Analysis (SOCA), which is based pointer arithmetic, and inlined assembly. Further issues on bounded symbolic execution, incorporates path-sensi- arise because of platform-specific and compiler-specific tive slicing, and employs the SMT solver Yices as its ex- details concerning memory layout, padding, and offsets ecution and verification engine. Extensive experimental [2]. In addition, several approaches to model checking results of a prototypic SOCA Verifier, using the Verisec compiled programs given in assembly or bytecode [7,27, suite and almost 10,000 Linux device driver functions as 35,48,52] and to integrating symbolic execution [28] with benchmarks, show that SOCA performs competitively model checking [17,18,26,43,49] have been presented. to modern source-code model checkers, scales well when However, these are tailored to exploit specific character- applied to real operating systems code and pointer safety istics of certain programming paradigms such as object- issues, and effectively explores niches of pointer-complex oriented programming, or lack support for data struc- software that current software verifiers do not reach. tures, function pointers, and computed jumps, or require substantial manual modeling effort. This article introduces and evaluates a novel, au- tomated technique to identifying pointer safety viola- Symbolic Object Code Analysis 1 Introduction tions, called (SOCA). This technique is based on the symbolic execution [28] of compiled and linked programs. In contrast to other ver- One challenge in verifying complex software is the proper ification techniques, SOCA requires only a minimum of analysis of pointer operations. A recent study shows that manual modeling effort, namely the abstract, symbolic the majority of errors found in device drivers involve specification of a program’s execution context in terms of pointer safety [10]. Writing software that is free of mem- function inputs and initial heap content. Our extensive ory safety concerns, e.g., free of errors caused by pointers evaluation of a prototypic SOCA implementation shows to invalid memory cells, is difficult since many such issues that SOCA performs competitively to state-of-the-art result in program crashes at later points in execution. model checkers such as SLAM/SDV [4], SatAbs [12], or Hence, a statement causing a memory corruption may BLAST [20] on programs with “well-behaved” pointers, not be easily identifiable using conventional validation and also that it scales well when applied to “dirty” pro- and testing tools such as Purify [45] and Valgrind [40]. grams such as device drivers that cannot be properly Today’s static verification tools, including software analyzed with source-code model checkers. model checkers such as [4,11,12,20], are also not of much Technically, the SOCA technique traverses the pro- help: they either assume that programs do “not have wild gram’s object code in a systematic fashion up to a cer- ? An extended abstract of this article appeared in the proceed- tain depth and width, and calculates at each assembly ings of SPIN 2010: “Model Checking Software”, volume 6349 of instruction a slice [53] required for checking the relevant Lecture Notes in Computer Science, pages 4–21, Springer, 2010. pointer safety properties. It translates such a slice and properties into a bit-vector constraint problem, and ex- explores semantic niches that neither currently available ecutes the property checks by invoking the Yices SMT testing tools nor software model checkers reach. solver [15]. During program traversal, SOCA unwinds The remainder of this article is organized as follows. loops and follows function calls. Since address calcula- In Sec. 2 we outline the objectives and challenges for tion and the management of stack frames is completely our SOCA technique. We also provide background infor- revealed in object code, SOCA does not require a sepa- mation on Valgrind’s intermediate representation (IR), rate reasoning to account for interprocedural effects. To on which our tool is based. A detailed explanation of the best of our knowledge, SOCA is the only program SOCA follows in Sec. 3, where we discuss the translation verification technique reported in the literature that fea- from our intermediate representation to bit-vector con- tures full support for pointer arithmetics, function point- straints for the SMT solver Yices, and explain how the ers, and computed jumps, rather than the partial sup- constraint representation can be annotated with asser- port offered in static analysis tools with bounded address tions expressing pointer safety properties. We also give tracking [27]. While SOCA is based on existing and well- a high-level explanation of our slicing algorithm and dis- known techniques, combining and implementing these cuss the handling of register access, memory access, and for object-code analysis proved challenging. Much engi- computed jumps in SOCA. Sec. 4 is dedicated to a more neering effort went into our SOCA implementation, so detailed presentation of our algorithms in pseudo-code, that it scales to complex real-world OS code such as and to an overview of the design decisions that influenced Linux device drivers. the development of our prototypical SOCA implementation, the SOCA Verifier. Our experimental results are The particular combination of techniques in SOCA is reported in Sec. 5. We present an evaluation of the ef- well suited for checking pointer safety. Analyzing object fectiveness of the SOCA Verifier for finding bugs in the code is beneficial in that it inherently considers compiler Verisec benchmark suite, give details on the impact of specifics such as code optimizations, makes memory lay- the slicing algorithm, and analyze the SOCA Verifier’s out obvious, and does away with the challenge of han- scalability based on an extensive case study of Linux de- dling mixed input languages involving assembly code. vice drivers. Finally, we discuss related work in Sec. 6 Symbolic execution, rather than the concrete execution and our conclusions in Sec. 7. adopted in testing, can handle software functions with many input parameters whose values are typically not known at compile time. It is the existence of efficient 2 Pointer Safety, Aliasing & IR SMT solvers that makes the symbolic approach feasi- ble. Symbolic execution also implies a path-wise explo- ration, thus reducing the aliasing problem and allowing The verification technique developed in this article aims us to handle complex pointer operations and computed at ensuring that every pointer in a given program is valid (i) jumps. In addition, slicing is now conducted at path- in the sense that it never references a memory loca- level instead of at program-level, resulting in drastically tion outside the address space allocated by or for that (ii) smaller slices to the extent that abstraction is not neces- program, and respects the usage rules determined sary for achieving scalability. However, the price of sym- by the Application Programming Interfaces (APIs) em- bolic execution is that it must be bounded and can thus ployed by the program. only analyze code up to a finite depth and breadth. 2.1 Pointer Safety To evaluate our technique, we have implemented a prototypic SOCA tool, the SOCA Verifier, for programs There are six categories of pointer safety properties: compiled for the 32-bit Intel Architecture (IA32) and performed extensive experiments. Using the Verisec [33] – Dereferencing invalid pointers: A pointer must not benchmark we show that our verifier performs on par be NULL, shall be initialized, and shall not point to a with the model checkers LoopFrog [31] and SatAbs [12] memory location outside the address space allocated with regards to performance, error detection, and false- by or for the program; positive rates. We have also applied the SOCA Verifier – Uninitialized reads: Memory cells shall be initialized to 9,296 functions taken from 250 Linux device drivers. before they are read; Our tool is able to successfully analyze 95% of these – Violation of memory permissions: Permissions to a functions and, despite the fact that SOCA performs a program’s segment shall be observed, which are as- bounded analysis, 28% of the functions are analyzed ex- signed when the program is loaded into memory and haustively. Since heap-aware program slicing is key to determine whether a segment can be read, written, SOCA’s scalability, we also provide a detailed experi- or executed; mental evaluation of the slicing strategy implemented in – Buffer overflows: Out-of-bounds read and write op- our tool. Overall, SOCA proves itself to be a capable erations to objects on the heap and stack shall not technique when being confronted with checking pointer- occur, as this might lead to memory corruption and complex software such as OS components. It effectively give way to various security problems; 2 – Memory leaks: A program shall not lose all handles to looking at the program’s source code but valid from the dynamically allocated memory, as this would mean compiler’s point of view, since it assumes that the two that the memory cannot be deallocated anymore; pointers are pointing to different data objects.

Symbolic Object Code Analysis⋆

Executable Code Is Not the Proper Subject of Copyright Law a Retrospective Criticism of Technical and Legal Naivete in the Apple V

The LLVM Instruction Set and Compilation Strategy

Architectural Support for Scripting Languages

CDC Build System Guide

Using Ld the GNU Linker

Optimizing Function Placement for Large-Scale Data-Center Applications

LLVM, Clang and Embedded Linux Systems

Compiling.Pdf

Code Generation

ACUCOBOL-GT | Open Systems COBOL Compiler and Runtime System

Looking Under the Hood: Software

Safety Certification of Compiler Generated Object Code