AUTOMATIC REUSABILITY ANALYSIS OF BINARIES

RAMAKRISHNAN VENKITARAMAN and GOPAL GUPTA Applied Logic, Programming-Languages, and Systems (ALPS) Laboratory Department of Computer Science The University of Texas at Dallas Email: [email protected], [email protected] 1

We consider reusability of software component binaries. Reuse of code at the binary level is im- portant because usually only the machine code for system components is available; vendors do not want to share their for proprietary reasons. We develop necessary and sufficient conditions for ensuring that software binaries are reusable and relate them to the coding standards that have been developed in the industry to ensure binary code reusability. These coding stan- dards, in essence, discourage the (i) use of hard-coded pointers, and (ii) writing of non-reentrant code. Checking that binary code satisfies these standards/conditions, however, is undecidable, in general. We thus develop static analysis based methods for checking if a software binary satisfies these conditions. We also consider the problem of automatically checking if coding standards have been followed in the development of embedded applications. The problem arises from practical considerations because DSP chip manufacturers (in our case Texas Instruments) want various third party software developers to adhere to a certain coding standard to facilitate system inte- gration during application development. This static analysis rests on an abstract interpretation framework. Our analyzer takes object code as input, disassembles it, builds the flow-graph, and statically analyzes the flow-graph for the presence of dereferenced pointers that are hard coded. The analyzer is currently being extended to check for compliance with other rules adopted by TI as part of its coding standards. Categories and Subject Descriptors: C.3 [Computer Systems Organization]: Special-Purpose and Application-Based Systems—Real-time and Embedded Systems; D.2 [Software]: Software Engineering General Terms: Reliability, Standardization, Verification Additional Key Words and Phrases: Embedded Software Components, Executable code, Static Analysis, Abstract Interpretation, Assembly Code

1. INTRODUCTION Software components have received considerable attention in recent years. The dream is to develop a virtual marketplace of commercial-off-the-shelf (COTS) soft- ware components developed by third party vendors. To assemble new applications, developers merely choose the right components and glue them together, perhaps with small amount of additional code (glue code). Software components thus pro-

1This paper is an expanded version of the paper ”Static Program Analysis of Embedded Exe- cutable Assembly Code” that appears in ACM CASES’04

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year, Pages @firstpg–@lastpg. 2 · mote software reuse (plug-and-play) which helps reduce software development time, development cost, and the time-to-market for new software based systems. Most third party vendors are unwilling to provide the source code of the compo- nent due to proprietary reasons. Thus, in most cases, only binary code is available. Distributing software in binary form means that integration of the software with other applications does not require recompilation but only linking with the applica- tion. That is, application developers will use the API provided to call the functions available in the component, the application code will then have to be linked to the software components’ binary prior to being loaded in the main memory for execution. However, the problem that arises then is ensuring that the software component is written in such a way that its binary code can be freely reused. For example, the component should be written in such a way it does not change its own binary code during execution (i.e., no self-modifying code). In this thesis we are interested in analyzing the necessary and sufficient conditions under which a component binary can be reused.2 We are also interested in detecting automatically if a binary satisfies these conditions. Our work is motivated by practical concerns for software reuse in the digital signal processing (DSP) industry [8]. Texas Instruments (TI), world’s leading manufac- turer of DSP hardware, is interested in developing a marketplace for DSP software COTS component. However, most of the DSP code from vendors is available as a binary for DSP processors in the TI TMS320 family. DSP software developers tend to use low level optimizations to make their software very efficient. One has to en- sure that these low level optimization do not interfere with reusability. Researchers at Texas Instruments have developed “general programming rules” as part of their Express DSP Algorithm Interoperability Standard (XDAIS) [37] that defines a set of requirements for DSP code (for TI TMS320 family of DSP processors). If DSP soft- ware developers follow these coding guidelines, then it will be possible for system integrators to quickly assemble production quality DSP systems from one or more subsystems. The programming rules essentially lay down the restrictions that, if not followed, will result in code incompatibility during reuse (e.g., one rule restricts the usage of hard coded pointers in the program). In this paper we analyze necessary and sufficient conditions for software binary code reusability. We relate these conditions to the TI’s XDAIS standard. The necessary and sufficient conditions are derived from the fact that linking and loading of binaries is done under certain assumptions. The binaries must not execute any instruction that will violate these assumptions. Further, we are interested in developing automatic tools that will detect if a pro- gram is not compliant with these conditions. However, the compliance checking is undecidable in general (this constitutes the reason why TI has found it hard to develop tool for checking compliance of a program code with XDAIS standard [8]). We propose to use static analysis to perform this compliance check. However, static analysis of the program code is complicated by the fact that the source code of the

2Note that software reusability is a very broad term [17], however, for our work we are primarily interested in reuse of software binary code. In the rest of the paper, the term software reuse should be taken to mean software binary code reuse.

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 3 program is not available—vendors generally just ship their binaries that can be linked with other codes. Thus, to check for compliance assembly code has to be an- alyzed. It should be noted that static analysis of assembly code is quite hard, as no type information is available. Thus, for example, distinguishing a pointer variable from a data value becomes quite difficult. Also, most take instruction level parallelism and instruction pipelining provided by modern processors into ac- count while generating code. This further exacerbates the automatic static analysis of assembly code. Our static analysis framework is based on abstract interpretation. Thus, assembly code is abstractly interpreted (taking instruction level parallelism and pipelining into account) to infer program properties. A backward analysis is used since in most cases data type of a memory location has to be inferred by how the value it stores is used. Once a point of use is determined, the analysis proceeds backwards to check for the desired property. We illustrate our approach by considering how we statically analyze the presence of hard-coded pointers (rule 3 of the TI XDAIS [37] standard), i.e., how we check whether a pointer variable has been assigned a constant value by the programmer. We give details of the tool that we have built for this purpose. Other rules can be checked in a similar manner. Note that our goal in this project has been to produce an analyzer that can be used to check for compliance of large commercial quality program codes. Thus, all features of C that may impact the analysis have been considered (most DSP code is written in C). For example, in hard-codedness analysis, we have to consider cases where pointers are implicitly obtained via array declarations, pointers with double or more levels of indirections (e.g., int **p), pointers to statically allocated global data area, etc. Our work makes the following contributions:

—Necessary and sufficient conditions are developed for checking if a software binary code is reusable. —A static analysis based framework is proposed for checking if these reusabil- ity conditions are satisfied. We show that software binaries can be successfully analyzed via abstract interpretation based techniques, even in the presence of instruction level parallelism and pipelining. —The static analysis based framework is used to develop a practical system for compliance checking for one of the conditions (hard codedness of pointers).

We assume that the reader is familiar with abstract interpretation; tutorial intro- duction can be found in Chapter 1 of [1]. The main contribution of this paper is to show that embedded machine code can be successfully analyzed via abstract interpretation based techniques, even in the presence of instruction level parallelism and pipelining. We show that abstract interpretation based static analysis can be successfully used to check compliance with coding standards for improving code reuse and interoperability. Indeed, our work has resulted in a practical tool for doing hard-codedness analysis of embedded DSP software.

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 4 ·

2. REUSE OF SOFTWARE BINARIES Third party code is generally made available by supplying a software binary that is linked with the applications that makes use of it. Vendors are generally reluctant to share their source code due to proprietary reasons. If software component binaries are to be reused, then one must ensure that these binaries do not include code that will compromise the reusability of the code. Reusability can be compromised, for example, if the code contains hard-coded pointers, or if the code is self-modifying or modifies other binaries that are linked to it. Note that it is important to make a distinction between usability vs reusability of a software system. Reusability is a broad term [17] and its meaning may differ based on the context (for example, depending on the context, a software with no documentation can be considered non-reusable). By software usability we mean that the program has been debugged and errors removed to the extent possible. Usable software can be safely used in the context for which it was created. Note that when we use the term “reuse” in this paper, we are interested in using the software as is, that is without modification for compatibility with the new system. For example, if a programmer wrote a program for mobile phone of type A, it is considered usable if it can be safely used in mobile phone of type A. If some of the characteristics of the program make it unusable in mobile phone of type B, then we consider the program to be non-reusable. The presence of hard-coded addresses may not hinder the use of the software but may hinder the reuse of the software in other systems. Certain, programming idioms, if used, may appear to compromise the reusability of the software system, however, on closer analysis one would realize that they relate to issue of usability. For example, if a binary code makes array references that are out of bound, then this may appear as violating binary code reusability (the references may be to addresses outside the binary’s code or data area). Indeed, reusability is compromised if out of bound array references are made unintentionally. However, if out-of-bound array references are present unintentionally, then this means that the program still has software bugs that have not been removed. Thus, the software is not even usable, let alone reusable. In this paper we are not interested in issues of software usability. Once binary code is given, it goes through two more steps prior to execution: (i) linking this binary with other binaries (done by a linker); and, (ii) the final loading of the resultant executable in the main memory by the loader after address relocation. A linker primarily performs the task of symbol resolution (determining relative offsets for labels), while a loader adds the proper offsets to addresses so that the program can be loaded in the area of the virtual memory allocated by the Operating System. Linking and loading operations are performed under certain assumptions. Reusability is compromised if the execution of this binary code results in these assumptions being violated. A linker obviously assumes that the offset of a label will not change later. That is, given an instruction in a binary B1 involving a label, e.g., jump L1, where L1 is a label defined in another binary B2 that is being linked to B1, then the offset for L1 should not change after linking. That is, the code starting from location L1 will not be relocated somewhere else after the linking phase is over. Similarly, a loader assumes that an executable can be loaded anywhere in the virtual memory. Thus,

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 5 to ensure reusability of binary code the following two conditions must hold: C1: The binary code should not change during execution in a way that link-time symbol resolution will become invalid. C2: The binary code should not be written in a way that it needs to be located starting from some fixed location in the virtual memory. We assume that no additional information is given with respect to conditions under which a software component binary is to be used, apart from a specification of the API. Note that if the code modifies itself during execution, then it is not re-entrant. Similarly, program code that hard-wires code or data location violates the assump- tion that the code can be loaded anywhere in virtual memory. In the case of hard-wiring, arguably reusability can be maintained, provided the set of hard-wired locations is known in advance. However, given that a loader may load a program anywhere, these hard-wired addresses can clash with the code itself (i.e., they may fall in the code area). Thus, the code is not reusable if it violates the assumption made by the loader. Theorem: Conditions C1 and C2 are necessary for binary code reusability. Proof: The proof by contradiction is straightforward. These conditions are also sufficient, because they cover all the assumptions made during linking and loading. Note, however, that the necessary conditions above are hard to characterize and even harder to detect. Thus, in practice we broaden these conditions and consider more general conditions that are easier to characterize and detect. A broader condition that captures C1 is that the binary code should be re-entrant. Similarly, for C2 it is sufficient to check that there are no hard coded memory addresses in the program. Thus, checking for reusability can be reduced to checking for the following conditions: C3: The binary code is re-entrant. C4: The binary code does not contain any hard-wired memory addresses. Note that code re-entrance is a very useful way of characterizing conditions for reusability, because very often the same component binary may be executed by mul- tiple threads or processes. Code re-entrance implies that such concurrent execution can take place safely. Theorem : If conditions C3 and C4 hold, then the binary code is reusable (i.e., C3 and C4 are sufficient). Proof: We will show that if C1 (resp. C2) does not hold then C3 (resp. C4) does not hold either. If C1 does not hold, then the symbol mapping for address labels determined at link-time does not hold at execution time. This implies that the symbol mapping was altered at execution time, i.e., the binary code got altered during execution, which in turn implies that the code is non re-entrant. Similarly, if C2 does not hold, then there must be some address in the binary code that is used during the execution that is fixed. Thus, C4 does not hold. 2 Note that while C1 implies C3 and C2 implies C4, the implication does not necessarily hold in the other direction. Thus, the code may not be re-entrant yet may be reusable, as long as the modifications made to the binary at run-time are such that the symbol mapping is not altered and only one thread uses the binary

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 6 · code at any given time. Likewise, hard wired addresses may be present, yet the code may still be loaded any where as long as the specific hard-wired addresses are known and they do not interfere with the area where the code is loaded. Thus, C3 and C4 are sufficient conditions, but not necessary conditions. Finally, note that the application must be re-entrant as a whole. Checking for re- entrancy of a component binary may not be enough, because some other component binary may modify it during execution. Thus, each component binary when checked in isolation appears to be re-entrant, but when put together, it is not re-entrant.

3. COMPLIANCE WITH THE XDAIS PROGRAMMING STANDARD The Texas Instruments’ (TI) Express DSP Algorithm Interoperability Standard (XDAIS) [40] defines a set of requirements for DSP code (for TI TMS320 fam- ily of DSP processors) that, if followed, will allow system integrators to quickly assemble production quality embedded systems from one or more subsystems. The standard aims to provide a framework that will enable the development of a rich set of Commercial Off-The-Shelf (COTS) marketplace for the DSP components technology for the TI TMS320 family of DSP chips. This will significantly reduce the time to market for new DSP based products and that will encourage reuse. There are 34 rules and 15 guidelines defined by the standard that program code developed by a third party DSP software vendor should follow. Rules 1 to 6 in the standard fall under the category of “general programming rules” (section 4) and pertain to standards that should be followed during program coding. Tools are available from TI that check for the compatibility of a program code with most of the non-programming rules (i.e., rules other than 1 to 6). The non-programming rules are relatively straightforward and hence easy to check automatically (e.g., rule 11 states that all modules must have an initialization and finalization method). Compliance of a program code with general programming rules, in contrast, is much harder. In fact, this compliance checking is undecidable in general (this constitutes the reason why the DSP industry has found it hard to develop tool for checking compliance of a program code with rules 2 through 6 [40]). As a result, no tool has been developed for checking compliance of a program code with rules 1 through 6. We propose to use static analysis to perform this compliance check. However, static analysis of the program code is complicated by the fact that the source code of the program is not available—vendors generally just ship their binaries that can be linked with other codes. Thus, to check for compliance assembly code has to be analyzed. This paper describes our approach to checking a program’s binary code for com- patibility with the “general programming rules” defined by the standard. Our approach is based on statically analyzing the disassembled code. Static analysis of assembly code is quite hard, as no type information is available. Thus, for ex- ample, distinguishing a pointer variable from a data value becomes quite difficult. Also, most compilers take instruction level parallelism and instruction pipelining provided by modern processors into account while generating code. This further exacerbates the automatic static analysis of assembly code.

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 7

4. GENERAL PROGRAMMING RULES The coding standard rules, published by TI for software vendors of its DSP chips, that fall under the category of “general programming rules” [40] are the following: (1) All programs must follow the runtime conventions imposed by TI’s implemen- tation of the C programming language. (2) All programs must be reentrant within a preemptive environment including time sliced preemption. (3) All data references must be fully relocatable (subject to alignment require- ments). That is, there must be no “hard coded” data memory locations. (4) The code must be fully relocatable. That is, there can be no hard coded program memory locations. (5) Programs must characterize their ROM-ability; i.e., state whether they are ROM-able or not. ROM-ability means that if part of the executable is placed in the DSP ROM, it would still function; this restricts the way global data can be accessed, etc [40]. (6) Programs must never directly access any peripheral device. This includes but is not limited to on-chip DMA’s, timers, I/O devices, and cache control registers. Rule 1 is not really a programming rule, since it requires compliance with TI’s definition of C. However, rules 2 through 5 are manifestations of conditions C3 and C4 above. Thus, rules 2 and 5 correspond to condition C3 while rules 3, 4, and 6 correspond to condition C4. In light of these conditions, and examining the entire instruction set of TI’s TMS320 family of DSP processors, one can show that indeed the XDAIS standard is sufficient for ensuring that binary code that is compliant with it is reusable. There are a number of advantages to DSP software vendors writing programs that comply with the published standards [40]. Compliance to standards (1) Allows system integrators to easily migrate between TI DSP chips (2) Enables host tools to simplify a system integrators tasks, including configura- tion, performance modeling, standard conformance, and debugging (3) Subsystems from multiple software vendors can be integrated into a single sys- tem (4) Programs are framework-agnostic [40], that is, they are reusable: the same program can be efficiently used in virtually any application or framework (5) Programs can be deployed in purely static as well as dynamic run-time envi- ronments (due to code relocatability)

5. AUTOMATIC REUSABILITY ANALYSIS Next, we are interested in developing tools that automatically detect if a binary code is reusable. This entails automatically determining if any of the 5 program- ming rules above are not complied with. Detecting if rules 3, 4 or 6 are violated involves checking that there are no hard-coded references in the code. Checking for rules 2 and 5 involves ensuring that no writes are made to the code area during execution. Checking for hard codedness or checking that no writes are made to

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 8 · a specific memory area is undecidable [18] in general. Thus, one has to resort to approximating this automated checking. A standard method is to use static anal- ysis [3]. Static analysis however is complicated by the fact that only binary code is available. All the type information is lost in the binary code, thus even determining if a value is an address or data is not quite that easy. In the rest of the paper we consider the problem of detecting hard-codedness and develop an abstract inter- pretation based framework for detecting hard-coded references. A similar analysis can be developed to check for code re-entrance (that is, for checking that no writes are made to the code area; see the work by Debray [15], for instance).

6. ABSTRACT INTERPRETATION BASED STATIC PROGRAM ANALYSIS Using hard coded memory addresses is considered a bad programming practice (except in some situations, e.g., while programming device drivers), and we would like to automatically detect such hard coded references in order to certify a program as relocatable and/or reusable. Detecting if a given pointer points to a fixed memory address is undecidable in general (the proof is straightforward and omitted). Static Program Analysis is a standard tool used to analyze properties that are undecidable. Given this undecidability, any static program analysis to detect hard-codedness will only give approximate answers. Static program analysis (or static analysis for brevity) is defined as any analysis of a program carried out without completely executing the program. Static analysis provides significant benefits and is increasingly recognized as a fundamental tool for analyzing programs [19]. The traditional data-flow analysis found in back-ends is an example of static analysis [3]. Another example of static analysis is abstract interpretation, in which a program’s data and operations are approxi- mated and the program abstractly executed (abstraction is done in a way to ensure termination) to collect information [13; 1].

Fig. 1. Interpretations using Static Analysis

Since static analysis can only be approximate, decisive results cannot always be obtained. For example, when doing analysis to detect whether the code is hard coded or not, two approaches are possible: (1) The approximation is done in such a way that non hard-coded references may be declared as hard-coded, but hard-coded references will never be declared as non hard-coded. Thus, when the analysis result says “not hard coded” then the

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 9

algorithm is certainly compliant but if it says “hard coded” then it may or may not be compliant. (2) The approximation is done in such a way that hard-coded references may be declared as non hard-coded, but non hard-coded references will never be de- clared as hard-coded. Thus, when the analysis result says “hard coded” then the program code is certainly not compliant but if it says it’s “not hard coded” then it may or may not be compliant. Thus, static analysis based approaches will give the correct solution in most of the cases but may fall into the area of uncertainty for certain kinds of inputs (see Figure 1). Our experience indicates that for most practical programs, hardcoded- ness analysis can precisely indicate whether an assignment to a pointer is hardcoded or not hardcoded. For the static analyzer to place an assignment to a pointer in the “not sure” area (Figure 1, the code has to be really convoluted. That is, a programmer would have to go to great lengths to fool the static analyzer to pre- vent it from making a precise determination. Our experience indicates that such convoluted code is rarely written by programmers in practical applications.

7. DOMAINS IN OUR STATIC ANALYSIS In abstract interpretation based static analysis [13; 1], domains from which variables draw their values are approximated by abstract domains. The original domain is called a concrete domain. Further, for each operation over these domains, a corresponding abstract operation is defined over the abstract domain. A program is represented by a flow chart, and its semantics is given via a mapping from arcs that connect nodes of the flow chart to environments, where an environment is a mapping from variables to values in the concrete domain. In the abstract semantics, the environment is abstracted as a mapping from variables to values in the abstract domain. A state is defined as a pair harc, envi where arc is an arc in the flow chart and env is the environment that exists along that arc at a given moment. The same arc may be in different states (depending on the values of the variables), at different moments during the execution. The function next maps a given state to the next state that will be reached in the execution. The meaning of the program P is solution to the recursive equation: P = next • P which is given by: fix(λf.next • f) In the abstract interpretation framework [13], a collecting semantics is used, i.e., we consider the set of all the abstract environments that might be associated with a program point (an arc in the flowgraph). This set of abstract environments is called a context. The collecting semantics thus associates a context with each arc. A context is a member of the powerset of the set of all environments. Contexts = 2Env The context associated with a particular arc is the set of all environments that can exist along that arc, if we started execution from any of the initial nodes of the flow graph.

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 10 ·

Fig. 2. Lattice Abstraction

The collecting semantics is not computable because it gives exact information, it is therefore approximated. Given a pointer variable that is dereferenced, we are only interested in whether this pointer was hard coded earlier or not. Let A be the set of all memory addresses. An environment maps a pointer variable to an address in the set A (for hard codedness analysis we are only interested in pointer variables). A can be divided into two sets Anhc and Ahc, where Anhc represents legitimate (i.e., not hard coded) memory addresses (those that might be returned by systems calls malloc, calloc or realloc, or returned by address-of operation, e.g., & operator in C), and Ahc represents the rest of memory addresses. Thus, Anhc ∪Ahc = A and Anhc ∩ Ahc = φ. We approximate the domain of addresses by an abstract domain, Aα where Aα = {⊥, hc, nhc, >}. Note that hc abstract the elements in the set Ahc while nhc abstracts the elements of the set Anhc; the value ⊥ represents complete lack of information, while > denotes that we cannot decide whether the value is hc or nhc. Aα forms a lattice as shown in Figure 2. Note that we assume that NHC addresses are those that are derived from calls to memory allocation routines (malloc, etc.), and we also assume that any offset from an NHC address is also NHC. Thus, the sets Ahc and Anhc are determined by each program and may vary from execution to execution depending on where the heap and stack are allocated. Our analysis is able to cope with this, since it regards any address derived from malloc, calloc, realloc and the & operator to be safe, while any address directly assigned in the program is regarded as unsafe. Following the abstract interpretation approach, we define the abstraction and the concretization functions, α and γ, respectively. α : Contexts → Abstract Contexts, where Abstract Contexts consists of abstract environments which map pointer variables to values in Aα. α(C) = ⊥,C = {}; = nhc, C ⊆ Anhc; = hc, C ⊆ Ahc; = > otherwise; γ : Abstract Contexts → Contexts γ(S) = {},S = ⊥,; = Ahc,S = hc; = Anhc,S = nhc; ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 11

= A, otherwise We next have to abstract the operators involving pointers. Pointers can be involved in pointer arithmetic expressions that use + and - operations. Given a statement p = q + i where p and q are pointers to integers and i an integer, then if q is hard-coded, our analysis should infer that p is also hard-coded. Likewise, if q is NHC, our analysis should infer that p is also NHC. However, given pointers p, q, and r, and p = q + r, then for p to be inferred as hard-coded, both q and r must be hard-coded. If, say, q is hard coded but r is not, then q must be treated as an offset from the safe pointer r, and our analysis should infer that p is not hard coded. With this in mind the definition of abstracted + and - operations for pointers is shown in table I.

+/- hc nhc ⊥ > hc hc nhc ⊥ > nhc nhc nhc nhc nhc ⊥ ⊥ nhc ⊥ ⊥ > > nhc ⊥ >

Table I. Abstracting Pointer Arithmetic

Once the abstract operators are defined, we can compute the abstract semantics of the program by computing the fix point of the recursive equation:

Pα = next • Pα where Pα is the abstract program (the abstract program ignores instructions that are determined to not affect a pointer value). Note that the abstract contexts (Abstract Contexts) forms a lattice under the subset relation, ⊆, where the join (t) and meet (u) operations correspond to the set union and set intersection operations respectively. The join operation is used to merge the abstract contexts of multiple arcs leading into a common node. The abstract semantics once computed, tells us which pointer references are hard coded. Finally, we have to show that our analysis is sound. The soundness of the analysis follows if we can show that α and γ are mutually consistent and that Abstract Contexts form a lattice. Theorem: The Hard-codedness Analysis formulated above is sound. Proof Sketch: Consistency of α and γ functions is established by showing that they constitute a Galois connection [13]. That is: x = α(γ(x)) ...(1) γ(α(y)) ⊇ y ...(2) It is easy to check that both (1) and (2) hold. Given the way Abstract Contexts are defined, it is also easy to see that the set of Abstract Contexts forms a lattice under the subset (⊆) relation. Thus, our analysis is indeed sound. 2

8. ANALYSIS OF HARD-CODED POINTERS We illustrate our static analysis based approach to compliance checking by showing how we check compliance for rule #3, which states that there should be no hard- coded data memory locations (analysis for rules #4 and #6 can be carried out in a

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 12 · similar manner). A data memory locations is hard coded in the assembly code if a constant is moved into a register Ri, and Ri is then used as a base register in a later instruction. The constant value may of course be transferred to another register Rj directly or indirectly, and then Rj used later in dereferencing. Since most of the TI’s DSP code is written in C, data memory locations can be hard coded either in assembly code embedded in a C program, or by using pointers provided in the C language. Thus, the problem of detecting hard coded references is to check whether a pointer variable that is assigned a constant value is dereferenced or not in the program. Thus, using the ‘C’ syntax for illustration purposes, given a variable p of type ‘(int *)’, we want to check if there is an execution path between a statement of the type p = k, where k is an expression that yields a constant value, and a later statement containing *p (that dereferences a pointer). Of course, the dereferencing may take place directly or indirectly, i.e., we might have an intervening statement (int *)q = p followed by a later statement containing *q. Note that if a program only hard codes a pointer variable but never dereferences it, the program is deemed safe. It is only after such a pointer variable is dereferenced during subsequent execution, that the program is deemed unsafe. The analyzer adopts a set of criteria to decide whether a dereferencing is hard coded or not. As mentioned earlier, a register corresponding to a pointer is hard coded if its assigned a constant value as in the statement “Reg=0x8800”. Some of the ways in which an address could be legitimate are: —If the address is derived from a call to memory allocation routines like “malloc” or “calloc” or other legitimate calls —If the address is derived as a function of the “stack pointer” —If the address is derived from another pointer that is legitimate. Based on the above the following cases can be distinguished. We illustrate each case with a simple example. —A dereferenced pointer is NOT Hard Coded even if one of the elements in the unsafe set is not hard coded. Example: Consider the following ‘C’ code fragment (Note: Some examples through- out this document are given as ‘C’ code fragments for clarity though the real analysis is done at the assembly code obtained from object code) int *p, *q, *r, val; //line 0 p = (int*)malloc(sizeof(int)); //line 1 q = (int*)0x8000; // line 2 r = p + q; // line 3 val = *r; // line 4 In the above example, the pointer “p” is legitimate and the pointer “q” is hard coded. The pointer “r” is derived from the pointers “p” and “q”. The pointer “r” is dereferenced and is assigned to the variable “val”. Clearly, “r” is not a hard coded pointer as it is derived as a function of a legitimate pointer “p”. When the variable “r” is dereferenced in line 4, the unsafe set will get initialized to {r}. When the assignment statement in line 3 is encountered, the unsafe set gets modified to {p,q}. Analysis of line 2 concludes that “q” is hard coded and

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 13

analysis of line 1 concludes that the pointer “p” is legitimate. Since “p” is safe and “r” is derived as a function of “p”, dereferencing of the pointer “r” in line 4 is also ”safe”. —A dereferenced pointer is Hard Coded if and only if all the elements in the corresponding unsafe set are detected to be hard coded Example: Consider the following ‘C’ code fragment. int *p, *q, *r, val; //line 0 p = (int*) 0x8800; //line 1 q = (int*)0x8000; // line 2 r = p + q; // line 3 val = *r; // line 4 In the example above, the pointers “p” and “q” are hard coded. The pointer “r” is derived from pointers the “p” and “q”. The pointer “r” is dereferenced and is assigned to the variable “val”. Clearly, “r” is hard coded as it is derived from two pointers that were hard coded. When the variable “r” is dereferenced in line 4, the unsafe set will be initialized to {r}. When the assignment statement in line 3 is encountered, the unsafe set gets modified to {p,q}. Analysis of line 2 concludes that “q” is hard coded and further analysis reveals that the pointer “p” is also hard coded (line 1). Since all the elements in the unsafe set namely “p” and “q” are hard coded, the original dereferencing of “r”, to which the current unsafe set corresponds to, is also declared ”hard coded”. —A program is said to be hard coded even if one of the pointers that’s dereferenced is detected to be hard coded. Example: Consider the following ‘C’ code fragment. int *p, *q, v1, v2; //line 0 p = (int*) malloc(sizeof(int)); // line 1 q = (int*) 8000; // line 2 v1 = *p; // line 3 v2 = *q; // line 4 In the example code above there are two pointers out of which, clearly “p” is legitimate and “q” is hard coded. The analyzer first detects the dereferencing of the pointer “q” (line 4) and after analysis concludes that the dereferencing of that pointer is safe (since the address is derived from a call to the memory allocation routine “malloc” in line 1). At this point the analyzer cannot declare that the program is safe and is free of hard coded pointers. The analyzer needs to make sure that other pointer dereferencing that occurs in the program is also safe. The dereferencing of the pointer “q” is unsafe as the pointer “q” gets hard coded in line 2. So, though the assignment in line 3 is safe, the unsafe assignment in line 4 makes the entire program unsafe. —A program is NOT Hard Coded if and only if all the pointers that are deref- erenced are detected to be not hard coded. Example: Consider the following ‘C’ code fragment. int *p, *q, v1, v2; //line 0

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 14 ·

p = (int*) malloc(sizeof(int)); // line 1 q = (int*) calloc(1, sizeof(int)); // line 2 v1 = *p; // line 3 v2 = *q; // line 4 In the example code above, there are two pointer declarations and both of them are derived legitimately (lines 1 and 2). The analyzer will detect the dereferencing of the pointers in lines 3 and 4 respectively. Further analysis of the code will reveal that the pointers are not hard coded (lines 1 and 2). Since there are only two pointers that are dereferenced in the code and since both of them are safe, the entire code is safe. Clearly, the problem of detecting hard coded references is undecidable in gen- eral [18]. So we employ static analysis for detection of hard-codedness (from this point on, we’ll call the analysis hard-codedness analysis). The hard-codedness analysis analyzes each pointer variable and determines if the pointer is definitely hard-coded (HC), definitely not hard-coded (NHC), or that its hard-codedness status cannot be deduced. Thus, the analysis is conservative (as all static anal- yses are), in that there are instances where a determination of hardcodedness/- non-hardcodedness cannot be precisely made. In practice, however, we have ob- served that our analysis—described in subsequent sections—is able to make a pre- cise determination of hardcodedness/non-hardcodedness for nearly all pointer vari- ables occurring in programs. Thus, in the benchmark programs we obtained from Texas Instruments, we rarely come across pointers whose hard-codedness or non- hardcodedness status could not be precisely determined (see Section 14). Note also that the problem of detecting dereferencing of hard-coded pointers sub- sumes the problem of detecting dereferencing of NULL pointers. This is because a NULL pointer is a pointer that has been assigned a special constant (usually 0x0). Thus our analysis will also detect NULL pointer dereferences. Similarly, hard- codedness analysis subsumes analysis for checking if un-initialized pointer variables are dereferenced. This is because hard-codedness analysis attempts to check if a pointer dereference is reachable from a point of initialization; and thus will detect any pointers that are dereferenced but not initialized. Thus, our hard-codedness analysis performs two of the checks proposed by the UNO project [23] at the as- sembly level. The UNO project claims that NULL pointers, un-initialized pointers, and array out of bounds reference are three most common run-time programming errors. When analyzing at the assembly level, all the type information is unavailable, and distinguishing between constants stored in integer variables from constant addresses stored in pointer variables is hard. The only way to distinguish between the two is to check if a register is dereferenced or used as a base register at some program point, and if so, we can go backwards from that program point and check to see if this register was directly or indirectly assigned a constant. We use an abstract interpretation [13; 1] based framework for static analysis. The abstract domain is quite simple and consists of four values: ⊥, >, HC and NHC. Ab- stract operators are defined for pointer arithmetic based on this abstract domains, and the abstract values propagated in the flowgraph of the program (obtained by disassembling the machine code, and analyzing the control flow). The abstraction is

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 15

Fig. 3. Activity Diagram for our tool shown to be safe (by constructing a Galois connection [13; 1]). Abstract semantics of the program are defined via recursive equations. The “collecting semantics” of the flow-graph is then computed via fix-points, which then allows us to check for hard codedness. For most programs, our system is able to detect occurrences of hard-codedness with good accuracy.

9. THE ANALYSIS ALGORITHM As discussed earlier, the analyzer has access only to the object (binary) code which is to be checked for compliance with the standard. So, the given object code is disassembled and the corresponding assembly language code is obtained. The dis- assembly is performed using TI Code Composer Studio [38; 39]. The disassembled code is provided as input to the static analyzer which produces a result which indicates whether the code is compliant with the rule. To check for compliance with the standard, the binary code that is given as input is never executed. The analyzer scans through the disassembled code statically and checks whether there are any hard coded addresses. The basic aim of the analysis is to find a path in the disassembled code from the point in which the dereferencing of a pointer occurs to the point at which an address is assigned to the pointer and then check whether that address is legitimate or not. Figure 3 shows the various steps involved in the analyzer we have developed. After the disassembly of object code, the assembly code is split into functions. The analysis is first done function by function (corresponds to individual functions in the source code). For each function, the analyzer computes its basic blocks and constructs the flow graph. The flow graph is then statically analyzed. Once infor- mation for various functions (including main) has been collected, an interprocedural analysis should be carried out, to analyze the entire program. The functions may have to be re-analyzed (until a fix-point is reached) during this interprocedural analysis. It should be noted that there are advantages as well as disadvantages of perform- ing static analysis at the assembly level. With respect to advantages, note that (i) parsing of source code is replaced by parsing of assembly instructions, which is considerably easier; (ii) in the source code pointers may occur in complex expres- sions (e.g., a pointer in a struct inside another struct), which makes the analysis complex at the source code level; at the assembly level all pointers manipulations are handled via registers, and thus all occurrences of pointers appear very similar, regardless of how they were expressed in the source code; (iii) since we analyze assembly code, which is source language independent, many of the complex con- structions allowed by C (e.g., an array of pointers where each pointer points to a

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 16 · structs one of whose field is a pointer) get handled by the compiler and appear in considerably sanitized form in the assembly code. With respect to disadvantages, (i) all type information is lost at the assembly level, so addresses are indistinguishable from data, thus abstracting information becomes harder, complicating the analysis. (ii) Registers are repeatedly reused, for holding values of different variables, and thus a lot of analysis is involved in figuring out which occurrences of a register in various instructions refers to the same value; (iii) Assembly code is much more verbose than source code, as well as it heavily employs pipelining and instruction level parallelism; thus, a very good understand- ing of the processor architecture is needed which needs to be then modeled in the abstract semantics. As an example, consider the following C code: void main(){ int a=1, b=2, c=3; } which translates to the assembly code shown below: 000007A0 main: 000007A0 07BE09C2 SUB.D2 SP,0x10,SP 000007A4 02002042 MVK.D2 1,B4 000007A8 0200012B MVK.S2 0x0002,B4 000007AC 023C22F6 || STW.D2T2 B4,*+SP[0x1] 000007B0 020001AB MVK.S2 0x0003,B4 000007B4 023C42F6 || STW.D2T2 B4,*+SP[0x2] 000007B8 023C62F6 STW.D2T2 B4,*+SP[0x3] 000007BC 00002000 NOP 2 000007C0 008C8362 BNOP.S2 B3,4 000007C4 07800852 ADDK.S2 16,SP 000007C8 00000000 NOP 000007CC 00000000 NOP The presence of the k characters denotes instruction level parallelism. We can see the usage of register B4 in line 5 and a value assigned to it in the line 4. But, the content of B4 used in line 5 is not the one assigned in line 4. This is due to the presence of parallelism. Note that our analyzer takes care of parallelism and instruction pipelining while computing the abstract semantics. So, in essence at the assembly level, problems lie in the detection of registers representing pointers and even if they are detected, multiple occurrences of the same can hold different meanings and even if the detected occurrences mean the same, the presence of parallelism may influence the value it holds. The following sections will elaborate on how we handle these problems to produce an elegant static analysis based solution to the problem of detecting hard coded pointers in embedded software. Since sound analysis cannot be performed at the executable code, the static analysis is performed at the assembly level.

10. ALGORITHM FOR OBTAINING BASIC BLOCKS AND FLOW GRAPH A basic block is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halting or possibility of branching except at the end. The basic blocks are constructed from the disassembled code.[3]

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 17

(1) The analyzer first determines the set of leaders, the first statements of the basic blocks. The rules used are the following. (a) The first statement is a leader. (b) Any statement that is the target of a Conditional or an Unconditional branch statement (“goto” statement, function calls and like) is a leader. (c) Any statement that immediately follows a Branch or Conditional Branch statement is a leader (2) For each leader, its basic block consists of the leader and all statements up to but not including the next leader or the end of the program. The next step is to construct the control flow-graph with the basic blocks and is explained in the next section. The basic blocks form the nodes in a directed graph called the control flow-graph [3]. The control flow-graph will help us to visualize and arrive at all possible paths through which program control could flow at runtime. All such paths must be analyzed for compliance. In the control flow-graph, one node is distinguished as the initial node and it is the block whose leader is the first statement. There is a directed edge from node A to B if B can immediately follow A in some execution sequence [3]. That is, if: (1) There is a conditional or unconditional jump from the last statement of A to the first statement of B or (2) B immediately follows A in the order of the program and A does not end in an unconditional jump. A is said to be the predecessor of B and B the successor of A.

11. PHASES IN THE ANALYSIS The analyzer functions in two phases. In the first phase, it scans through the flow- graph and detects all the register dereferencing that correspond to the dereferencing of pointer variables in the source code. It stores this information in the various ba- sic blocks in a set. We call such a set an unsafe set. The unsafe sets represent the abstract contexts discussed earlier. There is an unsafe set for each pointer that is dereferenced. This unsafe set records all the registers that may potentially hold the address corresponding to this pointer. In the second phase the unsafe sets are iteratively refined, until a fixpoint is reached. Phase 1: Detecting dereferencing of pointers: To detect the dereferencing of pointers, the analyzer starts from the entry nodes in the flow-graph and visits every reachable node in the flow-graph. While visiting any node in the flow-graph, it checks for the occurrences of pointer dereferencing. Dereferencing of a pointer is detected in the disassembled code when a register other than the stack pointer(SP) is used as the base register. Dereferenced registers are recorded in the unsafe sets. There is an unsafe set for each dereferencing operation in the program. Phase 2: Checking if dereferencing is safe: In the second phase, the analyzer uses the information gathered in the first phase and ascertains the safety of each of the unsafe sets. This is done by analyzing the safety of the pointer across all possible paths through which the pointer might have got its value at the point of its dereferencing. That is, if there are multiple locations at which the same pointer

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 18 · may be hard coded (which may correspond to multiple paths in the flow graph), the analyzer will be able to detect and report all such locations. Note that while scanning backward in the disassembled code, the analyzer takes into consideration a subgraph of the control flow-graph that it had already built. This subgraph consists of the nodes in the flow-graph which have a predecessor relation with the node in which the dereferencing occurred. Refining Unsafe Sets: The unsafe sets are built iteratively, via a fix point compu- tation, as described earlier in the presentation of abstract interpretation framework. Initially, the unsafe set is empty; once phase 1 detects the dereferencing of a register, it adds that register to the set. So, if register “Reg” is seen as being dereferenced, it is added into the unsafe set, which will now appear as {Reg}. In phase 2, the analyzer looks for statements in which an element from the unsafe set (in this case “Reg”) is used as the destination register. That is, the analyzer is trying to find what is the most recent value that was assigned to the register “Reg”. When it detects such an occurrence, say, which corresponds to a statement like “Reg = Reg1 + Reg2”, it deletes the element “Reg” from the unsafe set and inserts the elements “Reg1” and “Reg2” into the unsafe set. Now the unsafe set becomes {Reg1,Reg2} . In phase 2, the analyzer continues with the current unsafe set (looking for the occurrence of both Reg1 and Reg2 as the destination registers in this case). Phase 2 terminates when the status of each of the unsafe set has been determined. As discussed earlier, the status of an unsafe set is deemed to be HC if all the elements in a given unsafe set are hard coded. If at least one of the elements in the unsafe set is not hard-coded, then the corresponding pointer is safe. Note that if no pointers are dereferenced, the analyzer does not even enter phase 2. In phase 1, the analyzer inspects each line of the assembly code only once. This is achieved by maintaining a set of unsafe sets (SOUS) which is updated and modified as we move down the various instructions in the flow graph. Merging Information: During phase 2 of the analysis, the analyzer builds and populates the unsafe set. Consider figure 4 which represents part of some control flow graph. If a basic block has multiple successors, then the instructions of that basic block will be analyzed after merging the SOUS of all the successors. If the analyzer does not perform this merging operation then the analysis will be of ex- ponential complexity. This is true especially if loops are also involved. The unsafe sets (abstract contexts) form a lattice under subset ordering, the merging involves a join (t) operation, which is simply a set union operation in this case. Static Analysis based approximation and merging of information makes the complexity of the analysis O(n ∗ m) where n is the measure of the size of the flowgraph, and m a measure of the size of (finite) lattices involved. Merging results in information loss. That is, since the common predecessor will be analyzed using a merged set, the information about the source paths of these merged SOUS are lost. But, merging information does not make the analyzer give incorrect results as the integrity of the individual unsafe sets is preserved.

12. HANDLING COMPLEX PROGRAMMING CONSTRUCTS This section describes how the analyzer handles complex programming constructs like loops, arrays, pointers, global variables, function calls, and instruction level

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 19

Fig. 4. Need for Merging Unsafe Sets parallelism

12.1 Handling Loops A loop is detected when the predecessor of the current block is a block that has already been encountered in the analysis. Loops, if not properly handled, will result in the formation of wrong unsafe sets and incorrect results. As soon as the analyzer detects a loop, a new set called a loop-set, which is a set of set of unsafe sets, is created. That is, each element of the loop-set is a set of unsafe sets (thus, a loop-set is a set of SOUS). The analyzer also remembers the starting and the ending points (blocks) of the loop. The first element added to the loop-set is the SOUS that the analyzer has com- puted when it detects the cycle. The analyzer then uses the current SOUS to analyze all the blocks forming the cycle including all the possible paths involving those blocks in the control flow graph. When the end block for the loop is reached, the analyzer checks if the current SOUS is already a member of the loop-set. If the current SOUS is already a member, then the analyzer has reached a fixed point for the loop. That is, the analyzer has collected all the information from the loop and additional cycles through the loop will not add any new information to the sets. At this point the analyzer can safely exit the loop and continue the analysis of other unvisited blocks. If the current SOUS is not already a member of the loop-set, then the analyzer

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 20 · merges (through the t operation in the lattice) the current SOUS and loop-set and continues to cycle through the loop with the current SOUS, until a fixed-point is reached. Loops can be nested and in that case, the above procedure is performed for the each of the inner loops. The fixed point is re-calculated for each of the inner loops for each cycle through the outer loop until the outer loop reaches a fixed point. The following example will illustrate the rationale for having the loop-set as a set of SOUS and not as just the most recent SOUS. void main(){ int *b, *r, *p; //some code while(1){ r = b; b = p; p = r; } *b; } In the above example, if the analyzer had maintained as loop-set only the most recent SOUS as it cycles through the loop, the analyzer will never reach a fixed point. The loop-set will be constantly changing p to b and then from b to p during subsequent iterations. This problem can be avoided by having the loop-set as a set of unsafe sets.

12.2 Handling Arrays Operations on arrays, e.g., ... int a[] = {...}; ... a[..] = ....; ... exactly resemble pointer operations at the assembly level. That is, if statically allocated array elements are accessed, then the corresponding assembly code will resemble the dereferencing of pointer variables. The analyzer handles these cases by looking at what the destination location of the access corresponds to, i.e., whether it is in the stack or in the heap.

12.3 Pointers and Global Variables There are two cases in which global variables influence the analysis. The first case arises when a pointer (local variable) is assigned a value from (say an integer after typecasting) a variable which is declared as global. The second case arises when a pointer which is declared as a global variable is dereferenced. In both cases interprocedural analysis is needed. Consider that we have a pointer p dereferenced in a function and assigned a global variable. Since the variable is global, it could be modified by any part of the program. If there is no assignment to that global variable in the current function

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 21 prior to dereferencing, then its value is obtained from outside the function, and a global interprocedural analysis has to be performed. Similarly when a pointer, de- clared as global, is dereferenced, a similar situation arises. Interprocedural analysis is addressed in the next section.

12.4 Handling Functions As already discussed, the analyzer first splits any given program into functions and analyzes each function for safety. However, inter-procedural analysis is needed to detect hard-coding in the following cases: (i) return values of functions may be hard coded pointers that are later dereferenced in the calling function; (ii) arguments can be passed as a reference to functions and the called function hard codes the arguments and the calling function uses the hard coded values; and (iii) function bodies have statements involving either a globally declared pointer or a pointer that is assigned an expression involving global variables. We need to compute and keep track of the calling and return context for each function in a memo table. Thus, for each function, values are set in the memo table to denote whether the function arguments or return values are hard-coded. This memo table should be built iteratively (since functions may be mutually recursive) until it reaches a fix point, during the analysis of the program. Similar iterative inter-procedural analysis can be used in analyzing hard codedness of global pointers and pointers whose value depends on global variables. Our analyzer pre-processes the flow graph to determine if some of the functions can be analyzed “stand alone.” These are analyzed and their resulting contexts stored in the memo-table in advance. Other functions are iteratively analyzed as outlined above.

12.5 Pointers requiring multilevel dereferencing The usage of double pointers (**) or multilevel pointers that are hard coded com- plicates the analysis. This is because double pointers can be used to indirectly hard-code other single level pointer variables. When single level pointers (say int *p) is used for hard coding, the typical se- quence of operations for hard-coding are as shown below: ... somefunction(...) { declare pointer variables; hard code the pointer variables; create aliasing between the pointers; dereference the pointers; } Aliasing two pointers before hard coding one of them, will not affect the analysis, since after hard coding, the two pointers will no longer be aliased. However, in the case of doubly (or more) indirected pointers there can be sequences such as the following. ... somefunction(...) { declare pointer variables; create aliasing between the pointers;

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 22 ·

hard code the pointer variables; dereference the pointers; } A concrete example is shown below: void main() { int *p, val; int **q = &p; //p is hard-coded via q *q = (int*)0x8000; val = *p; } Detection of hard-coding in these indirect cases involves analyzing each line of the code multiple times or carrying huge sets of aliases, making the analysis very costly. The analyzer that we have built flags a warning when it sees multilevel pointer dereferencing. Note that the analyzer will detect hard-coding for double pointers that are hard-coded directly, i.e., occurrences such as (int **)p=90; **p; will be detected without any extra effort. Multilevel pointers cause problems in analysis only if we have these pointers used in hard-coding other pointers indirectly.

12.6 Handling Parallel Instructions The || characters in the disassembled code signify that an instruction is to execute in parallel with the previous instruction[41]. instruction A || instruction B || instruction C For example, if a code sequence as shown is encountered, where instructions A, B, C are some assembly level instructions then it means that instructions A, B and C are executed in parallel. That is, the instructions A, B and C in the fetch packet correspond to the same execute packet and are executed in the same cycle. Moreover, instructions A, B and C do not use any of the same functional units, cross paths, or other data path resources. Static analysis of disassembled code needs to make sure that it handles such kind of parallelism. Our analyzer does take care of such parallelism. As soon as dereferencing of a base register or occurrence of an element in the unsafe set (as the destination register) is found to occur in parallel with other instructions, the analyzer continues analysis with the instructions that occur in the previous cycle for that register or matched element.

13. ILLUSTRATIVE EXAMPLES In this section, we include some illustrative examples to show the capabilities of the analyzer developed. In the following discussion all the line and block indices are assumed to start at index 0. Some of the sample input and the output produced by our tool is given below. Only simple examples have been selected, to keep the presentation simple.

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 23

13.1 Compliant Code Example 1: Sample Disassembled code which is compliant 000014C0 main: 000014C0 01BC54F6 STW.D2T2 B3,*SP--[0x2] 000014C4 00002000 NOP 2 000014C8 0FFDDE10 B.S1 malloc 000014CC 01856162 ADDKPC.S2 RL0,B3,3 000014D0 02008040 MVK.D1 4,A4 000014D4 RL0: 000014D4 023C22F4 STW.D2T1 A4,*+SP[0x1] 000014D8 00002000 NOP 2 000014DC 02101AF3 MV.D2X A4,B4 000014E0 0280052A || MVK.S2 0x000a,B5 000014E4 029002F6 STW.D2T2 B5,*+B4[0x0] 000014E8 00002000 NOP 2 000014EC 01BC52E6 LDW.D2T2 *++SP[0x2],B3 000014F0 00006000 NOP 4 000014F4 008CA362 BNOP.S2 B3,5 000014F8 00000000 NOP 000014FC 00000000 NOP Result produced by our Tool: “The code is compliant with Rule3” Explanation: The above piece of code consists of only one basic block. During phase 1 of the analysis, the analyzer finds that the base register B4 gets dereferenced(in line 11). So the unsafe set gets initialized to {B4}. Scanning backward (phase 2 in the analysis), it finds that the register ‘B4’ is assigned a value from the register ‘A4’. So the contents of the unsafe set gets modified to {A4}. Scanning backward and looking for ‘A4’, the analyzer finds that the register ‘A4’ got its value from a call to the memory allocation routine “malloc”. So, ‘A4’ is a legitimate pointer. So the dereferencing of the register ‘B4’(in line 11) is declared to be safe.

13.2 Non-Compliant Code Example 2: Sample Disassembled code which is NOT compliant 00001440 main: 00001440 01BCD4F6 STW.D2T2 B3,*SP--[0x6] 00001444 00002000 NOP 2 00001448 0FFDEE10 B.S1 malloc 0000144C 01856162 ADDKPC.S2 RL0,B3,3 00001450 02008040 MVK.D1 4,A4 00001454 RL0: 00001454 023C22F4 STW.D2T1 A4,*+SP[0x1] 00001458 00002000 NOP 2 0000145C 020FA02A MVK.S2 0x1f40,B4 00001460 023C42F6 STW.D2T2 B4,*+SP[0x2] 00001464 00002000 NOP 2

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 24 ·

00001468 02002042 MVK.D2 1,B4 0000146C 023C82F6 STW.D2T2 B4,*+SP[0x4] 00001470 00002000 NOP 2 00001474 001008F2 OR.D2 0,B4,B0 00001478 200AA120 [ B0] BNOP.S1 L1,5 0000147C 000D6120 BNOP.S1 L2,3 00001480 02101AF2 MV.D2X A4,B4 00001484 023C62F6 STW.D2T2 B4,*+SP[0x3] 00001488 L1: 00001488 023C42E6 LDW.D2T2 *+SP[0x2],B4 0000148C 00006000 NOP 4 00001490 023C62F6 STW.D2T2 B4,*+SP[0x3] 00001494 L2: 00001494 023C62E6 LDW.D2T2 *+SP[0x3],B4 00001498 00006000 NOP 4 0000149C 021002E6 LDW.D2T2 *+B4[0x0],B4 000014A0 00006000 NOP 4 000014A4 023C82F6 STW.D2T2 B4,*+SP[0x4] 000014A8 00002000 NOP 2 000014AC 01BCD2E6 LDW.D2T2 *++SP[0x6],B3 000014B0 00006000 NOP 4 000014B4 008CA362 BNOP.S2 B3,5 000014B8 00000000 NOP Result produced by our tool: Hard Coding ”MVK” detected @ Block Index = 0 and Line Index = 9 0000145C 020FA02A MVK.S2 0x1f40,B4 The code is NOT compliant with Rule3 Explanation: The assembly language code in example 2 has four basic blocks and the corresponding flow-graph is shown in figure 5. Phase 1 of the analysis detects the dereferencing of the register ‘B4’ in block 3. The contents of the unsafe set is initialized to {B4} and phase 2 begins. Scanning backward in block 3 the analyzer finds that that ‘B4’ got its value from the contents of the location ‘SP[0x3]’. So the unsafe set gets modified to {*+SP[0x3]}. Since there are no more lines to scan backward in block 3, the predecessors of block 3 namely blocks 1 and 2 are scanned next with the current unsafe set which is {*+SP[0x3]}. Scanning backward in block 1 with the contents of the unsafe set holding {*+SP[0x3]} detects that ‘*+SP[0x3]’ came from ‘B4’. Now the contents of the unsafe set be- come {B4}. Continuing scanning backward reveals that ‘B4’ came from ‘A4’. So the contents of the unsafe set get modified to {A4}. Since there are no more lines to scan in this block, the predecessor of this block namely block 0 (transitive pre- decessor of block 3) is to be scanned after merging information from Block 1 and Block 2. The path through block 2 is checked next. Scanning backward in block 2 with the contents of the unsafe set holding {*+SP[0x3]} (the contents of the unsafe set when the analyzer was at the top of block 3), detects that ‘*+SP[0x3]’ came from ‘B4’. So the contents of the unsafe set gets modified to {B4}. Continuing scanning backward in block 2 the analyzer finds that ‘B4’ got

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 25

Fig. 5. Sample Control Flow Graph ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 26 · its value from the contents at location ‘SP[0x2]’. So now the unsafe set is updated to hold {*+SP[0x2]}. Since the analyzer has reached the top of block 2, it needs to now scans the predecessor of block 2 namely block 0 with the collected information from block 1 and block 2. In the next step the unsafe sets from Blocks 1 and 2 are merged to get a new unsafe set. The new unsafe set will hold {{A4},{*+SP[0x2]}}. Analysis of block 0 concludes that ‘A4’ is a legitimate pointer as it is derived from a call to “mal- loc”. So {A4} is deleted from the unsafe set and the unsafe set gets modified to {{*+SP[0x2]}}. Analysis of block 0(transitive predecessor of block 3) finds that ‘*+SP[0x2]’ came from ’B4’. So the contents of the unsafe set becomes {{B4}}. The analyzer continues to scan backward and finds that ’B4’ is assigned a constant value of “0x1f40” and hence the analysis concludes that ‘B4’ (in block 0) is hard coded and hence the original dereferencing of register ‘B4’ in block 3 is also hard coded. Note that we are now unable to tell which path made the pointer to be hard-coded since we merged the information from the unsafe sets. Also note that during merging since the integrity of the unsafe sets were maintained, the analysis is rightly able to detect hard-coding. Since the dereferencing of the base register ‘B4’ in block 3 is hard coded, analysis concludes that the entire program is unsafe and is not compliant with Rule 3 of the standard.

13.3 More Examples Example 3: void main() { p = ...; q = 0; for(i=0;i

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 27

Example 4 void main() { p = hard coded; q = good pointer; r = q + p - q; *r; } In example 4, though the dereferenced pointer r is derived as a function of a good pointer and a hard coded pointer, in reality we are assigning to r the hard coded pointer p. But since all pointer operations are abstracted, our static analyzer will not be able to detect this hard coding of r.

14. PERFORMANCE RESULTS

t read timer1 mcbsp1 figset m hdrv dat gui codec codec stress demo Num fns 1 2 1 2 6 5 8 7 16 16 Num lines 80 126 196 292 345 950 1139 1188 1202 1350 Num BB 7 2 15 30 22 72 48 49 28 78 Num *ptr 3 17 0 19 6 10 102 109 105 82 Num HC 0 6 0 10 2 8 40 28 0 47 Max US size 0 1 0 2 1 3 1 1 1 2 Avg US size 0 1 0 2 1 2 1 1 1 1 Max SOUS size 0 13 0 8 1 5 9 4 4 13 Avg SOUS size 0 13 0 4 1 2 3 3 4 5 Max Chain len 0 1 0 2 1 12 1 1 1 9 Avg Chain len 0 1 0 2 1 3 1 1 1 2 RT (ms) 1280 1441 1270 1521 2262 2512 3063 3043 4505 4716

Table II. Performance Results for the analyzer

The performance figures for our analyzer are given in Table II. The first column corresponds to the metrics that are used to guage the performance of the analyzer, the subsequent columns show the performance for each of the selected DSP program. The programs that were used to test were taken randomly from the TI’s dis- tribution of CCS[39]. We took the a.out files of the randomly selected programs, disassembled it, and then fed the assembly code to the analyzer. The analyzer pro- duced a log file that contained the results of analysis and Table II was generated from the log file. The value ‘Num fns’ corresponds to the number of functions in the program, ‘Num lines’ is the number of lines in the input file subject to analysis, while ‘Num BB’ is the number of basic blocks in the input file. ‘Num *Ptr’ is the number of pointers that were dereferenced in the file. ‘Num HC’ is number of hard coded occurrences in the program detected by the analyzer. ‘Max US size’ and ‘Avg US size’ are the maximum number and the average number of elements respectively

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 28 · in the unsafe sets created by the analyzer. ‘Max SOUS size’ and ‘Avg SOUS size’ are the maximum and average number of elements in the set of unsafe sets respectively. ‘Max Chain Len’ is the max path length from the point at which dereferencing occurs to the point at which the analyzer detects that the pointer is assigned a value. ‘RT (ms)’ is the running time of our analyzer in milliseconds. From Table II, we find that the average number of elements in the unsafe set at any point of time is small. One of the reasons is that modifications to the unsafe set always involve the deletion followed by the addition of one or more elements that got related to the deleted element. This also means that the number of elements to which a pointer gets related to through pointer arithmetic, assignment and other pointer operations is always small. Also, note that the number of element in the SOUS was a maximum of 13. This means that the maximum number of pointers that the analyzer was analyzing simultaneously is 13. The maximum chain length was 12 and in most cases the chain length is either 1 or 2. This means that most pointers were dereferenced immediately after getting hard coded. The amount of time that the analyzer takes to run depends on the number of hard coded pointers, the number of basic blocks in the disassembled code and the number of lines of code in the input file. The maximum time that the analyzer spent on analysis was less than 5 seconds on code with 1350 lines. The results were produced by the prototype implementation of the analyzer with- out any fine tuning. The main aim of building the current prototype was to prove the efficacy and viability of static analysis based tools to perform these checks in commercial software when source code is not available. While the largest program we have tried was only 1350 lines long, the sizes of the unsafe sets are relatively small causing the fix-point computation to converge rapidly. Thus, we are reason- ably confident that our analyzer will scale up for larger code sizes. Also, it should be noted that a component binary needs to be statically analyzed only once before being integrated into an application. Finally, note that for each pointer variable in each of the benchmark programs, our analysis was able to precisely determine whether it is hardcoded or not hard- coded (i.e., none of the pointer variables were inferred by the analysis to have the value ⊥ or >). Thus, we believe that a static analysis based approach will catch most cases of hardcodedness/non-hardcodedness in practical programs. Es- sentially, one has to write very convoluted programs for the analysis to infer the “don’t know” (>) abstract value for a pointer variable. Likewise, for the value undefined (⊥) to be inferred, the program must have undesirable features such as uninitialized pointer variables, unreachable code, etc. Most programs do not have such convoluted-structures/undesirable-features, especially if the creators of these programs have followed good software engineering and coding practices.

15. RELATED WORK Static analysis has been recognized as an important technology for software quality assurance [27; 34], however, the limited efforts described in the literature primarily analyze the source code [14; 10; 11; 34; 23; 27]; none of them deal with code reusabil- ity. Those that analyze assembly code are only interested in security properties [5; 14] and not in reusability. Thus, to the best of our knowledge there is no existing work that statically analyzes assembly code to check for software reusability.

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 29

There is a wealth of literature on static program analysis. These static analyses either analyze data-flow or control-flow [3] of the program or employ an abstract interpretation based framework [13; 1]. However, much of this work has been done in a scenario where source code is available. Not as much attention has been levied on analyzing machine code. Several researchers have looked at link time optimization [12; 21; 29; 30; 35], where machine code has to be analyzed, however, as pointed out by Debray et al [15], they are “limited to fairly simple local analysis.” There is a wealth of literature on pointer analysis [20; 2; 26; 9; 24; 36; 15; 4; 16; 5]. However, only a limited number of these [15; 4; 16; 5] consider analyzing machine code statically. Most of these efforts are concentrated on doing aliasing analysis of machine code, i.e., detecting whether two instructions will access the same memory location or not. Many of the issues that arise are similar to the ones that arise in our analysis, since these analyses have to keep track of use-def chains as well (i.e., given a use of a register, check where it was modified). Debray et al [15] use a mod k abstraction in which no distinction is made between two addresses that have the same lower k bits. An interprocedural, context-sensitive data flow analysis is performed to see if two instructions access the same abstract address. Fernandez and Espasa [16] extend this analysis a little further by also analyzing if a memory reference is to the heap, stack or global memory. Amme at al [4] perform a similar analysis to aid parallelizing compilers. They symbolically abstract the values of registers which are then propagated in the flow graph; de- pendence information is then gathered via data-flow analysis. Analysis of assembly code via data-flow analysis has also been used for other applications, these include estimating memory use and execution time for interrupt driven software [7] and verifying security properties [5]. Other related work includes work done by software groups at Texas Instruments to develop tools for automatic compliance check. Static analysis based approach was considered too costly for the benefits obtained, and instead a testing-based approach was resorted to where a program code is run in different scenarios (for example, different parts of the memory) and checked to see if erroneous output is obtained. Of course, this method is not complete either. It also cannot pinpoint the problem (for example, the pointer that is hard coded cannot be automatically identified; all we can conclude that the code has some compliance problem). Our results in this paper show that contrary to belief, a static analysis based technique is effective, practical, and useful.

16. CONCLUSIONS AND FUTURE WORK In this paper we developed and analyzed necessary and sufficient conditions for binary code reusability. We showed that absence of hard-coded memory addresses and code re-entrance are sufficient conditions to ensure binary code reusability. However, automatically checking that these conditions hold for a binary code is un- decidable in general. We proposed static analysis as a technique for approximating this check. We illustrate the approach by developing a static analyzer for analysis of hard-coded pointers, and develop an abstract interpretation based static analysis framework for this purpose. Our results show that static analysis based approaches are viable in industrial settings for checking for coding standards compliance. Code compliance checking is critical for code reuse and COTS compatibility in applica-

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 30 · tions. A complete analyzer has been developed for pointer hard-codedness analysis and shown to run successfully on code samples taken from Texas Instruments’ DSP code suite. The prototype system is currently being refined to provide more ac- curate results in presence of global pointers and mutually recursive functions. We are also extending the system to handle rules 2, 4 through 6 [37] laid out by TI. The analysis needed for these rules is similar to that for hard-codedness and we are quite confident that a abstract interpretation based static analysis framework is sufficient. In this paper we also presented an abstract interpretation based static analysis framework for analyzing hard-codedness of pointer variables in embedded assembly code. Our results show that static analysis based approaches are viable in industrial settings for checking for coding standards compliance. Code compliance checking is critical for code reuse and COTS compatibility in applications. A complete analyzer has been developed for pointer hard-codedness analysis and shown to run successfully on code samples taken from Texas Instruments’ DSP code suite. The prototype system is currently being refined to provide more accurate results in presence of global pointers and mutually recursive functions. Future work also includes extending the system to handle rules 1, 2, and 4 through 6 [40] laid out by TI. Note that the analyses needed for rule numbers 4 and 6 are very similar to hard-codedness analysis. Similarly rules 2 and 5 require analysis that determines if a binary code is re-entrant. Note that the analysis for determining if the code is re-entrant is similar to pointer hard codedness analysis. A binary is re-entrant if it does not contain any instructions that will modify a location in the code area (i.e., where the binary resides in the main memory). Thus, once again all memory locations can be divided into two sets: those that correspond to the code area and those that do not. Each instruction then has to be analyzed to ensure that if it performs a memory-write operation, then the target location is in the latter set. Once again, similar to the hard-codedness analysis, we can abstract the two sets by constants CA (for code area) and NCA (non-code area). A lattice can then be built just as was done for the sets corresponding to HC and NHC, and an abstract interpretation based analysis performed in a manner described in section 6.

17. ACKNOWLEDGEMENTS We are grateful to Steve Blonstein of Texas Instruments for bringing the problem to our attention and for help in providing information during the project. Authors have been partially supported by grants from the National Science Foundation, the Environment Protection Agency and the U.S. Department of Education.

REFERENCES (1) S. Abramsky and C. Hankin Abstract Interpretation of Declarative Languages, Ellis Hor- wood, 1987. (2) S. Adams, T. Ball, et al. Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis. SAS 2002. pp. 230-246. (3) Alfred V.Aho, Ravi Sethi and Jeffrey D.Ullman Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1988. (4) W. Amme, P. Braun, E. Zehendner, F. Thomasset. Data Dependence Analysis of Assembly Code. Proc. PACT 1998.

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. · 31

(5) J. Bergeron, M. Debbabi, M.M. Erhioui, B. Ktari. Static Analysis of Binary Code to Isolate Malicious Behaviors. IEEE 8th International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, 1999. Palo Alto, California (6) J. Bergeron, M. Debbabi, J. Desharnais, M. M. Erhioui, Y. Lavoie and N. Tawbi. Static Detection of Malicious Code in Executable Programs. Symposium on Requirements Engi- neering for Information Security (SREIS’01). (7) D. Brylow, N. Damgaard, J. Palsberg, Static Checking of Interrupt-driven Software. In- ternational Conference on Software Engineering. 2001. (8) S. Blonstein (Texas Instruments). Personal Communication. (9) D. R. Chase, M. Wegman, and F. K. Zadeck, Analysis of Pointers and Structures. PLDI ’90. June 1990, pp. 296–310. (10) Hao Chen, Jonathan S. Shapiro. Exploring Static Checking for Software Assurance. SRL Technical Report SRL-2003-06. (11) B.V. Chess. Improving computer security using extending static checking. IEEE Sympo- sium on Security and Privacy, 2002. (12) R. Cohn, D. Goodwin, P. G. Lowney, and N. Rubin, Spike: An Optimizer for Alpha/NT Executables, Proc. USENIX Windows NT Workshop, Aug. 1997. Alias Analysis. 16th ACM POPL. Jan. 1989, pp. 49–59. (13) P. Cousot, R. Cousot. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction of Approximation of Fixpoints. Fourth Annual ACM Symp. on Principles of Programming Languages. 1977. pp. 238-252. (14) Mihai Christodorescu and Somesh Jha. Static Analysis of Executables to Detect Malicious Patterns. 12th USENIX Security Symposium, August 2003. (15) Saumya Debray, Robert Muth, Matthew Weippert Alias analysis of executable code. POPL’98. (16) M. Fernandez and R. Espasa. Speculative alias analysis for executable code. International Conference on Parallel Architectures and Compilation Techniques. 2002. (17) W. Frake, C. Terry. Software Reuse: Metrics and Models. In ACM Computing Surveys 28(2):1996. (18) M. R. Garey and D. S. Johnson. Computers and Intractability. W. H. Freeman and Com- pany. New York. 1979. (19) Bill Gates. The Future of Programming in a World of Web Services (keynote address). 17th Annual ACM Conference on Object-Oriented Programming, Systems, Languages and Application Seattle, Washington Friday, November 8, 2002 (20) Samuel Z. Guyer, Calvin Lin. Client-Driven Pointer Analysis. Static Analysis Symposium. 2003. Springer LNCS 2694. pp. 214-236. (21) D. W. Goodwin. Interprocedural Dataflow Analysis in an Executable Optimizer. Proc. PLDI ’97. pp 122-133. (22) Nevin Heintze, Oiivier Tardieu. Demand-Driven Pointer Analysis Conference on Program- ming Language Design and Implementation 2001 (23) Gerard J. Holzmann. Static Source Code Checking for User-defined Properties. Conference on Integrated Design and Process Technology, IDPT-2002. (24) S. Horwitz, P. Pfeiffer, and T. Reps, Dependence Analysis for Pointer Variables. Proc. PLDI ’89. pp. 28–40. (25) W. Landi and B. G. Ryder, A Safe Approximate Algorithm for Interprocedural Pointer Aliasing. Proc. SIGPLAN PLDI ’92. pp. 235–248. (26) Donglin Liang, Mary Jean Harrold. Efficient Computation of Parameterized Pointer In- formation for Interprocedural Analyses. SAS 2001. Springer LNCS 2126. pp. 279-29. (27) Horst Licheter and Gerhard Riedinger Improving software quality by static program anal- ysis Proc. of SPI 97 software process improvement, Barcelona, 1997 (28) George C. Necula Scott McPeak Westley Weimer CCured Type-Safe Retrofitting of Legacy Code POPL ’02, Jan. 16-18, 2002 Portland, OR USA (29) E. Ruf, ContextInsensitive Alias Analysis Reconsidered. Proc. SIGPLAN ’95 Conference on Programming Language Design and Implementation. June 1995, pp. 13–22. (30) A. Srivastava and D. W. Wall, Linktime Optimization of Address Calculation on a 64bit Architecture. Proc. PLDI 1994. pp. 49–60.

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year. 32 ·

(31) Bjarne Steensgaard Points-to Analysis in Almost Linear Time (1996) Symposium on Prin- ciples of Programming Languages (32) Ramakrishnan Venkitaraman and Gopal Gupta, Static Program Analysis of Embedded Executable Assembly Code. Compilers, Architecture, and Synthesis for Embedded Systems (ACM CASES), September 2004 pp. 157-166. (33) Ramakrishnan Venkitaraman and Gopal Gupta, Framework for Safe Reuse of Software Binaries. 1st International Conference on Distributed Computing and Internet Technology (ICDCIT), LNCS Springer Publishing, December 2004 (34) David A. Wagner. Static analysis and computer security: New techniques for Software Assurance. University of California at Berkley Phd Dissertation. Dec. 2000. (35) W. E. Weihl, Interprocedural data flow analysis in the presence of pointers, procedure variables, and label vari ables. Proc. ACM POPL. pp 83–94. (36) R. P. Wilson and M. S. Lam. Efficient Context Sensitive Pointer Analysis for C Programs. Proc. PLDI ’95. pp. 1–12. (37) Texas Instruments Code Composer Studio and XDAIS/TMS320 Algorithmic Standards Literatures (http://support.ti.com). (38) Texas Instruments Code Composer Studio Getting Started Guide, Literature No: SPRU509C. (39) Texas Instruments TMS320C6000 Code Composer Studio Tutorial, Literature No: SPRU301C. (40) TI TMS320 DSP Algorithm Standard Rules and Guidelines, Literature No: SPRU352D. (41) TI TMS320C6000 CPU and Instruction Set Reference Guide, Literature No: SPRU189F.

ACM Transactions on Embedded Computing Systems, Vol. TBD, No. TDB, Month Year.