Precise Null Pointer Analysis Through Global Value Numbering

Precise Null Pointer Analysis Through Global Value Numbering Ankush Das1 and Akash Lal2 1 Carnegie Mellon University, Pittsburgh, PA, USA 2 Microsoft Research, Bangalore, India Abstract. Precise analysis of pointer information plays an important role in many static analysis tools. The precision, however, must be bal- anced against the scalability of the analysis. This paper focusses on improving the precision of standard context and flow insensitive alias analysis algorithms at a low scalability cost. In particular, we present a semantics-preserving program transformation that drastically improves the precision of existing analyses when deciding if a pointer can alias Null. Our program transformation is based on Global Value Number- ing, a scheme inspired from compiler optimization literature. It allows even a flow-insensitive analysis to make use of branch conditions such as checking if a pointer is Null and gain precision. We perform experiments on real-world code and show that the transformation improves precision (in terms of the number of dereferences proved safe) from 86.56% to 98.05%, while incurring a small overhead in the running time. Keywords: Alias Analysis, Global Value Numbering, Static Single As- signment, Null Pointer Analysis 1 Introduction Detecting and eliminating null-pointer exceptions is an important step towards developing reliable systems. Static analysis tools that look for null-pointer exceptions typically employ techniques based on alias analysis to detect possible aliasing between pointers. Two pointer-valued variables are said to alias if they hold the same memory location during runtime. Statically, aliasing can be de- cided in two ways: (a) may-alias [1], where two pointers are said to may-alias if they can point to the same memory location under some possible execution, and (b) must-alias [27], where two pointers are said to must-alias if they always point to the same memory location under all possible executions. Because a precise alias analysis is undecidable [24] and even a flow-insensitive pointer analysis is NP-hard [14], much of the research in the area plays on the precision-efficiency trade-off of alias analysis. For example, practical algorithms for may-alias analysis lose precision (but retain soundness) by over-approximating: a verdict that two pointer may-alias does not imply that there is some execution in which they actually hold the same value. Whereas, a verdict that two pointers cannot may-alias must imply that there is no execution in which they hold the same value. We use a sound may-alias analysis in an attempt to prove the safety of a program with respect to null-pointer exceptions. For each pointer dereference, we ask the analysis if the pointer can may-alias Null just before the dereference. If the answer is that it cannot may-alias Null, then the pointer cannot hold a Null value under all possible executions, hence the dereference is safe. The more precise the analysis, the more dereferences it can prove safe. This paper demonstrates a technique that improves the precision of may-alias analysis at a little cost when answering aliasing queries of pointers with the Null value. The Null value is special because programmers tend to be defensive against null-pointer exceptions. If there is doubt that a pointer, say x, can be Null or not, the programmer uses a check \if (x 6= Null)" before dereferencing x. Exist- ing alias analysis techniques, especially flow insensitive techniques for may-alias analysis, ignore all branch conditions. As we demonstrate in this paper, exploit- ing these defensive checks can significantly increase the precision of alias analysis. Our technique is based around a semantics-preserving program transformation and requires only a minor change to the alias analysis algorithm itself. Program transformations have been used previously to improve the precision for alias analysis. For instance, it is common to use a Static Single Assignment (SSA) conversion [5] before running flow-insensitive analyses. The use of SSA automatically adds some level of flow sensitivity to the analysis [12]. SSA, while useful, is still limited in the amount of precision that it adds, and in particular, it does not help with using branch conditions. We present a program transformation based on Global Value Numbering (GVN) [16] that adds significantly more precision on top of SSA by leveraging branch conditions. The transformation works by first inserting an assignment v := e on the then branch of a check if (e 6= Null), where v is a fresh program variable. This gives us the global invariant that v can never hold the Null value. However, this invariant will be of no use unless the program uses v. Our transformation then searches locally, in the same procedure, for program expressions e0 that are equivalent to e, that is, at runtime they both hold the same value. The transformation then replaces e0 with v. The search for equivalent expressions is done by adapting the GVN algorithm (originally designed for compiler optimizations [10]). Our transformation can be followed up with a standard alias analysis to infer the points-to set for each variable, with a slight change that the new variables introduced by our transformation (such as v above) cannot be Null. This change stops spurious propagation of Null and makes the analysis more precise. We perform extensive experiments on real-world code. The results show that the precision of the alias analysis (measured in terms of the number of pointer dereferences proved safe) goes from 86.56% to 98.05%. This work is used in Microsoft's Static Driver Verifier tool [22] for finding null-pointer bugs in device drivers3. The rest of the paper is organized as follows: Section 2 provides background on flow-insensitive alias analysis and how SSA can improve its precision. Section 3 illustrates our program transformation via an example and Section 4 presents 3 https://msdn.microsoft.com/en-us/library/windows/hardware/mt779102(v= vs.85).aspx var x : int procedure f(var y : int) returns u : int procedure main() f f var z : int var a : int; L1: var b : int; x := y.f; L1: assume (x 6= Null); a := new(); goto L2; b := call f(a); L2: goto L2; z.g := y; L2: assert (x 6= Null); return; u := x; g return; g Fig. 1. An example program in our language it formally. Section 5 presents experimental results, Section 6 describes some of the related work in the area and Section 7 concludes. 2 Background 2.1 Programming Language We introduce a simplistic language to demonstrate the alias analysis and how program transformations can be used to increase its precision. As is standard, we concern ourselves only with statements that manipulate pointers. All other statements are ignored (i.e., abstracted away) by the alias analysis. Our language has assignments with one of the following forms: pointer assignments x := y, dereferencing via field writes x:f := y and reads x := y:f, creating new memory locations x := new(), or assigning Null as x := Null. The language also has assume and assert statements: { assume B checks the Boolean condition B and continues execution only if the condition evaluates to true. The assume statement is a convenient way of modeling branching in most existing source languages. For instance, a branch \if (B)" can be modeled using two basic blocks, one beginning with assume B and the other with assume :B. { assert B checks the Boolean condition B and continues execution if it holds. If B does not hold, then it raises an assertion failure and stops program execution. A program in our language begins with global variable declarations followed by one or more procedures. Each procedure starts with declarations of local variables, followed by a sequence of basic blocks. Each basic block starts with a label, followed by a list of statements, and ends with a control transfer, which is either a goto or a return.A goto in our language can take multiple labels. The choice between which label to jump is non-deterministic. Finally, we disallow loops in the control-flow of a procedure; they can instead be encoded using Statement Constraint Algorithm 1 Algorithm for computing i : x := new() aSi 2 pt(x) points-to sets x := Null aS0 2 pt(x) 1: For each program variable x, let pt(x) = ; 2: repeat x := y pt(y) ⊆ pt(x) 3: opt := pt aSi 2 pt(y) 4: for all program statements st do x := y:f pt(aSi:f) ⊆ pt(x) 5: if st is i : x := new() then 6: pt(x) := pt(x) [ faSig aSi 2 pt(x) 7: if st is x := Null then x:f := y pt(y) ⊆ pt(aSi:f) 8: pt(x) := pt(x) [ faS0g Fig. 2. Program statements and 9: if st is x := y then corresponding points-to set 10: pt(x) := pt(x) [ pt(y) constraints 11: if st is x := y:f then 12: for all aSi 2 pt(y) do 13: pt(x) := pt(x) [ pt(aSi:f) 14: if st is x:f := y then 15: for all aSi 2 pt(x) do 16: pt(aSi:f) := pt(aSi:f) [ pt(y) 17: for all tagged(x) do 18: pt(x) := pt(x) − faS0g 19: until opt = pt procedures with recursion. This restriction simplifies the presentation of our algorithms. Figure 1 shows an illustrative example in our language. 2.2 Alias Analysis This section describes Andersen's may-alias analysis [1]. The analysis is context and flow-insensitive, which means that it completely abstracts away the control of the program. But the analysis is field-sensitive, which means that a value can be obtained by reading a field f only if it was previously written to the same field f.

Precise Null Pointer Analysis Through Global Value Numbering

Enabling RUST for UEFI Firmware

Beyond BIOS Developing with the Unified Extensible Firmware Interface

Efficient Run Time Optimization with Static Single Assignment

CS153: Compilers Lecture 19: Optimization

A Formally-Verified Alias Analysis

Value Numbering

Practical C/C++ Programming Part II

Coverity Static Analysis

Aliases, Intro. to Optimization

Declare Property Class Accept Null C

Global Value Numbering Using Random Interpretation

Automated Techniques for Creation and Maintenance of Typescript Declaration Files