A Flow-Sensitive Approach for Steensgaard's Pointer Analysis
Total Page:16
File Type:pdf, Size:1020Kb
A Flow-Sensitive Approach for Steensgaard’s Pointer Analysis José Wesley de Souza Magalhães Daniel Mendes Barbosa Federal University of Viçosa Federal University of Viçosa Florestal, Minas Gerais, Brazil Florestal, Minas Gerais, Brazil [email protected] [email protected] ABSTRACT pointer analysis is utilizing flow-sensitivity, which has been consid- Pointer analysis is a very useful technique to help compilers make ered in a larger number of works than in the past. Many of these safe optimizations in codes, determining which memory locations flow-sensitive pointer analysis in recent works utilize inclusion- every pointer may points to at the execution of a program. One of based analysis, as know as Andersen’s style[2], a iterative solution 3 the main characteristics of pointer analysis that increases precision that is more expensive in terms of processing, having a O(n ) com- is the flow-sensitivity. Flow-sensitive pointer analysis are more plexity. precise than flow-insensitive ones, because they respect the control The performance is also an important aspect in pointer analysis, flow of a program and computes the points-to relation foreach since we’re treating large programs with many lines of code. In or- program point. However, this aspect makes flow-sensitive pointer der to achieve better results in terms of performance, our algorithm analysis more expensive and inefficient in very large programs. was implemented using equality-based, or Steensgaard’s pointer In this work, we implemented a Steensgaard’s style pointer anal- analysis, that is known faster [15]. Steensgaard’s pointer analysis ysis in a flow-sensitive approach, mixing the efficiency of Steens- executes in almost linear time without a large increase in resources gaard’s algorithm, that executes in almost linear time, with the consumption and memory usage[9]. precision of flow-sensitivity to achieve better results, keeping the However, the Steensgaard’s pointer analysis doesn’t generate best features of each aspect. We evaluate our analysis in open- very precise solutions because it is originally a flow-insensitive algo- source benchmarks and achieve better performance in comparison rithm [15]. A determining characteristic for this imprecision of the with an original Steensgaard’s algorithm and another flow-sensitive solutions is the merge of equivalent nodes. When a pointer changes analysis. to point to another memory location, the original Steensgaard’s algorithm merge the nodes relative to the old and the new locations CCS CONCEPTS pointed by this pointer into a single node. Besides that, this new unified node must point to wherever both of original nodes pointed, • Theory of computation → Program analysis; which causes the merge to be propagated to all subsequent points- to relations [12]. This propagation leads to incorrect relations in KEYWORDS the final graph, and the analysis may conclude, e.g, that pointers Flow-Sensitivity, Steensgaard’s Pointer Analysis, LLVM that are never assigned in the source code have some relation. The remainder of the paper is organized as follows. Section 2 brings some information necessary for a better understanding of 1 INTRODUCTION Steensgaard’s pointer analysis, flow-sensitivity and LLVM. Section Compilers always face difficulties to analyze and handle complex 3 presents related work. In Section 4 we detail our flow-sensitive codes that make massive use of pointers. The main reason of these algorithm. Section 5 discusses the experimental results and Section difficulties is the fact that it’s not possible to know the memory 6 concludes. locations acessed by pointers by just analyzing the statements in source code [9]. It’s important to know these memory locations 2 BACKGROUND to perform safe optimizations, such as dead code elimination and This section provides important background information useful for error detection [5]. Eliminate code without knowing which memory understanding the rest of the paper. First, we present the Steens- regions are acessed through that code could lead to loss of valuable gaard’s algorithm, then we describe the basics of flow-sensitivity in information, and consequently malfunctioning of the program. the context of pointer analysis, and finally we described LLVM, the Pointer analysis is a technique that consists in statically deter- compiler infrastructure used in this work, focusing on the LLVM mine which memory locations each pointer may points to at exe- internal representation along with single static assignment (SSA). cution time, building a points-to graph containing the pointers and theirs relations, also called points-to sets. Achieve precise results using pointer analysis is a complex process and it takes too long to 2.1 Steensgaard’s Pointer Analysis execute as the size of programs increases, and besides that, static The Steensgaard’s pointer analysis [15] is an algorithm that handle a analysis are undecidable [13][15]. The recent works in area [7][8] constraint system to build the points-to sets and the points-to graph. [13][17][16] aim to increase the precision and speed of analysis, The considered constraints are the same addressed in Andersen’s but keeping it scalable to very large programs. pointer analysis and consists of the following four statements [13]. The precision of a pointer analysis is important to ensure the accuracy of the solution, however high precision is a NP-Hard prob- • x=&y (Address-of ): the pointer x is assigned the address of lem [6]. Nowadays the most efficient way to increase precision of variable y. • x=y (Copy): pointer variable y is copied over to pointer vari- able x. It means that the x will points to where y points to. • x=*y (Load): for each variable v that y may points to, x will point to where v points to. • *x=y (Store): for each variable v that x may points to, v will now points to where y points to. For each statement listed above, Steensgaard’s algorithm will update the points-to set of each pointer assuming equivalence in both sides of the statement [12]. This means that every time that a constraint is treated, the points-to set updated are admittedly equal to the source points-to set, and not a subset of them. This increases the analysis’s speed, and is efficiently implemented using the Union-Find algorithm [4]. The table below shows the effects of (a) (b) each statements on the points-to sets involved. We use the abbrevi- ation p for points-to set. Table 1: Constraint System for Steensgaard’s Pointer Analy- sis Statement Constraint Name Result x=&y Address-of address(y) 2 p(x) x=y Copy p(x) = p(y) x=∗y Load v 2 p(y), p(x) = p(v) ∗x=y Store 8 v 2 p(x), p(v) = p(y) 8 The nodes in the points-to graph represent the pointers used in the code and it’s points-to set, and there’s an edge from a node x to a node y if (y) 2 points-to(x). In Steensgaard’s graph, every node (c) (d) has one single outgoing edge, and this lead to the merge of the correspondent nodes when there’s a new assignment to a pointer. Figure 1: Example of source code in C and the correspondent As we mentioned earlier, this merge may produce incorrect results. graph generated by Steensgaard’s algorithm. In figure 1, the graph in 1b represents the points-to relation up to line 5 . The statement in line 8 causes a merge between y and x nodes 1c. Since we now have a single node (x,y), the propagation unifies p1 and p2 nodes in 1d. However, this graph tell us that p1 [7]. Thus, a flow-sensitive analysis achieve more precises results may point to y and that p2 may point to x, which affirmative isn’t than a flow-insensitive analysis, since the points-to set keeps the true. locations that a pointer exactly points to at each program point, The Steensgaard’s pointer analysis is interprocedural [15], which instead the locations that a pointer may points to at execution time. means the analysis takes the uses of pointers in function calls In figure 1, a flow-sensitive pointer analysis would tell usthat p3 into account, either as parameter or return type, and not just the points to x at line 5, and points to y at line 8, also maintaining the local scope of those pointers [12]. This characteristic increases correct relations involving the other pointers. the complexity of analysis, however increases precision as well. Steensgaard’s algorithm is originally flow-insensitive [15], but in Steensgaard’s pointer analysis is also context-insensitive and field- this work we implement a flow-sensitive version of Steensgaard’s insensitive, i.e., the analysis doesn’t consider the context when algorithm, increasing the precision of results and keeping high analyzing the target of a function call [12] and always makes the performance and scalability in very large programs. same decisions for all function calls, even if they have distinct behaviors. A pointer analysis field-insensitive doesn’t handle fields 2.3 LLVM IR from structure variables [12]. The LLVM (Low-Level-Virtual-Machine)[10] has its own interme- diate representation (IR) which is used for the analysis made in 2.2 Flow-Sensitivity this compilers infrastructure. This IR utilizes a partial SSA form Flow-Sensitivity is the pointer analysis’s property that refers to [8], that separate the variables in two classes. One containing the how the analysis respects the control flow of program and the variables that have their address exposed, which can be referenced order of statements. A flow-insensitive pointer analysis computes by pointers, the address-taken variables; and another one containing a unique solution for the whole program, in contrast to a flow- the variables which are never referenced by pointers, the top-level sensitive ones, which computes a solution to each program point variables.