
University of London Imperial College of Science, Technology and Medicine Department of Computing Some directed graph algorithms and their application to pointer analysis David J. Pearce February 2005 Submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Engineering of the University of London Abstract This thesis is focused on improving execution time and precision of scalable pointer analysis. Such an analysis statically determines the targets of all pointer variables in a program. We formulate the analysis as a directed graph problem, where the solution can be obtained by a computation similar, in many ways, to transitive closure. As with transitive closure, identifying strongly connected components and transitive edges offers significant gains. However, our problem differs as the computation can result in new edges being added to the graph and, hence, dynamic algorithms are needed to efficiently identify these structures. Thus, pointer analysis has often been likened to the dynamic transitive closure problem. Two new algorithms for dynamically maintaining the topological order of a directed graph are presented. The first is a unit change algorithm, meaning the solution must be recomputed immediately following an edge insertion. While this has a marginally inferior worse-case time bound, compared with a previous solution, it is far simpler to implement and has fewer restrictions. For these reasons, we find it to be faster in practice and provide an experimental study over random graphs to support this. Our second is a batch algorithm, meaning the solution can be updated after several insertions, and it is the first truly dynamic solution to obtain an optimal time bound of O(v + e + b) over a batch b of edge insertions. Again, we provide an experimental study over random graphs comparing this against the standard approach to topological sort. Furthermore, we demonstrate how both algorithms can be extended to the problem of dynamically detecting strongly connected components (i.e. cycles), thus achieving the first solutions which do not need to traverse the entire graph for half of all edge insertions. Several other new techniques for improving pointer analysis are also presented. These include difference propagation, which avoids redundant work by tracking changes in the points-to sets, and a novel approach to field-sensitive analysis of C. Finally, a detailed study of numerous solving algorithms, evaluating our techniques and algorithms against previous work, is contained herein. Our benchmark suite consists of many common C programs ranging in size from 15,000-200,000 lines of code. 2 Acknowledgements I am grateful to my supervisor Paul Kelly for his guidance throughout this work and for having the courage to let me develop my own directions. He has always supported my work through helpful advice, astute criticism and stimulating conversation. He also encouraged me to undertake internships at Bell Labs and IBM Hursley. For these things, I thank him. Many other people have been helpful to me throughout my time at Imperial College. My second supervisor, Chris Hankin, has provided many excellent comments and suggestions on my work. His depth of knowledge on program analysis has also been invaluable. I would also like to thank Oskar Mencer, who has always given an interesting and alternate viewpoint on life, and those members of the Software Performance Group, in particular Olav Beckmann and Kwok Cheung Yeung, for many interesting and delightful discussions. To my parents I am, of course, indebted for giving me such an excellent start in life. They encouraged my interest in computers from an early age and have provided both moral and financial support throughout the years. I must also thank the Engineering and Physical Sciences Research Council (EPSRC), without whose financial support I could not have done this work. I would also like to thank my examiners, Andy King and Mark Harman, for their excellent and helpful comments and their general appreciation of my work. Lastly, but my no means least, I must thank my partner Melika King for her love and patience throughout the final and most testing years of my work. 3 Contents 1 Introduction 10 1.1 Applications . 12 1.2 Contributions . 13 1.3 Thesis Organisation . 13 2 Constraint-Based Pointer Analysis 15 2.1 Solving the Analysis . 17 2.1.1 Set Implementation . 19 2.2 Extending the Basic Model . 21 2.2.1 Context-Sensitivity . 21 2.2.2 Flow-Sensitivity . 24 2.2.3 Field-Sensitivity . 26 2.2.4 The Heap . 28 2.2.5 Arrays, Conditionals and Loops . 31 2.2.6 Metrics . 32 2.2.7 Concluding Remarks . 33 2.3 Alternative Approaches to Pointer Analysis . 34 2.3.1 Abstract Interpretation . 34 2.3.2 Unification . 37 2.4 Concluding Remarks . 40 3 Dynamic Topological Order 41 3.1 Background . 42 3.1.1 The Complexity Parameter δxy . 44 3.1.2 The MNR Algorithm . 46 3.1.3 The AHRSZ Algorithm . 48 3.2 Algorithm PTO1 . 53 3.3 Algorithm PTO2 . 57 3.4 Experimental Study . 63 3.4.1 Generating a Random DAG . 63 3.4.2 Experimental Procedure . 64 3.4.3 Single Insertion Experiments . 65 4 CONTENTS 5 3.4.4 Experiment 2 - Batch Insertions . 67 3.5 Dynamic Strongly Connected Components . 69 3.6 Concluding Remarks . 71 4 Efficient Pointer Analysis 73 4.1 Worklist Solvers . 73 4.1.1 Background . 74 4.1.2 Algorithm PW1, a Simple Worklist Solver . 76 4.1.3 Algorithm PWD, a Difference Propagation Solver . 80 4.1.4 Experimental Study . 83 4.2 Beyond the Worklist . 88 4.2.1 Algorithm PW2 . 88 4.2.2 The Heintze-Tardieu Algorithm . 91 4.2.3 Experimental Study . 94 4.3 Concluding Remarks . 94 5 Field-Sensitive Pointer Analysis 97 5.1 Indirect Function Calls . 98 5.2 Field-Sensitive Pointer Analysis . 100 5.3 Experimental Study . 103 5.4 Related Work . 107 5.4.1 Field-Based Pointer Analysis . 111 5.5 Concluding Remarks . 113 6 Conclusions and Future Work 115 6.1 Review of Contributions . 115 6.2 Future Work for the Dynamic Topological Order Problem . 116 6.2.1 Experiments on Real-World Graphs . 117 6.2.2 A Bounded Complexity Result for PTO2 . 117 6.2.3 A Batch Variant of PTO1 . 117 6.2.4 Improving PTO1 . 118 6.3 Future Work on Pointer Analysis . 119 6.3.1 Eliminating Positive Weight Cycles . 119 6.3.2 Developing the Heintze-Tardieu Algorithm . 120 6.3.3 Transitive Edges . 120 6.4 Conclusions . 121 A Relating to Heintze-Aiken Systems 122 A.1 Inductive Form . 123 B Strongly Connected Components 126 List of Figures 2.1 An inference system for flow- and context-insensitive pointer analysis . 17 2.2 An illustration of how get/set methods affect field-sensitivity . 29 2.3 An example of how a dynamic heap model can improve the precision of pointer analysis . 30 2.4 An example showing a pointer analysis formulated using abstract interpretation . 35 2.5 Pseudo-code for a simple worklist solver . 35 2.6 An illustration of how unification avoids revisiting statements . 39 3.1 Algorithm STO, a simple solution to the dynamic topological order problem. 42 3.2 Pseudo-code for algorithm MNR, an existing solution for the (unit change) dy- namic topological order problem . 47 3.3 Pseudo-code for algorithm AHRSZ, an optimal solution for the (unit change) dy- namic topological order problem. 52 3.4 Pseudo-code for PTO1, a new algorithm for the unit change dynamic topological order problem . 56 3.5 Pseudo-code for PTO2, a novel and unique solution to the batch dynamic topolog- ical order problem . 61 3.6 Pseudo-code for our procedure measuring the Average Cost Per Insertion (ACPI) of algorithms for the dynamic topological order problem . 64 3.7 Experimental data illustrating how the Average Cost Per Insertion (ACPI) and certain complexity metrics vary with density for three unit change solutions to the dynamic topological order problem . 66 3.8 Experimental data illustrating how the Average Cost Per Insertion (ACPI) varies with batch size for all five solutions to the dynamic topological order problem . 68 3.9 Pseudo-code demonstrating how the depth-first search component of MNR can be modified to back-propagate component information . 69 3.10 An example showing MSCC, a dynamic algorithm for detecting strongly con- nected components, in use. 70 3.11 The extended shift procedure for MSCC, a dynamic algorithm for detecting strongly connected components. 70 4.1 Pseudo-code for a standard worklist solver . 74 6 LIST OF FIGURES 7 4.2 Pseudo-code for PW1, an extended worklist algorithm for solving pointer analysis 77 4.3 Pseudo-code for PWD, an extended worklist algorithm for solving pointer analysis which employs difference propagation . 81 4.4 A chart of our experimental data investigating the effect of iteration strategy on the performance of PW1, a worklist algorithm for solving pointer analysis . 86 4.5 A chart of our experimental data looking at visit count for PW1, a worklist algo- rithm for solving pointer analysis . 86 4.6 A chart of our experimental data looking at the effect of dynamic cycle detection on the performance of PW1, a worklist algorithm for solving pointer analysis . 87 4.7 A chart of our experimental data looking at the effect of dynamic cycle detection on visit count for PW1, a worklist algorithm for solving pointer analysis . 87 4.8 A chart of our experimental data looking at the effect of difference propagation on the performance of PW1, a worklist algorithm for solving pointer analysis . 89 4.9 A chart of our experimental data looking at the effect of difference propagation on average set size for PW1, a worklist algorithm for solving pointer analysis .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages148 Page
-
File Size-