Refinement-Based Program Analysis Tools

Refinement-Based Program Analysis Tools Manu Sridharan Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2007-125 http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-125.html October 23, 2007 Copyright © 2007, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Refinement-Based Program Analysis Tools by Manu Sridharan B.S. (Massachusetts Institute of Technology) 2001 M.Eng. (Massachusetts Institute of Technology) 2002 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Professor Rastislav Bodík, Chair Professor George Necula Professor Leo Harrington Fall 2007 The dissertation of Manu Sridharan is approved: Professor Rastislav Bodík, Chair Date Professor George Necula Date Professor Leo Harrington Date University of California, Berkeley Fall 2007 Refinement-Based Program Analysis Tools Copyright © 2007 by Manu Sridharan Abstract Refinement-Based Program Analysis Tools by Manu Sridharan Doctor of Philosophy in Computer Science University of California, Berkeley Professor Rastislav Bodík, Chair Program analysis tools are starting to change how software is developed. Verifiers can now eliminate certain complex bugs in large code bases, and automatic refactor- ing tools can greatly simplify code cleanup. Nevertheless, writing robust large-scale software remains a challenge, as greater use of component frameworks complicates debugging and program understanding. Developers need more powerful programming tools to combat this complexity and produce reliable code. This dissertation presents two techniques for more powerful debugging and program understanding tools based on refinement. In general, refinement-based techniques aim to discover interesting properties of a large program by only reasoning about the most important parts of the program (typically a small amount of code) precisely, abstracting away the behavior of much of the program. Our key contribu- tion is the first framework for effective refinement-based handling of object-oriented data structures; pervasive use of such data structures thwarts the effectiveness of most existing analyses and tools. Our two refinement-based techniques significantly advance the state-of-the-art in program analyses and tools for object-oriented languages. The first technique is a refinement-based points-to analysis that can compute precise answers in interactive 1 time. This analysis enables precise reasoning about data structure behavior in clients that require interactive performance, e.g., just-in-time compilers and interactive de- velopment environment (IDE) tools. Our second technique is thin slicing, which gives usable answers to code relevance questions—e.g., "What code might have caused this crash?"—addressing a long-standing challenge for analysis tools. In this dissertation, we describe the design and implementation of these techniques and present experimental evaluations that validate their key properties. Professor Rastislav Bodík, Chair Date 2 Contents List of Figures vi List of Tables xi 1 Introduction 1 1.1 Our Approach: Refinement ........................ 5 1.2 Refinement-Based Points-To Analysis .................. 8 1.2.1 Motivation ............................. 8 1.2.2 A Brief History of Points-to Analysis .............. 10 1.2.3 Our Approach ........................... 14 1.3 Thin Slicing ................................ 17 1.3.1 Motivation ............................. 18 1.3.2 Our Approach ........................... 19 1.4 Dissertation Overview .......................... 24 2 Points-To Analysis Background 26 2.1 Points-To Analysis Definition ...................... 26 2.2 Points-To Analysis Precision ....................... 27 2.2.1 Assignments ............................ 28 2.2.2 Field accesses ........................... 29 i 2.2.3 Call Graph ............................ 30 2.2.4 Method Calls ........................... 32 2.2.5 Heap Abstraction ......................... 36 2.2.6 Control Flow ........................... 39 3 Points-To Analysis Formulations 42 3.1 Context-Free Language Reachability .................. 43 3.2 Context-Insensitive Formulation ..................... 46 3.2.1 Graph Representation ...................... 46 3.2.2 Analysis Grammar ........................ 48 3.2.3 Other Java Language Features .................. 52 3.3 Context-Sensitive Formulation ...................... 53 3.4 On-The-Fly Call Graph Formulation .................. 59 3.4.1 Intuition .............................. 60 3.4.2 Graph Representation ...................... 62 3.4.3 The LOTF Language ........................ 64 3.4.4 Adding Context Sensitivity .................... 70 3.4.5 Discussion ............................. 72 4 Context-Insensitive Points-To Analysis 76 4.1 Algorithm Overview ........................... 77 4.1.1 Demand-Driven Points-To Analysis ............... 77 4.1.2 Client-Driven Refinement .................... 79 4.1.3 Simplified Formulation ...................... 80 4.1.4 Refinement Algorithm ...................... 81 4.1.5 Proofs of Termination and Soundness .............. 89 4.2 Regular Approximation .......................... 94 4.2.1 Regular Reachability ....................... 95 ii 4.2.2 RegularPT ............................. 97 4.2.3 Improving Precision with Types ................. 98 4.3 Refinement ................................ 99 4.3.1 Refining through match edge removal .............. 100 4.3.2 RefinedRegularPT ........................ 101 4.4 Evaluation ................................. 106 4.4.1 Experimental Configuration ................... 107 4.4.2 Experimental Results ....................... 112 4.5 Java vs. C ................................. 123 4.6 FullFS Details ............................... 125 4.7 Adaptation of Tabulation Algorithm .................. 130 4.7.1 Tabulation Algorithm for Points-To Analysis .......... 130 4.7.2 Discussion ............................. 136 5 Context-Sensitive Points-To Analysis 139 5.1 Algorithm Overview ........................... 140 5.1.1 Simplified Formulation ...................... 140 5.1.2 Context-Sensitive Refinement Algorithm ............ 142 5.1.3 Refinement on Java programs .................. 143 5.2 Decidable Formulation .......................... 149 5.2.1 Context-Sensitive Analysis in CFL-Reachability ........ 149 5.2.2 Handling Recursion ........................ 151 5.3 Refinement Algorithm .......................... 153 5.3.1 Refinement for LRF-reachability ................. 153 5.3.2 Pseudocode ............................ 157 5.3.2.1 Refinement Loop .................... 158 5.3.2.2 Computing Points-To Sets ............... 159 iii 5.3.2.3 Refinement Policy ................... 166 5.4 Evaluation ................................. 166 5.4.1 Experimental Configuration ................... 167 5.4.2 Experimental Results ....................... 171 6 Thin Slicing 177 6.1 Defining Thin Slices ........................... 178 6.2 Thin Slices as Dependences ....................... 180 6.3 Expanding Thin Slices .......................... 182 6.3.1 Question 1: Explaining Aliasing ................. 183 6.3.2 Question 2: Control Dependence ................ 186 6.4 Computing Thin Slices .......................... 187 6.4.1 Graph Construction ....................... 188 6.4.2 Context-Insensitive Thin Slicing ................. 189 6.4.3 Context-Sensitive Thin Slicing .................. 189 6.5 Evaluation ................................. 190 6.5.1 Configuration and Methodology ................. 192 6.5.2 Experiment: Locating Bugs ................... 196 6.5.3 Experiment: Understanding Tough Casts ............ 200 7 Related Work 204 7.1 Pointer Analysis Related Work ..................... 204 7.1.1 Context-insensitive points-to analysis .............. 205 7.1.2 Context-sensitive points-to analysis ............... 206 7.1.3 Refinement-based points-to analysis ............... 208 7.1.4 Demand-driven points-to analysis ................ 208 7.1.5 CFL-reachability ......................... 209 7.1.6 Incremental points-to analysis .................. 210 iv 7.1.7 Cast verification .......................... 210 7.2 Thin Slicing Related Work ........................ 211 8 Conclusions and Future Work 215 Bibliography 218 v List of Figures 1.1 Graphs showing the number of public methods for the Eclipse plat- form [Ecl] and the Java standard library [J2S] in successive releases. The size of Eclipse has increased by a factor of 3 since its initial release, while the Java library size has increased by a factor of 3.8. ...... 2 1.2 Code examples to informally illustrate the effects of our refinement technique. ................................. 6 1.3 Example showing the advantages of thin slicing. The six statements with underlined expressions are in the thin slice seeded with line 24, while the traditional slice for line 24 contains all displayed code. The bodies of functions with inessential behavior (e.g., print()) have been elided for clarity. ............................. 20 2.1 A simple example to illustrate different treatments of assignments.

Load more