Automating Modular Program Verification by Refining
Total Page:16
File Type:pdf, Size:1020Kb
Automating Modular Program Verification by Refining Specifications by Mana Taghdiri Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2008 @ Massachusetts Institute of Technology 2008. All rights reserved. A uthor ......................... Department of Electrical Engineering and Computer Science December 2007 Certified by .................. ,-D--i-TN.Jackson Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by......... Terry P. Orlando Chairman, Department Committee on Graduate Students MASSACHUSELTS INSTITUTE OF TEGHNOLOGY APR 0 7 2008 ARCHIVES LIBRARIES Automating Modular Program Verification by Refining Specifications by Mana Taghdiri Submitted to the Department of Electrical Engineering and Computer Science on December 2007, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract Modular analyses of software systems rely on the specifications of the analyzed mod- ules. In many analysis techniques (e.g. ESC/Java), the specifications have to be provided by users. This puts a considerable burden on users and thus limits the applicability of such techniques. To avoid this problem, some modular analysis tech- niques automatically extract module summaries that capture specific aspects of the modules' behaviors. However, such summaries are only useful in checking a restricted class of properties. We describe a static modular analysis that automatically extracts procedure spec- ifications in order to check heap-manipulating programs against rich data structure properties. Extracted specifications are context-dependent; their precision depends on both the property being checked, and the calling context in which they are used. Starting from a rough over-approximation of the behavior of each call site, our anal- ysis computes an abstraction of the procedure being analyzed and checks it against the property. Specifications are further refined, as needed, in response to spurious counterexamples. The analysis terminates when either the property has been vali- dated (with respect to a finite domain), or a non-spurious counterexample has been found. Furthermore, we describe a lightweight static technique to extract specifications of heap-manipulating procedures. These specifications neither are context-dependent, nor require any domain finitizations. They summarize the general behavior of proce- dures in terms of their effect on program state. They bound the values of all variables and fields in the post-state of the procedure by relational expressions in terms of their values in the pre-state. The analysis maintains both upper and lower bounds so that in some cases an exact result can be obtained. Thesis Supervisor: Daniel N. Jackson Title: Professor of Electrical Engineering and Computer Science Acknowledgments I would like to thank my advisor Professor Daniel Jackson not only for his invaluable technical guidance on this project over the past few years, but also for his patience, encouragement, and enthusiasm that made the task of preparing talks and papers enjoyable. I also thank my thesis committee members Professor Martin Rinard and Professor Srini Devadas for their helpful comments. I thank my friends at MIT, Emina Torlak, Robert Seater, Gregory Dennis, Felix Chang, Jonathan Edwards, Derek Rayside, Sarfraz Khurshid, Ilya Shlyakhter, Manu Sridharan, Tina Nolte, Viktor Kuncak, Alexandru Salcianu, and Mandana Vaziri, who made my graduate studies a wonderful academic and social experience. I spe- cially thank Emina Torlak for her insights on the Alloy modeling language, and her implementation of Kodkod, the most efficient Alloy model finder currently available. I thank Gregory Dennis for many useful discussions about checking Java code, and for providing me with his tool Forge. I thank Mandana Vaziri for providing me with Jalloy, a Java bug finder that made it possible for me to evaluate my ideas at the very early stages. I thank Robert Seater, Viktor Kuncak, and Alexandru Salcianu for their valuable comments on my procedure summarization technique. But above all, I would like to thank my family: my parents Afsaneh and Hossein, and my sisters Shiva and Sara, who always encouraged me to learn and supported me with love throughout my life, and my husband Mehdi, for all his love, patience, encouragement, and support without which this thesis would have never been possible. This thesis is dedicated to my family. Contents 1 Introduction 17 1.1 Problem Description ......... ..... ... 18 1.1.1 Automating Modular Program Verification . 1.1.2 Existing Approaches ... .... .. .... 1.2 Proposed Solution .................. 1.2.1 Specification Refinement ........... 1.2.2 Instantiation: Karun . ............ 1.2.3 Limitations ... ................ 1.3 Exam ple .................... 1.3.1 User Experience . ............... 1.3.2 Underlying Analysis . ............ 1.4 Evaluation ..................... 1.5 Contributions . .................... 1.6 Thesis Organization .................. 2 Method 2.1 Overview .......... 2.2 Definitions ......... 2.3 Input Functions ...... 2.4 Computed Functions . 2.5 Input Functions Properties 2.6 Algorithm ......... 2.7 Properties ......... 2.7.1 Termination ............ 7 5 2.7.2 Completeness ............ 7 6 2.7.3 Soundness ............................. 3 Context 3.1 Programming Language: Java ...• • • • •... ..... • • •.. • 3.2 Specification Language: Alloy . ....................• 3.2.1 Declarations ...... 3.2.2 Expressions ...... 3.2.3 Formulas ........ 3.2.4 Auxiliary Constraints . 3.2.5 Examples ....... 3.3 Bounds .............. 4 Extracting Initial Specifications 89 4.1 Overview ............ .. 89 4.2 Logic............... .. 92 4.3 Examples . .......... 94 4.4 Abstraction Technique . ... 99 4.4.1 Definitions........ 99 4.4.2 Transfer Function . .. ..........• . .. 103 4.4.3 Simplifications ..... 110 4.5 Properties . .......... 112 4.5.1 Safety .......... 112 4.5.2 Termination ...... 123 4.5.3 Complexity ...... S. ... ... ... ... 123 4.6 Optimization ......... ... .... ... .. 125 5 Checking Abstract Programs: Forge 127 5.1 Overview ............... ........... 127 5.2 Translating Java to Alloy . 129 5.2.1 Encoding Basic Statements ....... ... ... 130 5.2.2 Encoding Call Site Specifications . ... ... ... 133 5.2.3 Example .................. ....... 138 5.3 Generating Relation Bounds ......... .. ...... 141 5.4 Solving Alloy Formulas: Kodkod ....... .. ... ... 142 6 Refining Specifications 145 6.1 Validity Checking of Counterexamples ..... 145 6.1.1 Interpretation of a Counterexample . .. 146 6.1.2 Checking Call Sites . ........... 151 6.2 Specification Refinement . .......... .. 154 6.2.1 Generating Small Unsatisfiability Proofs 155 6.2.2 Example ............. .... 158 7 Experiments 161 7.1 Overview. ............ ......... ............ 16 1 7.2 Case Studies ........... .. ....... ...... ...... 163 7.2.1 Quartz API ....... ............. ....... 167 7.2.2 OpenJGraph API .... ............. ....... 172 7.2.3 Discussions ....... ...... ....... ...... .. 173 7.3 Evaluating Performance .... ................... .. 174 7.4 Evaluating Initial Summaries ...... ....... ...... .. 176 7.4.1 Accuracy ........ ..... ............. ... 177 7.4.2 Effectiveness ....... .... .... .... .... ..... 18 1 8 Conclusion 185 8.1 Summary ............ .. ... 185 8.2 Related Work .......... ..... 187 8.2.1 Underlying Ideas . .. ..... 187 8.2.2 Overall Goal ....... .... 193 8.3 Discussion ............ ..... 200 8.3.1 Merits . .............. .......... .. .. .... 200 8.3.2 Limitations . ......... ......... .. .. .... 202 8.3.3 Future Directions ............................ .... 203 List of Figures 1-1 Exam ple. .. ................... 30 1-2 Properties: (a) the resulting order of nodes is a permutation of the original nodes of the graph, (b) the in-degree of each node is smaller than the number of nodes preceding it in the topological sort, (c) the representation invariants .......................... 32 1-3 Counterexample: (a) textual description, (b) pre-state heap, (c) corre- sponding graph ............................... 34 1-4 Program execution corresponding to the counterexample. Executed statements are given in bold. Called methods are expanded below call sites. Program states before and after the execution, and the state updates after each executed statement are also given. ........... 35 1-5 Initial specifications. ........................... 37 1-6 Final specification for fixIns. ............. ..... ... 38 2-1 An overview of the framework. ....... ........... ... 46 2-2 Abstract syntax for the analyzed language. ................. 47 2-3 Example: an xor operation. ....................... 48 2-4 Example: (a) an initial program state, (b) the corresponding trace: each line gives an executed statement and the state before the execution of that statement ................................ 49 2-5 Relationship between different domains. .................. 52 2-6 Translation example: (a) initial variable and value mappings, (b) the transformed xorAll method, (c) Variable mappings after translating each statement, and (d) the translation of statements to a boolean form ula . ...... ... .......... ........... .... 54 2-7 Examples of initial specification: (a) the xor method and its pre-state mappings, (b) first specification and the post-state variable mapping, (c) second specification and the post-state variable mapping ....