Discovering and Debugging Algebraic Specifications for Java Classes
Total Page:16
File Type:pdf, Size:1020Kb
Discovering and Debugging Algebraic Specifications for Java Classes by Johannes Henkel Vordiplom, Darmstadt University of Technology, 1999 M.S., University of Colorado at Boulder, 2001 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirement for the degree of Doctor of Philosophy Department of Computer Science 2004 This thesis entitled: Discovering and Debugging Algebraic Specifications for Java Classes written by Johannes Henkel has been approved for the department of Computer Science Amer Diwan Daniel Connors James Martin William Waite Alexander Wolf Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. Henkel, Johannes (Ph.D., Computer Science) Discovering and Debugging Algebraic Specifications for Java Classes Thesis directed by Assistant Professor Amer Diwan This thesis presents a system for reducing the cost of developing algebraic specifications for Java classes. The system consists of two components: an algebraic specification discovery tool and an algebraic interpreter. The first tool automatically discovers algebraic specifications from Java classes. The tool generates tests and captures the information it observes during their execution as algebraic axioms. In practice, this tool is accurate, but not complete. Still, the discovered specifications are a good starting point for writing a specification. The second component of our system is the algebraic specification interpreter, which helps developers in achieving specification completeness. Given an algebraic specification of a class, the interpreter generates a rapid prototype which can be used within an application just like any regular Java class. When running an application that uses the rapid prototype, the interpreter prints error messages that tell the developer in which way the specification is incomplete. We validate evaluate the performance of our tools and present case studies that demonstrate the usefulness of the approach. iii iv Dedication To my mother, my father, and my sister. v vi Acknowledgments First and foremost, I want to thank my advisor Amer Diwan for supporting me with freedom and guidance, inspiration and feedback, patience and challenges, and everything else I needed as a student. Amer has taught me all important aspects of conducting research, including how to communicate ideas and re- sults. Most importantly though, Amer never lost interest in my work, and his friendship with his students makes working with him a true privilege. I want to thank the members of my thesis committee, Daniel Connors, James Martin, William Waite, and Alexander Wolf, for being a great resource and for providing constructive and supportive feedback to the ideas presented in this thesis. I want to thank Alexander Wolf and Dirk Grunwald for being interested in my work, giving great advice, and for their support of my job search. I want to thank Michael Burke for making two summer internships at the IBM Watson lab possible for me. He was a great mentor, and I enjoyed working with him on several cool research projects. Michael also improved my writing by iteratively eliminating many redundant words from a report we wrote together; this was a very useful experience. I want to thank Michael for being my promoter inside and outside of IBM. Most importantly, I want to thank Michael for being a great friend. I want to thank the people whom I worked closely with at IBM: Peri Tarr, for the nice experience of hacking a little Hyper/J tool together, and Harold Ossher, for being a most supportive manager. I want to thank many others at IBM Research whom I was happy to exchange ideas with: Michael Hind, Vivek Sarkar, Mukund Raghavachari, William Harrison, Bob Schloss, John Field, Frank Tip, Robert O'Callahan, Mark Wegman, Philippe Charles, and many others. I want to thank Wolfgang Henhapl and Ulrik Schroeder for teaching me ax- iomatic specifications, algebraic specifications, and object oriented programming at the very beginning of my career as a computer science student. I want to thank vii Ulrik for also being a great mentor when I was a TA for his class. I want to also thank him for supporting my application as an exchange student, and for supporting my academic job search. I want to thank the members of our research group in Boulder, for being a nice, friendly environment. Despite the span of research topics that our group is exploring, there has always been opportunity for collaboration and great feed- back. I want to thank Martin Hirzel; working with him was a lot of fun, and I also enjoyed our excursions to the Rocky Mountains. I want to thank Han Lee for sharing a cubicle with me, Matthias Hauswirth, for teaching me snowboard- ing, Christoph Reichenbach, for being interested in my work and for helping me a lot, and Daniel von Dincklage, for our political discussions. I want to thank everyone I shared the systems lab with, and even the machine learning folks from downstairs and the software engineering students from the other side of the hallway; and Marco Gruteser, for sharing an apartment with me. I want to thank my mentors at GMD-IPSI, Silvia Hollfelder, Andre Everts, Gerald Huck, and Ingo Macherius, who sparked my interest in computer science research problems very early in my career as a CS student. Even without air conditioning, working at the IPSI lab was always fun. Last, but not least, I want to thank all my friends, and my family, whom I managed to stay in contact with despite the geographical distance. Having good friends and family at home will help me to make any place my home, as long as there is a telephone and a network connection. Johannes Henkel Boulder, Colorado, USA May 2004 viii Contents 1 Introduction 1 1.1 Motivation . 1 1.1.1 Quality criteria for documentation . 1 1.1.2 Deficiencies of informal documentation . 2 1.1.3 Formal specifications . 3 1.1.4 Other properties of formal specifications . 3 1.1.5 On the Importance of Container Classes . 4 1.1.6 Case Study: Axiomatic Specifications versus Algebraic Specifications . 9 1.2 Overview of Our Approach . 14 1.2.1 Idea . 14 1.2.2 An Algebraic Specification Discovery Tool . 15 1.2.3 An Algebraic Specification Interpreter . 15 1.2.4 Usage Scenarios . 16 1.3 Dependency on the Java Programming Language . 19 1.4 Organization of this thesis . 20 1.5 Origin of the chapters . 20 2 An Algebraic Specification Language 21 2.1 Scope of the Algebraic Specification Language . 21 2.2 Definition of the Algebraic Specification Language . 22 2.2.1 Specification Name . 23 2.2.2 Sorts . 23 2.2.3 Function Types . 24 2.2.4 Simulation Set . 25 2.2.5 Algebraic Axioms . 26 ix Contents 3 Discovering Algebraic Specifications from Java Classes 27 3.1 Overview . 27 3.2 Automatically Mapping Java Classes to Algebras . 28 3.3 Generating Ground Terms . 31 3.3.1 Randomized Selection of Terms . 32 3.3.2 Exhaustive Term Generation . 32 3.3.3 Growing Terms . 35 3.3.4 Generating Arguments for Terms . 37 3.3.5 Future Work on Term Generation . 38 3.4 Term Equivalence . 39 3.4.1 Equivalence for Primitive Types . 40 3.4.2 Comparing the References Computed by Terms . 41 3.4.3 Representation Equivalence . 42 3.4.4 Observational Equivalence . 42 3.5 Finding Equations . 43 3.5.1 State Equations: Equality of Distinct Terms . 44 3.5.2 Observer Equations: Equality of a Term to a Constant . 44 3.5.3 Difference Equations: Constant Difference Between Terms 45 3.6 Generating Axioms . 45 3.7 Axiom Redundancy Elimination by Axiom Rewriting . 47 3.8 Optimizing the Specification Discovery Process . 48 3.8.1 Early Rewriting . 49 3.8.2 Making Use of Discovered Knowledge . 50 3.9 Towards Support for Conditional Axioms . 51 3.10 Discussion . 53 4 An Embedded Algebraic Specification Interpreter 55 4.1 Overview . 55 4.2 Motivation . 55 4.3 Approach . 57 4.3.1 Required Inputs for Simulation . 57 4.3.2 Modifying Applications during Class-Loading . 59 4.3.3 Generating Simulation Stubs . 60 4.3.4 Modelling Object State with Algebraic Terms . 60 4.3.5 Incomplete Specifications . 61 4.4 Algebraic Term Rewriting . 62 4.4.1 Overview of Rewriting . 62 4.4.2 Strategies for Algebraic Term Rewriting . 63 x Contents 4.4.3 Conditional Axioms . 64 4.4.4 References to External Methods . 65 4.4.5 Debugging Support . 66 4.5 Discussion . 67 5 Evaluation 69 5.1 Performance Evaluation of our Algebraic Specification Discovery Tool . 69 5.1.1 Performance of the Tool . 70 5.1.2 Coverage Measurements . 75 5.1.3 Manual Inspection of Axioms . 76 5.2 Performance Evaluation of our Algebraic Term Rewriting Engine . 77 5.3 Case Study: Extreme Specifying . 80 5.4 Case Study: Discovering and Debugging a Specification . 83 6 Related Work 85 6.1 Specification Languages and Algebraic Specifications . 85 6.2 Term Rewriting . 86 6.3 Dynamic Invariant Detection . 87 6.4 Discovering Programs from Examples . 88 6.5 Static Program Analysis . 90 6.5.1 Reliably Approximating the Dynamic Behavior of Programs 90 6.5.2 Reverse Engineering and Design Recovery . 91 6.6 Testing . 92 7 Conclusion 95 xi Contents xii List of Tables 1.1 Modularity in large Java projects. 2 1.2 Container class usage in large Java projects. 7 3.1 Example Terms . 30 5.1 Java classes used in our evaluation . 70 5.2 Timings for our benchmark programs. 71 5.3 Efficiency of Term generation.