Adaptive Approximate State Storage

Adaptive Approximate State Storage A dissertation presented by Peter C. Dillinger to the Faculty of the Graduate School of the College of Computer and Information Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy Northeastern University Boston, Massachusetts December, 2010 Acknowledgments I can only begin to thank all who have helped me in completing this Ph.D. I especially thank my advisor, Panagiotis (Pete) Manolios, for his pa- tience, insight, encouragement, constructive criticism, and perspective on computer science, working as a scientist, and everything. After my Bache- lor’s and Master’s degrees at Georgia Tech, I decided to continue there for a Ph.D. because of an awesome professor I had recently started working with, Pete Manolios. I was able to transition to Ph.D. work with the confidence that I had an advisor who would help me find the best ways to apply my strengths and who would work with me on my weaknesses. Without Pete, I would not have been able to distinguish the things that only I would care about from the things that would become widely read and respected. For such successes, first thanks goes to Pete. Although I transferred to Northeastern University to continue working under Pete, I have many people to thank from my time at Georgia Tech. Per- haps at the top of the list is the great teacher and wise computer scientist, Prof. Olin Shivers, because I took three great classes from him while at Geor- gia Tech and he moved to Northeastern University a little before Pete and I did. I value the time and many interesting interactions I had with Profes- sors Yannis Smaragdakis and Mary Jean Harrold. I thank Professors Spencer Rugaber and Dick Lipton for serving on my qualifying exam committee, and also Santosh Pande, Alex Orso, Jim Xu, and H. Venkateswaran for many use- ful discussions over the years. Fellow Ph.D. students were always great for bouncing ideas off of and helped me to keep a shred of sanity. They included i ii ACKNOWLEDGMENTS Daron Vroon, Sudarshan Srinivasan, Yimin Zhang, G.J. Halfond, Jim Clause, Matt Might, David Fisher (who also moved to Northeastern!), Gayatri Sub- ramanian, Roma Kane, Saswat Anand, David Dagon, David Hilley, and many others. At Northeastern University, I would like to thank those who welcomed Pete and me into collaborative relationships. Prof. Gene Cooperman and his students in particular were great about welcoming us into a forum to throw around and vet ideas relating to data structures, systems, and enumeration algorithms. On my thesis committee, Gene was active, thoughtful, and always constructive, and thanks to his input, my dissertation is better. After proving my skill in his beautifully-constructed algorithms class, Prof. Jay Aslam agreed to be on my committee, and I thank him for his insightful input and cheerful service. Thanks to other professors and students who sent ideas and feedback my way. Once again, I did not go completely insane, thanks to fellow students including Stevie Strickland (an old friend), David Herman, Dimitris Vardoulakis, Richard Cobbe, David Fisher (again!), Carl Eastlund, Dan Kunkle, Felix Klock, Christine Hang (Chambers), Ben Cham- bers, and heir to ACL2s, Harsh Raju Chamarthi. Others to thank include those I worked with during internships, and others who have made important contributions and have been willing to cor- respond on technical points. I thank Gerard Holzmann for all his work on SPIN, and for the opportunity to work with him and Rajeev Joshi as an in- tern at NASA/JPL. I thank Willem Visser, Corina Pasareanu, Peter Mehlitz and others for the opportunity to work with them at NASA/Ames. I particu- larly thank Willem, now a professor at Stellenbosch, for serving on my thesis committee and, specifically, striking down an erroneous claim I tried to make regarding transitive omissions. My thesis builds on important contributions by Gerard Holzmann, Michael Mitzenmacher, Rasmus Pagh, John G. Cleary, and others, but I thank these in particular for taking time to answer my technical queries. iii I would also like to thank the National Science Foundation, for funding a grant that supported me during much of my time as a Ph.D. student. For that grant, I was the primary developer of ACL2s, “The ACL2 Sedan,” which has helped make computer-aided theorem proving accessible even to college freshmen 1. Though it took time away from my unrelated thesis work, my work on ACL2s was unquestionably valuable in boosting and expanding my reputation and experience. And it kept the stipend coming. And it gave me a chance to collaborate with some great people at the University of Texas at Austin, most notably Matt Kaufmann and my grand-advisor, J Moore. I thank them for all the ways in which they have helped and supported me over the years. In particular, I thank Matt for co-authoring a paper with me [22] and, thus, lowering my Erdos˝ number to three, and for offering a job reference that seems compelling enough to land me any job anywhere. Ever. Olin Shivers and Matt Might hold a special place in my heart. According to my recollection, each of them said that by taking a full time job before finishing my dissertation, I would not finish my Ph.D. I assume they were merely giving me the perfect encouragement to finish: an opportunity to prove them wrong. I thank them for that extra motivation. It is likewise apropos that I thank my employer, Coverity, for allowing a leave of absence in which to finish. Proving Matt and Olin wrong—it still counts, right? I proudly also thank my favorite distraction from Ph.D. work. No, I’m not talking about Portal2. In fact, “distraction” is not the right word, because it might be more accurate to say that the Ph.D. work was a distraction while I was waiting to meet this one. I’m talking about Miss Natasha Herman, my fiancée. She has enriched my life immensely and has helped me to become more the person I want to be, and for that I will always thank her. 1I would not have believed it if I had not seen it with my own eyes—and my own red pen. I thank the Northeastern College of Computer and Information Science for allowing me to teach the course one semester, and supporting me during teaching. 2Registered trademark of the Valve Corporation. iv ACKNOWLEDGMENTS Earlier steps in making this achievement possible are thanks to my high quality public education in Thomasville, Georgia, at the Georgia Governor’s Honors Program, and at Georgia Tech. I am grateful to all the teachers who put in an extra effort to challenge an arrogant kid like me. Final thanks go to those who took the earliest steps in making this achievement possible, my parents Charles and Sissy Dillinger. Their dedication, moral support, and logistical support cannot be overstated. At an early age, they taught me to love knowledge, to love computers and programming, and to love a noble challenge. From that point, they made their best possible move: to let me explore my passion on my own terms and according to my own motivation, but to diligently smooth out any structural or logistical barriers to my learning. Perhaps the best way I know to thank my parents is with the observation, “I am pleased with my accomplishment, proving you should be with yours.” Thesis Statement In an explicit-state model checker, no knowledge of the reachable state space size is needed for the speed and the possibility of overlooking errors to be near optimal for available memory. v Abstract Efficiently storing and matching visited states is key to the effectiveness of explicit-state model checkers such as SPIN. Models of concurrent programs often have too many reachable states to enumerate easily in main memory, and an efficient model checker can exhaust main memory in minutes by storing state descriptors exactly. A popular alternative is to over-approximate the set of visited states using a randomized, probabilistic data structure, such as a Bloom filter. Because the approximation is sound and does not slow down the search with revisitation of states, it tends to find errors quickly. Because it is probabilistically complete, the approach can also convincingly demonstrate lack of errors. In this dissertation, I analyze the approximate state storage problem in unprecedented detail, improve upon standard solutions, and demonstrate a novel approach that solves a configuration dilemma facing users of the standard solutions. Especially with my improvements, the best Bloom filter or hash compaction configuration for a given situation is quite good, but choosing the best configuration depends on a good estimate of the number of reachable states. Such an estimate is usually only available after checking a model. I solve this dilemma with an efficient storage scheme that is not tied to a particular estimate, because it is adaptive. Regardless of the number of states encountered at run time, its accuracy is near the information-theoretic optimal. It is also competitively fast, thanks to a novel in-place adaptation algorithm and a favorable access pattern to memory. vii Contents Acknowledgments i Thesis Statement v Abstract vii Contents ix List of Figures xv Outline of Contributions xix 1 Motivation and Scope 1 1.1 Verification problems . 2 1.2 Explicit-state vs. alternatives . 2 1.3 State enumeration . 3 1.4 Out-of-core storage and caching . 4 1.5 Heuristic storage . 5 1.6 Non-heuristic, potentially over-approximate storage . 7 1.7 Hash functions . 8 2 Overview of Dissertation 11 2.1 Understanding the problem .

Adaptive Approximate State Storage

Communications of the Acm

Benchmarks for IP Forwarding Tables

Diffie and Hellman Receive 2015 Turing Award Rod Searcey/Stanford University

Communications of the Acm

Stal Aanderaa Hao Wang Lars Aarvik Martin Abadi Zohar Manna James

In Defense of Minhash Over Simhash

Contents U U U

Communications of the Acm

Contact: Jim Ormond 212-626-0505 [email protected]

A Bibliography of Publications in ACM SIGACT News

Conference Info & Program

Fundamental Data Structures