Acknowledgments Diagnosing and Tolerating Bugs in Deployed Systems
Total Page:16
File Type:pdf, Size:1020Kb
The Dissertation Committee for Michael David Bond certifies that this is the approved version of the following dissertation: Acknowledgments Diagnosing and Tolerating Bugs in Deployed Systems I am deeply grateful to Kathryn McKinley for supporting and mentor- by ing me in many ways. Kathryn has provided invaluable technical expertise Diagnosing and Tolerating Bugs in Deployed Systems and feedback even as we worked in new areas. She has been unwaveringly Michael David Bond, B.S., M.C.S. enthusiastic and positive about students and ideas and projects, and she has been a strong advocate, putting my and other students’ interests first. I en- tered graduate school uncertain about getting a Ph.D., and I leave inspired to pursue an academic career because of Kathryn. Committee: Steve Blackburn has been an enthusiastic supporter and has devoted a lot of his time to solving problems and giving advice, both solicited and unso- Copyright DISSERTATION Kathryn S. McKinley, Supervisor licited. Steve, Keshav Pingali, Peter Stone, and Emmett Witchel have given by Presented to the Faculty of the Graduate School of helpful feedback and spent a lot of time reading long documents and attending Michael David Bond Stephen M. Blackburn The University of Texas at Austin long talks. Vitaly Shmatikov and Steve Keckler have provided valuable advice 2008 in Partial Fulfillment and mentoring. David Padua and Craig Zilles were supportive advisors when Keshav Pingali of the Requirements I was a master’s student at the University of Ilinois. for the Degree of The graduate student community at UT has been incredibly support- Peter Stone ive. I am thankful to have Ben Wiedermann, Xianglong Huang, and Milind DOCTOR OF PHILOSOPHY Kulkarni as friends and colleagues. I am indebted to them and to the fol- Emmett Witchel lowing friends and colleagues for their technical and personal support: Kar- tik Agaram, Katherine Coons, Boris Grot, Sam Guyer, Jungwoo Ha, Maria Jump, Byeongcheol Lee, Bert Maher, Kristi Morton, Nick Nethercote, Phoebe THE UNIVERSITY OF TEXAS AT AUSTIN Nethercote, Alison Norman, Dimitris Prountzos, Jennifer Sartor, Simha Sethu- madhavan, and Suriya Subramaniam. In my cohort but outside my area, Jason December 2008 iv Davis, Justin Brickell, and Matt Taylor have been good friends and colleagues Diagnosing and Tolerating Bugs in Deployed Systems To diagnose post-deployment failures, programmers need to understand some embedded systems have no disk. Our second technique, leak pruning, without the good sense to be systems researchers. the program operations—control and data flow—responsible for failures. Prior addresses these limitations by automatically reclaiming likely leaked memory. approaches for widespread tracking of control and data flow often slow pro- It preserves semantics by waiting until heap exhaustion to reclaim memory— This thesis would have not have been possible without Jikes RVM. I am Publication No. indebted to all Jikes RVM developers and especially Steve Blackburn, Daniel grams by two times or more and increase memory usage significantly, making then intercepting program attempts to access reclaimed memory. them impractical for online use. We present novel techniques for representing Frampton, Robin Garner, David Grove, and Ian Rogers for their generous Michael David Bond, Ph.D. We demonstrate the utility and efficiency of post-deployment debugging control and data flow that add modest overhead while still providing diagnos- efforts maintaining and improving Jikes RVM. The University of Texas at Austin, 2008 on large, real-world programs—where they pinpoint bug causes and improve tic information directly useful for fixing bugs. The first technique, probabilistic I am thankful for guidance and help provided by Gem Naviar, Tom software availability. Post-deployment debugging efficiently exposes and ex- calling context (PCC), provides low-overhead context sensitivity to dynamic Horn, Aubrey Hooser, and Gloria Ramirez. Gloria Ramirez and Bill Mark Supervisor: Kathryn S. McKinley ploits programming language semantics and opens up a promising direction analyses that detect new or anomalous deployed behavior. Second, Bell statis- helped solve a two-body problem that would have prevented me from attending for improving software robustness. tically correlates control flow with data, and it reconstructs program locations UT. Intel funded my research with an internship and a fellowship, and Suresh associated with data. We apply Bell to leak detection, where it tracks and Srinivas has supported and mentored me. Deployed software is never free of bugs. These bugs cause software to fail, wasting billions of dollars and sometimes causing injury or death. Bugs reports program locations responsible for real memory leaks. The third tech- I am thankful for the unconditional love and support that my parents are pervasive in modern software, which is increasingly complex due to demand nique, origin tracking, tracks the originating program locations of unusable have provided throughout my life. for features, extensibility, and integration of components. Complete validation values such as null references, by storing origins in place of unusable values. My partner Julia has selflessly supported and tolerated me while also and exhaustive testing are infeasible for substantial software systems, and These origins are cheap to track and are directly useful for diagnosing real- being awesome and fun, and I am grateful. therefore deployed software exhibits untested and unanalyzed behaviors. world null pointer exceptions. Software behaves differently after deployment due to different environ- Post-deployment diagnosis helps developers find and fix bugs, but in ments and inputs, so developers cannot find and fix all bugs before deploying the meantime, users need help with failing software. We present techniques software, and they cannot easily reproduce post-deployment bugs outside of that tolerate memory leaks, which are particularly difficult to diagnose since the deployed setting. This dissertation argues that post-deployment is a com- they have no immediate symptoms and may take days or longer to mate- pelling environment for diagnosing and tolerating bugs, and it introduces a rialize. Our techniques effectively narrow the gap between reachability and general approach called post-deployment debugging. Techniques in this class liveness by providing the illusion that dead but reachable objects do not con- are efficient enough to go unnoticed by users and accurate enough to find and sume resources. The techniques identify stale objects not used in a while and report the sources of errors to developers. We demonstrate that they help de- remove them from the application and garbage collector’s working set. The velopers find and fix bugs and help users get more functionality out of failing first technique, Melt, relocates stale memory to disk, so it can restore objects if software. the program uses them later. Growing leaks exhaust the disk eventually, and v vi vii viii 3.2 Probabilistic Calling Context . 22 4.4.1 Methodology ........................ 60 6.2.3 Multithreading . 112 3.2.1 CallingContext ...................... 22 4.4.2 SPECjbb2000........................ 62 6.2.4 SavingStaleSpace . 113 Table of Contents 3.2.2 ProbabilisticApproach . 23 4.4.3 Eclipse ........................... 66 6.3 Results............................... 113 3.2.3 Computing Calling Context Values . 24 4.4.4 AdaptiveProfiling . 68 6.3.1 Melt’sOverhead . 114 3.2.4 Querying Calling Context Values . 27 4.4.5 Discussion ......................... 70 6.3.2 ToleratingLeaks . 120 Acknowledgments iv 3.3 Implementation .......................... 28 4.5 Performance ............................ 72 6.4 ConclusionandInterpretation . 135 3.3.1 ComputingthePCCvalue . 28 4.5.1 Methodology ........................ 72 Abstract vi 3.3.2 QueryingthePCCvalue. 29 4.5.2 SpaceOverhead ...................... 73 Chapter 7. Leak Pruning 137 3.3.3 Defining Calling Context . 30 4.5.3 CompilationOverhead. 73 7.1 ApproachandSemantics . 138 List of Tables xiv 3.4 Results............................... 32 4.5.4 TimeOverhead....................... 74 7.1.1 Motivation ......................... 139 7.1.2 TriggeringLeakPruning. 140 List of Figures xv 3.4.1 Deterministic Calling Context Profiling . 32 4.6 ConclusionandInterpretation . 77 3.4.2 PotentialPCCClients . 33 7.1.3 Reclaiming Reachable Memory . 144 Chapter 5. Tracking the Origins of Unusable Values 78 Chapter 1. Introduction 1 3.4.3 PCCAccuracy ....................... 34 7.1.4 Exception and Collection Semantics . 144 5.1 OriginTrackinginJava. 81 1.1 Bugs in Deployed Software Systems . 1 3.4.4 PCCPerformance . .. .. 36 7.2 AlgorithmandImplementation. 146 5.1.1 Supporting Nonzero Null References . 81 1.2 Improving Reliability in the Deployed Setting . .. 3 3.4.5 Evaluating Context Sensitivity . 41 7.2.1 PredictingDeadObjects. 146 5.1.2 EncodingProgramLocations . 82 1.2.1 Diagnosing Bugs by Tracking Flow . 4 3.5 ConclusionandInterpretation . 43 7.2.2 The OBSERVE State .................... 147 1.2.2 ToleratingMemoryLeaks . 6 5.1.3 RedefiningProgramOperations. 83 7.2.3 The SELECT State ..................... 148 1.3 MeaningandImpact ....................... 9 Chapter 4. Storing Per-Object Sites in a Bit with Bell 45 5.1.4 Implementation . .. .. 86 7.2.4 The PRUNE State ..................... 150 4.1 Encoding and Decoding Per-Object Information . 46 5.2 FindingandFixingBugs . 87 7.2.5 Intercepting Accesses to Pruned References . 151 Chapter 2. Background 11 4.1.1 Encoding .........................