Safe Programming at the C Level of Abstraction
Total Page:16
File Type:pdf, Size:1020Kb
SAFE PROGRAMMING AT THE C LEVEL OF ABSTRACTION A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Daniel Joseph Grossman August 2003 c Daniel Joseph Grossman 2003 ALL RIGHTS RESERVED SAFE PROGRAMMING AT THE C LEVEL OF ABSTRACTION Daniel Joseph Grossman, Ph.D. Cornell University 2003 Memory safety and type safety are invaluable features for building robust software. However, most safe programming languages are at a high level of abstraction; pro- grammers have little control over data representation and memory management. This control is one reason C remains the de facto standard for writing systems software or extending legacy systems already written in C. The Cyclone language aims to bring safety to C-style programming without sacrificing the programmer control necessary for low-level software. A combination of advanced compile-time techniques, run-time checks, and modern language features helps achieve this goal. This dissertation focuses on the advanced compile-time techniques. A type system with quantified types and effects prevents incorrect type casts, dangling- pointer dereferences, and data races. An intraprocedural flow analysis prevents dereferencing NULL pointers and uninitialized memory, and extensions to it can prevent array-bounds violations and misused unions. Formal abstract machines and rigorous proofs demonstrate that these compile-time techniques are sound: The safety violations they address become impossible. A less formal evaluation establishes two other design goals of equal importance. First, the language remains expressive. Although it rejects some safe programs, it permits many C idioms regarding generic code, manual memory management, lock-based synchronization, NULL-pointer checking, and data initialization. Sec- ond, the language represents a unified approach. A small collection of techniques addresses a range of problems, indicating that the problems are more alike than they originally seem. BIOGRAPHICAL SKETCH Dan Grossman was born on January 20, 1975 in St. Louis, Missouri. Thus far, his diligent support of St. Louis sports teams has produced one World Series victory in three appearances and one Super Bowl victory in two appearances, but no Stanley Cup finals. Upon graduating from Parkway Central High School in 1993, Dan’s peers selected him as the male mostly likely to become a politician. Dan spent five summers working at the S-F Scout Ranch. For three years, he managed camp business operations with a typewriter, carbon paper, and an adding machine. In 1997, Dan received a B.A. in Computer Science and a B.S. in Electrical Engi- neering from Rice University. These awards signified the termination of convenient access to La Mexicana, a restaurant that produces the world’s best burritos. Also in 1997, Dan completed hiking the 2160-mile Appalachian Trail. To this day, he smirks when others describe somewhere as a, “really long walk.” For the last six years, Dan has lived primarily in Ithaca, New York while completing his doctorate in computer science at Cornell University. His ice-hockey skills have improved considerably. Dan has been to every state in the United States except Alaska, Hawaii, Nevada, Michigan, Minnesota, Wisconsin, and South Carolina. (He has been to airports in Michigan, Minnesota, Nevada, and South Carolina.) In his lifetime, Dan has eaten only two green olives. iii ACKNOWLEDGMENTS Attempting to acknowledge those who have helped me in endeavors that have led to this dissertation is an activity doomed to failure. In my six years of graduate school and twenty-two years of formal education, far too many people have shared a kind word, an intellectual insight, or a pint of beer for this summary to be complete. But rest assured: I thank you all. My advisor, Greg Morrisett, has been an irreplaceable source of advice, guid- ance, and encouragement. This dissertation clearly represents an outgrowth of his research vision. He has also shown an amazing ability to determine when my stubbornness is an asset and when it is a liability. This dissertation owes its credibility to the actual Cyclone implementation, which is joint work with Greg, Trevor Jim, Michael Hicks, James Cheney, and Yanling Wang. These people have all made significant intellectual contributions as well as doing grunt work I am glad I did not have to do. My thesis committee, Greg, Keshav Pingali, and Dexter Kozen (filling in for Jim West) have done a thorough and admirable job, especially considering this dissertation’s length. My research-group predecessors deserve accolades for serving as mentors, tu- tors, and collaborators. Moreover, they deserve individual mention for unique lessons such as the importance of morning coffee (Neal Glew), things to do in New Jersey (Fred Smith), fashion sense (Dave Walker), the joy of 8AM push-ups (Stephanie Weirich), and the proper placement of hyphens (Steve Zdancewic). My housemates (fellow computer scientists Amanda Holland-Minkley, Nick Howe, and Kevin O’Neill) deserve mention for helping from beginning (when I didn’t know where the grocery store was) to middle (when I couldn’t get a date to save my life) to end (when I spent all my time deciding what to do next). My officemates (Neal, Steve Zdancewic, Yanling Wang, Stephen Chong, and Alexa Sharp) would only occasionally roll their eyes at a whiteboard full of ` characters. (Not to mention an acknowledgments section with one.) I have benefitted from the watchful eye of many faculty members at Cornell, most notably Bob Constable, Jon Kleinberg, Dexter Kozen, Lillian Lee, and An- iv drew Myers. I also enjoyed great mentors during summer internships (John Reppy and Rob DeLine) and as an undergraduate (Matthias Felleisen). The Cornell Com- puter Science staff members, particularly Juanita Heyerman, have been a great help. Friends who cooked me dinner, bought me beer, played hockey or soccer or softball, went to the theatre, played bridge, or in some cases all of the above, have made my time in Ithaca truly special. Only one such person gets to see her name in this paragraph, though. In an effort to be brief, let me just say that Kate Forester is responsible for me leaving Ithaca a happier person than I arrived. My family has shown unwavering support in all my pursuits, so it should come as no surprise that they encouraged me throughout graduate school. They have occasionally even ignored my advice not to read my papers. But more importantly, they taught me the dedication, perseverance, and integrity that are prerequisites for an undertaking of this nature. Finally, I am grateful for the financial support of a National Science Foundation Graduate Fellowship and an Intel Graduate Fellowship. v vi TABLE OF CONTENTS 1 Introduction and Thesis 1 1.1 Safe C-Level Programming . 2 1.2 Relation of This Dissertation to Cyclone . 3 1.3 Explanation of Thesis . 6 1.4 Contributions . 10 1.5 Overview . 12 2 Examples and Techniques 14 2.1 Type Variables . 14 2.2 Singleton Integer Types . 18 2.3 Region Variables . 20 2.4 Lock Variables . 22 2.5 Summary of Type-Level Variables . 24 2.6 Definite Assignment . 25 2.7 NULL Pointers . 27 2.8 Checking Against Tag Variables . 28 2.9 Interprocedural Flow . 29 2.10 Summary of Flow-Analysis Applications . 30 3 Type Variables 31 3.1 Basic Constructs . 33 3.1.1 Universal Quantification . 33 3.1.2 Existential Quantification . 36 3.1.3 Type Constructors . 38 3.1.4 Default Annotations . 40 3.2 Size and Calling Convention . 42 3.3 Mutation . 44 3.3.1 Polymorphic References . 44 3.3.2 Mutable Existential Packages . 46 3.3.3 Informal Comparison of Problems . 48 vii 3.4 Evaluation . 48 3.4.1 Good News . 48 3.4.2 Bad News . 51 3.5 Formalism . 54 3.5.1 Syntax . 55 3.5.2 Dynamic Semantics . 57 3.5.3 Static Semantics . 62 3.5.4 Type Safety . 67 3.6 Related Work . 67 4 Region-Based Memory Management 71 4.1 Basic Constructs . 73 4.1.1 Region Terms . 73 4.1.2 Region Names . 74 4.1.3 Quantified Types and Type Constructors . 75 4.1.4 Subtyping . 78 4.1.5 Default Annotations . 79 4.2 Interaction With Type Variables . 81 4.2.1 Avoiding Effect Variables . 83 4.2.2 Using Existential Types . 84 4.3 Run-Time Support . 85 4.4 Evaluation . 86 4.4.1 Good News . 86 4.4.2 Bad News . 88 4.4.3 Advanced Examples . 91 4.5 Formalism . 94 4.5.1 Syntax . 95 4.5.2 Dynamic Semantics . 97 4.5.3 Static Semantics . 100 4.5.4 Type Safety . 108 4.6 Related Work . 108 5 Type-Safe Multithreading 112 5.1 Basic Constructs . 115 5.1.1 Multithreading Terms . 115 5.1.2 Multithreading Types . 116 5.1.3 Multithreading Kinds . 119 5.1.4 Default Annotations . 120 5.2 Interaction With Type Variables . 121 5.3 Interaction With Regions . 123 viii 5.3.1 Comparing Locks and Regions . 123 5.3.2 Combining Locks and Regions . 124 5.4 Run-Time Support . 125 5.5 Evaluation . 126 5.5.1 Good News . 126 5.5.2 Bad News . 128 5.6 Formalism . 130 5.6.1 Syntax . 131 5.6.2 Dynamic Semantics . 133 5.6.3 Static Semantics . 137 5.6.4 Type Safety . 144 5.7 Related Work . 144 6 Uninitialized Memory and NULL Pointers 148 6.1 Background and Contributions . 149 6.1.1 A Basic Analysis . 150 6.1.2 Reasoning About Pointers . 151 6.1.3 Evaluation Order . 152 6.2 The Analysis . 154 6.2.1 Abstract States . 154 6.2.2 Expressions . 155 6.2.3 Statements .