The Mechanized Verification of Garbage Collector Implementations

Total Page:16

File Type:pdf, Size:1020Kb

The Mechanized Verification of Garbage Collector Implementations Abstract The Mechanized Verification of Garbage Collector Implementations Andrew Evan McCreight 2008 Languages such as Java, C], Standard ML, and Haskell use automated memory man- agement to simplify development and improve reliability by eliminating the need for programmers to manually free unused objects. The cost of this improved program- mer experience is that the implementation of the language becomes more complex, requiring a garbage collector. Garbage collectors are becoming increasingly sophis- ticated to adapt them to high-performance, concurrent and real-time applications, making internal collector invariants and the interface with user programs (mutators in garbage collector parlance) subtle and difficult to implement correctly. My thesis is that treating the garbage collected heap as an abstract data type allows the specification and verification of flexible and expressive garbage collector- mutator interfaces, and that the mechanized verification of garbage collector im- plementations using Hoare-style logic is both practical and effective. The mutator- garbage collector interface I describe in this paper is expressive enough to be used with verified mutators, while being abstract enough that a single interface can be used with both stop-the-world and incremental collectors. It is also powerful enough to be used with collectors that require read or write barriers. In this dissertation, I describe my framework for the mechanized verification of garbage collector implementations. I discuss the formal setting, and my approach to garbage collector interfaces. I also describe the specification and verification of the Cheney and Baker copying collectors. The Baker collector is an incremental collector 2 that requires a read barrier. Finally, I discuss some practical tools I developed for program verification using separation logic in the Coq proof assistant. The Mechanized Verification of Garbage Collector Implementations A Dissertation Presented to the Faculty of the Graduate School of Yale University in Candidacy for the Degree of Doctor of Philosophy by Andrew Evan McCreight Dissertation Director: Zhong Shao December 2008 Copyright c 2008 by Andrew Evan McCreight All rights reserved. ii Contents Acknowledgments xiii 1 Introduction 1 2 Formal Setting 5 2.1 Introduction . .5 2.2 The machine . .6 2.2.1 Syntax . .6 2.2.2 Dynamic semantics . .9 2.3 SCAP . 10 2.3.1 Instruction sequence typing rules . 11 2.3.2 Top level typing rules . 14 2.3.3 Soundness . 16 2.4 Weak SCAP . 16 2.5 Separation logic . 20 2.5.1 Predicates . 21 2.5.2 Basic properties . 23 2.5.3 Machine properties . 24 2.6 Implementation . 24 2.7 Specification style . 27 iii 2.8 Example . 30 2.8.1 Verification . 32 2.9 Conclusion . 39 3 Abstract Data Types for Hoare Logic 40 3.1 Introduction . 40 3.2 Basic abstract data types . 42 3.3 Indexed abstract data types . 43 3.3.1 Example . 44 3.3.2 Depth-indexed stack ADT . 46 3.3.3 List-indexed ADT . 47 3.4 Functor-based implementation of ADTs . 48 3.5 ADTs for deep embeddings . 52 3.5.1 Example specification . 55 3.6 Related work and conclusion . 55 4 The Garbage Collector Interface 59 4.1 Introduction . 59 4.2 Garbage collection as ADT . 61 4.3 The abstract state . 64 4.3.1 Representation predicate . 65 4.3.2 Minor interface parameters . 66 4.3.3 Properties of the representation predicate . 67 4.4 Operation specifications . 71 4.4.1 GC step . 71 4.4.2 Read specification . 74 4.4.3 Write specification . 76 iv 4.4.4 Allocator specification . 77 4.5 Top level components . 79 4.6 Collector coercions . 80 4.7 Conclusion . 81 5 Representation Predicates 83 5.1 Introduction . 83 5.2 Mark-sweep collector . 85 5.3 Copying collector . 88 5.4 Incremental copying collector . 92 5.5 Conclusion . 99 6 Cheney Collector Verification 100 6.1 Introduction . 100 6.2 Generic predicates and definitions . 105 6.3 Field scanning . 110 6.3.1 Implementation . 111 6.3.2 Specification . 114 6.3.3 Verification . 121 6.4 The loop . 123 6.4.1 Implementation . 123 6.4.2 Specification . 124 6.4.3 Verification . 128 6.5 The collector . 130 6.5.1 Implementation . 130 6.5.2 Specification . 132 6.5.3 Verification . 137 v 6.6 The allocator . 143 6.6.1 Implementation . 145 6.6.2 Specification . 145 6.6.3 Verification . 146 6.7 Reading and writing . 150 6.7.1 Implementation . 151 6.7.2 Specification . 152 6.7.3 Verification . 152 6.8 Putting it together . 153 6.9 Conclusion . 154 7 Baker Collector Verification 156 7.1 Introduction . 156 7.2 Field scanning . 160 7.3 The loop . 161 7.3.1 Implementation . 161 7.3.2 Specification . 163 7.3.3 Verification . 168 7.4 Allocator . 175 7.4.1 Implementation . 177 7.4.2 Specification . 177 7.4.3 Allocator verification . 183 7.4.4 Restore root verification . 193 7.4.5 Specification weakening . 193 7.5 Read barrier . 195 7.5.1 Implementation . 195 vi 7.5.2 Specification . 197 7.5.3 Verification . 198 7.6 Write barrier . 202 7.6.1 Verification . 203 7.7 Putting it all together . 203 7.8 Conclusion . 204 8 Tools and Tactics 206 8.1 Introduction . 206 8.2 Machine semantics . 207 8.2.1 Command step simplifications . 207 8.2.2 State update simplifications . 209 8.3 Finite sets . 210 8.4 Separation logic . 212 8.4.1 Simplification . 212 8.4.2 Matching . 213 8.4.3 Reordering . 214 8.4.4 Reassociation . 215 8.4.5 Reordering and reassociation . 217 8.5 Related work . 218 8.6 Conclusion . ..
Recommended publications
  • Gnu Smalltalk Library Reference Version 3.2.5 24 November 2017
    gnu Smalltalk Library Reference Version 3.2.5 24 November 2017 by Paolo Bonzini Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled \GNU Free Documentation License". 1 3 1 Base classes 1.1 Tree Classes documented in this manual are boldfaced. Autoload Object Behavior ClassDescription Class Metaclass BlockClosure Boolean False True CObject CAggregate CArray CPtr CString CCallable CCallbackDescriptor CFunctionDescriptor CCompound CStruct CUnion CScalar CChar CDouble CFloat CInt CLong CLongDouble CLongLong CShort CSmalltalk CUChar CByte CBoolean CUInt CULong CULongLong CUShort ContextPart 4 GNU Smalltalk Library Reference BlockContext MethodContext Continuation CType CPtrCType CArrayCType CScalarCType CStringCType Delay Directory DLD DumperProxy AlternativeObjectProxy NullProxy VersionableObjectProxy PluggableProxy SingletonProxy DynamicVariable Exception Error ArithmeticError ZeroDivide MessageNotUnderstood SystemExceptions.InvalidValue SystemExceptions.EmptyCollection SystemExceptions.InvalidArgument SystemExceptions.AlreadyDefined SystemExceptions.ArgumentOutOfRange SystemExceptions.IndexOutOfRange SystemExceptions.InvalidSize SystemExceptions.NotFound SystemExceptions.PackageNotAvailable SystemExceptions.InvalidProcessState SystemExceptions.InvalidState
    [Show full text]
  • Secure the Clones - Static Enforcement of Policies for Secure Object Copying Thomas Jensen, Florent Kirchner, David Pichardie
    Secure the Clones - Static Enforcement of Policies for Secure Object Copying Thomas Jensen, Florent Kirchner, David Pichardie To cite this version: Thomas Jensen, Florent Kirchner, David Pichardie. Secure the Clones - Static Enforcement of Policies for Secure Object Copying. ESOP 2011, 2011, Saarbrucken, Germany. hal-01110817 HAL Id: hal-01110817 https://hal.inria.fr/hal-01110817 Submitted on 28 Jan 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Secure the Clones * Static Enforcement of Policies for Secure Object Copying Thomas Jensen, Florent Kirchner, and David Pichardie INRIA Rennes – Bretagne Atlantique, France [email protected] Abstract. Exchanging mutable data objects with untrusted code is a delicate matter because of the risk of creating a data space that is accessible by an attacker. Consequently, secure programming guidelines for Java stress the importance of using defensive copying before accepting or handing out references to an inter- nal mutable object. However, implementation of a copy method (like clone()) is entirely left to the programmer. It may not provide a sufficiently deep copy of an object and is subject to overriding by a malicious sub-class. Currently no language-based mechanism supports secure object cloning.
    [Show full text]
  • Lazy Object Copy As a Platform for Population-Based Probabilistic Programming
    Lazy object copy as a platform for population-based probabilistic programming Lawrence M. Murray Uber AI Abstract This work considers dynamic memory management for population-based probabilistic programs, such as those using particle methods for inference. Such programs exhibit a pattern of allocating, copying, potentially mutating, and deallocating collections of similar objects through successive generations. These objects may assemble data structures such as stacks, queues, lists, ragged arrays, and trees, which may be of random, and possibly unbounded, size. For the simple case of N particles, T generations, D objects, and resampling at each generation, dense representation requires O(DNT ) memory, while sparse representation requires only O(DT + DN log DN) memory, based on existing theoretical results. This work describes an object copy-on-write platform to automate this saving for the programmer. The core idea is formalized using labeled directed multigraphs, where vertices represent objects, edges the pointers between them, and labels the necessary bookkeeping. A specific labeling scheme is proposed for high performance under the motivating pattern. The platform is implemented for the Birch probabilistic programming language, using smart pointers, hash tables, and reference-counting garbage collection. It is tested empirically on a number of realistic probabilistic programs, and shown to significantly reduce memory use and execution time in a manner consistent with theoretical expectations. This enables copy-on-write for the imperative programmer, lazy deep copies for the object-oriented programmer, and in-place write optimizations for the functional programmer. 1 Introduction Probabilistic programming aims at better accommodating the workflow of probabilistic modeling and inference in general-purpose programming languages.
    [Show full text]
  • Concurrent Copying Garbage Collection with Hardware Transactional Memory
    Concurrent Copying Garbage Collection with Hardware Transactional Memory Zixian Cai 蔡子弦 A thesis submitted in partial fulfilment of the degree of Bachelor of Philosophy (Honours) at The Australian National University November 2020 © Zixian Cai 2020 Typeset in TeX Gyre Pagella, URW Classico, and DejaVu Sans Mono by XƎTEX and XƎLATEX. Except where otherwise indicated, this thesis is my own original work. Zixian Cai 12 November 2020 To 2020, what does not kill you makes you stronger. Acknowledgments First and foremost, I thank my shifu1, Steve Blackburn. When I asked you how to learn to do research, you said that it often takes the form of an apprenticeship. Indeed, you taught me the craft of research by example, demonstrating how to be a good teacher, a good researcher, and a good community leader. Apart from the vast technical ex- pertise, you have also been a constant source of advice, support, and encouragement throughout my undergraduate career. I could not ask for a better mentor. I thank Mike Bond from The Ohio State University, who co-supervises me. Meet- ings with you and Steve are always enjoyable for me. I am sorry for the meetings that went over time, often around the dinner time for you, lowering your glucose level. You help me turn complicated ideas into implementation with your experiences in hard- ware transactional memory. I am thankful for your patient guidance and inspiration. Many people have helped me with this thesis. I thank Adrian Herrera, Kunal Sa- reen, and Brenda Wang for subjecting themselves to the draft of this document, and providing valuable feedback.
    [Show full text]
  • Obstacl: a Language with Objects, Subtyping, and Classes
    OBSTACL: A LANGUAGE WITH OBJECTS, SUBTYPING, AND CLASSES A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY By Amit Jayant Patel December 2001 c Copyright 2002 by Amit Jayant Patel All Rights Reserved ii I certify that I have read this dissertation and that in my opin- ion it is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. John Mitchell (Principal Adviser) I certify that I have read this dissertation and that in my opin- ion it is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Kathleen Fisher I certify that I have read this dissertation and that in my opin- ion it is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. David Dill Approved for the University Committee on Graduate Studies: iii Abstract Widely used object-oriented programming languages such as C++ and Java support soft- ware engineering practices but do not have a clean theoretical foundation. On the other hand, most research languages with well-developed foundations are not designed to support software engineering practices. This thesis bridges the gap by presenting OBSTACL, an object-oriented extension of ML with a sound theoretical basis and features that lend themselves to efficient implementation. OBSTACL supports modular programming techniques with objects, classes, structural subtyping, and a modular object construction system. OBSTACL's parameterized inheritance mechanism can be used to express both single inheritance and most common uses of multiple inheritance.
    [Show full text]
  • Lecture 8 EECS 498 Winter 2000
    C++ Object Model Classes are abstract types. Objects are instances of that type. Three distinctive aspects of an OO System • Encapsulation Classes group related data and methods • Inheritance Derived classes extend and refine existing abstractions ... the design of consistent, intuitive, and useful class hierarchies is a complex and difficult art. • Dynamic Binding (polymorphism) Delay selection of appropriate abstraction implementation until program execution. ... OO programming is abstract data types with polymorphism C++ Object Model 1 of 19 Lecture 8 EECS 498 Winter 2000 Aspects to Examine • Distinctive Types of Programming • Semantics of Constructors • Semantics of Data • Semantics of Functions • Semantics of Construction, Destruction & Copy C++ Object Model 2 of 19 Lecture 8 EECS 498 Winter 2000 Distinctive Types of Programming • Procedural char *target, *source = “Hello World!”; target = malloc( strlen( source )); strcpy( target, source ); • Abstract Data Type String target, source = “Hello World!”; target = source; if (source == target) ... • Object Oriented class B { virtual int F(int) {return 1;} /* pure? /; } class D : B { virtual void F(int) { return 2;} } class E : B { virtual void F(int) { return 3;} } ... B *b = new B, B *d = new D, E *e = new D; cout << b->F() << d->F() << e->F() << endl; Object oriented model requires reference to operating object. Referred to as this (self, current, etc.) Often first parameter to function C++ Object Model 3 of 19 Lecture 8 EECS 498 Winter 2000 Storage Layout Procedural • Storage sizes are known at compile time Pointers have fixed size on target platform Data namespace encapsulated in type structure Function namespace global Abstract Data Types • Namespace for functions bound to data structure Signature of function is name, result type, parameter types Object Oriented • Virtual functions invocation structure Virtual base classes A virtual base class occurs exactly once in a derived class regardless of the number of times encountered in the class inheritance heirarchy.
    [Show full text]
  • SECURE the CLONES 1. Introduction Exchanging Data Objects with Untrusted Code Is a Delicate Matter Because of the Risk of Creati
    Logical Methods in Computer Science Vol. 8 (2:05) 2012, pp. 1–30 Submitted Sep. 17, 2011 www.lmcs-online.org Published May. 31, 2012 SECURE THE CLONES THOMAS JENSEN, FLORENT KIRCHNER, AND DAVID PICHARDIE INRIA Rennes { Bretagne Atlantique, France e-mail address: fi[email protected] Abstract. Exchanging mutable data objects with untrusted code is a delicate matter be- cause of the risk of creating a data space that is accessible by an attacker. Consequently, secure programming guidelines for Java stress the importance of using defensive copying before accepting or handing out references to an internal mutable object. However, im- plementation of a copy method (like clone()) is entirely left to the programmer. It may not provide a sufficiently deep copy of an object and is subject to overriding by a mali- cious sub-class. Currently no language-based mechanism supports secure object cloning. This paper proposes a type-based annotation system for defining modular copy policies for class-based object-oriented programs. A copy policy specifies the maximally allowed sharing between an object and its clone. We present a static enforcement mechanism that will guarantee that all classes fulfil their copy policy, even in the presence of overriding of copy methods, and establish the semantic correctness of the overall approach in Coq. The mechanism has been implemented and experimentally evaluated on clone methods from several Java libraries. 1. Introduction Exchanging data objects with untrusted code is a delicate matter because of the risk of creating a data space that is accessible by an attacker. Consequently, secure programming guidelines for Java such as those proposed by Sun [17] and CERT [6] stress the importance of using defensive copying or cloning before accepting or handing out references to an internal mutable object.
    [Show full text]
  • Memory Management for Multi-Language Multi-Runtime Systems on Multi-Core Architectures
    UNIVERSITY OF CALIFORNIA Santa Barbara Memory Management for Multi-Language Multi-Runtime Systems on Multi-Core Architectures A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Michal Wegiel Committee in Charge: Professor Chandra Krintz, Chair Professor Amr El Abbadi Professor Ben Zhao March 2011 The Dissertation of Michal Wegiel is approved: Professor Amr El Abbadi Professor Ben Zhao Professor Chandra Krintz, Committee Chairperson January 2011 Memory Management for Multi-Language Multi-Runtime Systems on Multi-Core Architectures Copyright © 2011 by Michal Wegiel iii Dedication and Gratitude I dedicate this dissertation to my family: my parents, Maria and Krzysztof, and my sister, Barbara, for their unconditional support and encouragement throughout all stages of my education. I am deeply grateful to Chandra Krintz for all the support, guidance, mentorship, and help that she has provided during the entire process. I would like to thank Ben Zhao, Amr El Abbadi, and Rich Wolski for serving on my Ph.D. committee. I am grateful to Grzegorz Czajkowski and Laurent Daynes for being my mentors and collaborators during my internship at Sun Labs. Finally, I would like to thank the staff, faculty, and fellow graduate students at the Com- puter Science department at UC Santa Barbara for their support and the opportunity to pursue this work. iv Acknowledgements The text of Chapters 3–7 is in part a reprint of the material as it appears in the conference proceedings listed below. The dissertation author was the primary researcher while the co-author listed on each publication directed and supervised the research which forms the basis for these chapters.
    [Show full text]
  • Objective-C 2.0 Essentials Third Edition
    Objective-C 2.0 Essentials Third Edition i Objective-C 2.0 Essentials – Third Edition ISBN-13: 978-1480262102 © 2012 Neil Smyth. This book is provided for personal use only. Unauthorized use, reproduction and/or distribution strictly prohibited. All rights reserved. The content of this book is provided for informational purposes only. Neither the publisher nor the author offers any warranties or representation, express or implied, with regard to the accuracy of information contained in this book, nor do they accept any liability for any loss or damage arising from any errors or omissions. This book contains trademarked terms that are used solely for editorial purposes and to the benefit of the respective trademark owner. The terms used within this book are not intended as infringement of any trademarks. Find more eBooks online at http://www.eBookFrenzy.com. Rev. 3.0 ii Table of Contents 1. About Objective-C Essentials ...................................................................................................... 1 1.1 Why are you reading this? .................................................................................................... 1 1.2 Supported Platforms ............................................................................................................. 2 2. The History of Objective-C .......................................................................................................... 3 2.1 The C Programming Language .............................................................................................
    [Show full text]
  • Memory Management Programming Guide
    Memory Management Programming Guide Performance 2010-06-24 PROVIDED “AS IS,” AND YOU, THE READER, ARE ASSUMING THE ENTIRE RISK AS TO ITS QUALITY Apple Inc. AND ACCURACY. © 2010 Apple Inc. IN NO EVENT WILL APPLE BE LIABLE FOR DIRECT, All rights reserved. INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES RESULTING FROM ANY DEFECT OR INACCURACY IN THIS DOCUMENT, even No part of this publication may be reproduced, if advised of the possibility of such damages. stored in a retrieval system, or transmitted, in THE WARRANTY AND REMEDIES SET FORTH ABOVE any form or by any means, mechanical, ARE EXCLUSIVE AND IN LIEU OF ALL OTHERS, ORAL OR WRITTEN, EXPRESS OR IMPLIED. No Apple electronic, photocopying, recording, or dealer, agent, or employee is authorized to make otherwise, without prior written permission of any modification, extension, or addition to this Apple Inc., with the following exceptions: Any warranty. person is hereby authorized to store Some states do not allow the exclusion or limitation of implied warranties or liability for incidental or documentation on a single computer for consequential damages, so the above limitation or personal use only and to print copies of exclusion may not apply to you. This warranty gives you specific legal rights, and you may also have documentation for personal use provided that other rights which vary from state to state. the documentation contains Apple’s copyright notice. The Apple logo is a trademark of Apple Inc. Use of the “keyboard” Apple logo (Option-Shift-K) for commercial purposes without the prior written consent of Apple may constitute trademark infringement and unfair competition in violation of federal and state laws.
    [Show full text]
  • Life with Object-Relational Mappers Or, How I Learned to Stop Worrying and Love the ORM
    Life with Object-Relational Mappers Or, how I learned to stop worrying and love the ORM. PGCon 2011 Christophe Pettus PostgreSQL Experts, Inc. [email protected] Has this ever happened to you? • “This query is running way too slowly. God, RDBMSes suck!” • “Well, you just need to change the WHERE clause…” • “I can’t change the SQL. We’re using an…” ORM Let’s talk about ORMs. • What is an ORM? • Why do we have to put up with them? • What are they good at? • What are the problems? • Can’t we just make them go away? • No. Sorry. • How can we live with them? Oh, right. Hi! • Christophe Pettus • Consultant with PostgreSQL Experts, Inc. • PostgreSQL person since 1998. • Application and systems architect. • Designed a bunch of ORMs for various languages. WHEN WORLDS COLLIDE. The Two Worlds • Object-Oriented Programming. • Relational Database Management. Object-Oriented Programming. • Let’s ask Wikipedia! • “Object-oriented programming (OOP) is a programming paradigm using ‘objects’ …” • Ask three programmers, get five answers. • “… – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs.” What is the critical OO feature? • Data abstraction? Nope. • Messaging? Nope. • Modularity? Nope. • Polymorphism? Nope, but getting warmer. • Inheritance? Getting colder… • Encapsulation. Encapsulation • Objects export behavior, not data. • Many language expose the data — but that’s a shortcut. • Many objects don’t have significant behavior — but that’s a degenerate case. • OO is all about wrapping up the behavior and the data into a single package, the object. Object Relationships. • Object models are collections of graphs.
    [Show full text]
  • Important Java Programming Concepts
    Appendix A Important Java Programming Concepts This appendix provides a brief orientation through the concepts of object-oriented programming in Java that are critical for understanding the material in this book and that are not specifically introduced as part of the main content. It is not intended as a general Java programming primer, but rather as a refresher and orientation through the features of the language that play a major role in the design of software in Java. If necessary, this overview should be complemented by an introductory book on Java programming, or on the relevant sections in the Java Tutorial [10]. A.1 Variables and Types Variables store values. In Java, variables are typed and the type of the variable must be declared before the name of the variable. Java distinguishes between two major categories of types: primitive types and reference types. Primitive types are used to represent numbers and Boolean values. Variables of a primitive type store the actual data that represents the value. When the content of a variable of a primitive type is assigned to another variable, a copy of the data stored in the initial variable is created and stored in the destination variable. For example: int original = 10; int copy = original; In this case variable original of the primitive type int (short for “integer”) is assigned the integer literal value 10. In the second assignment, a copy of the value 10 is used to initialize the new variable copy. Reference types represent more complex arrangements of data as defined by classes (see Section A.2).
    [Show full text]