Weak-Consistency Group Communication and Membership

UNIVERSITY OF CALIFORNIA SANTA CRUZ Weak-consistency group communication and membership A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in COMPUTER AND INFORMATION SCIENCES by Richard Andrew Golding December 1992 The dissertation of Richard Andrew Golding is approved: Prof. Darrell Long Prof. Charles McDowell Dr. Kim Taylor Dr. John Wilkes Dean of Graduate Studies and Research Copyright c by Richard Andrew Golding 1992 iii Contents Abstract ix Acknowledgments x 1 Introduction 1 1.1 Requirements :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 3 1.2 Using replication : ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 4 1.3 Group communication mechanism : :: ::: ::: ::: ::: ::: :: ::: :: 5 1.4 Weak consistency ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 6 1.5 The Refdbms 3.0 system ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 8 1.6 Conventions in the text : ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 9 1.7 Organization of the dissertation :: :: ::: ::: ::: ::: ::: :: ::: :: 9 2 Terms and definitions 11 2.1 Consensus : ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 11 2.2 Principals : ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 12 2.3 Time and clocks : ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 13 2.4 Network :: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 14 2.5 Network services : ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 15 3 A framework for group communication systems 16 3.1 The framework :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 17 3.2 The application :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 19 3.2.1 The Refdbms application : :: ::: ::: ::: ::: ::: :: ::: :: 20 3.2.2 The Tattler system :: ::: :: ::: ::: ::: ::: ::: :: ::: :: 22 3.2.3 Handling update collisions : :: ::: ::: ::: ::: ::: :: ::: :: 23 3.3 Message delivery : ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 24 3.3.1 Propagating messages versus state :: ::: ::: ::: ::: :: ::: :: 27 3.4 Message ordering ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 28 3.4.1 Using message ordering :: :: ::: ::: ::: ::: ::: :: ::: :: 30 3.5 Group membership ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 31 3.5.1 Using group membership : :: ::: ::: ::: ::: ::: :: ::: :: 33 3.6 Summary :: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 34 4 Existing group communication systems 35 4.1 Centralized protocols :: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 36 iv 4.2 Consistent replication protocols :: :: ::: ::: ::: ::: ::: :: ::: :: 37 4.3 Orca RTS : ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 38 4.4 Isis :: ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 39 4.5 Epsilon serializability : ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 40 4.6 Psync : ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 41 4.7 A reliable multicast protocol : ::: :: ::: ::: ::: ::: ::: :: ::: :: 42 4.8 OSCAR :: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 43 4.9 Lazy Replication : ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 44 4.10 Epidemic replication :: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 45 4.11 Summary :: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 47 5 Weak-consistency communication 48 5.1 Reliable, eventual message delivery :: ::: ::: ::: ::: ::: :: ::: :: 48 5.1.1 Data structures for timestamped anti-entropy :: ::: ::: :: ::: :: 50 5.1.2 The timestamped anti-entropy protocol :: ::: ::: ::: :: ::: :: 55 5.2 Correctness ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 58 5.2.1 Logical communication topology :: ::: ::: ::: ::: :: ::: :: 59 5.2.2 Eventual communication :: :: ::: ::: ::: ::: ::: :: ::: :: 60 5.2.3 Summary vector progress : :: ::: ::: ::: ::: ::: :: ::: :: 63 5.3 Purging the message log ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 66 5.4 Extensions : ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 67 5.4.1 Selecting a session partner : :: ::: ::: ::: ::: ::: :: ::: :: 67 5.4.2 Principal failure and volatile storage : ::: ::: ::: ::: :: ::: :: 69 5.4.3 Combining anti-entropy with unreliable multicast ::: ::: :: ::: :: 70 5.4.4 Anti-entropy with unsynchronized clocks ::::::::::::::::: 73 5.5 Message ordering ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 74 5.6 Summary :: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 79 6 Group membership 81 6.1 Message delivery and dynamic membership : ::: ::: ::: ::: :: ::: :: 82 6.2 Correctness ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 83 6.3 Fault tolerance :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 84 6.4 Protocols :: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 86 6.4.1 Data structures : ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 86 6.4.2 Initializing a new group :: :: ::: ::: ::: ::: ::: :: ::: :: 88 6.4.3 Group join ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 88 6.4.4 Group leave :: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 92 6.4.5 Failure recovery ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 94 6.5 Summary :: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 95 7 Performance of weak-consistency protocols 97 7.1 Message reliability ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 97 7.1.1 Analytical modeling : ::: :: ::: ::: ::: ::: ::: :: ::: :: 98 7.1.2 Results :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 99 7.1.3 Volatile storage ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 101 7.2 Message latency : ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 101 v 7.2.1 Simulation modeling : ::: :: ::: ::: ::: ::: ::: :: ::: :: 102 7.2.2 Results :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 102 7.3 Group membership resilience ::: :: ::: ::: ::: ::: ::: :: ::: :: 105 7.3.1 Simulation modeling : ::: :: ::: ::: ::: ::: ::: :: ::: :: 106 7.3.2 Results :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 106 7.4 Traffic ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 112 7.4.1 Simulation modeling : ::: :: ::: ::: ::: ::: ::: :: ::: :: 113 7.4.2 Results using ring topology :: ::: ::: ::: ::: ::: :: ::: :: 114 7.4.3 Results using backbone topology :: ::: ::: ::: ::: :: ::: :: 115 7.4.4 Traffic and propagation time :: ::: ::: ::: ::: ::: :: ::: :: 117 7.5 Consistency ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 118 7.5.1 Simulation modeling : ::: :: ::: ::: ::: ::: ::: :: ::: :: 118 7.5.2 Results :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 120 7.6 Comparison ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 124 7.6.1 Efficiency ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 124 7.6.2 Implementation effort ::: :: ::: ::: ::: ::: ::: :: ::: :: 125 7.7 Summary :: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 127 8 Multiple membership roles 129 8.1 Limiting write access : ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 129 8.2 Clients ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 131 8.3 Storing a subset of group state ::: :: ::: ::: ::: ::: ::: :: ::: :: 132 8.3.1 Caches :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 133 8.3.2 Slices :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 134 8.3.3 Using slices for resource discovery : ::: ::: ::: ::: :: ::: :: 135 8.4 Location service : ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 136 8.4.1 Existing location services : :: ::: ::: ::: ::: ::: :: ::: :: 139 9 Continuing work 141 9.1 Performance ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 141 9.2 Fault tolerance :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 141 9.3 Reducing space requirements ::: :: ::: ::: ::: ::: ::: :: ::: :: 141 9.4 Hybrid consistency ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 142 9.5 Authentication :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 143 9.6 Location services : ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 143 9.7 Refdbms :: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 144 10 Summary 145 Bibliography 147 vi List of Figures 1.1 Overall system architecture. : ::: :: ::: ::: ::: ::: ::: :: ::: :: 2 1.2 Placing replicas in an internetwork. :: ::: ::: ::: ::: ::: :: ::: :: 5 1.3 Components of a group communication mechanism. ::: ::: ::: :: ::: :: 6 1.4 An example reference. : ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 9 3.1 A framework for constructing a group communication system. ::: :: ::: :: 18 3.2 Structure of a Refdbms principal. : :: ::: ::: ::: ::: ::: :: ::: :: 21 3.3 Structure of a Tattler. :: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 23 5.1 The timestamp data structure. ::: :: ::: ::: ::: ::: ::: :: ::: :: 50 5.2 The timestamp vector data structure. :: ::: ::: ::: ::: ::: :: ::: :: 51 5.3 Data structures used by the TSAE communication protocol. :: ::: :: ::: :: 52 5.4 How the summary vector summarizes the messages in the log. ::: :: ::: :: 53 5.5 Summary and acknowledgment vectors for principals with loosely-synchronized clocks. ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 54 5.6 An example anti-entropy session. : :: ::: ::: ::: ::: ::: :: ::: :: 56 5.7 Originator’s protocol for TSAE with loosely-synchronized clocks. : :: ::: :: 57 5.8 Partner’s protocol for TSAE with loosely-synchronized clocks. ::: :: ::: :: 58 5.9 A function to purge messages from the message log. :::::::::::::::: 66 5.10 The checksum vector data type. :: :: ::: ::: ::: ::: ::: :: ::: :: 71 5.11 Originator’s protocol for TSAE combined with unreliable multicast. :: ::: :: 72 5.12 Summary and acknowledgment data structures for TSAE for unsynchronized clocks. 74 5.13 Function to deliver messages in per-principal FIFO order. ::: ::: :: ::: :: 76 5.14 Function to deliver messages in a total order. ::::::::::::::::::: 77 5.15 Function to deliver messages in a causal order. :: ::: ::: ::: :: ::: :: 78 6.1 The group membership view data structure. : ::: ::: ::: ::: :: ::: :: 87 6.2 Initializing a new group. ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 88 6.3 The join

Weak-Consistency Group Communication and Membership

Interview with Darrell Long RIK FARROWFILE SYSTEMS

Pagoda: a Hybrid Approach to Enable Efﬁcient Real-Time Provenance Based Intrusion Detection in Big Data Environments

Curriculum Vitae

University of California Santa Cruz Data Management For

Predicting File System Actions from Reference Patterns

Caching Files with a Program-Based Last N Successors Model

Conference Organizers Hotel & Registration

2012 Ipccc Program Schedule

A Weak-Consistency Architecture for Distributed Information Services 381 Temporarily Fail and Recover

Curriculum Vitae Ethan L

University of California Santa Cruz

Employment History Visitor History