Distributed Logging for Transaction Processing
Total Page:16
File Type:pdf, Size:1020Kb
Distributed Logging for Transaction Processing Dean Spencer Daniels December 1988 CMU-CS-89-114 Submittedto CarnegieMellonUniversityin partialfulfillmentof the requirementsfor thedegreeof Doctorof Philosophy Department of Computer Science Carnegie Mellon University Pittsburgh, Pennsylvania Copyright© 1988 DeanSpencerDaniels This work was supported by IBM and the Defense Advanced Research Projects Agency, ARPA Order No. 4976, monitored by the Air Force Avionics Laboratory under Contract F33615-87-K-C-1499. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any of the sponsoring agencies or of the United States Government. Abstract This dissertation defends the thesis that recovery logs for transaction processing can be efficiently and reliably provided by a highly available network service. Recovery logs are a special implementation of stable storage that transaction processing systems use to record information essential to their operation. Recovery logs must be very reliable and have fast access. Typically, mirrored magnetic disks are dedicated to log storage in high performance transaction systems. Dedicated mirrored disks are an expensive resource for small processors like workstations or nodes in a non-shared memory multiprocessor. However, it is these types of processors that participate in many distributed programs and benefit from the availability of a general purposed distributed transaction facility. Distributed logging promotes reliable distributed computing by addressing the problem of the resources needed by the recovery log for a general purpose distributed transaction processing facility. The distributed logging thesis is defended by discussion of the design, implementation, and evaluation of distributed logging services. The design issues considered include the global representation of distributed logs, communication, security, log server data structures, log space management, and load assignment. A new distributed algorithm for replicating recovery logs and a new data structure for direct access to log records in append-only storage are presented. The dissertation explores the use of uninterruptible power supplies to implement low-latency non- volatile virtual memory disk buffers. The implementation and evaluation of the Distributed Log Facility for the Camelot Distributed Transaction Facility is described. The Camelot DLF uses the new distributed replication algorithm for representing distributed logs and uses uninterruptible power supplies to implement non-volatile virtual memory. Specially designed protocols are used for communication between clients and log servers in the Camelot DLF. The performance of the Camelot DLF is competitive with the Camelot local log implementation for a variety of benchmarks. The throughput capacity of log servers is reported. Acknowledgments This thesis, and my entire graduate career, would not have been possible without support, guidance, encouragement, and friendship from Alfred Spector. I doubt that I will ever participate in collaborations as successful as my eight years (so far) of working with Alfred. My readers, Maurice Herlihy, Bruce Lindsay, and Rick Rashid, are to be commended for promptly producing helpful comments on (often very rough) drafts of this thesis. Bruce's efforts at quality control are particularly appreciated. Dean Thompson collaborated in the development of many of the new ideas presented here and implemented portions of the Camelot Distributed Log Facility. The system described here should really be called Deans' logger. The entire Camelot group, especially Dean, Alfred, Josh Bloch, Dan Duchamp, Jeff Eppinger and Randy Pausch, is to be thanked for providing an exciting family in which to work and play. Thanks to Steve Berman for procuring the UPSs used for the Camelot DLF. My graduate career and life in Pittsburgh was a really great experience because of many fine friends. There are too many to list them all, but some deserve special mention, including: Alan, Mike, Ann, Jumpin, Chris, Archie, Steve, David, Dan, Sherri, Doug, Penn, and of course, Bill. For W.A.B. Table of Contents 1. Introduction 1 1.1. Distributed Logging: Thesis and Motivation 1 1.2. Goals 3 1.3. Outline 4 2. Background 5 2.1. Distributed Systems 5 2.2. What a Log Is 7 2.2.1. Transactions and Transaction Processing Facilities 8 2.2.1.1. The Transaction Concept 8 2.2.1.2. Transaction System Applications 9 2.2.1.3. DistributedTransaction Processing Facilities 13 2.2.2. Log Definition, Use, and Implementation 19 2.2.2.1. A Simple Log Definition 19 2.2.2.2. Log-based Transaction Recovery Algorithms 20 2.2.2.3. Practical Log Interfaces 25 2.2.2.4. Log Implementation Issues 29 3. Design Issues and Alternatives 33 3.1. Representation of Distributed Log Data 34 3.1.1. Mirroring 34 3.1.2. Distributed Replication 35 3.1.2.1. Replicated Time-Ordered Unique Identifiers 36 3.1.2.2. Replicated Log Algorithm 38 3.1.2.3. Formal Proof of Restart Procedure 45 3.1.3. Hybrid Schemes 51 3.1.4. Comparison Criteria 52 3.1.4.1. Reliability 52 3.1.4.2. Availability 55 3.1.4.3. Performance 57 3.2. Client/Server Communication 60 3.2.1. Stream Protocols 61 3.2.2. RPC Protocols 62 3.2.3. The Channel Model and LU 6.2 64 3.2.4. Parallel Communications and Multicast Protocols 66 3.2.5. Comparison Criteria 67 3.2.5.1. Load Models 67 3.2.5.2. Force and Random Read Times 68 3.2.5.3. Streaming Rates 69 3.2.5.4. Resilience 69 3.2.5.5. Complexity and Suitability 70 3.3. Security 70 3.3.1. Alternative Mechanisms 71 3.3.1.1. Authentication Mechanisms 71 3.3.1.2. Physical Security 72 3.3.1.3. End-to-End Encryption 72 3.3.1.4. Encryption-Based Protocols 73 3.3.2. Policies and Comparison Criteria 74 3.3.2,1. Threat Models and Security Requirements 74 3,3.2.2. Cost 75 3.4. Log Representation on Servers 75 3.4.1. Disk Representation Alternatives 76 3.4.1.1. Files 76 3.4.1.2. Partitioned Space 77 3.4.1,3. Single Data Stream 79 3.4.2. Low-Latency Buffer Alternatives 82 3.4.3. Comparison Criteria 84 3.4.3.1. Low-Latency Buffer Costs 84 3.4.3,2. Disk Utilization 85 3.4,3.3. Performance 85 3.5. Log Space Management 86 3.5.1. Mechanisms 87 3.5.1.1. Server Controlled Mechanisms 87 3.5.1.2. Client Controlled Mechanisms 89 3,5.1.3. Compression 92 3.5.2. Policy Alternatives 93 3.5.3. Comparison Criteria 93 3.5.3.1, Use Models 94 3.5.3.2, Costs 94 3.5.3.3. Performance 95 3.6. Load Assignment 95 3.6.1. Mechanisms 96 3.6.1.1. Load Assessment 96 3.6.1.2. Load Assignment Mechanisms 97 3.6.2. Policies 98 3.7. Summary (and Perspective) 99 4. The Design of the Camelot Distributed Log Facility 101 4.1. Camelot Distributed Log Design Decisions 101 4.1.1. Distributed Log Representation 102 4.1.2. Communication 102 4.1.3. Security 103 4.1.4. Server Data Representation 103 4.1.5. Log Space Management 103 4.1.6. Load Assignment 104 4.2. Communication Design 104 4.2.1. Transport Protocol 104 4,2.2. Message Protocols 105 4.2.2.1. Data Message Packing and ReadLog Buffering 105 4.2.2.2. RPC Subprotocols 106 4.2.2.3. WriteLog Subprotocol 108 4.2.2.4. CopyLog Subprotocol 110 4.3. Log Client Structure 111 III 4.3.1. Camelot Architecture 111 4.3.2. Camelot's Local and Network Loggers 113 4.3.2.1. The Camelot Log Interface 113 4.3.2.2. The Local Logger 116 4.3.2.3. The Network Logger 117 4.4. Log Server Design 118 4.4.1. Log Server Threads 118 4.4.2. Log Server Data Structures 119 4.4.2.1. Main Memory Structures 119 4.4.2.2. Disk Data Structures 120 4.4.3. Uninterruptible Power Supply Operations 123 5. Performance of the Camelot Distributed Logging Facility 125 5.1. Methodology 126 5,1.1. LatencyExperiments 126 5.1.1.1. CPA Tests 126 5.1.1.2. Debit/Credit Tests 128 5.1.1.3. Debit/Credit Latency Breakdown Tests 128 5.1.2. Throughput Experiments 129 5.2. Experiments 129 5.2.1. Experimental Environment 130 5.2.2. Latency Experimerits 130 5.2.2.1. CPA Tests 131 5.2.2.2. Debit/Credit Tests 132 5.2.2.3. Debit/Credit Latency Breakdown Test 134 5.2.3. Throughput Experiments 135 5.3. Results and Discussion 135 5.3.1. Latency Experiments 135 5.3.1.1. CPA Tests 136 5.3.1.2. Debit/Credit Tests 137 5.3.1.3. Debit/Credit Latency Breakdown Test 138 5.3.2. Throughput Experiments 140 5.3.3. Performance Summary 142 6. Conclusions 143 6.1. Evaluations 143 6.1.1. The Camelot Distributed Logging Facility 143 6.1.2. Future Distributed Logging Systems 145 6.1.3. High Performance Network Services 145 6.2. Future Research 146 6.2.1. Camelot DLF Enhancements 146 6.2.2. Future Distributed Log Servers 147 6.3. Summary 147 iv List of Figures Figure 2-1 : A Simple Log Interface 19 Figure 2-2: Practical Log Interface 26 Figure 2-3: Figure 2-2 Continued 27 Figure 3-1: Unique Identifier Generator State Representative Interface 37 Figure 3-2: Program for OrderedId Newld 38 Figure 3-3: Three log sewers with LSN 10 Partially Replicated 39 Figure 3-4: Figure 3-3 after Restart with Sewers 1 and 2 41 Figure 3-5: Directory for Distributed Log in Figure 3-4 (merges interval lists 42 from Sewer 1 and 2) Figure 3-6: Type Definitions for Distributed Log Replication 42 Figure 3-7: Log Sewer Interface for Distributed Log Replication 43 Figure 3-8: Global Variables and Open Procedure for Distributed Log 44 Replication Figure 3-9: Figure 3-8 Continued 45 Figure 3-10: Read Procedure for Distributed Log Replication 46 Figure 3-11: Write Procedure for Distributed Log Replication 47 Figure 3-12: Markov Reliability Model for 2 Copy Logs 53 Figure 3-13: Write Availability of Different