Deterministic, Mutable, and Distributed Record-Replay for Operating Systems and Database Systems
Total Page:16
File Type:pdf, Size:1020Kb
Deterministic, Mutable, and Distributed Record-Replay for Operating Systems and Database Systems Nicolas Viennot Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2017 c 2017 Nicolas Viennot All Rights Reserved ABSTRACT Deterministic, Mutable, and Distributed Record-Replay for Operating Systems and Database Systems Nicolas Viennot Application record and replay is the ability to record application execution and replay it at a later time. Record-replay has many use cases including diagnosing and debugging applications by capturing and re- producing hard to find bugs, providing transparent application fault tolerance by maintaining a live replica of a running program, and offline instrumentation that would be too costly to run in a production environ- ment. Different record-replay systems may offer different levels of replay faithfulness, the strongest level being deterministic replay which guarantees an identical reenactment of the original execution. Such a guarantee requires capturing all sources of nondeterminism during the recording phase. In the general case, such record-replay systems can dramatically hinder application performance, rendering them unpractical in certain application domains. Furthermore, various use cases are incompatible with strictly replaying the original execution. For example, in a primary-secondary database scenario, the secondary database would be unable to serve additional traffic while being replicated. No record-replay system fit all use cases. This dissertation shows how to make deterministic record-replay fast and efficient, how broadening replay semantics can enable powerful new use cases, and how choosing the right level of abstraction for record-replay can support distributed and heterogeneous database replication with little effort. We explore four record-replay systems with different semantics enabling different use cases. We first present SCRIBE, an OS-level deterministic record-replay mechanism that support multi-process applications on multi-core systems. One of the main challenge is to record the interaction of threads running on different CPU cores in an efficient manner. SCRIBE introduces two new lightweight OS mechanisms, rendezvous point and sync points, to efficiently record nondeterministic interactions such as related system calls, signals, and shared memory accesses. SCRIBE allows the capture and replication of hard to find bugs to facilitate debugging and serves as a solid foundation for our two following systems. We then present RACEPRO, a process race detection system to improve software correctness. Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. Detecting process races is difficult due to the elusive nature of these bugs, and the het- erogeneity of frameworks involved in such bugs. RACEPRO is the first tool to detect such process races. RACEPRO records application executions in deployed systems, allowing offline race detection by analyzing the previously recorded log. RACEPRO then replays the application execution and forces the manifesta- tion of detected races to check their effect on the application. Upon failure, RACEPRO reports potentially harmful races to developers. Third, we present DORA, a mutable record-replay system which allows a recorded execution of an appli- cation to be replayed with a modified version of the application. Mutable record-replay provides a number of benefits for reproducing, diagnosing, and fixing software bugs. Given a recording and a modified appli- cation, finding a mutable replay is challenging, and undecidable in the general case. Despite the difficulty of the problem, we show a very simple but effective algorithm to search for suitable replays. Lastly, we present SYNAPSE, a heterogeneous database replication system designed for Web applica- tions. Web applications are increasingly built using a service-oriented architecture that integrates services powered by a variety of databases. Often, the same data, needed by multiple services, must be replicated across different databases and kept in sync. Unfortunately, these databases use vendor specific data repli- cation engines which are not compatible with each other. To solve this challenge, SYNAPSE operates at the application level to access a unified data representation through object relational mappers. Additionally, SYNAPSE leverages application semantics to replicate data with good consistency semantics using mecha- nisms similar to SCRIBE. Table of Contents List of Figures v List of Tables ix 1 Introduction 1 2 SCRIBE: Transparent Deterministic Record-Replay 11 2.1 Introduction . 11 2.2 Architecture Overview . 14 2.3 System Calls . 17 2.3.1 Record . 17 2.3.2 Replay . 18 2.3.3 Go Live . 19 2.4 Shared Memory . 21 2.5 Rendezvous Points . 23 2.6 Sync Points . 26 2.6.1 Signal Delivery . 28 2.6.2 Page Ownership Transfer . 29 2.6.3 Signature Record and Replay . 30 2.7 Performance Evaluation . 33 2.8 Related Work . 39 2.9 Summary . 42 i 3 RACEPRO: Detection of Process Races in Deployed Systems 43 3.1 Introduction . 43 3.2 Process Race Study . 46 3.2.1 Findings . 47 3.2.2 Process Race Examples . 49 3.3 Architecture Overview . 52 3.4 Recording Executions . 52 3.5 Detecting Process Races . 54 3.5.1 The Happens-Before Graph . 55 3.5.2 Modeling Effects of System Calls . 57 3.5.3 Race Detection Algorithms . 60 3.6 Validating Races . 64 3.6.1 Creating Execution Branches . 64 3.6.2 Replaying Execution Branches and Going Live . 67 3.6.3 Checking Execution Branches . 70 3.7 Experimental Results . 72 3.7.1 Bugs Found . 73 3.7.2 Bug Statistics . 74 3.7.3 Performance Overhead . 75 3.8 Differences from SCRIBE .................................... 77 3.9 Related Work . 79 3.10 Summary . 81 4 DORA: Transparent Mutable Record-Replay 82 4.1 Introduction . 82 4.2 Mutable Replay Concept . 85 4.3 Recorder . 87 4.4 Replayer . 89 4.4.1 Additions . 90 4.4.2 Deletions . 92 4.4.3 Going Live . 93 ii 4.5 Explorer . 93 4.6 Properties . 97 4.7 Limitations . 98 4.8 Evaluation . 99 4.8.1 Debugging and Diagnosis Techniques . 101 4.8.2 Patch Validation . 105 4.8.3 Release Upgrades . 106 4.8.4 Performance . 107 4.9 Related Work . 108 4.10 Comparison with SCRIBE and RACEPRO ........................... 110 4.11 Summary . 111 5 SYNAPSE: Distributed Record-Replay for Database Systems 112 5.1 Introduction . 112 5.2 Background . 114 5.3 SYNAPSE API.......................................... 116 5.3.1 SYNAPSE Abstractions . 117 5.3.2 Comparison with SCRIBE ............................... 120 5.3.3 SYNAPSE Delivery Semantics . 122 5.3.4 SYNAPSE Programming by Example . 124 5.4 SYNAPSE Architecture . 127 5.4.1 Model-Driven Replication . 130 5.4.2 Enforcing Delivery Semantics . 131 5.4.3 Live Schema Migrations . 136 5.4.4 Bootstrapping and Reliability . 136 5.4.5 Testing Framework . 137 5.4.6 Supporting New DBs and ORMs . 138 5.5 Applications . 138 5.5.1 SYNAPSE at Crowdtap . 138 5.5.2 Integrating Open-Source Apps with SYNAPSE .................... 140 5.5.3 Splitting a monolithic app into services with SYNAPSE ................ 141 iii 5.6 Evaluation . 142 5.6.1 Sample Executions . 142 5.6.2 Application Overheads (Q1) . 143 5.6.3 Scalability (Q2) . ..