Cornus: One-Phase Commit for Cloud Databases with Storage Disaggregation

Cornus: One-Phase Commit for Cloud Databases with Storage Disaggregation Zhihan Guo* Xinyu Zeng* University of Wisconsin-Madison University of Wisconsin-Madison Madison, Wisconsin Madison, Wisconsin [email protected] [email protected] Ziwei Ren Xiangyao Yu University of Wisconsin-Madison University of Wisconsin-Madison Madison, Wisconsin Madison, Wisconsin [email protected] [email protected] ABSTRACT previous works have proposed one-phase commit (1PC) protocols [8, Two-phase commit (2PC) has been widely used in distributed databases 13, 22, 26] removing one phase from the commit procedure. Existing to ensure atomicity for distributed transactions. However, 2PC suf- 1PC protocols, however, make extra assumptions to 2PC [8]. Most fers from two limitations. First, 2PC incurs long latency as it requires of these assumptions are impractical in a production environment, two logging operations on the critical path. Second, when a coordina- which stymies the wide adoption of existing 1PC protocols. To solve tor fails, a participant may be blocked waiting for the coordinator’s the blocking problem, previous works have proposed three-phase decision, leading to indefinitely long latency and low throughput. commit (3PC) protocols [25] such that an uncertain transaction can We make a key observation that modern cloud databases feature learn the decision even if the coordinator crashes. However, as the a storage disaggregation architecture, which allows a transaction’s name suggests, these protocols must pay the overhead of one extra final decision to not rely on the central coordinator. We propose phase to the commit procedure, further exacerbating the latency Cornus, a one-phase commit (1PC) protocol specifically designed problem. No protocol can achieve nonblocking without introducing for this architecture. Cornus can solve the two problems mentioned extra assumptions or communications according to the fundamental above by leveraging the fact that all compute nodes are able to nonblocking theorem [25]. access and modify the log data on any storage node. We present In this project, we made a key insight that the architectural par- Cornus in detail, formally prove its correctness, develop certain adigm shift happening in cloud databases (i.e., storage disaggrega- optimization techniques, and evaluate against 2PC on YCSB and tion [2, 3, 5, 10, 15, 30, 32]) is fundamentally changing the design TPC-C workloads. The results show that Cornus can achieve 1.5× space of atomic commitment protocols. Specifically, disaggregating speedup in latency. the storage from computation allows a database server to directly access all the storage nodes rather than its own storage as in a 1 INTRODUCTION conventional shared-nothing architecture. This insight allows us to design a new 1PC protocol that can achieve low latency and non- Modern database management systems (DBMS) are increasingly blocking altogether, without making further assumptions besides the distributed due to the growing data volume and the diverse demands disaggregation of storage of modern Internet services. To ensure the atomicity of distributed To this end, we propose Cornus, a new non-blocking 1PC protocol transactions, an atomic commitment protocol (ACP) is required for designed for the storage-disaggregation architecture. Cornus solves transactions that access data across distributed machines. Two-phase the latency problem by eliminating the decision logging at each commit (2PC) is so far the most widely used ACP. coordinator. Instead, a transaction relies on the collective logs of all Albeit widely implemented, 2PC has two major problems that the participating nodes for its final decision. This change is made arXiv:2102.10185v1 [cs.DB] 19 Feb 2021 limit its performance. First, 2PC requires two round-trip network possible because all the logs are accessible to all transactions in messages and the associated logging operations. Previous works [8, a disaggregation architecture. If any failure occurs, an uncertain 9, 16, 20, 23, 24, 33] have demonstrated that 2PC can be attributed transaction can rebuild the decision by accessing all the logs. Cornus to the majority of a transaction’s execution time due to the incurred is also non-blocking — If any participant fails to flush to its log, network messages and disk logging, which directly affects the query other uncertain transactions can insert an abort record on behalf of response time that a user experiences. Second, 2PC has a well- the non-responding node. We introduce the LogIfNotExist() function known blocking problem [11, 12, 25]. If a coordinator crashes before to avoid race conditions in corner cases. In summary, this paper notifying participants of the decision, the participants may not know makes the following key contributions: the decision and will be blocked until the coordinator recovers. Meanwhile, uncertain transactions cannot release their locks, which blocks other transactions from making forward progress. • The two problems above have inspired two separate lines of We develop Cornus, a one-phase commit (1PC) protocol de- research seeking solutions. To mitigate the long latency problem, signed for a storage-disaggregation architecture to reduce the latency overhead in 2PC and alleviate the blocking problem at *Both authors contributed equally the same time. Guo, Zeng, Ren and Yu, et al. Coordinator Participant 1 Participant 2 Coordinator Participant 1 Participant 2 Prepare Phase Begin prepare request Prepare Phase Begin prepare request [log] START-2PC [log] VOTE-YES [log] VOTE-YES [log] START-2PC [log] VOTE-YES [log] VOTE-YES Commit Phase Begin vote yes compute node Commit Phase Begin vote yes compute node [log] COMMIT fail back to user storage node storage node commit timeout timeout contact coordinator [log] COMMIT [log] COMMIT Block until timeout timeout Coordinator ack contact coordinator recovers! (a) 2PC with no failure. (b) 2PC with coordinator failure. Figure 1: Illustration of Two-Phase Commit (2PC) — The lifecycle of a committing transaction (a) and a scenario of coordinator failure (b). The compute node and the corresponding storage node are drawn close to each other. • We prove the correctness of Cornus by showing that it can satisfy protocol to contact other nodes to learn the decision; it will repeat all the five properties of an atomic commitment protocol that this process until at least one node replies with the decision. In 2PC can satisfy [11, 12]. case the coordinator takes a long time to recover, the nodes under • We evaluate Cornus in a great variety of settings on both YCSB uncertainty cannot learn the decision, and the associated transactions and TPC-C workloads. Cornus shows an improvement in latency will block. In 2PC, the coordinator’s decision log record serves as of 50% compared to 2PC. the ground truth of the commit/abort decision — the final outcome of the transaction relies on the success of logging this record. 2 BACKGROUND AND MOTIVATION Limitation 1: The Latency of Two Phases Section 2.1 describes two-phase commit (2PC). We discuss two In the standard 2PC protocol, the transaction caller experiences major concerns of the protocol, long latency delay and blocking, and an average latency of one network round-trip and two logging opera- briefly describe existing works addressing each problem. Section 2.2 tions, as shown in Figure 1a. Such a delay directly affects the query discusses storage disaggregation as a trending architecture in cloud response time that an end-user will experience. databases and how it motivates the design of Cornus. Previous works have proposed various one-phase commit (1PC) 2.1 Two-Phase Commit protocols to reduce this latency. Some works combine the voting phase with the execution of the transaction [7, 9, 22, 26, 27] to In a distributed database management system (DDBMS), data reduce one phase, yet they make assumptions that are too strong are partitioned across multiple sites which can be accessed by a to be practical [8]. These protocols assume that serialization and distributed transaction. After the execution, all the sites involved consistency are ensured before an acknowledgment of each oper- must reach a consensus on committing or aborting the transaction to ation is sent from the participant to the coordinator, and no abort ensure the transaction’s atomicity. An atomic commitment protocol due to consistency or serialization is allowed after the successful (ACP) is required to achieve this goal. execution of all operations. Thus, they do not support common con- Two-phase commit (2PC) [23] is a widely used ACP in current currency protocols such as Optimistic Concurrency Control [21] and DDBMSs. It contains a prepare phase and a commit phase. A demon- Timestamp Ordering [12] according to Abdallah et al. [8]. More- stration of the protocol is shown in Figure 1. When no failure hap- over, most protocols either pay extra overhead such as blocking I/O pens and all participating nodes agree to commit the transaction, during execution [26] or violating site autonomy [9, 22, 26, 27]— the protocol behaves as Figure 1a. For each transaction, one node is a property that allows each node to manage its data and recovery pre-designated to be the coordinator and the other nodes involved in independently [8]. the transaction become participants. Limitation 2: The Blocking Problem During the prepare phase, the coordinator of the transaction starts with sending prepare requests (also called vote requests) to all the In 2PC, a participant learns the decision of a transaction either participants and in parallel writing a Start-2PC log record to its directly from the coordinator or indirectly from other participants. In own log file. Upon receiving the prepare request, each participant an unfortunate corner case shown in Figure 1b where the coordinator logs a VOTE-YES record (assuming a committing transaction) to fails before sending any notifications, no participant can make or the corresponding log file and then responds to the coordinator. learn the decision since it is unclear whether the decision has been After receiving all the votes, the coordinator enters the commit made.

Load more