Rethinking Database High Availability with RDMA Networks
Total Page:16
File Type:pdf, Size:1020Kb
Rethinking Database High Availability with RDMA Networks Erfan Zamanian1, Xiangyao Yu2, Michael Stonebraker2, Tim Kraska2 1 Brown University 2 Massachusetts Institute of Technology [email protected], fyxy, stonebraker, [email protected] ABSTRACT copy propagate to all the backup copies synchronously such that Highly available database systems rely on data replication to tol- any failed primary server can be replaced by a backup server. erate machine failures. Both classes of existing replication algo- The conventional wisdom of distributed system design is that rithms, active-passive and active-active, were designed in a time the network is a severe performance bottleneck. Messaging over when network was the dominant performance bottleneck. In essence, a conventional 10-Gigabit Ethernet within the same data center, these techniques aim to minimize network communication between for example, delivers 2–3 orders of magnitude higher latency and replicas at the cost of incurring more processing redundancy; a lower bandwidth compared to accessing the local main memory of trade-off that suitably fitted the conventional wisdom of distributed a server [3]. Two dominant high availability approaches, active- database design. However, the emergence of next-generation net- passive and active-active, both adopt the optimization goal of min- works with high throughput and low latency calls for revisiting imizing network overhead. these assumptions. With the rise of the next-generation networks, however, conven- In this paper, we first make the case that in modern RDMA- tional high availability protocol designs are not appropriate any- enabled networks, the bottleneck has shifted to CPUs, and there- more, especially in a setting of Local Area Network (LAN). The fore the existing network-optimized replication techniques are no latest remote direct memory access (RDMA) based networks, for longer optimal. We present Active-Memory Replication, a new high example, achieve a bandwidth similar to that of main memory, availability scheme that efficiently leverages RDMA to completely while having only a factor of 10× higher latency. Our investigation eliminate the processing redundancy in replication. Using Active- of both active-passive and active-active schemes demonstrates that Memory, all replicas dedicate their processing power to executing with a modern RDMA network, the performance bottleneck has new transactions, as opposed to performing redundant computa- shifted from the network to CPU’s computation overhead. There- tion. Active-Memory maintains high availability and correctness fore, the conventional network-optimized schemes are not the best in the presence of failures through an efficient RDMA-based undo- fit anymore. This calls for a new protocol design to fully unleash logging scheme. Our evaluation against active-passive and active- the potential of RDMA networks. active schemes shows that Active-Memory is up to a factor of 2 To this end, we propose Active-Memory Replication, a new high faster than the second-best protocol on RDMA-based networks. availability protocol designed specifically for the next-generation RDMA networks in the LAN setting. The optimization goal in PVLDB Reference Format: Active-Memory is to minimize the CPU overhead of performing Erfan Zamanian, Xiangyao Yu, Michael Stonebraker, Tim Kraska. Rethink- data replication rather than minimizing network traffic. The core ing Database High Availability with RDMA Networks. PVLDB, 12(11): idea of Active-Memory is to use the one-sided feature of RDMA 1637-1650, 2019. to directly update the records on remote backup servers without DOI: https://doi.org/10.14778/3342263.3342639 involving the remote CPUs. One key challenge in such design is to achieve fault tolerance when the CPUs on backup servers do not participate in the replication protocol. To address this problem, we 1. INTRODUCTION designed a novel undo-logging based replication protocol where A key requirement of essentially any transactional database sys- all the logic is performed unilaterally by the primary server. Each tem is high availability. A single machine failure should neither transaction goes through two serial phases: (1) undo logging and render the database service unavailable nor should it cause any data in-place updates and (2) log space reclamation, where each update loss. High availability is typically achieved through distributed data is performed by a separate RDMA write. We have proved that the replication, where each database record resides in a primary replica protocol has correct behavior under different failure scenarios. as well as one or multiple backup replicas. Updates to the primary We compared Active-Memory with both active-passive (i.e., log- shipping [26, 17]) and active-active (i.e., H-Store/VoltDB [16, 35] and Calvin [37]) schemes on various workloads and system config- This work is licensed under the Creative Commons Attribution- urations. Evaluation shows that Active-Memory is up to a factor of NonCommercial-NoDerivatives 4.0 International License. To view a copy 2× faster than the second-best baseline protocol that we evaluated of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For over RDMA-based networks. any use beyond those covered by this license, obtain permission by emailing Specifically, the paper makes the following contributions: [email protected]. Copyright is held by the owner/author(s). Publication rights • We revisit the conventional high availability protocols on the licensed to the VLDB Endowment. Proceedings of the VLDB Endowment, Vol. 12, No. 11 next-generation networks and demonstrate that optimizing ISSN 2150-8097. for network is no longer the most appropriate design goal. DOI: https://doi.org/10.14778/3342263.3342639 1637 • We propose Active-Memory, a new replication protocol for Exec T1 Commit Exec T2 (A) Commit RDMA-enabled high bandwidth networks, which is equipped P1 Replay Replay with a novel undo-log based fault tolerance protocol that is logs logs both correct and fast. B1 • We perform extensive evaluation of Active-Memory over con- Exec T2 (B) Commit ventional protocols and show it can perform 2× faster than P2 Replay the second-best protocol that we evaluated. logs The rest of the paper is organized as follows: Section 2 de- B2 scribes the background of the conventional high availability proto- Figure 1: Active-passive replication using log shipping. cols. Section 3 analyzes why conventional wisdom is no longer ap- propriate for modern RDMA-based networks. Section 4 describes 2.1 Active-Passive Replication the Active-Memory replication protocol in detail, and Section 5 Active-passive replication is one of the most commonly used demonstrates that the protocol is fault tolerant. In Section 6, we replication techniques. Each database partition consists of one ac- present the results of our performance evaluation. Section 7 re- tive copy (known as the primary replica) which handles transac- views the related work and Section 8 concludes the paper. tions and makes changes to the data, and one or more backup repli- cas, which keep their copies in sync with the primary replica. When a machine p fails, the system maintains its high availability by pro- moting one of p’s backup nodes as the new primary and fails over to that node. There are many different implementations of active- passive replication both in academic projects [9, 15, 18, 39] and in 2. HIGH AVAILABILITY IN DBMSS commercial databases [4] (such as Postgres Replication [17], Or- Database systems experience failures for different reasons: hard- acle TimesTen [19], and Microsoft SQL Server Always On [26]). ware failures, network communication failures, software bugs, hu- Active-passive schemes are often implemented through log ship- man errors, among others. Highly available database systems en- ping where the primary executes the transaction, then ships its log sure that even in the face of such failures, the system remains oper- to all its backup replicas. The backup replicas replay the log so that ational with close to zero downtime. the new changes are reflected in their copy. High availability is typically achieved through replication: every Figure 1 illustrates how log shipping is used in an active-passive record of the database gets replicated to one or more machines. To replication scheme. Here, we assume that the database is split into survive k machine failures, the system must make sure that for each two partitions (P 1 and P 2), with each partition having a backup transaction, its effects are replicated on at least k+1 machines. This copy (B1 is backup for P 1, and B2 is backup for P 2). In practice, is known as the k-safety rule. For example, for k = 1, each record P 1 and B2 may be co-located on the same machine, while P 2 is stored on two different machines, so that a failure of either of and B1 may reside on a second machine. T 1 (colored in blue) them does not disrupt the continuous operation of the system. is a single-partition transaction which touches data only on P 1. According to the widely-cited taxonomy of Gray el al. [10], repli- Therefore, P 1 executes this transaction, and before committing, it cation protocols can be either eager or lazy (which pertains to when sends the change log to all its backup. Upon receiving all the acks, the updates are propagated to the replicas), and be either primary P 1 can commit. Transactions that span multiple partitions, such copy or update anywhere (which concerns where data-modifying as T 2 (colored in orange), follow the same replication protocol, transactions must be issued). Lazy replication is often used in data except that an agreement protocol such as 2PC is also needed to stores where strong consistency is not crucial, and the possibility of ensure consistency. data loss can be accepted in exchange for possibly better through- Wide adoption of active-passive replication is due to its simplic- put, such as in Amazon Dynamo [7] and Facebook Cassandra [20].