Rethinking Database High Availability with RDMA Networks Erfan Zamanian1, Xiangyao Yu2, Michael Stonebraker2, Tim Kraska2 1 Brown University 2 Massachusetts Institute of Technology
[email protected], fyxy, stonebraker,
[email protected] ABSTRACT copy propagate to all the backup copies synchronously such that Highly available database systems rely on data replication to tol- any failed primary server can be replaced by a backup server. erate machine failures. Both classes of existing replication algo- The conventional wisdom of distributed system design is that rithms, active-passive and active-active, were designed in a time the network is a severe performance bottleneck. Messaging over when network was the dominant performance bottleneck. In essence, a conventional 10-Gigabit Ethernet within the same data center, these techniques aim to minimize network communication between for example, delivers 2–3 orders of magnitude higher latency and replicas at the cost of incurring more processing redundancy; a lower bandwidth compared to accessing the local main memory of trade-off that suitably fitted the conventional wisdom of distributed a server [3]. Two dominant high availability approaches, active- database design. However, the emergence of next-generation net- passive and active-active, both adopt the optimization goal of min- works with high throughput and low latency calls for revisiting imizing network overhead. these assumptions. With the rise of the next-generation networks, however, conven- In this paper, we first make the case that in modern RDMA- tional high availability protocol designs are not appropriate any- enabled networks, the bottleneck has shifted to CPUs, and there- more, especially in a setting of Local Area Network (LAN).