Dissecting the Performance of Strongly-Consistent Replication Protocols

Dissecting the Performance of Strongly-Consistent Replication Protocols Ailidani Ailijiang∗ Aleksey Charapko† Murat Demirbas† Microsoft University at Buffalo, SUNY University at Buffalo, SUNY [email protected] [email protected] [email protected] ABSTRACT 1 INTRODUCTION Many distributed databases employ consensus protocols to Coordination services and protocols play a key role in mod- ensure that data is replicated in a strongly-consistent man- ern distributed systems and databases. Many distributed ner on multiple machines despite failures and concurrency. databases and datastores [4, 7–10, 12, 13, 16, 18, 23, 24, 31, 40] Unfortunately, these protocols show widely varying per- use consensus to ensure that data is replicated in a strongly- formance under different network, workload, and deploy- consistent manner on multiple machines despite failures and ment conditions, and no previous study offers a comprehen- concurrency. sive dissection and comparison of their performance. To fill Fault-tolerant distributed consensus problem is addressed this gap, we study single-leader, multi-leader, hierarchical by the Paxos [25] protocol and its numerous variations and multi-leader, and leaderless (opportunistic leader) consen- extensions [1, 19–21, 26, 30, 33–35]. The performance of sus protocols, and present a comprehensive evaluation of these protocols become important for the overall perfor- their performance in local area networks (LANs) and wide mance of the distributed databases. These protocols show area networks (WANs). We take a two-pronged systematic widely varying performance under different conditions: net- approach. We present an analytic modeling of the protocols work (latency and bandwidth), workload (command interfer- using queuing theory and show simulations under varying ence and locality), deployment size and topology (LAN/WAN, controlled parameters. To cross-validate the analytic model, quorum sizes), and failures (leader and replica crash and re- we also present empirical results from our prototyping and covery). Unfortunately, there has been no study that offers a evaluation framework, Paxi. We distill our findings to simple comprehensive comparison across consensus protocols, and throughput and latency formulas over the most significant that dissects and explains their performance. parameters. These formulas enable the developers to decide which category of protocols would be most suitable under given deployment conditions. 1.1 Contributions ACM Reference Format: We present a comprehensive evaluation of consensus proto- Ailidani Ailijiang, Aleksey Charapko, and Murat Demirbas. 2019. cols in local area networks (LANs) and wide area networks Dissecting the Performance of Strongly-Consistent Replication Pro- (WANs) and investigate many single-leader, multi-leader, hi- tocols. In 2019 International Conference on Management of Data (SIG- erarchical multi-leader and leaderless (opportunistic leader) MOD ’19), June 30-July 5, 2019, Amsterdam, Netherlands. ACM, New consensus protocols. We take a two-pronged systematic ap- York, NY, USA, 15 pages. https://doi.org/10.1145/3299869.3319893 proach and study the performance of these protocols both analytically and empirically. ∗Work completed at University at Buffalo, SUNY. †Also with Microsoft, Redmond, WA. For the analytic part, we devise a queuing theory based model to study the protocols controlling for workload and Permission to make digital or hard copies of all or part of this work for deployment characteristics and present high-fidelity sim- personal or classroom use is granted without fee provided that copies are not ulations of the protocols. Our model captures parameters made or distributed for profit or commercial advantage and that copies bear impacting throughput, such as inter-node latencies, node this notice and the full citation on the first page. Copyrights for components processing speed, network bandwidth, and workload char- of this work owned by others than ACM must be honored. Abstracting with acteristics. We made the Python implementations of our credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request analytical models available as opensource. permissions from [email protected]. For our empirical study, we developed Paxi, a prototyp- SIGMOD ’19, June 30-July 5, 2019, Amsterdam, Netherlands ing and evaluation framework for consensus and replication © 2019 Association for Computing Machinery. protocols. Paxi provides a leveled playground for protocol ACM ISBN 978-1-4503-5643-5/19/06...$15.00 evaluation and comparison. The protocols are implemented https://doi.org/10.1145/3299869.3319893 using common building blocks for networking, message han- Phase 1 Phase 2 dling, quorums, etc., and the developer only needs to fill Ack Ack in two modules for describing the distributed coordination protocol. Paxi includes benchmarking support to evaluate Propose Accept the protocols in terms of their performance, availability, scal- Quorum ability, and consistency. We implemented Paxi in Go [17] Reject Quorum programming language and made it available as opensource at https://github.com/ailidani/paxi. Reject The analytical model and the Paxi experimental framework are complementary. The Paxi experiments cross-validate Figure 1: State transitions for two-phase coordination the analytical model. And the analytical model allows ex- ploring varying deployment conditions that are difficult to increasing the number of leaders (which helps for throughput arrange and control using the experimental framework. and latency) may cause an increase on conflicts (which hurts 1.2 Results throughput and latency). EPaxos [30] protocol suffers from this problem. Multi-leader protocols that learn and adapt to Armed with both the simulation results from the analyti- locality, such as WPaxos [1] and WanKeeper [2], are less cal model and the experimental results obtained from the susceptible to this problem. Paxi platform implementations, we distill the performance Finally, the deployment parameters, distance to the leader results into simple throughput and latency formulas and and distance from leader to the quorum number of nodes, present these in Section 6. These formulas provide a simple also have a big effect on the latency in WAN deployments. unified theory of strongly-consistent replication in terms of Note that these deployment parameters shadow the protocol throughput —Formula 3: L/(1 + cº¹Q + L − 2º— and latency parameters, the number or leaders and the quorum size. In —Formula 7:¹1+cº∗¹¹1−lº∗¹DL +DQ º+l ∗DQ º. As such, these WANs, other factors also affect latency. The asymmetric formulas enable developers to perform back-of-the-envelope distances between datacenters, the access pattern locality, performance forecasting. In Section 6 we discuss these re- and unbalanced quorum distances complicate forecasting sults in detail and provide a flowchart to serve as a guideline the performance WAN deployments. to identify which consensus protocols would be suitable for a given deployment environment. Here we highlight some 1.3 Outline of the rest of the paper significant corollaries from these formulas. In Section 2 we briefly introduce the protocols we study. L number of leaders Protocol parameters We discuss our analytical model in Section 3 and our proto- Q quorum size typing/evaluation framework in Section 4. We present the c conflict probability Workload parameters evaluation in Section 5, discussion of the findings in Section 6, l locality and conclude the paper in Section 7. D latency to leader Deployment parameters L DQ latency to quorum 2 PROTOCOLS Considering protocol parameters, an effective protocol- Many coordination and replication protocols share a similar level revision for improving throughput and latency is to state transition pattern as shown in Figure 1. These protocols increase the number of leaders in the protocol, while trying typically operate in two phases. In the phase-1, some node to avoid an increase on the number of conflicts. Increasing establishes itself as a leader by announcing itself to other the number of leaders is also good for availability: In Paxos, nodes and gaining common consent. During this stage, an failure of the single leader leads to unavailability until a new incumbent leader also acquires information related to any leader is elected, but in multi-leader protocols most requests prior unfinished commands in order to recover them in the do not experience any disruption in availability, as the failed next phase. The phase-2 is responsible for replicating the leader is not in their critical path. Another protocol revision state/commands from the leader to the nodes. that helps to improve throughput and latency is to reduce Q, Leveraging this two phase pattern, we give brief descrip- the quorum size, provided that fault-tolerance requirements tions of the protocols in our study below. These protocols are still met. provide strong consistency guarantees for data replication As workload parameters are concerned, reducing conflict in distributed databases. probability and increasing locality (in the presence of mul- Paxos. The Paxos consensus protocol [25] is typically em- tiple leaders) are beneficial. However, there is an interplay ployed for realizing a fault-tolerant replicated state machine between the number of leaders and probability of conflicts: (RSM), where each node executes the same commands in the self-appoint

Load more