4 Derecho: Fast State Machine Replication for Cloud Services
4 Derecho: Fast State Machine Replication for Cloud Services SAGAR JHA, Cornell University, USA JONATHAN BEHRENS, Cornell University, USA and MIT, USA THEO GKOUNTOUVAS, MATTHEW MILANO, WEIJIA SONG, EDWARD TREMEL, ROBBERT VAN RENESSE, SYDNEY ZINK, and KENNETH P. BIRMAN, Cornell University, USA Cloud computing services often replicate data and may require ways to coordinate distributed actions. Here we present Derecho, a library for such tasks. The API provides interfaces for structuring applications into patterns of subgroups and shards, supports state machine replication within them, and includes mechanisms that assist in restart after failures. Running over 100Gbps RDMA, Derecho can send millions of events per second in each subgroup or shard and throughput peaks at 16GB/s, substantially outperforming prior solu- tions. Configured to run purely on TCP, Derecho is still substantially faster than comparable widely used, highly-tuned, standard tools. The key insight is that on modern hardware (including non-RDMA networks), data-intensive protocols should be built from non-blocking data-flow components. CCS Concepts: • Computer systems organization → Dependable and fault tolerant systems and networks; • Software and its engineering → Cloud computing; Additional Key Words and Phrases: Cloud computing, RDMA, non-volatile memory, replication, consistency ACM Reference format: Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Matthew Milano, Weijia Song, Edward Tremel, Robbert van Renesse, Sydney Zink, and Kenneth P. Birman. 2019. Derecho: Fast State Machine Replication for Cloud Services. ACM Trans. Comput. Syst. 36, 2, Article 4 (March 2019), 49 pages. https://doi.org/10.1145/3302258 1 INTRODUCTION There is a pervasive need for cloud-hosted services that guarantee rapid response based on the most up-to-date information, scale well, and support high event rates.
[Show full text]