Scalable but Wasteful: Current State of Replication in the Cloud

Scalable but Wasteful: Current State of Replication in the Cloud

Scalable but Wasteful: Current State of Replication in the Cloud Venkata Swaroop Matte Aleksey Charapko Abutalib Aghayev The Pennsylvania State University University of New Hampshire The Pennsylvania State University Strongly Consistent Replication - Used in Cloud datastores and Configuration management - Rely on Consensus protocols ( replication protocols ) - Achieve High-throughput 2 How to optimize for Throughput? Leader Multi-Paxos 3 One way to optimize: Shift the load Leader Multi-Paxos EPaxos - Many protocols shift work from the bottleneck to the under-utilized node - Examples: EPaxos, SDPaxos and PigPaxos 4 Resource utilization of replication protocols Core 1 Core 1 Core 1 Core 2 Core 2 Core 2 3 nodes with 2 cores each 5 Resource utilization of replication protocols Core 1 Core 1 Core 1 Utilized core Core 2 Core 2 Core 2 Idle core Leader Followers Multi-Paxos 6 Resource utilization of replication protocols Core 1 Core 1 Core 1 Core 1 Core 1 Core 1 Core 2 Core 2 Core 2 Core 2 Core 2 Core 2 Multi-Paxos EPaxos EPaxos also utilizes the idle cores to achieve high throughput 7 Confirming performance gains • Single Instance • 5 AWS EC2 m5a.large nodes • Each 2 vCPU, 8GB RAM • 50% write workload Throughput of Multi-Paxos and EPaxos EPaxos achieves 20% higher throughput compared to Multi-Paxos 8 Missing piece: Resource efficiency EPaxos Multi-Paxos 500% Utilization 200% Utilization 18 kops/s 14 kops/s Multi-Paxos shows better resource efficiency compared9 to EPaxos Metric to analyze Resource efficiency Throughput-per-unit-of-constraining-resource-utilization - Used CPU utilization to identify resource efficiency - This metric determines the added cost of removing bottleneck 10 Throughput-per-unit-of-aggregate-CPU-Utilization Metric shows the resource efficiency of replication protocols 11 Relevance of resource efficiency in Cloud - Important in a pay-as-you-go utility model like Cloud - Replication protocols are optimized for dedicated VMs - Whereas Cloud is sharded and resource packed - Spanner, CockroachDB, and YugabyteDB support many instances from different shards on the same physical machine 12 Example: Packing in a resource constrained setting Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 5 nodes with 6 cores each Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 13 Example: Packing in a resource constrained setting Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Instance 1 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 14 Example: Packing in a resource constrained setting Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Instance 1 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Instance 2 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 15 Example: Packing in a resource constrained setting Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Instance 1 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Instance 2 Instance 3 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 16 Example: Packing in a resource constrained setting Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Instance 1 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Instance 2 Instance 3 Core 1 Core 3 Core 2 Core 4 Core 5 Core 6 Instance 4 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 17 Example: Packing in a resource constrained setting Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Instance 1 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Instance 2 Instance 3 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Instance 4 Instance 5 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 18 Experiment: Packing 5 instances in Cloud • 5 Instance of Multi-Paxos/EPaxos • 5 AWS EC2 m5a.2xlarge nodes • Each 8 vCPU, 32GB RAM • 50% write workload 19 Aggregate Throughput Aggregate throughput of Multi-Paxos and EPaxos with 5 instances packed together 20 Why throughput-per-unit-of-constraining-resource-utilization? It is a good proxy for the performance of replication protocols in Cloud setting dedicated resource setting shared resource setting 21 Conclusion: Scalable but Wasteful dedicated resource setting Fixed-budget shared resource setting Resource efficiency plays a key role for replication protocols when moving from a dedicated to shared resource setting 22.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    22 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us