Scale Out and Conquer: Architectural Decisions Behind Distributed In-Memory Systems

Denis Magda Apache Ignite PMC Chair GridGain Product Management

© 2018 GridGain Systems, Inc. GridGain Company Confidential Agenda

• Partitioning – Pitfalls of Even Distribution

• Affinity Collocation – Laid Out in Numbers

• Multi-threading of Distributed Systems – Things to Know

© 2018 GridGain Systems, Inc. GridGain Company Confidential Memory-centric distributed , caching, and processing platform

Memory-centric platform based on Apache Ignite adds enterprise features and support for mission critical deployments

© 2018 GridGain Systems, Inc. GridGain Company Confidential Among Top 5 Apache Projects Total number of Apache Projects: 318 Top-Level Apache Projects: 193

Top 5 Apache Projects by Commits Top 10 most active Apache mailing lists 1. Flex 1. Hadoop 2. Lucene 2. Ambari 3. Ignite 3. Lucene-Solr 4. Kafka 4. Camel 5. Geode 5. Ignite 6. Flink 7. Tomcat 8. Cassandra 9. Beam 10. Sentry © 2018 GridGain Systems, Inc. GridGain Company Confidential Partitioning – Pitfalls of Even Distribution

© 2018 GridGain Systems, Inc. GridGain Company Confidential Where request goes?

© 2018 GridGain Systems, Inc. GridGain Company Confidential Affinity Function

Server Node

Key Partition

ON-DISK

© 2018 GridGain Systems, Inc. GridGain Company Confidential Affinity Function

© 2018 GridGain Systems, Inc. GridGain Company Confidential Naïve Affinity

© 2018 GridGain Systems, Inc. GridGain Company Confidential Naïve Function: New Node

© 2018 GridGain Systems, Inc. GridGain Company Confidential Naïve Function: What’s Wrong?

© 2018 GridGain Systems, Inc. GridGain Company Confidential Naïve Affinity: Redundant Partitions Re-shuffling!

© 2018 GridGain Systems, Inc. GridGain Company Confidential Productized Affinity Functions

• Consistent hashing [1] • Rendezvous hashing (HRW) [2]

[1] https://en.wikipedia.org/wiki/Consistent_hashing [2] https://en.wikipedia.org/wiki/Rendezvous_hashing

© 2018 GridGain Systems, Inc. GridGain Company Confidential Rendezvous Affinity: Weight Calculation

WEIGHT(partition, node)

© 2018 GridGain Systems, Inc. GridGain Company Confidential Rendezvous Affinity: Weight Calculation

WEIGHT(partition, node)

© 2018 GridGain Systems, Inc. GridGain Company Confidential Rendezvous Affinity: Weight Calculation

WEIGHT(partition, node)

© 2018 GridGain Systems, Inc. GridGain Company Confidential Rendezvous Affinity: Two Nodes to Three

© 2018 GridGain Systems, Inc. GridGain Company Confidential Rendezvous Affinity: Distribute Evenly at Best

© 2018 GridGain Systems, Inc. GridGain Company Confidential Rendezvous Affinity: Possible Distribution

© 2018 GridGain Systems, Inc. GridGain Company Confidential Affinity Collocation – Laid Out in Numbers

© 2018 GridGain Systems, Inc. GridGain Company Confidential Running Transaction: No Collocation

1: class Account { 2: long id; 3: long city; 4: }

© 2018 GridGain Systems, Inc. GridGain Company Confidential Running Transaction: No Collocation

© 2018 GridGain Systems, Inc. GridGain Company Confidential Non Collocated Data: Number of Operations

2 (2 nodes)

© 2018 GridGain Systems, Inc. GridGain Company Confidential Non Collocated Data: Number of Operations

2 (2 nodes) 2 (primary + backup)

© 2018 GridGain Systems, Inc. GridGain Company Confidential Non Collocated Data: Number of Operations

2 (2 nodes) 2 (primary + backup) 2 (two-phase commit)

© 2018 GridGain Systems, Inc. GridGain Company Confidential Non Collocated Data: Number of Operations

2 (2 nodes) 2 (primary + backup) 2 (two-phase commit) 2 (request-response) -- 16 network operations

© 2018 GridGain Systems, Inc. GridGain Company Confidential Network Operations!

© 2018 GridGain Systems, Inc. GridGain Company Confidential Running Transaction: Collocated Data

1: class Account{ 2: long id; 3: 4: @AffinityKeyMapped 5: long city; 6: }

© 2018 GridGain Systems, Inc. GridGain Company Confidential Running Transaction: Collocated Data!

© 2018 GridGain Systems, Inc. GridGain Company Confidential Collocated Data: Number of Operations

1 (1 node)

© 2018 GridGain Systems, Inc. GridGain Company Confidential Collocated Data: Number of Operations

1 (1 node) 2 (primary + backup)

© 2018 GridGain Systems, Inc. GridGain Company Confidential Collocated Data: Number of Operations

1 (1 node) 2 (primary + backup) 1 (one-phase commit)

© 2018 GridGain Systems, Inc. GridGain Company Confidential Collocated Data: Number of Operations

1 (1 node) 2 (primary + backup) 1 (one-phase commit) 2 (request-response) -- 4 network operations

© 2018 GridGain Systems, Inc. GridGain Company Confidential Let’s Run an SQL Query – No Collocation

© 2018 GridGain Systems, Inc. GridGain Company Confidential Executing SQL: Full Scan

• 1/3x latency • 3x throughput

© 2018 GridGain Systems, Inc. GridGain Company Confidential What About Indexes?

© 2018 GridGain Systems, Inc. GridGain Company Confidential Index Search Complexity log 1_000_000 ≈ 20 vs. log 333_333 ≈ 18 log 333_333 ≈ 18 log 333_333 ≈ 18

© 2018 GridGain Systems, Inc. GridGain Company Confidential Executing SQL: Indexes Search

• ~same latency • ~same throughput

© 2018 GridGain Systems, Inc. GridGain Company Confidential Let’s Collocate and Run SQL Again

© 2018 GridGain Systems, Inc. GridGain Company Confidential Executing SQL: Indexed Search and Collocation

• same latency • 3x throughput

© 2018 GridGain Systems, Inc. GridGain Company Confidential Multi-threading of Distributed Systems – Things to Know

© 2018 GridGain Systems, Inc. GridGain Company Confidential How Requests/Tasks/Queries Processed?

© 2018 GridGain Systems, Inc. GridGain Company Confidential Requests Processing: Bottleneck!

© 2018 GridGain Systems, Inc. GridGain Company Confidential Making Situation Worse

© 2018 GridGain Systems, Inc. GridGain Company Confidential Cluster Node Chokes!

© 2018 GridGain Systems, Inc. GridGain Company Confidential Tread-Per-Partition and Workers

© 2018 GridGain Systems, Inc. GridGain Company Confidential Thread-Per-Partition and I/O Queues

© 2018 GridGain Systems, Inc. GridGain Company Confidential Thread-Per-Partition: Not a Magic Bullet

© 2018 GridGain Systems, Inc. GridGain Company Confidential Thread-Per-Partition: Not a Magic Bullet

© 2018 GridGain Systems, Inc. GridGain Company Confidential Upshot

© 2018 GridGain Systems, Inc. GridGain Company Confidential Upshot

• Partitioning – it’s not about even distribution only

• Affinity Collocation – a golden concept of distributed systems

• Business Data Model should be adopted/reconsidered

• Thread-per-partition – speeds up most but not all usage scenarios

© 2018 GridGain Systems, Inc. GridGain Company Confidential These Days in New York

Docker NYC Meetup, Tomorrow, 6:00 PM • Location: PeerSpace, 1740 Broadway, 15th Floor • Distributed Database DevOps Dilemmas? to the rescue

https://ignite.apache.org/events.html

© 2018 GridGain Systems, Inc. GridGain Company Confidential Any Questions?

#apacheignite Thank you for joining us. Follow the conversation. #gridgain https://ignite.apache.org #TheASF #denismagda

https://www.meetup.com/NYC-In-Memory-Computing-Meetup/

© 2018 GridGain Systems, Inc. GridGain Company Confidential