Geo-Distributed Databases: Engineering Around the Physics of Latency
1 © 2021 All Rights Reserved Who we are
Taylor Mull Suda Srinivasan
● Senior Data Engineer ● VP of Solutions ● DataStax, Charter Comms ● ~15 years in tech - many hats ● Nutanix, Deloitte, Microsoft, bunch of startups
© 2021 All Rights Reserved Cloud native relational database for cloud native applications
SQL PostgreSQL Resilience and Compatibility High Availability
Transactional distributed SQL database Horizontal Geographic built for resilience and scale. Scalability Distribution 100% open source. Runs in any cloud.
ACID Security Transactions
© 2021 All Rights Reserved 3 What is a geo-distributed database?
Data centers Availability zones A single database that is spread across Regions two or more geographically distinct locations, and runs without experiencing performance delays in
executing transactions But Physics!
© 2021 All Rights Reserved The physics of wire latency
Speed of light
~60 ms +
1-2 ms Transmission media Packet size ~150 ms Packet loss Signal strength Propagation delays ...
© 2021 All Rights Reserved Latency in the I/O path
Keep your data close to usage and compute close to data
© 2021 All Rights Reserved Why deploy geo-distributed databases?
Resilience Performance Compliance
● Datacenters, cloud AZs, even ● Customers and users are located ● Data residency laws require data regions can fail around the world about a nation's citizens or ● Applications and databases ● Moving data close to usage and residents to be collected, should be resilient and available compute close to data lowers processed, and/or stored inside through failures latency the country
© 2021 All Rights Reserved Core concepts
0. YugabyteDB architecture 1. Synchronous replication within a YugabyteDB cluster 2. Follower reads 3. xCluster asynchronous replication - unidirectional and bidirectional 4. Read replicas 5. Geo-partitioning
© 2021 All Rights Reserved Core concept 0: YugabyteDB architecture
App
Node Node Node ● Nodes across DCs, zones, and regions ● User tables sharded into tablets (group of rows) ● Tablets (per-table, across tables) evenly distributed across nodes ● Sharding and distribution are transparent to the user
© 2021 All Rights Reserved Designing the perfect distributed SQL DB
Aurora much more popular than Spanner
Yugabyte Query Layer
YCQL YSQL Amazon Aurora Google Spanner DocDB Distributed Document Store A highly available MySQL The first horizontally scalable, Distributed Sharding & Load Raft Consensus and PostgreSQL-compatible strongly consistent, relational Transaction Manager Balancing Replication & MVCC relational database service database service
Document Storage Layer Not scalable but HA Scalable and HA
Custom RocksDB Storage Engine All RDBMS features Missing RDBMS features
PostgreSQL & MySQL New SQL syntax bit.ly/distributed-sql-deconstructed
© 2021 All Rights Reserved Core concept 1: Synchronous replication by default
● Each tablet is replicated App ● YugabyteDB uses Raft consensus protocol for leader election and replication ● Writes are replicated to all the tablets peers; they need to be acknowledged by a majority of the peers before the write succeeds ● Reads and writes are served by the tablet Node Node Node leader (by default) ● Sync replication offers: ○ Consistency ○ Resilience ● Sync replication costs: ○ Latency RF-3 tablet 1’
© 2021 All Rights Reserved Geo-distribution with sync replication
© 2021 All Rights Reserved Enabling business outcomes: Top 5 global retailer
An American multinational retail corporation that operates a chain of hypermarkets, department stores, and grocery stores in countries around the world
WHY YUGABYTE SOLUTION AND BENEFITS
● Linear scale with product growth ● SOR for product catalog of +100 million items with ● Open source billions of mappings, serving over 100K qps ● Cloud-agnostic, geo-distributed ● Enhanced product agility ● Multi-row ACID transactions ● Handled Black Friday and Cyber Monday peaks ● Alternate key lookups ● Service remained resilient and available through TX ● Better performance and resiliency than Azure cloud outage CosmosDB, Azure Cloud SQL, and other databases ● $10M in lost revenue recovered
© 2021 All Rights Reserved Multi-region deployment for resilience: Top 5 retailer
Deployment: 27 Azure nodes across 3 regions - US-East, US-West Seattle, and US-South Texas
US-East Cores: 16
Memory: 128 GB
Disk: 2 x 1024 GB premium P40 disks per node
US-West- US-South-Texas OS: CentOS 7.8 Seattle
Preferred leaders in US-South Central region
Service remained resilient and available through the Texas cloud power outage
© 2021 All Rights Reserved Core concept 2: Follower reads trade off freshness for latency
Leader Follower 1 Follower 2
15 15 15 ● Follower reads can return stale data ● Followers located near the client can
Write req serve data with low latency 20 15 15 received ● Follower read configuration is at the app level Write ● Follower reads offer: completed 20 20 15 ○ Low latency ● Follower reads cost: Read req Data accuracy (freshness) received 20 20 15 ○
Tablet fully replicated 20 20 15 tablet 1’
© 2021 All Rights Reserved Core concept 3: xCluster asynchronous replication
Master Cluster 1 in Region 1 Master Cluster 2 in Region 2
Availability Zone 1 Availability Zone 1
Unidirectional or Bidirectional Async Replication
Availability Zone 2 Availability Zone 3 Availability Zone 2 Availability Zone 3
Consistent Across Zones Consistent Across Zones No Cross-Region Latency for Both Writes & Reads No Cross-Region Latency for Both Writes & Reads
© 2021 All Rights Reserved 16 Enabling business outcomes: Kroger
Largest supermarket chain in the US with over 2,750 supermarkets and multi-departments stores. Rapidly growing digital channel, especially during the COVID-19 crisis.
WHY YUGABYTE SOLUTION AND BENEFITS
● Distributed ACID transactions, scalability ● YugabyteDB is the SOR for the shopping list service ● Geo-distributed deployment for resilience ● 42 states, 9m shoppers ● Multi-API support - YSQL, YCQL ● Multi-region deployment with sync replication for ● Automatic data sharding resilience with single digit latency ● Open source ● xCluster bidirectional replication ● Designed to be multi-cloud on GCP and Azure
”We have been leveraging YugabyteDB as the distributed SQL database running natively inside Kubernetes to power the business-critical apps that require scale and high availability.” - Mahesh Thyagarajan, VP Engineering
© 2021 All Rights Reserved Core concept 4: Read replicas
AZ 1
● Read replicas offer low latency reads AZ 2 AZ 3 ● Read replicas can’t be used for resilience/ failover
AZ 1
Read Replica
AZ 2
© 2021 All Rights Reserved 18 © 2021 All Rights Reserved 19 Admiral architecture
Deployed across 5 countries, 3 continents
● Synchronous cluster across US West, US Central, and US East ● Each region has a master process for HA ● Read replica clusters in Asia and Europe
© 2021 All Rights Reserved Core concept 5: Row-level geo-partitioning
id geo id geo 4 UK 1 US
● Pin rows of a table or indexes to specific geos ● Strong consistency id geo ● Low read and write 2 IND
3 IND latency
© 2021 All Rights Reserved Flexible deployment options in a single database
Consistency Read Latency Write Latency Used For
Multi-zone cluster Strong Low within region 1-10 ms Low within region 1-10 ms Zone-level resilience
Multi-region stretched Tunable (with High with strong consistency 40-100 ms, always Region-level resilience cluster follower reads) or Low with eventual strongly consistent consistency
xCluster Async Eventual (timeline) Low within region 1-10 ms Low within region 1-10 ms Backup and DR single-direction
xCluster Async Eventual (timeline) Low within region 1-10 ms Low within region 1-10 ms Backup and DR bidirectional
Read replicas Strong in primary Low within primary cluster Low within region 1-10 ms Low latency reads; not a cluster; eventual in region 1-10 ms DR solution (not an read replica clusters independent failure domain)
Geo-partitioning Strong Low within region 1-10 ms; Low within region 1-10 ms Compliance high across regions 40-100ms
© 2021 All Rights Reserved Summary of core concepts
1. Data is synchronously replicated within a Yugabyte cluster by default. 2. Nodes can be placed in different zones, different regions (stretched), or different cloud. 3. Reads and writes are handled by the tablet leader. 4. Follower reads trade off data freshness for lower latency. 5. xCluster replication asynchronously replicates data across clusters for backup/DR. 6. Read replicas enable low latency reads from local clusters. 7. Geo-partitioning allows table rows and indexes to be pinned to specific geographies.
These options allow you to prioritize different objectives - resilience, data freshness, latency, and compliance. Achieve desired resilience, latency, and compliance for your apps.
© 2021 All Rights Reserved Thank You
Join us on Slack: yugabyte.com/slack
Star us on GitHub: github.com/yugabyte/yugabyte-db
© 2021 All Rights Reserved 24