YugabyteDB – Distributed SQL Database on Kubernetes
Amey Banarse VP of Product, Yugabyte, Inc. Taylor Mull Senior Data Engineer, Yugabyte, Inc.
© 2020 - All Rights Reserved 1 Introduction – Amey
Amey Banarse
VP of Product, Yugabyte, Inc.
Pivotal • FINRA • NYSE
University of Pennsylvania (UPenn)
@ameybanarse
about.me/amey
© 2020 - All Rights Reserved 2 Introduction – Taylor
Taylor Mull
Senior Data Engineer, Yugabyte, Inc.
DataStax • Charter
University of Colorado at Boulder
© 2020 - All Rights Reserved 3 Kubernetes Is Massively Popular in Fortune 500s
● Walmart – Edge Computing KubeCon 2019 https://www.youtube.com/watch?v=sfPFrvDvdlk
● Target – Data @ Edge https://tech.target.com/2018/08/08/running-cassandra-in-kubernetes -across-1800-stores.html
● eBay – Platform Modernization https://www.ebayinc.com/stories/news/ebay-builds-own-servers-intends -to-open-source/
© 2020 - All Rights Reserved 4 The State of Kubernetes 2020
● Substantial Kubernetes growth in Large Enterprises ● Clear evidence of production use in enterprise environments ● On-premises is still the most common deployment method ● Though there are pain points, most developers and executives alike feel Kubernetes is worth it
VMware The State of Kubernetes 2020 report https://tanzu.vmware.com/content/ebooks/the-state-of-kubernetes-2020 https://containerjournal.com/topics/container-ecosystems/vmware-releases-state-of-kubernetes-2020-report/
© 2020 - All Rights Reserved 5 Data on K8s Ecosystem Is Evolving Rapidly
© 2020 - All Rights Reserved 6 Why Data Services on K8s?
Containerized data workloads running on Kubernetes offer several advantages over traditional VM / bare metal based data workloads including but not limited to:
● Better cluster resource utilization ● Robust automation framework can be embedded inside CRDs (Custom Resource Definitions) or ● Portability between cloud and on-premises commonly referred as ‘K8s Operator’
● Self Service Experience and seamlessly Scale ● Simple and selective instant upgrades on demand during peak traffic
© 2020 - All Rights Reserved 7 A Brief History of Yugabyte
Part of Facebook’s cloud native DB evolution Builders of multiple popular databases
Yugabyte founding team ran Facebook’s public cloud scale DBaaS
+1 Trillion +100 Petabytes ops/day data set sizes ● Yugabyte team dealt with this growth first hand ● Massive geo-distributed deployment given global users ● Worked with world-class infra team to solve these issues
© 2020 - All Rights Reserved 8 Transactional, distributed SQL database designed for resilience and scale
100% open source, PostgreSQL compatible, enterprise-grade RDBMS …..built to run across all your cloud environments
© 2020 - All Rights Reserved 9 Enabling Business Outcomes
● System of Record driving shopping list service ● Designed to be Multi-Cloud on GCP and Azure ● Scaling to 42 states, and 9m shoppers
● Real-time APIs for financial services data ● 14 Billion requests per day with data from over 100 world-wide exchanges ● Serves major financial tech and services firms from Betterment to Schwab
● Retail personalization platform serving 600+ retailers like Walmart and Nike ● Designed to be Multi-cloud/Multi-AZ and tuned to handle Black Friday
© 2020 - All Rights Reserved 10 YugabyteDB – The Promise of Distributed SQL
SQL
PostgreSQL compatible
Resilience & HA Low Latency
Multi/ Hybrid cloud /K8s 100% Open Source
Horizontal Geo Distribution Scalability
© 2020 - All Rights Reserved 11 Designing the Perfect Distributed SQL Database
PostgreSQL is more popular than MongoDB Aurora much more popular than Spanner
Amazon Aurora Google Spanner A highly available MySQL and The first horizontally scalable, PostgreSQL-compatible strongly consistent, relational relational database service database service
Not scalable but HA Scalable and HA
All RDBMS features Missing RDBMS features
PostgreSQL & MySQL New SQL syntax
© 2020 - All Rights Reserved 12 Comparing PostgreSQL and Google Spanner
Feature PostgreSQL Google Spanner ✓ ❌ Query Layer Very well known New SQL flavor Feature set and ecosystem support SQL Feature Depth High Low (many advanced features) (basic features missing) critical for adoption
Fault Tolerance with HA ❌ ✓
Horizontal Scalability ❌ ✓
Distributed Txns ❌ ✓
Replication Async Sync, Read Replicas
© 2020 - All Rights Reserved 13 Spanner Design + PostgreSQL Compatibility
Feature PostgreSQL Google Spanner ✓ ❌ ✓ SQL Wire Protocol Very well known New SQL flavor Reuses PostgreSQL
RDBMS Feature Support High Low High (many advanced features) (basic features missing) (supports most Postgres features) Fault Tolerance with HA ❌ ✓ ✓
Horizontal Scalability ❌ ✓ ✓
Distributed Txns ❌ ✓ ✓
Global Txn Replication Async Sync, Read Replicas Sync, Read Replicas, Async
© 2020 - All Rights Reserved 14 Designed for Cloud Native Microservices
Google PostgreSQL Spanner YugabyteDB
✓ ✘ ✓ Yugabyte Query Layer SQL Ecosystem Massively New SQL flavor Reuse PostgreSQL adopted YCQL YSQL ✓ ✘ ✓ RDBMS Features Advanced Basic Advanced Complex cloud-native Complex and cloud-native DocDB Distributed Document Store Highly Available ✘ ✓ ✓ Distributed Sharding & Load Raft Consensus Transaction Manager Balancing Replication Horizontal Scale ✘ ✓ ✓ & MVCC Document Storage Layer Distributed Txns ✘ ✓ ✓ Custom RocksDB Storage Engine Data Replication Async Sync Sync + Async
© 2020 - All Rights Reserved 15 All Nodes Are Identical
Microservices Can connect to ANY node platform Add/remove nodes anytime
YugabyteDB YugabyteDB YugabyteDB Query Layer Query Layer Query Layer … … DocDB DocDB DocDB Storage Layer Storage Layer Storage Layer
YugabyteDB Node YugabyteDB Node YugabyteDB Node
© 2020 - All Rights Reserved 16 YugabyteDB Deployed as StatefulSets
Admin App Clients yb-masters Clients yb-tservers Headless Service Headless Service
yb-master-0 pod yb-master-1 pod yb-master-2 pod yb-master StatefulSet yugabytedb yugabytedb yugabytedb
yb-tserver-0tablet 1’ pod yb-tserver-1tablet 1’ pod yb-tserver-2tablet 1’ pod yb-tserver-3tablet 1’ pod yb-tserver … StatefulSet yugabytedb yugabytedb yugabytedb yugabytedb
node1 node2 node3 node4
Local/Remote Local/Remote Local/Remote Local/Remote Persistent Volume Persistent Volume Persistent Volume Persistent Volume
© 2020 - All Rights Reserved 17 Under the Hood – 3 Node Cluster
node1 node2 node3 YB-Master Manage shard metadata & yb-master1 yb-master2 coordinate cluster-wide ops yb-master3
yb-tserver1 yb-tserver2
tablet1-leader tablet1-follower YB-TServer tablet 1’ tablet 1’ Stores/serves data tablet2-follower Global Transaction Manager tablet2-leader in/from tablets (shards) Tracks ACID txns across multi-row ops, incl. clock skew mgmt. tablet3-follower tablet3-follower yb-tserver3 … …
tablet1-follower
tablet2-followertablet 1’
tablet3-leader DocDB Storage Engine … Raft Consensus Replication Purpose-built for ever-growing data, extended from RocksDB Highly resilient, used for both data replication & leader election
© 2020 - All Rights Reserved 18 Deployment Topologies
1. Single Region, Multi-Zone 2. Single Cloud, Multi-Region 3. Multi-Cloud, Multi-Region
Availability Zone 1 Region 1 Cloud 1
Availability Zone 2 Availability Zone 3 Region 2 Region 3 Cloud 2 Cloud 3
Consistent Across Zones Consistent Across Regions Consistent Across Clouds No WAN Latency But No Cross-Region WAN Latency with Cross-Cloud WAN Latency with Auto Region-Level Failover/Repair Auto Region-Level Failover/Repair Cloud-Level Failover/Repair
© 2020 - All Rights Reserved 19 Multi-Master Deployments w/ xCluster Replication
Master Cluster 1 in Region 1 Master Cluster 2 in Region 2
Availability Zone 1 Availability Zone 1
Bidirectional Async Replication
Availability Zone 2 Availability Zone 3 Availability Zone 2 Availability Zone 3
Consistent Across Zones Consistent Across Zones No Cross-Region Latency for Both Writes & Reads No Cross-Region Latency for Both Writes & Reads App Connects to Master Cluster in Region 2 on Failure App Connects to Master Cluster in Region 1 on Failure
© 2020 - All Rights Reserved 20 Deploying Yugabyte on Multi-Region K8s
● Scalable and highly available data tier
● Business continuity
● Geo-partitioning and data compliance
© 2020 - All Rights Reserved 21 YugabyteDB on K8s Multi-Region Requirements
● Pod to pod communication over TCP ports using RPC calls across n K8s clusters
● Global DNS Resolution system ○ Across all the K8s clusters so that pods in one cluster can connect to pods in other clusters
● Ability to create load balancers in each region/DB
● RBAC: ClusterRole and ClusterRoleBinding
● Reference: Deploy YugabyteDB on multi cluster GKE https://docs.yugabyte.com/latest/deploy/kubernetes/multi-cluster/gke/helm-chart/
© 2020 - All Rights Reserved 22 YugabyteDB on K8s
Demo – Single YB Universe Deployed on 3 GKE Clusters
© 2020 - All Rights Reserved 23 YugabyteDB Universe on 3 GKE Clusters
Deployment: 3 GKE clusters Each with 3 x N1 Standard 8 nodes 3 pods in each cluster using 4 cores
Cores: 4 cores per pod
Memory: 7.5 GB per pod
Disk: ~ 500 GB total for universe
© 2020 - All Rights Reserved 24 Yugabyte Platform
Demo
© 2020 - All Rights Reserved 25 Ensuring High Performance
LOCAL STORAGE REMOTE STORAGE Since v1.10 Most used
Lower latency, Higher throughput Higher latency, Lower throughput
Recommended for workloads that do their own Recommended for workloads do not perform any replication replication on their own
Pre-provision outside of K8s Provision dynamically in K8s
Use SSDs for latency-sensitive apps Use alongside local storage for cost-efficient tiering
© 2020 - All Rights Reserved 26 Configuring Data Resilience
POD ANTI-AFFINITY MULTI-ZONE/REGIONAL/MULTI-REGION POD SCHEDULING
Pods of the same type should not be scheduled on the same node Multi-Zone – Tolerate zone failures for Keeps impact of node failures to K8s worker nodes absolute minimum Regional – Tolerate zone failures for both K8s worker and master nodes
Multi-Region / Multi-Cluster – Requires network discovery between multi cluster
© 2020 - All Rights Reserved 27 Automating Day 2 Operations
HANDLING FAILURES ROLLING UPGRADES BACKUP & RESTORE
Pod failure handled by K8s Supports two upgradeStrategies: Backups and restores are a automatically onDelete (default) and database level construct rollingUpgrade Node failure has to be handled YugabyteDB can perform manually by adding a new slave Pick rolling upgrade strategy for distributed snapshot and copy to a node to K8s cluster DBs that support zero downtime target for a backup upgrades such as YugabyteDB Local storage failure has to be Restore the backup into an existing handled manually by mounting New instance of the pod spawned cluster or a new cluster with a new local volume to K8s with same network id and storage different number of TServers
© 2020 - All Rights Reserved 28 Extending StatefulSets with Operators
CPU usage in the yb-tserver StatefulSet
Based on Custom Controllers that have direct CPU > 80% for 1min and access to lower level K8S API max_threshold not exceeded
Excellent fit for stateful apps requiring human operational knowledge to correctly scale, reconfigure and upgrade while simultaneously ensuring high performance and data resilience Scale pods Complementary to Helm for packaging https://github.com/yugabyte/yugabyte-platform-operator
© 2020 - All Rights Reserved 29 Yugabyte Platform
Demo
© 2020 - All Rights Reserved 30 Live Demo: Cluster Scale Up
Tablet Tablet Tablet Tablet Tablet Tablet Tablet Server-1 Server-2 Server-3 Server-1 Server-2 Server-3 Server-4
t1 t4 t1 t4 t1 t4 t1 t4 t1 t4 t1 t4
t3 t2 t3 t2 t3 t2 t3 t2 t3 t2 t3 t2
1 Before expansion, all 3 nodes taking traffic. 2 New node just added. No traffic to new node yet.
Tablet Tablet Tablet Tablet Tablet Tablet Tablet Tablet Server-1 Server-2 Server-3 Server-4 Server-1 Server-2 Server-3 Server-4
t1 t4 t1 t1 t4 t1 t1 t4 t4 t4 t4 t3 t1
t3 t2 t3 t2 t3 t2 t2 t3 t3 t2 t2 t2
3 New node received t2 from current Leader, which is Zero-downtime cluster expansion and much guaranteed to have consistent copy. A simple file transfer and 4 with just that one Tablet, it is ready to take traffic. Expansion faster because of Raft and strong consistency. is still in progress.
31 © 2020 - All Rights Reserved Confidential Live Demo: Terminate a YB Pod
➜ ~ kubectl delete pods yb-tserver-1 -n yb-dev-yb-webinar-us-west1-c
➜ ~ kubectl get pods -n yb-dev-yb-webinar-us-west1-c
© 2020 - All Rights Reserved 32 A Classic Enterprise App Scenario
© 2020 - All Rights Reserved 33 Yugastore – E-Commerce App : A Real-World Demo
Deployed on
https://github.com/yugabyte/yugastore-java
© 2020 - All Rights Reserved 34 Yugastore – Kronos Marketplace
© 2020 - All Rights Reserved 35 Classic Enterprise Microservices Architecture
PRODUCT . MICROSERVICE YCQL
Yugabyte Cluster
REST CHECKOUT UIUIU APP API Gateway MICROSERVICE YCQL
YSQL CART MICROSERVICE
© 2020 - All Rights Reserved 36 Istio Traffic Management for Microservices
UIUIU APP Istio API Edge Proxy Gateway
Istio Service Discovery Istio Edge Gateway Istio Route Configuration CART CHECKOUT PRODUCT using Envoy Proxy MICROSERVICE MICROSERVICE MICROSERVICE
Pilot Citadel Galley
Istio Control Plane
© 2020 - All Rights Reserved 37 A Platform Built for a New Way of Thinking
➔ Event + Microservice first design
➔ Team autonomy with platform efficiency
➔ 100% Cloud Native operating model on K8s
➔ Turnkey multi-cloud
➔ Full Spring Data support
© 2020 - All Rights Reserved 38 We’re a Fast Growing Project
Clusters Slack GitHub Stars ▲ 12x ▲ 12x ▲ 7x We stars! Give us one:
github.com/YugaByte/yugabyte-db Growth in 1 Year
Join our community:
yugabyte.com/slack
© 2020 - All Rights Reserved 39 Thank You
Join us on Slack: yugabyte.com/slack
Star us on GitHub: github.com/yugabyte/yugabyte-db
© 2020 - All Rights Reserved 40