Distributed SQL Database on Kubernetes
Total Page:16
File Type:pdf, Size:1020Kb
YugabyteDB – Distributed SQL Database on Kubernetes Amey Banarse VP of Product, Yugabyte, Inc. Taylor Mull Senior Data Engineer, Yugabyte, Inc. © 2020 - All Rights Reserved 1 Introduction – Amey Amey Banarse VP of Product, Yugabyte, Inc. Pivotal • FINRA • NYSE University of Pennsylvania (UPenn) @ameybanarse about.me/amey © 2020 - All Rights Reserved 2 Introduction – Taylor Taylor Mull Senior Data Engineer, Yugabyte, Inc. DataStax • Charter University of Colorado at Boulder © 2020 - All Rights Reserved 3 Kubernetes Is Massively Popular in Fortune 500s ● Walmart – Edge Computing KubeCon 2019 https://www.youtube.com/watch?v=sfPFrvDvdlk ● Target – Data @ Edge https://tech.target.com/2018/08/08/running-cassandra-in-kubernetes -across-1800-stores.html ● eBay – Platform Modernization https://www.ebayinc.com/stories/news/ebay-builds-own-servers-intends -to-open-source/ © 2020 - All Rights Reserved 4 The State of Kubernetes 2020 ● Substantial Kubernetes growth in Large Enterprises ● Clear evidence of production use in enterprise environments ● On-premises is still the most common deployment method ● Though there are pain points, most developers and executives alike feel Kubernetes is worth it VMware The State of Kubernetes 2020 report https://tanzu.vmware.com/content/ebooks/the-state-of-kubernetes-2020 https://containerjournal.com/topics/container-ecosystems/vmware-releases-state-of-kubernetes-2020-report/ © 2020 - All Rights Reserved 5 Data on K8s Ecosystem Is Evolving Rapidly © 2020 - All Rights Reserved 6 Why Data Services on K8s? Containerized data workloads running on Kubernetes offer several advantages over traditional VM / bare metal based data workloads including but not limited to: ● Better cluster resource utilization ● Robust automation framework can be embedded inside CRDs (Custom Resource Definitions) or ● Portability between cloud and on-premises commonly referred as ‘K8s Operator’ ● Self Service Experience and seamlessly Scale ● Simple and selective instant upgrades on demand during peak traffic © 2020 - All Rights Reserved 7 A Brief History of Yugabyte Part of Facebook’s cloud native DB evolution Builders of multiple popular databases Yugabyte founding team ran Facebook’s public cloud scale DBaaS +1 Trillion +100 Petabytes ops/day data set sizes ● Yugabyte team dealt with this growth first hand ● Massive geo-distributed deployment given global users ● Worked with world-class infra team to solve these issues © 2020 - All Rights Reserved 8 Transactional, distributed SQL database designed for resilience and scale 100% open source, PostgreSQL compatible, enterprise-grade RDBMS …..built to run across all your cloud environments © 2020 - All Rights Reserved 9 Enabling Business Outcomes ● System of Record driving shopping list service ● Designed to be Multi-Cloud on GCP and Azure ● Scaling to 42 states, and 9m shoppers ● Real-time APIs for financial services data ● 14 Billion requests per day with data from over 100 world-wide exchanges ● Serves major financial tech and services firms from Betterment to Schwab ● Retail personalization platform serving 600+ retailers like Walmart and Nike ● Designed to be Multi-cloud/Multi-AZ and tuned to handle Black Friday © 2020 - All Rights Reserved 10 YugabyteDB – The Promise of Distributed SQL SQL PostgreSQL compatible Resilience & HA Low Latency Multi/ Hybrid cloud /K8s 100% Open Source Horizontal Geo Distribution Scalability © 2020 - All Rights Reserved 11 Designing the Perfect Distributed SQL Database PostgreSQL is more popular than MongoDB Aurora much more popular than Spanner Amazon Aurora Google Spanner A highly available MySQL and The first horizontally scalable, PostgreSQL-compatible strongly consistent, relational relational database service database service Not scalable but HA Scalable and HA All RDBMS features Missing RDBMS features PostgreSQL & MySQL New SQL syntax © 2020 - All Rights Reserved 12 Comparing PostgreSQL and Google Spanner Feature PostgreSQL Google Spanner ✓ ❌ Query Layer Very well known New SQL flavor Feature set and ecosystem support SQL Feature Depth High Low (many advanced features) (basic features missing) critical for adoption Fault Tolerance with HA ❌ ✓ Horizontal Scalability ❌ ✓ Distributed Txns ❌ ✓ Replication Async Sync, Read Replicas © 2020 - All Rights Reserved 13 Spanner Design + PostgreSQL Compatibility Feature PostgreSQL Google Spanner ✓ ❌ ✓ SQL Wire Protocol Very well known New SQL flavor Reuses PostgreSQL RDBMS Feature Support High Low High (many advanced features) (basic features missing) (supports most Postgres features) Fault Tolerance with HA ❌ ✓ ✓ Horizontal Scalability ❌ ✓ ✓ Distributed Txns ❌ ✓ ✓ Global Txn Replication Async Sync, Read Replicas Sync, Read Replicas, Async © 2020 - All Rights Reserved 14 Designed for Cloud Native Microservices Google PostgreSQL Spanner YugabyteDB ✓ ✘ ✓ Yugabyte Query Layer SQL Ecosystem Massively New SQL flavor Reuse PostgreSQL adopted YCQL YSQL ✓ ✘ ✓ RDBMS Features Advanced Basic Advanced Complex cloud-native Complex and cloud-native DocDB Distributed Document Store Highly Available ✘ ✓ ✓ Distributed Sharding & Load Raft Consensus Transaction Manager Balancing Replication & MVCC Horizontal Scale ✘ ✓ ✓ Document Storage Layer Distributed Txns ✘ ✓ ✓ Custom RocksDB Storage Engine Data Replication Async Sync Sync + Async © 2020 - All Rights Reserved 15 All Nodes Are Identical Microservices Can connect to ANY node platform Add/remove nodes anytime YugabyteDB YugabyteDB YugabyteDB Query Layer Query Layer Query Layer … … DocDB DocDB DocDB Storage Layer Storage Layer Storage Layer YugabyteDB Node YugabyteDB Node YugabyteDB Node © 2020 - All Rights Reserved 16 YugabyteDB Deployed as StatefulSets Admin App Clients yb-masters Clients yb-tservers Headless Service Headless Service yb-master-0 pod yb-master-1 pod yb-master-2 pod yb-master StatefulSet yugabytedb yugabytedb yugabytedb yb-tserver-0tablet 1’ pod yb-tserver-1tablet 1’ pod yb-tserver-2tablet 1’ pod yb-tserver-3tablet 1’ pod yb-tserver … StatefulSet yugabytedb yugabytedb yugabytedb yugabytedb node1 node2 node3 node4 Local/Remote Local/Remote Local/Remote Local/Remote Persistent Volume Persistent Volume Persistent Volume Persistent Volume © 2020 - All Rights Reserved 17 Under the Hood – 3 Node Cluster node1 node2 node3 YB-Master Manage shard metadata & yb-master1 yb-master2 coordinate cluster-wide ops yb-master3 yb-tserver1 yb-tserver2 tablet1-leader tablet1-follower YB-TServer tablet 1’ tablet 1’ Stores/serves data tablet2-follower Global Transaction Manager tablet2-leader in/from tablets (shards) Tracks ACID txns across multi-row ops, incl. clock skew mgmt. tablet3-follower tablet3-follower yb-tserver3 … … tablet1-follower tablet2-followertablet 1’ tablet3-leader DocDB Storage Engine … Raft Consensus Replication Purpose-built for ever-growing data, extended from RocksDB Highly resilient, used for both data replication & leader election © 2020 - All Rights Reserved 18 Deployment Topologies 1. Single Region, Multi-Zone 2. Single Cloud, Multi-Region 3. Multi-Cloud, Multi-Region Availability Zone 1 Region 1 Cloud 1 Availability Zone 2 Availability Zone 3 Region 2 Region 3 Cloud 2 Cloud 3 Consistent Across Zones Consistent Across Regions Consistent Across Clouds No WAN Latency But No Cross-Region WAN Latency with Cross-Cloud WAN Latency with Auto Region-Level Failover/Repair Auto Region-Level Failover/Repair Cloud-Level Failover/Repair © 2020 - All Rights Reserved 19 Multi-Master Deployments w/ xCluster Replication Master Cluster 1 in Region 1 Master Cluster 2 in Region 2 Availability Zone 1 Availability Zone 1 Bidirectional Async Replication Availability Zone 2 Availability Zone 3 Availability Zone 2 Availability Zone 3 Consistent Across Zones Consistent Across Zones No Cross-Region Latency for Both Writes & Reads No Cross-Region Latency for Both Writes & Reads App Connects to Master Cluster in Region 2 on Failure App Connects to Master Cluster in Region 1 on Failure © 2020 - All Rights Reserved 20 Deploying Yugabyte on Multi-Region K8s ● Scalable and highly available data tier ● Business continuity ● Geo-partitioning and data compliance © 2020 - All Rights Reserved 21 YugabyteDB on K8s Multi-Region Requirements ● Pod to pod communication over TCP ports using RPC calls across n K8s clusters ● Global DNS Resolution system ○ Across all the K8s clusters so that pods in one cluster can connect to pods in other clusters ● Ability to create load balancers in each region/DB ● RBAC: ClusterRole and ClusterRoleBinding ● Reference: Deploy YugabyteDB on multi cluster GKE https://docs.yugabyte.com/latest/deploy/kubernetes/multi-cluster/gke/helm-chart/ © 2020 - All Rights Reserved 22 YugabyteDB on K8s Demo – Single YB Universe Deployed on 3 GKE Clusters © 2020 - All Rights Reserved 23 YugabyteDB Universe on 3 GKE Clusters Deployment: 3 GKE clusters Each with 3 x N1 Standard 8 nodes 3 pods in each cluster using 4 cores Cores: 4 cores per pod Memory: 7.5 GB per pod Disk: ~ 500 GB total for universe © 2020 - All Rights Reserved 24 Yugabyte Platform Demo © 2020 - All Rights Reserved 25 Ensuring High Performance LOCAL STORAGE REMOTE STORAGE Since v1.10 Most used Lower latency, Higher throughput Higher latency, Lower throughput Recommended for workloads that do their own Recommended for workloads do not perform any replication replication on their own Pre-provision outside of K8s Provision dynamically in K8s Use SSDs for latency-sensitive apps Use alongside local storage for cost-efficient tiering © 2020 - All Rights Reserved 26 Configuring Data Resilience POD ANTI-AFFINITY MULTI-ZONE/REGIONAL/MULTI-REGION