Systems for Cloud Data Analytics
Total Page:16
File Type:pdf, Size:1020Kb
Peter Boncz SYSTEMS FOR CLOUD DATA ANALYTICS www.cwi.nl/~boncz/badsCloud Data Systems Credits • David DeWitt & Willis Lang (Microsoft) – cloud DW material • Stratis Viglas (Google) – extreme computing course (University Edinburgh) • Marcin Zukowski (Snowflake) • Ippokratis Pandis (Amazon Redshift/Spectrum) • Spark Team – Matei Zaharia, Xiangrui Meng (Stanford), – Ion Stoica, Xifan Pu (UC Berkeley) – Reynold Xin, Alex Behm (Databricks) www.cwi.nl/~boncz/badsCloud Data Systems Is it safe to have enterprise data in the Cloud? 2005: No way! Are you crazy? 2012: Don’t think so... But wait, we store our email where? 2018: Of course! www.cwi.nl/~boncz/badsCloud Data Systems Getting a database in a cloud Hi! I'm a Data Scientist! Hello! I am your account manager at X! I'm looking for a database for our cloud system Sure thing! Let's install our product, DBMS X for you! Awesome! It seems to work! Great. Let me send you that invoice! Just a sec… How much does the storage cost ? Hold on, let me check that Wait, what? And the system is elastic, right? Mommy!!! And I only pay for what I use, right? www.cwi.nl/~boncz/badsCloud Data Systems Traditional DB systems and the cloud • Designed for: –Small, fixed, optimized clusters of machines –Constrained amount of data and resources • Can be delivered via the Cloud –Reduce the complexity of hardware setup, software installation –No elasticity –No cheap storage –Not designed for cloud's poor stability –Not easy to use –Not "always on" –... www.cwi.nl/~boncz/badsCloud Data Systems Data in the Cloud • Data traditional DW systems are built for –Assume predictable, slow-evolving internal data –Complex ETL (extract-transform-load) pipelines and physical tuning –Limited number of users and use-cases –OK to cost $100K per TB • Data in the cloud –Dynamic, external sources: web, logs, mobile devices, sensor data… –ELT instead of ETL (data transformation inside the system) –Often in semi-structured form (JSON, XML, Avro) –Access required by many users, very different use cases –100TBs volume common www.cwi.nl/~boncz/badsCloud Data Systems 10,000 ft. view: Complexity vs Cost Only 2 options 5 years ago! Complexity (deployment & Roll-your-own (RYO) operational) ▪ Buy & install a cluster of servers ▪ Buy, install, & configure software (Vertica, High RYO Asterdata, Greenplum, …) ▪ High complexity ▪ Medium capex and opex Medium Buy an appliance ▪ Teradata, Microsoft APS, Netezza Appliance ▪ High capex, low opex ▪ Low complexity Low ▪ Gold standard for performance Cost (capex + opex) Low Medium High www.cwi.nl/~boncz/badsCloud Data Systems 10,000 ft. view: Complexity vs Cost Complexity Use a SAAS DW in the cloud (deployment & ▪ AWS Redshift, MSFT SQL DW, Snowflake, BigQuery operational) ▪ Low complexity ▪ No capex, low opex High RYO Roll-your-own-Cloud (RYOC) RYOC ▪ Rent a cluster of cloud servers Medium ▪ Buy, install, & configure software (Spark, Hive, Vertica, Asterdata, Greenplum, …) Appliance ▪ Medium to high complexity ▪ Low capex Low CLOUD DW ▪ Medium opex Cost (capex + opex) Low Medium High www.cwi.nl/~boncz/badsCloud Data Systems Scalability and the price of agility Time to make an adjustment Months Appliance Weeks RYO RYOC Minutes CLOUD DW Cost of making an adjustment Low Mediu High m www.cwi.nl/~boncz/badsCloud Data Systems Why Cloud DW? • No CapEx and low OpEx • Go from conception to insight in hours • Rock bottom storage prices (Azure, AWS S3, GFS) • Flexibility to scale up/down compute capacity • Simple upgrade process www.cwi.nl/~boncz/badsCloud Data Systems Parallel Processing in Analytical DBs ▪ Alternative architectures –Shared-memory –Shared-disk/storage –Shared-nothing “The Case for Shared Nothing,” ▪ Partitioned tables Stonebraker, HPTS ‘85 ▪ Partitioned parallelism www.cwi.nl/~boncz/badsCloud Data Systems Shared-Nothing • Commodity servers connected via commodity networking • DB storage is ”strictly local” to each node Node 1 Node 2 Node K Co-located CPU CPU CPU … compute and MEM MEM MEM storage Interconnection Network • Design scales extremely well www.cwi.nl/~boncz/badsCloud Data Systems Shared Disk/Storage • Commodity servers connected to each other and storage using commodity networking Local• DB disks is stored for caching on DB“remote storage” (e.g. a SAN, S3, Azure pages,Storage) temp files, … Node 1 Node 2 Node K Network can limit scaling as CPU CPU CPU it must carry I/O traffic MEM MEM … MEM Storage Area Network www.cwi.nl/~boncz/badsCloud Data Systems Table Partitioning What? Shared Storage Distribute rows of each table across multiple storage devices Why? • spread I/O load • parallel query execution • data lifecycle management How? Hash, Round Robin, Range www.cwi.nl/~boncz/badsCloud Data Systems Shared-Nothing Ex. Application Select Name, Item from Orders O, Customers C Parser where O.CID = C.ID Join can be done “locally” Optimizer Catalogs Execution Coordinator NODEJOIN No data movementNODEJOIN O.CID1 = C.ID O.CID2 = C.ID CID OID Item CID OID Item Example of 602 10 Tivo Orders Table 933 20 Surface 633 21 TV 752 31 iPhone hash partitioned “partitioned parallelism” 633 21 DVD 602 10 Xbox on CID 602 11 iPod 19 51 TV ID Name AmtDue Customers ID Name AmtDue 602 Larry $13K 933 Mary $49K 752 Anne $75K Table 633 Bob $19K hash partitioned 322 Jeff $20K 19 George $83K on ID www.cwi.nl/~boncz/badsCloud Data Systems Shared-Nothing Ex. Application Select Name, Item from Orders O, Customers C Parser where O.CID = C.ID JoinJoin cannot can be be done done “locally” “locally” Optimizer Catalogs Execution Coordinator data movement needed NODEJOIN No data movementNODEJOIN Biggest table (orders)O.CID 1 = C.ID O.CID2 = C.ID needs to be shuffled: all-to- all communications CID OID Item CID OID Item Example of 602 10 Tivo Orders Table 933 20 Surface 633 21 TV 752 31 iPhone hash partitioned “partitioned parallelism” 633 21 DVD 602 10 Xbox on OID 602 11 iPod 19 51 TV ID Name AmtDue Customers ID Name AmtDue 602 Larry $13K 933 Mary $49K 752 Anne $75K Table 633 Bob $19K hash partitioned 322 Jeff $20K 19 George $83K on ID www.cwi.nl/~boncz/badsCloud Data Systems Shared-Storage Ex. Application Select Name, Item from Parser Orders O, Customers C Optimizer where O.CID = C.ID Execution Coordinator NODE NODE 1 2 LAN Both tables are remote CID OID Item CID OID Item 602 10 Tivo Orders Table 933 20 Surface 633 21 TV 752 31 iPhone hash partitioned 633 21 DVD 602 10 Xbox on CID 602 11 iPod 19 51 TV ID Name AmtDue Customers ID Name AmtDue 602 Larry $13K 933 Mary $49K 752 Anne $75K Table 633 Bob $19K hash partitioned 322 Jeff $20K 19 George $83K on ID www.cwi.nl/~boncz/badsCloud Data Systems For 30+ years • Shared-nothing has been “gold standard” –Teradata, Netezza, DB2/PE, SQL Server PDW, ParAccel, Greenplum, … • Simplest design • Excellent scalability • Minimizes data movement –Especially for DBs with a star schema design • The “cloud” has changed the game –shared nothing: 2017 www.cwi.nl/~boncz/badsCloud Data Systems Outline • Part 1: Intro • Part 2: Databases in the Cloud – Amazon Redshift – Snowflake – Microsoft Azure Synapse Analytics – Google BigQuery – Databricks • Part 3: Cloud Research Challenges www.cwi.nl/~boncz/badsCloud Data Systems Amazon (AWS) Redshift • Classic shared-nothing design w. locally attached storage –Engine is ParAccel database system (classic MPP, JIT C++) • Leverages AWS services –EC2 compute instances –S3 storage system –Virtual Private Cloud (VPC) • Leader in market adoption www.cwi.nl/~boncz/badsCloud Data Systems A Redshift Instance Application Single Leader Node LEADER Catalogs NODEOne or more compute One slice/core Memory,nodes storage, (EC2 instance) & data partitioned among slices ) NODE 1 Hash & round-robin table partitioning AmtDue SLICE 1 NODE 1 SLICE 2 SLICE 3 NODE 2 SLICE 4 ID` , Name, , ID ID Name Amt ID Name Amt ID Name Amt ID Name Amt HashPartitionon Customers( 22 www.cwi.nl/~boncz/badsCloud Data Systems Within a slice Two sort options: 1) Compound sort key 2) “Interleaved” sort key (multidimensional sorting) NAM ID E AMT Columns stored in 1MB blocks Min and Max value of each block retained in a “zone” map Rich collection of compression options (RLE, dictionary, gzip, …) www.cwi.nl/~boncz/badsCloud Data Systems Unique Fault Tolerance Approach Catalogs LEADER NODE Each 1MB block gets replicated on a different compute node SLICE NODE 1 SLICE SLICE NODE 2 SLICE 1 2 3 4 ID Name Amt ID Name Amt ID Name Amt ID Name Amt ID Name Amt ID Name Amt ID Name Amt ID Name Amt And also on S3 S3, in turn, triply replicates S3 ID Name Amt ID Name Amt ID Name Amt ID Name Amt each block www.cwi.nl/~boncz/badsCloud Data Systems Handling Node Failures Alternative #1: Node 2 processes load until Node 1 is restored Alternative #2: New node instantiatedCatalogs LEADER NODE Assume Node 1 fails Node 3 processes SLICE NODENODE 1 3 SLICE SLICE NODE 2 SLICE workload using data in S3 1 2 3 4 ID Name Amt ID Name Amt ID Name Amt ID Name Amt ID Name Amt ID Name Amt ID Name Amt ID Name Amt Until local disks are restored S3 ID Name Amt ID Name Amt ID Name Amt ID Name Amt www.cwi.nl/~boncz/badsCloud Data Systems Redshift Summary • Highly successful cloud SAAS DW service • Classic shared-nothing design • Leverages S3 to handle node and disk failures • Key strength: performance through use of local storage • Key weaknesses: compute cannot be scaled independent of storage (and vice versa) www.cwi.nl/~boncz/badsCloud Data Systems Redshift Spectrum www.cwi.nl/~boncz/badsCloud Data Systems Redshift Spectrum Spectrum www.cwi.nl/~boncz/badsCloud Data Systems Outline • Part 1: Intro • Part 2: Databases in the Cloud