Modernize Your Data Warehouse
Total Page:16
File Type:pdf, Size:1020Kb
Modernize your data warehouse Isabel Huerga Ayza Senior Developer Advocate © 2020, Amazon Web Services, Inc. or its Affiliates. Agenda • Cloud data warehouses • Amazon Redshift architecture and features • Accelerating your data warehouse migration • Demo © 2020, Amazon Web Services, Inc. or its Affiliates. Benefits of a cloud data warehouse Get insights Scale, elasticity, Increases in No infrastructure from all your data and flexibility productivity costs and pay-as-you-go © 2020, Amazon Web Services, Inc. or its Affiliates. © 2020, Amazon Web Services, Inc. or its Affiliates. JustGiving Supports 24 Million Users on Charity Site Using AWS • Needed a new platform to support general “ Using the new AWS tools, we can operations and new analytics service extract much finer-grained data points • Moved to AWS, using a wide range of services based on millions of donations and • Can scale system faster in response to billions of visits, and then use that unanticipated spikes in traffic information to provide a better platform • Receives query results in seconds compared to 30 for our visitors. minutes under old system Richard Atkinson Chief Information Officer, JustGiving • Obtains deeper insights into billions of data points, using information to deliver better services JustGiving is a major online platform that supports charitable giving. The organization is based in London.” © 2020, Amazon Web Services, Inc. or its Affiliates. Amazon Redshift Benefits Integrated catalog & security Massively parallel processing Usage-based pricing Exabyte data lake querying Columnar data storage Predictable costs Result caching Virtually unlimited AWS-grade security Easy to provision & manage elastic linear scaling Certifications such as SOC, PCI, DSS, ISO, Automated administrative tasks FedRRAMP, HIPAA © 2020, Amazon Web Services, Inc. or its Affiliates. Data warehouse (business data) Amazon Redshift Data lake (event data) Customers moving to data lake architectures Amazon Redshift enables you to have a lake house approach © 2020, Amazon Web Services, Inc. or its Affiliates. Amazon Redshift federated query Queries on Amazon RDS and Amazon Aurora PostgreSQL databases JDBC/ODBC Analytics on live data without data movement Unified analytics across data warehouse, data lake & operational databases Flexible and easy way to ingest data Performant and secure access to data © 2020, Amazon Web Services, Inc. or its Affiliates. Redshift Cluster Architecture SQL Clients / BI Tools • Leader node • SQL endpoint JDBC/ODBC • Stores metadata • Coordinates parallel SQL processing & Leader ML optimizations node • Leader node is free with 2+ nodes • Compute nodes Compute Compute Compute • Local, columnar storage node node node • Executes queries in parallel Spectrum • Load, unload, backup, restore from S3 Load … … … … … Redshift • Amazon Redshift Spectrum nodes UnloadQuery ... • Execute queries directly against data lake Backup 1 2 3 4 N Restore Amazon S3 Exabyte-scale object storage © 2020, Amazon Web Services, Inc. or its Affiliates. Redshift Instance Types Amazon Redshift analytics—RA3 (new) A Redshift cluster can have up to128 ds2.8xlarge • Solid-state disks + Amazon S3 or RA3.16xlarge nodes (i.e. 2PB or 8 PB of local or • Amazon Redshift Managed Storage (RMS) managed storage, respectively) & can support Dense compute DC2 EBs of data with its Redshift Lakehouse feature • Solid-state disks Dense storage DS2 • Magnetic disks Instance type Disk type Size Memory # CPUs # Slices RA3 4xlarge RMS Scales to 64 TB 96 GB 12 4 RA3 16xlarge RMS Scales to 64 TB 384 GB 48 16 DC2 large SSD 160 GB 16 GB 2 2 DC2 8xlarge SSD 2.56 TB 244 GB 32 16 DS2 xlarge Magnetic 2 TB 32 GB 4 2 DS2 8xlarge Magnetic 16 TB 244 GB 36 16 © 2020, Amazon Web Services, Inc. or its Affiliates. Evolving Architecture RA3 Managed Storage SQL Clients / BI Tools Amazon Redshift Managed Storage JDBC/ODBC • Pay separately for storage and compute Leader node • Large high-speed SSD backed cache • Automatic scaling (up to 64TB/instance) Compute Compute Compute node node node • Supports up to 8.2PB of cluster storage Amazon Redshift Managed Storage Exabyte-scale object storage © 2020, Amazon Web Services, Inc. or its Affiliates. Evolution – shared nothing to disaggregated compute Local storage has enabled the fastest Compute Compute Compute Compute Cloud-based DWs node node node node Shared storage enables flexibility at the cost of performance What if we could get the benefits of both without a network performance penalty? © 2020, Amazon Web Services, Inc. or its Affiliates. AQUA: Advanced Query Accelerator Preview! ComputeCompute ComputeCompute ComputeCompute New distributed & hardware-accelerated processing ComputeComputeClusters ComputeComputeClusters ComputeComputeClusters RedshiftClustersClusters RedshiftClustersClusters RedshiftClustersClusters layer ClusterClusters ClusterClusters ClusterClusters With AQUA, Amazon Redshift is up to 10x faster than any other cloud data warehouse, no extra cost AQUA AQUA AQUA AQUA node node node node AQUA Nodes with custom AWS-designed analytics processors to make operations (compression, encryption, filtering, and aggregations) faster than traditional CPUs Available in Preview with RA3. No code changes Amazon Redshift Managed Storage required © 2020, Amazon Web Services, Inc. or its Affiliates. Node Scaling Modify node type, number of nodes, or both Execute immediately or on a schedule During resize, cluster is in read-only © 2020, Amazon Web Services, Inc. or its Affiliates. Concurrency Scaling Scale-out to multiple Amazon Redshift clusters from a single endpoint in seconds Amazon Redshift concurrency scaling JDBC/ODBC + + Support virtually unlimited concurrent users and queries while maintaining SLAs Per-second billing for additional clusters used Free 1-hr. usage per day (free for 97% of clusters) © 2020, Amazon Web Services, Inc. or its Affiliates. Efficiency with workload management and query priorities Workload Auto WLM - Dynamically manage concurrency manager and memory to optimize throughput and performance ETL BI/Analytics Data science Priority: High Priority: Normal Priority: Low Auto WLM priorities - Influence workload performance, intelligent algorithms to keep low-priority queries running Concurrency query slots = Auto SQA (Short Query Accelerator) – prioritized … selected queries in dedicated space QMR (Query Monitoring Rules) - Define actions Amazon based on thresholds Redshift Leader node cluster Efficient sharing of cluster between users & Compute nodes business groups © 2020, Amazon Web Services, Inc. or its Affiliates. Machine learning based automatic optimizations Automates table maintenance Optimizes for peak performance as data and workloads scale Leverages machine learning Offers prescriptive recommendations with ability to dynamically apply changes © 2020, Amazon Web Services, Inc. or its Affiliates. Accelerating your migration © 2020, Amazon Web Services, Inc. or its Affiliates. AWS migration tooling AWS Schema Conversion Tool (AWS SCT) converts your commercial database and data warehouse schemas to open- source engines or AWS native services, such as Amazon Aurora and Amazon Redshift AWS Database Migration Service (AWS DMS) easily and securely migrates and/or replicates your databases and data warehouses to AWS © 2020, Amazon Web Services, Inc. or its Affiliates. AWS SCT The AWS SCT helps automate database schema and code conversion tasks when migrating from source to Features target database engines Create assessment reports for homogeneous/heterogeneous migrations Convert Convert database schema Convert data warehouse schema Convert embedded application code Code browser that highlights places where manual edits are required Source DB AWS SCT Target DB Secure connections to your databases with SSL Service substitutions/ETL modernization to AWS Glue Migrate data to data warehouses using SCT data extractors Optimize schemas in Amazon Redshift © 2020, Amazon Web Services, Inc. or its Affiliates. AWS SCT data extractors Extract data from your data warehouse and migrate to Amazon Redshift • Extracts data through local migration agents • Data is optimized for Amazon Redshift and saved in local files • Files are loaded to an Amazon S3 bucket (through network or AWS Snowball Edge) and then to Amazon Redshift Microsoft SQL Server Source DW AWS SCT Amazon Amazon NETEZZA S3 bucket Redshift © 2020, Amazon Web Services, Inc. or its Affiliates. AWS DMS Migrating Migrate between on-premises and AWS databases Migrate between databases to AWS Automated schema conversion Data replication for zero downtime migration © 2020, Amazon Web Services, Inc. or its Affiliates. Demo © 2020, Amazon Web Services, Inc. or its Affiliates. Thank you! Isabel Huerga Ayza @isahuerga isahuerga © 2020, Amazon Web Services, Inc. or its Affiliates. .