D A T 3 7 3 Data platform engineering: How Vanguard is migrating data to AWS
Rafael Suguiura Donovan Stockton Solutions Architect Platform Owner, Cloud Data as a Service Amazon Web Services Vanguard
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migrating data to AWS – Lift and shift
Corporate AWS Cloud data center
SQL SQL database database
SQL SQL database database
File share File share
Tape storage Tape storage Migrating data to AWS
Corporate AWS Cloud data center
SQL database
SQL database Modern data platform
File share
Tape storage © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Common use cases
Relational Key value Document In memory Graph Time series Ledger
Referential High Store Query by key Quickly and Collect, store, Complete, integrity, throughput, documents with easily create and process immutable, and ACID low latency and quickly microsecond and navigate data verifiable transactions, reads access latency relationships sequenced by history of all schema- and writes, querying on between time changes to on-write endless scale any attribute data application data
Lift and shift, Real-time Content Leaderboards, Fraud detection, IoT applications, Systems ERP, CRM, bidding, management, real-time social event tracking of record, supply finance shopping cart, personalization, analytics, caching networking, chain, health social, product mobile recommendation care, catalog, engine registrations, customer financial preferences © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Purpose-built databases
Relational Key value Document In memory Graph Time series Ledger
Amazon RDS Amazon Amazon Amazon Amazon Amazon DynamoDB ElastiCache Neptune Timestream QLDB
Amazon Amazon Amazon Aurora DocumentDB Managed Blockchain Step 2/3 Amazon S3
Secure, highly scalable, durable object storage with millisecond latency for data access Store any type of data Websites, mobile apps, corporate applications, and IoT sensors, at any scale Store data in the format you want Unstructured (logs, dump files), semi-structured (JSON, XML), structured (CSV, Parquet) Storage lifecycle integration Amazon S3 Standard, Amazon S3 Standard-Infrequent Access, Amazon S3 Glacier Step 2/3 Amazon Athena
Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Supports multiple data formats—define schema on demand © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Step 3/3 AWS data transfer and migration portfolio
Online Offline Hybrid data transfer data transfer storage
Load Accelerated AWS Transfer Ship static Storage and Access AWS streaming transfer of for SFTP data into and compute in storage from data into AWS active data transfers into out of disconnected on-premises Amazon S3 Amazon S3 environments
Amazon Kinesis AWS AWS AWS Snowball Storage Data Streams DataSync Transfer Snowball Edge Gateway for SFTP
Amazon Kinesis AWS Data Firehose Snowmobile Recap
Step 1: Identify your data requirements
Step 2: Map data requirements to AWS services
Step 3: Select the data transfer tools and methods Sample migration
Corporate AWS Cloud data center
Aurora AWS Fargate Customer records
AWS DMS
Customer records DynamoDB Lambda Inventory, purchases Inventory Purchases
DataSync Access logs Amazon S3 Athena Access logs Documentation (PDF, images)
Snowball Amazon S3 Documentation (PDF, images) © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Vanguard – Background One of the world’s largest investment Began companies operations May 1, 1975, in Valley Forge, PA Wall ST
Multiple lines of business: Retail, Institutional Session overview
• Rationale for migration to AWS Cloud • Formation of a data engineering team • Define data platform engineering at Vanguard • Our cloud database platform • Solving high volume and performance replication • Lessons learned • Future platform direction Rationale to migrate to public cloud
Factors On premises Cloud
Database Monolithic Microservice
Database Shared database dependencies Bounded context, loose coupling
Database Limited database choices Fit for purpose
Core competency Vanguard = investments AWS = Cloud
Availability Limited Unlimited
Infrastructure Fixed capacity Elastic
Service Separate support teams DevOps
Cost CapEx Pay by the drink Program objectives
Read replica for Cloud native microservices directly to DBaaS
Bring data closer Near-real-time DevOps to compute and CDC-based our clients replication
Modernize file Modernize data transfer lake hydration Amazon S3 Three main pillars of cloud data as a service
Data platform engineering is known at Vanguard as “cloud data as a service”
Part of the chief technology office, which is responsible for overall cloud architecture and platform enablement
Service name Function Database as a service (DBaaS) Persist the data
Data replication as a service (DRaaS) Replicate using change data capture (CDC) Upload files easily and at scale to File transfer as a service (FTaaS) Amazon S3 Building the data engineering team
Educate SRE LOBs Continuous Innovation learning
Architect/ Java/Python developer
CDaaS CAP AWS Cloud theorem/ dev expertise trade-offs staff Guiding principles
Database Freedom Cloud DBA and Fully Managed > (move off DevOps managed > DIY franchise DBs)
Fit-for-purpose Automation over Engineered for databases clicks LOBs
Data infrastructure is not data management CDaaS platform services
Build
Vendor Functional management gaps
Tool Service selection enablement
Architecture patterns Audit
Alert Cost management © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Getting databases the on-premises way (14–28 days) DBaaS: Amazon RDS/Aurora (4–5 minutes)
Corporate AWS Cloud Data Center
VPC
AWS CloudFormation template
Bitbucket DBaaS team (DBaaS repository) Deploy
Bamboo Amazon RDS
Config/parms AWS CloudFormation Bitbucket LOB team (SI Amazon RDS repository) DBaaS Amazon RDS exemplar configuration
[ { "name": "stack", "template": "rds-service/templates/rds_dbaas_postgres.json", "stackPrefix": "exemplar-rds-", "copyTemplatesToS3": "Y", "deleteTemplatesFromS3": "Y", "configBucket": "s3://application-payload-${AWS_ACCOUNT}-${AWS_REGION}/exemplar- rds/${STACK_NAME}", "templateBucket": "s3://application-payload-${AWS_ACCOUNT}- ${AWS_REGION}/exemplar-rds/${STACK_NAME}/cf-templates", "files": [] } ] DBaaS: Alerting/notification infrastructure
AWS Cloud
Availability Availability Zone Zone
VPC
SQL Event database (event-based) Amazon Amazon RDS Amazon RDS CloudWatch master replica
Amazon SNS
Aurora Aurora master replica
DynamoDB Monitoring Alarm DBaaS: Database alarms
• Amazon RDS/Aurora • DynamoDB • CPU Usage >= 70% for 4 datapoints within 20 minutes • ThrottledRequests >= 4 for 12 datapoints within 60 minutes (warning) (warning) • CPU Usage >= 85% for 4 datapoints (critical) • ReadThrottleEvents >= 4 for 12 datapoints within 60 minutes (warning) • Database Connections >= 550 connections for 4 datapoints • WriteThrottleEvents >= 4 for 12 datapoints within 60 minutes within 20 minutes (warning) (warning) • Disk Queue Depth >= 255 for 4 datapoints (warning) • OnlineIndexThrottleEvent>= 4 for 12 datapoints within 60 • Read IOPs >= 1,000/second for 4 datapoints within 20 minutes (warning) minutes (warning) • Write IOPs >= 1,000/second for 4 datapoints within 20 minutes (warning) • Read Latency >= 10 seconds for 4 datapoints within 20 minutes (warning) • Write Latency >= 15 seconds for 4 datapoints within 20 minutes (warning) • Freeable Memory <= 30% for 4 datapoints within 20 minutes (warning) • Freeable Memory <= 15% for 4 datapoints within 20 minutes (critical) • Free Storage Space <= 50% or 4 datapoints within 20 minutes (warning) • Free Storage Space <= 30% or 4 datapoints within 20 minutes (critical) • ReplicaLag >= 350 milliseconds or 4 datapoints within 20 minutes (warning) CDaaS: Cost management architecture
CDaaS cost Line of business IT
AWS Cloud
VPC
Lambda for cost Aurora Amazon Kinesis Data DynamoDB Streams
CloudWatch Amazon RDS Amazon EC2 Storage Gateway for Attunity CDaaS cost strategy
• Current • Upcoming • Stop Amazon RDS when no DB • Downsize number of shards for Kinesis connections for four days Data Streams after initial data load • Alert LOB when CPU utilization is under • Minimize unnecessary number of shards 10% for Amazon RDS for Kinesis Data Streams based on activity • Downsize WCU and RCU to one when provisioned DynamoDB tables are not in • Downgrade instance class when Amazon use for two days RDS is overprovisioned • Convert provisioned to on-demand when • Extend quarantine and eviction to cover highly unpredictable spike due to market PROD activities • Alert LOB when Amazon Kinesis Data Streams are provisioned with high 31 number of shards CDaaS cost Lambda – Stop RDS DB instance
def handle_rds(self, service, cw_service): """ Stop RDS instances if no connection for 4 days """ # iterate through RDS instances for rds_instance in service.describe_schedulable_instance(): #check if the RDS instance is used at all for last 4 days ... if len(dayswoconnection) >= 4 \ and rds_instance['InstanceCreateTime'] < ( end_of_nonusage_window - timedelta(days=4) ) service.stop_instance(rds_instance) instances_effected.append(rds_instance['DBInstanceIdentifier']) ... CDaaS cost Lambda – Alert LOB for low CPU usage
def lambda_handler(event, context): """ Alert LOB when average CPU Utilization is lower than 10% for last 2 weeks """ ... #iterate through RDS instances for item in metrics_info['MetricsList']: my_metrics_info = {} response = get_rds_metric( cloudwatch_client, my_metrics_info ) avg = process_datapoints( response, my_metrics_info ) metric_name = my_metrics_info['MetricName'] 33 if metric_name == "CPUUtilization" and avg < 10 and (MyInstanceClass != "db.t2.small"): send_email_message(subject, message, support_email, mail_server) ... CDaaS: Rogue database detection
CDaaS Line of business IT
AWS Cloud
VPC
Lambda for Aurora Kinesis Data compliance Streams
SQL CloudWatch Amazon RDS Amazon DynamoDB database © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Program objectives
Read replica for Cloud native microservices directly to DBaaS Initial loads
1–2 seconds Bring data closer Near-real-time latency DevOps to compute and CDC-based our clients replication Reduce daytime MIPS Modernize file RTO = 2-3 Modernize data mins. transfer lake hydration Amazon S3 RPO = 0 Mainframe volumes DRaaS: Infrastructure automation
AWS Cloud Availability Zone
Import/ export
Amazon S3 Amazon EC2 Auto Scaling
AWS Service Catalog AWS CloudFormation Deployments
Attunity Replicate CloudWatch AWS Secrets Manager AWS Directory Service Stack
Amazon FSx Monitoring Alarm DRaaS: Availability infrastructure
AWS Cloud Availability Zone Availability Zone CloudWatch VPC
Amazon EC2 Amazon EC2 Event Auto Scaling (event- Auto Scaling based)
Attunity Lambda Attunity primary warm standby
Amazon FSx Amazon FSx DRaaS: Custom automated task promotion
Corporate AWS Cloud data center VPC
Attunity replicate UI CloudWatch
Attunity Replicate (Windows) DRaaS utilities Windows scheduler 1) Pull task files from Amazon S3 Amazon SNS Email notification 2) Resolve configurations 3) Import tasks into Attunity Replicate 4) Email process summary Bitbucket
Build
Amazon S3 Bamboo Deploy © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. DRaaS: Move data to AWS patterns guide
Access pattern-driven architecture
Preserve database Save on MIPS for Decrease time to Simply and reduce Speed of reads
business logic read update data lake coding
goals Strategic Strategic
DB2 to microservices DB2 to cloud relational DB2 to cloud NoSQL DB2 to data lake Files to data lake and data lake
Attunity to Postgres Attunity to Kinesis to Storage Gateway Fanout Attunity to Amazon S3 Amazon RDS/Aurora DynamoDB file gateway
MSSQL to cloud relational
DMS to PostgreSQL Amazon RDS/Aurora
Oracle to relational Architecture Architecture patterns DMS to PostgreSQL Amazon RDS/Aurora DRaaS: On-prem RDBMS → Amazon RDS
Corporate AWS Cloud Data Center
VPC
Database per service
Oracle database AWS DMS Aurora Web service PostgreSQL
Database per service
MSSQL Mainframe Aurora Web service RDBMS Attunity PostgreSQL Replicate DRaaS: On-prem RDBMS → DynamoDB
Corporate AWS Cloud Data Center
VPC
Denormalize to Database NoSQL per service
Mainframe RDBMS Attunity Kinesis Lambda DynamoDB Replicate
Mainframe Web service DRaaS: Transaction integrity for Kinesis
AWS Cloud
VPC
Putrecords "Records": [batch of Transaction 500] Kinesis Shard consistency restored
Attunity { Lambda Kinesis Record sequence DynamoDB Replicate "FailedRecordCount“ : 2, "Records": [] may change Operational table }
• Save incomplete UOWs to staging table • Retrieve and sort with next batch
DynamoDB Staging table A closer look at a CDC Kinesis message { "magic": "atMSG", "type": "DT", "headers": null, "messageSchemaId": null, "messageSchema": null, Kinesis "message": { "data": { Table record "CLIENT_ID": "123456789", "SYS_BGN_TS": "2019-11-28 13:41:35.318646949", "SYS_END_TS": "9999-12-30 00:00:00.000000000", "TXN_STRT_TS": "2019-08-28 13:41:35.318646949", "TABLENAME": "TT_CLIENT_TABLE", }, "beforeData": null, Transaction log "headers": { "operation": "INSERT", "changeSequence": "20190828174140330000000000000000009", "timestamp": "2019-11-28T17:41:40.325", "streamPosition": "84;637026109062073520A|00D6A412F3F287F2C6000001", "transactionId": "00000000FC52D5F25355000000000400", "transactionEventCounter": 3, Unit of work "transactionLastEvent": true } } DRaaS: Kinesis enhanced fanout
Corporate AWS Cloud Data Center VPC
Kinesis Lambda DynamoDB
Mainframe RDBMS Attunity Kinesis Amazon Kinesis Replicate Data Firehose Data Firehose SL Database Mainframe
Aurora PostgreSQL DRaaS: On-prem RDBMS → Amazon S3 for analytics – Amazon EMR Corporate AWS Cloud Data Center
VPC Amazon S3 Staging Area
SQL AWS DMS database
Data lake Amazon EMR
Amazon Mainframe S3 staging RDBMS Attunity area Replicate Amazon S3 Data validation for data quality
• Spanning separate storage rules out physical data corruption detection • Logical data checks means comparing data values across architecture process components that move or change the data • Complements the replication alerting and notifications, which are triggered only by fatal errors • Level 1 – Validate at the replication level using CDC metrics by comparing DB2 CRUD stats with number of Kinesis messages • Level 2 – Data value comparison across source and target DRaaS: Data validation
Corporate AWS Cloud Data Center
VPC
Metrics table Real-time CDC stats
Mainframe RDBMS Attunity Kinesis Lambda DynamoDB Replicate
Compare process
Mainframe Lambda Notifications © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. FTaaS: File transfer as a service using Storage Gateway and DataSync
Corporate AWS Cloud Data Center
VPC
Analytics users Storage Gateway
SMB files
Send and retrieve encrypted in transit Storage Gateway Amazon EMR Data lake
NFS files
DataSync Web service © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. CDaaS adoption DBaaS adoption DRaaS adoption © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lessons learned
• Innovation cycle is very accelerated at AWS • Cost management is really important to get right • Vendor roadmaps can be influenced • Ensure security management is aligned with your roadmap • Need for upskilling for cloud is imperative • Fully managed databases turned out to be a fraction of headcount • Mainframe MIPS for replication have to be managed Future
Serverless, serverless, serverless—in that order Buffered write architecture Data virtualization – Avoid always moving data DBaaS Integrate DBaaS with AWS Service Catalog and ServiceNow Test data management for cloud databases Integrated build stack Multi-Region DRaaS App
Amazon Redshift Neptune Amazon QLDB Amazon DocumentDB Learn databases with AWS Training and Certification Resources created by the experts at AWS to help you build and validate database skills
25+ free digital training courses cover topics and services related to databases, including: • Amazon Aurora • Amazon Neptune • Amazon DocumentDB • Amazon DynamoDB • Amazon ElastiCache • Amazon Redshift • Amazon RDS Validate expertise with the new AWS Certified Database - Specialty beta exam
Visit aws.training
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.