D A T 3 7 3 Data platform engineering: How Vanguard is migrating data to AWS

Rafael Suguiura Donovan Stockton Solutions Architect Platform Owner, Data Vanguard

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migrating data to AWS – Lift and shift

Corporate AWS Cloud

SQL SQL database database

SQL SQL database database

File share File share

Tape storage Tape storage Migrating data to AWS

Corporate AWS Cloud data center

SQL database

SQL database Modern data platform

File share

Tape storage © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Common use cases

Relational Key value Document In memory Graph Time series Ledger

Referential High Store Query by key Quickly and Collect, store, Complete, integrity, throughput, documents with easily create and process immutable, and ACID low latency and quickly microsecond and navigate data verifiable transactions, reads access latency relationships sequenced by history of all schema- and writes, querying on between time changes to on-write endless scale any attribute data application data

Lift and shift, Real-time Content Leaderboards, Fraud detection, IoT applications, Systems ERP, CRM, bidding, management, real-time social event tracking of record, supply finance shopping cart, personalization, analytics, caching networking, chain, health social, product mobile recommendation care, catalog, engine registrations, customer financial preferences © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Purpose-built databases

Relational Key value Document In memory Graph Time series Ledger

Amazon RDS Amazon Amazon Amazon Amazon Amazon DynamoDB ElastiCache Neptune Timestream QLDB

Amazon Amazon Amazon Aurora DocumentDB Managed Blockchain Step 2/3

Secure, highly scalable, durable object storage with millisecond latency for data access Store any type of data Websites, mobile apps, corporate applications, and IoT sensors, at any scale Store data in the format you want Unstructured (logs, dump files), semi-structured (JSON, XML), structured (CSV, Parquet) Storage lifecycle integration Amazon S3 Standard, Amazon S3 Standard-Infrequent Access, Amazon S3 Glacier Step 2/3 Amazon Athena

Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Supports multiple data formats—define schema on demand © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Step 3/3 AWS data transfer and migration portfolio

Online Offline Hybrid data transfer data transfer storage

Load Accelerated AWS Transfer Ship static Storage and Access AWS streaming transfer of for SFTP data into and compute in storage from data into AWS active data transfers into out of disconnected on-premises Amazon S3 Amazon S3 environments

Amazon Kinesis AWS AWS AWS Snowball Storage Data Streams DataSync Transfer Snowball Edge Gateway for SFTP

Amazon Kinesis AWS Data Firehose Snowmobile Recap

Step 1: Identify your data requirements

Step 2: Map data requirements to AWS services

Step 3: Select the data transfer tools and methods Sample migration

Corporate AWS Cloud data center

Aurora AWS Fargate Customer records

AWS DMS

Customer records DynamoDB Lambda Inventory, purchases Inventory Purchases

DataSync Access logs Amazon S3 Athena Access logs Documentation (PDF, images)

Snowball Amazon S3 Documentation (PDF, images) © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Vanguard – Background One of the world’s largest investment Began companies operations May 1, 1975, in Valley Forge, PA Wall ST

Multiple lines of business: Retail, Institutional Session overview

• Rationale for migration to AWS Cloud • Formation of a data engineering team • Define data platform engineering at Vanguard • Our platform • Solving high volume and performance replication • Lessons learned • Future platform direction Rationale to migrate to public cloud

Factors On premises Cloud

Database Monolithic Microservice

Database Shared database dependencies Bounded context, loose coupling

Database Limited database choices Fit for purpose

Core competency Vanguard = investments AWS = Cloud

Availability Limited Unlimited

Infrastructure Fixed capacity Elastic

Service Separate support teams DevOps

Cost CapEx Pay by the drink Program objectives

Read replica for Cloud native microservices directly to DBaaS

Bring data closer Near-real-time DevOps to compute and CDC-based our clients replication

Modernize file Modernize data transfer lake hydration Amazon S3 Three main pillars of cloud data as a service

Data platform engineering is known at Vanguard as “cloud data as a service”

Part of the chief technology office, which is responsible for overall cloud architecture and platform enablement

Service name Function Database as a service (DBaaS) Persist the data

Data replication as a service (DRaaS) Replicate using change data capture (CDC) Upload files easily and at scale to File transfer as a service (FTaaS) Amazon S3 Building the data engineering team

Educate SRE LOBs Continuous Innovation learning

Architect/ Java/Python developer

CDaaS CAP AWS Cloud theorem/ dev expertise trade-offs staff Guiding principles

Database Freedom Cloud DBA and Fully Managed > (move off DevOps managed > DIY franchise DBs)

Fit-for-purpose Automation over Engineered for databases clicks LOBs

Data infrastructure is not data management CDaaS platform services

Build

Vendor Functional management gaps

Tool Service selection enablement

Architecture patterns Audit

Alert Cost management © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Getting databases the on-premises way (14–28 days) DBaaS: Amazon RDS/Aurora (4–5 minutes)

Corporate AWS Cloud Data Center

VPC

AWS CloudFormation template

Bitbucket DBaaS team (DBaaS repository) Deploy

Bamboo Amazon RDS

Config/parms AWS CloudFormation Bitbucket LOB team (SI Amazon RDS repository) DBaaS Amazon RDS exemplar configuration

[ { "name": "stack", "template": "rds-service/templates/rds_dbaas_postgres.json", "stackPrefix": "exemplar-rds-", "copyTemplatesToS3": "Y", "deleteTemplatesFromS3": "Y", "configBucket": "s3://application-payload-${AWS_ACCOUNT}-${AWS_REGION}/exemplar- rds/${STACK_NAME}", "templateBucket": "s3://application-payload-${AWS_ACCOUNT}- ${AWS_REGION}/exemplar-rds/${STACK_NAME}/cf-templates", "files": [] } ] DBaaS: Alerting/notification infrastructure

AWS Cloud

Availability Availability Zone Zone

VPC

SQL Event database (event-based) Amazon Amazon RDS Amazon RDS CloudWatch master replica

Amazon SNS

Aurora Aurora master replica

DynamoDB Monitoring Alarm DBaaS: Database alarms

• Amazon RDS/Aurora • DynamoDB • CPU Usage >= 70% for 4 datapoints within 20 minutes • ThrottledRequests >= 4 for 12 datapoints within 60 minutes (warning) (warning) • CPU Usage >= 85% for 4 datapoints (critical) • ReadThrottleEvents >= 4 for 12 datapoints within 60 minutes (warning) • Database Connections >= 550 connections for 4 datapoints • WriteThrottleEvents >= 4 for 12 datapoints within 60 minutes within 20 minutes (warning) (warning) • Disk Queue Depth >= 255 for 4 datapoints (warning) • OnlineIndexThrottleEvent>= 4 for 12 datapoints within 60 • Read IOPs >= 1,000/second for 4 datapoints within 20 minutes (warning) minutes (warning) • Write IOPs >= 1,000/second for 4 datapoints within 20 minutes (warning) • Read Latency >= 10 seconds for 4 datapoints within 20 minutes (warning) • Write Latency >= 15 seconds for 4 datapoints within 20 minutes (warning) • Freeable Memory <= 30% for 4 datapoints within 20 minutes (warning) • Freeable Memory <= 15% for 4 datapoints within 20 minutes (critical) • Free Storage Space <= 50% or 4 datapoints within 20 minutes (warning) • Free Storage Space <= 30% or 4 datapoints within 20 minutes (critical) • ReplicaLag >= 350 milliseconds or 4 datapoints within 20 minutes (warning) CDaaS: Cost management architecture

CDaaS cost Line of business IT

AWS Cloud

VPC

Lambda for cost Aurora Amazon Kinesis Data DynamoDB Streams

CloudWatch Amazon RDS Amazon EC2 Storage Gateway for Attunity CDaaS cost strategy

• Current • Upcoming • Stop Amazon RDS when no DB • Downsize number of shards for Kinesis connections for four days Data Streams after initial data load • Alert LOB when CPU utilization is under • Minimize unnecessary number of shards 10% for Amazon RDS for Kinesis Data Streams based on activity • Downsize WCU and RCU to one when provisioned DynamoDB tables are not in • Downgrade instance class when Amazon use for two days RDS is overprovisioned • Convert provisioned to on-demand when • Extend quarantine and eviction to cover highly unpredictable spike due to market PROD activities • Alert LOB when Amazon Kinesis Data Streams are provisioned with high 31 number of shards CDaaS cost Lambda – Stop RDS DB instance

def handle_rds(self, service, cw_service): """ Stop RDS instances if no connection for 4 days """ # iterate through RDS instances for rds_instance in service.describe_schedulable_instance(): #check if the RDS instance is used at all for last 4 days ... if len(dayswoconnection) >= 4 \ and rds_instance['InstanceCreateTime'] < ( end_of_nonusage_window - timedelta(days=4) ) service.stop_instance(rds_instance) instances_effected.append(rds_instance['DBInstanceIdentifier']) ... CDaaS cost Lambda – Alert LOB for low CPU usage

def lambda_handler(event, context): """ Alert LOB when average CPU Utilization is lower than 10% for last 2 weeks """ ... #iterate through RDS instances for item in metrics_info['MetricsList']: my_metrics_info = {} response = get_rds_metric( cloudwatch_client, my_metrics_info ) avg = process_datapoints( response, my_metrics_info ) metric_name = my_metrics_info['MetricName'] 33 if metric_name == "CPUUtilization" and avg < 10 and (MyInstanceClass != "db.t2.small"): send_email_message(subject, message, support_email, mail_server) ... CDaaS: Rogue database detection

CDaaS Line of business IT

AWS Cloud

VPC

Lambda for Aurora Kinesis Data compliance Streams

SQL CloudWatch Amazon RDS Amazon DynamoDB database © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Program objectives

Read replica for Cloud native microservices directly to DBaaS Initial loads

1–2 seconds Bring data closer Near-real-time latency DevOps to compute and CDC-based our clients replication Reduce daytime MIPS Modernize file RTO = 2-3 Modernize data mins. transfer lake hydration Amazon S3 RPO = 0 Mainframe volumes DRaaS: Infrastructure automation

AWS Cloud Availability Zone

Import/ export

Amazon S3 Amazon EC2 Auto Scaling

AWS Service Catalog AWS CloudFormation Deployments

Attunity Replicate CloudWatch AWS Secrets Manager AWS Directory Service Stack

Amazon FSx Monitoring Alarm DRaaS: Availability infrastructure

AWS Cloud Availability Zone Availability Zone CloudWatch VPC

Amazon EC2 Amazon EC2 Event Auto Scaling (event- Auto Scaling based)

Attunity Lambda Attunity primary warm standby

Amazon FSx Amazon FSx DRaaS: Custom automated task promotion

Corporate AWS Cloud data center VPC

Attunity replicate UI CloudWatch

Attunity Replicate (Windows) DRaaS utilities Windows scheduler 1) Pull task files from Amazon S3 Amazon SNS Email notification 2) Resolve configurations 3) Import tasks into Attunity Replicate 4) Email process summary Bitbucket

Build

Amazon S3 Bamboo Deploy © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. DRaaS: Move data to AWS patterns guide

Access pattern-driven architecture

Preserve database Save on MIPS for Decrease time to Simply and reduce Speed of reads

business logic read update data lake coding

goals Strategic Strategic

DB2 to microservices DB2 to cloud relational DB2 to cloud NoSQL DB2 to data lake Files to data lake and data lake

Attunity to Postgres Attunity to Kinesis to Storage Gateway Fanout Attunity to Amazon S3 Amazon RDS/Aurora DynamoDB file gateway

MSSQL to cloud relational

DMS to PostgreSQL Amazon RDS/Aurora

Oracle to relational Architecture Architecture patterns DMS to PostgreSQL Amazon RDS/Aurora DRaaS: On-prem RDBMS → Amazon RDS

Corporate AWS Cloud Data Center

VPC

Database per service

Oracle database AWS DMS Aurora Web service PostgreSQL

Database per service

MSSQL Mainframe Aurora Web service RDBMS Attunity PostgreSQL Replicate DRaaS: On-prem RDBMS → DynamoDB

Corporate AWS Cloud Data Center

VPC

Denormalize to Database NoSQL per service

Mainframe RDBMS Attunity Kinesis Lambda DynamoDB Replicate

Mainframe Web service DRaaS: Transaction integrity for Kinesis

AWS Cloud

VPC

Putrecords "Records": [batch of Transaction 500] Kinesis Shard consistency restored

Attunity { Lambda Kinesis Record sequence DynamoDB Replicate "FailedRecordCount“ : 2, "Records": [] may change Operational table }

• Save incomplete UOWs to staging table • Retrieve and sort with next batch

DynamoDB Staging table A closer look at a CDC Kinesis message { "magic": "atMSG", "type": "DT", "headers": null, "messageSchemaId": null, "messageSchema": null, Kinesis "message": { "data": { Table record "CLIENT_ID": "123456789", "SYS_BGN_TS": "2019-11-28 13:41:35.318646949", "SYS_END_TS": "9999-12-30 00:00:00.000000000", "TXN_STRT_TS": "2019-08-28 13:41:35.318646949", "TABLENAME": "TT_CLIENT_TABLE", }, "beforeData": null, Transaction log "headers": { "operation": "INSERT", "changeSequence": "20190828174140330000000000000000009", "timestamp": "2019-11-28T17:41:40.325", "streamPosition": "84;637026109062073520A|00D6A412F3F287F2C6000001", "transactionId": "00000000FC52D5F25355000000000400", "transactionEventCounter": 3, Unit of work "transactionLastEvent": true } } DRaaS: Kinesis enhanced fanout

Corporate AWS Cloud Data Center VPC

Kinesis Lambda DynamoDB

Mainframe RDBMS Attunity Kinesis Amazon Kinesis Replicate Data Firehose Data Firehose SL Database Mainframe

Aurora PostgreSQL DRaaS: On-prem RDBMS → Amazon S3 for analytics – Amazon EMR Corporate AWS Cloud Data Center

VPC Amazon S3 Staging Area

SQL AWS DMS database

Data lake Amazon EMR

Amazon Mainframe S3 staging RDBMS Attunity area Replicate Amazon S3 Data validation for data quality

• Spanning separate storage rules out physical data corruption detection • Logical data checks means comparing data values across architecture process components that move or change the data • Complements the replication alerting and notifications, which are triggered only by fatal errors • Level 1 – Validate at the replication level using CDC metrics by comparing DB2 CRUD stats with number of Kinesis messages • Level 2 – Data value comparison across source and target DRaaS: Data validation

Corporate AWS Cloud Data Center

VPC

Metrics table Real-time CDC stats

Mainframe RDBMS Attunity Kinesis Lambda DynamoDB Replicate

Compare process

Mainframe Lambda Notifications © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. FTaaS: File transfer as a service using Storage Gateway and DataSync

Corporate AWS Cloud Data Center

VPC

Analytics users Storage Gateway

SMB files

Send and retrieve encrypted in transit Storage Gateway Amazon EMR Data lake

NFS files

DataSync Web service © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. CDaaS adoption DBaaS adoption DRaaS adoption © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lessons learned

• Innovation cycle is very accelerated at AWS • Cost management is really important to get right • Vendor roadmaps can be influenced • Ensure security management is aligned with your roadmap • Need for upskilling for cloud is imperative • Fully managed databases turned out to be a fraction of headcount • Mainframe MIPS for replication have to be managed Future

Serverless, serverless, serverless—in that order Buffered write architecture Data virtualization – Avoid always moving data DBaaS Integrate DBaaS with AWS Service Catalog and ServiceNow Test data management for cloud databases Integrated build stack Multi-Region DRaaS App

Amazon Redshift Neptune Amazon QLDB Amazon DocumentDB Learn databases with AWS Training and Certification Resources created by the experts at AWS to help you build and validate database skills

25+ free digital training courses cover topics and services related to databases, including: • Amazon Aurora • • Amazon DocumentDB • Amazon DynamoDB • Amazon ElastiCache • Amazon Redshift • Amazon RDS Validate expertise with the new AWS Certified Database - Specialty beta exam

Visit aws.training

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.