A N T 3 3 3 How Woot.com built a serverless data lake with AWS analytics
Theo Carpenter Karthik Kumar Odapally Sr. Systems Manager Sr. Solutions Architect Woot! Amazon Web Services
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda
Introduction
Problem statement
Woot’s solution
Lessons learned
Results Related breakouts
ARC310 Serverless data lake patterns for voice, vision, and ML ANT335 Build data analytics stacks with Amazon Redshift, featuring Warner Bros. ANT334 Migrate your data warehouse to the cloud in record time, featuring Nielsen ANT204 How Amazon leverages AWS to deliver analytics at enterprise scale AMZ304 Prime Video: Processing analytics at petabyte scale ARC214 Data lake DevOps on AWS © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is Woot!?
“One day, one deal” Problem statement
Add data in minutes Exponential growth User configurable Data now Tools and log-on Legacy solution
Single database (DB) instance Shared resource Complex custom ingestion Difficult to use Separate DB users Learning curve Operational cost Multiple updates and patches Programmatic reporting Requirements
Any data, any source Separation of duties Data democratization © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. 10,000-ft view
Woot DW VPC Woot production VPC Woot corporate Third- party data Amazon Amazon Woot Web Services Amazon S3 Redshift Athena
NAV services AWS Lambda Amazon RDS Amazon QuickSight AWS Glue
SSIS Amazon Kinesis AWS DMS Data Firehose AWS Lambda
Amazon data Amazon DynamoDB Amazon EMR warehouse (DW) AWS Fargate © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migrate existing data
Woot production VPC Woot corporate
Woot Web Services
NAV services AWS Lambda Amazon RDS
SSIS Amazon Kinesis AWS DMS Data Firehose
Amazon DynamoDB Amazon EMR Building data pipelines
Woot production VPC
Woot Web Services
AWS Lambda Amazon RDS
Amazon Kinesis AWS DMS Data Firehose
Amazon DynamoDB Amazon EMR Bringing it all together
Woot DW VPC
Third- party data Amazon Amazon Redshift Athena
Amazon QuickSight AWS Glue
AWS Lambda
Amazon DW AWS Fargate The Woot data lake solution
• Amazon Kinesis Data Firehose for data ingestion • Amazon Simple Storage Service (Amazon S3) for data storage • AWA Lambda and AWS Glue for data processing • AWS Database Migration Service (AWS DMS) and AWS Glue for data migration • AWS Glue for orchestration and metadata management • Amazon Athena and Amazon QuickSight for querying and for visualizing data • AWS Directory Service for user identity © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. GODS architecture
AWS connectors
Amazon Amazon S3 Amazon Redshift GODS service Cloudwatch
Amazon Dynamo Amazon Athena Pandas data Amazon EMR Amazon EC2 Amazon Amazon Pandas data frame Lambda Fargate frame
AWS Secrets Amazon Manager QuickSight
GODS data service Amazon DocumentDB
AWS IAM AWS PrivateLink Job orchestration
# Get status for all jobs from Dynamo # Now that we have a valid event to handle, let's get the triggers response = table.scan(IndexName='JobStatusEndTime’, triggers = get_conditional_triggers(job_name)print("Jo b name: ", job_name) Select='ALL_PROJECTED_ATTRIBUTES')var xml_min = pd.xmlmin(data [,true]); # If the job isn't part of any other triggers, get out # Only want to process job successes var xml_min = pd.xmlmin(data [,true]); if event.get('detail-type') == 'Glue Job State Change': # Get status and last start time for all jobs in action if event.get('detail'): for i in triggers: job_name = event['detail']['jobName'] if event['detail'].get('state') == 'SUCCEEDED' # If not all jobs in predicate are else None successful, ignore trigger What’s next?
• AWS lake formation • Multiple environments • Configuration simplification • Transactional data • Incremental data loads • ETL and view simplification • More deal evaluation • Models • Historical © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lessons learned
Aggregation
Preserve raw data
Service limits
Data quality Pain points © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Wins
Magic features
Performance
AWS integrated
Ease of use
Flexibility Data points
60 TB vs. 12 TB
40 hours saved weekly
90% operating cost reduction
8 AWS accounts sharing data
~600 million rows
0 screaming Woot monkeys harmed #Woot #reinvent #AWS #rocks
Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Learn big data with AWS Training and Certification Resources created by the experts at AWS to help you build and validate data analytics skills
New free digital course, Data Analytics Fundamentals, introduces Amazon S3, Amazon Kinesis, Amazon EMR, AWS Glue, and Amazon Redshift
Classroom offerings, including Big Data on AWS, feature AWS expert instructors and hands-on labs
Validate expertise with the AWS Certified Big Data - Specialty exam or the new AWS Certified Data Analytics - Specialty beta exam
Visit aws.amazon.com/training/paths-specialty/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!
Theo Carpenter Karthik Kumar Odapally [email protected] [email protected]
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.