A N T 3 3 3 How .com built a serverless data lake with AWS analytics

Theo Carpenter Karthik Kumar Odapally Sr. Systems Manager Sr. Solutions Architect Woot! Web Services

© 2019, , Inc. or its affiliates. All rights reserved. Agenda

Introduction

Problem statement

Woot’s solution

Lessons learned

Results Related breakouts

ARC310 Serverless data lake patterns for voice, vision, and ML ANT335 Build data analytics stacks with , featuring Warner Bros. ANT334 Migrate your data warehouse to the cloud in record time, featuring Nielsen ANT204 How Amazon leverages AWS to deliver analytics at enterprise scale AMZ304 Prime Video: Processing analytics at petabyte scale ARC214 Data lake DevOps on AWS © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is Woot!?

“One day, one deal” Problem statement

Add data in minutes Exponential growth User configurable Data now Tools and log-on Legacy solution

Single (DB) instance Shared resource Complex custom ingestion Difficult to use Separate DB users Learning curve Operational cost Multiple updates and patches Programmatic reporting Requirements

Any data, any source Separation of duties Data democratization © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. 10,000-ft view

Woot DW VPC Woot production VPC Woot corporate Third- party data Amazon Amazon Woot Web Services Redshift Athena

NAV services AWS Lambda Amazon RDS Amazon QuickSight AWS Glue

SSIS Amazon Kinesis AWS DMS Data Firehose AWS Lambda

Amazon data Amazon DynamoDB Amazon EMR warehouse (DW) AWS Fargate © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migrate existing data

Woot production VPC Woot corporate

Woot Web Services

NAV services AWS Lambda Amazon RDS

SSIS Amazon Kinesis AWS DMS Data Firehose

Amazon DynamoDB Amazon EMR Building data pipelines

Woot production VPC

Woot Web Services

AWS Lambda Amazon RDS

Amazon Kinesis AWS DMS Data Firehose

Amazon DynamoDB Amazon EMR Bringing it all together

Woot DW VPC

Third- party data Amazon Amazon Redshift Athena

Amazon QuickSight AWS Glue

AWS Lambda

Amazon DW AWS Fargate The Woot data lake solution

• Amazon Kinesis Data Firehose for data ingestion • Amazon Simple Storage Service (Amazon S3) for data storage • AWA Lambda and AWS Glue for data processing • AWS Database Migration Service (AWS DMS) and AWS Glue for data migration • AWS Glue for orchestration and metadata management • Amazon Athena and Amazon QuickSight for querying and for visualizing data • AWS Directory Service for user identity © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. GODS architecture

AWS connectors

Amazon Amazon S3 Amazon Redshift GODS service Cloudwatch

Amazon Amazon Athena Pandas data Amazon EMR Amazon EC2 Amazon Amazon Pandas data frame Lambda Fargate frame

AWS Secrets Amazon Manager QuickSight

GODS data service Amazon DocumentDB

AWS IAM AWS PrivateLink Job orchestration

# Get status for all jobs from Dynamo # Now that we have a valid event to handle, let's get the triggers response = table.scan(IndexName='JobStatusEndTime’, triggers = get_conditional_triggers(job_name)print("Jo b name: ", job_name) Select='ALL_PROJECTED_ATTRIBUTES')var xml_min = pd.xmlmin(data [,true]); # If the job isn't part of any other triggers, get out # Only want to process job successes var xml_min = pd.xmlmin(data [,true]); if event.get('detail-type') == 'Glue Job State Change': # Get status and last start time for all jobs in action if event.get('detail'): for i in triggers: job_name = event['detail']['jobName'] if event['detail'].get('state') == 'SUCCEEDED' # If not all jobs in predicate are else None successful, ignore trigger What’s next?

• AWS lake formation • Multiple environments • Configuration simplification • Transactional data • Incremental data loads • ETL and view simplification • More deal evaluation • Models • Historical © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lessons learned

Aggregation

Preserve raw data

Service limits

Data quality Pain points © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Wins

Magic features

Performance

AWS integrated

Ease of use

Flexibility Data points

60 TB vs. 12 TB

40 hours saved weekly

90% operating cost reduction

8 AWS accounts sharing data

~600 million rows

0 screaming Woot monkeys harmed #Woot #reinvent #AWS #rocks

Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Learn with AWS Training and Certification Resources created by the experts at AWS to help you build and validate data analytics skills

New free digital course, Data Analytics Fundamentals, introduces Amazon S3, Amazon Kinesis, Amazon EMR, AWS Glue, and Amazon Redshift

Classroom offerings, including Big Data on AWS, feature AWS expert instructors and hands-on labs

Validate expertise with the AWS Certified Big Data - Specialty exam or the new AWS Certified Data Analytics - Specialty beta exam

Visit aws.amazon.com/training/paths-specialty/

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!

Theo Carpenter Karthik Kumar Odapally [email protected] [email protected]

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.