<<

DAT326 DocumentDB deep dive

Joseph Idziorek Antra Grover Principal Product Manager Software Development Engineer Fulfillment By Amazon

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda

What is the purpose of a document ?

What customer problems does Amazon DocumentDB (with MongoDB compatibility) solve and how?

Customer use case and learnings: Fulfillment by Amazon

What did we deliver for customers this year?

What’s next? © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Purpose-built

Relational Key value Document In-memory Graph Search Time series Ledger Why document databases?

Denormalized data Normalized data model model { 'name': 'Bat City Gelato', 'price': '$', 'rating': 5.0, 'review_count': 46, 'categories': ['gelato', 'ice cream'], 'location': { 'address': '6301 W Parmer Ln', 'city': 'Austin', 'country': 'US', 'state': 'TX', 'zip_code': '78729'} } Why document databases?

GET https://api.yelp.com/v3/businesses/{id}

{ 'name': 'Bat City Gelato', 'price': '$', 'rating': 5.0, 'review_count': 46, 'categories': ['gelato', 'ice cream'], 'location': { 'address': '6301 W Parmer Ln', 'city': 'Austin', 'country': 'US', 'state': 'TX', 'zip_code': '78729'} } Why document databases?

response = yelp_api.search_query(term='ice cream', location='austin, tx', sort_by='rating', limit=5) Why document databases?

for i in response['businesses']: col.insert_one(i)

db.businesses.aggregate([ { $group: { _id: "$price", ratingAvg: { $avg: "$rating"}} } ])

db.businesses.find({ $and: [{"price" : "$"}, {"rating": { $gt: 4.5}}]})

db.businesses.createIndex( { review_count: -1 } ) Why document databases?

JSON all the way When you should use a document database?

Flexible JSON data Ad hoc query Flexible Operational and schema for capabilities indexing analytics fast iteration workloads When are other databases more appropriate?

Database to Known, static Large binary Ultralow Highly Log analytics, enforce access patterns data latency, connected full-text referential for primary key ephemeral data social searches integrity lookups dataset

101 010

Relational Key value In memory Graph Search Customer use cases

Product FinTech Content catalog management © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Customer challenges with document databases

Self-managing is hard Scaling is hard

Undifferentiated heavy lifting Timely, costly, complex Working backward

We start with the customer and we work backward Amazon DocumentDB (with MongoDB compatibility)

Fully MongoDB managed Scalable compatible

Managed by AWS: Separation of compute Compatible with MongoDB no hardware provisioning; and storage enables 3.6; use the same SDKs, auto patching, quick setup, both layers to scale tools, and applications with secure, and automatic independently; scale out Amazon DocumentDB backups to 15 read replicas in minutes

Cloud-native database architecture Fully managed

Automatic failure Automatic Integrated with Continuous recover and failover patching AWS services backup

Replicas are Up to date with the Amazon CloudWatch, Enabled by default, up automatically latest patches AWS CloudTrail, AWS to 35 days of PITR promoted to primary CloudFormation, AWS Secrets Manager, Amazon VPC, IAM, AWS CLI

“Our engineering teams now spend less time on operations like backup scripts, scale testing, and managing high availability and instead are able to focus on developing new capabilities for our customers.” Fully managed: Safe defaults

Encryption at rest Amazon VPC only TLS by default Compliant by default

Highly available and Backup enabled Authentication by Deletion protection durable by default by default default Scaling your database

You are here Scalable

Scale up Scale out Autoscaling Load balancing in minutes in minutes storage

Scale to 15 read replicas, Scale from 16 to 768 Storage automatically Scale reads across replicas millions of reads GiB or RAM grow from 10 GB to 64 TB

"Adopting Amazon DocumentDB is a game-changer . . . with Amazon DocumentDB, we can add or scale instances in minutes, regardless of data size.” Amazon DocumentDB

Q: How? Cloud-native database architecture Challenges with traditional database architectures

Application

Scale Fail API parsing monolithically monolithically Query processor

Buffer cache

Logging/replication Storage

Storage

Storage Storage Storage Cloud-native database architecture

API Scale compute Compute layer Query processor

Caching

Logging/replication

1 Decouple computeStorage and storage

Scale storage Storage layer Cloud-native database architecture

Logging/replication

Storage

Distributed storage volume

Storage

AZ1 AZ2 AZ3 2 Replicate 10-GB data segments 6× across 3 AZs DocumentDB Architecture

Instance Instance Instance (replica) (primary) (replica)

Compute Writes Reads Reads Reads

Distributed storage volume

Storage

AZ1 AZ2 AZ3 Flexible

Instances: 321

Environment: productiondev/test Instance Instance Instance (replica) (primary) (replica) Availability goal: 99.9999.9%99% %

Durability: highly durable Writes Reads Reads Reads

Distributed storage volume

AZ1 AZ2 AZ3 Scaling reads replicaSet=rs0 readPreference=secondaryPreferred

Instance Instance Instance (replica) (primary) (replica)

Compute Writes Reads Reads Reads

Distributed storage volume

Storage

AZ1 AZ2 AZ3 Scaling reads replicaSet=rs0 readPreference=secondaryPreferred

Instance Instance Instance Instance (replica) (primary) (replica) (replica)

Compute Writes ~8–10 minutes Reads Reads Reads

Distributed storage volume

Storage

AZ1 AZ2 AZ3 Failure recovery

~30 seconds

InstanceInstance Instance Instance Instance (primary)(Replica) (primary) (replica) (replica)

Compute Writes Writes ~8–10 minutes Reads Reads Reads

Distributed storage volume

Storage

AZ1 AZ2 AZ3 Scale-up for analytics Instance (replica)

Instance Instance instance (replica) (primary) (replica)

Compute Writes Reads Reads Reads

Distributed storage volume

Storage

AZ1 AZ2 AZ3 Scaling storage

Instance Instance Instance (replica) (primary) (replica)

Compute Writes Reads Reads Reads

DistributedDistributed storagestorage volume

Storage Storage

AZ1 AZ2 AZ3

AZ1 AZ2 AZ3 Grows automatically from 10 GB to 64 TB Why not sharding?

Sharding motivator DocumentDB approach

Scale reads Add up to 15 read replicas

Scale writes Scale vertically with very low impact

Scale storage Storage scales automatically to 64 TB © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Spot the difference! Spot the difference! “Third-party sales have grown from 3% of the total to 58%. To put it bluntly:

Third-party sellers are kicking our first party butt. Badly.”

https://blog.aboutamazon.com/company-news/2018-letter-to-shareholders FBA overview 1

6 2

Customer success

5 3

4 Inventory Authority Platform (IAP)

Real-time serverless stream processing

Multitude of use cases and growing

Multiple data vending channels including real-time queries Challenges with previous data store

Many transactions per second

Large datasets

Heavy resource and compute

High costs

High operations

Different and complex nature of client requirements Why we chose DocumentDB

{ }

MongoDB API Flexible schema Separation of Client isolation storage and compute

Instance isolation Seamless AWS Exhaustive Availability and Backups and integration monitoring reliability PITR Why we chose DocumentDB

{ }

MongoDB API Flexible schema Scalability Separation of Client isolation storage and compute

Instance isolation Seamless AWS Exhaustive Availability and Backups and integration monitoring reliability PITR Why we chose DocumentDB

{ }

MongoDB API Flexible schema Scalability Separation of Client isolation storage and compute

Instance isolation Seamless AWS Exhaustive Availability and Backups and integration monitoring reliability PITR Why we chose DocumentDB

{ }

MongoDB API Flexible schema Scalability Separation of Client isolation storage and compute

Instance isolation Seamless AWS Exhaustive Availability and Backups and integration monitoring reliability PITR Architecture

Business logic

Ingestion layer Amazon Amazon AWS Amazon Client-facing (service ) Kinesis Kinesis Lambda DocumentDB APIs Results

More client onboarding avenues

Convenient peak load testing

Validation against upstream

Less operations Results Better resource utilization

96 hosts -> 2 Amazon DocumentDB instances Results Better resource Improved utilization performance

96 hosts -> 2 Amazon 66% improvement DocumentDB instances in average latency Results Better resource Improved Room to scale utilization performance

96 hosts -> 2 Amazon 66% improvement 19% CPU utilization DocumentDB instances in average latency Results Better resource Improved Room to scale Cost savings utilization performance

96 hosts -> 2 Amazon 66% improvement 19% CPU utilization 45% cost savings DocumentDB instances in average latency Lessons learned: Scaling writes

Dedupe fast- Split your writes across Archive cold data moving update collections

Evaluate updates vs Handling conditional replace updates (i.e., upserts) Lessons learned: Scaling reads

Connect as a replica Working with long- sets rather than reader Close cursors after running operations endpoints use Lessons learned: Defining indexes

Choosing instance type Single rather than Define indexes before for reader compound indexes enabling writes

Background rather Evaluate the trade- than foreground offs index creation Best practices

Restrict access Test performance Segregate code

Automate Amazon S3 with monitoring via Use the “start/stop” mongodump CloudWatch feature to save money Looking forward

Change streams

Aggregation pipelines

Automated alarming on query profiling © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Working backward

2 Jul 4 Apr 8 May 17 May 5 Jun 9 Jan 12 Feb 28 Feb 13 Mar 14 Mar 15 Mar

DDL 14 Frankfurt DataDog Secrets 17 Tokyo R5 all Sydney Start/stop Launch auditing operators support Manager operators Seoul regions deletion protection

3 Jul 19 Jul 1 Aug 19 Aug 14 Oct 16 Oct 23 Oct 31 Oct

Deletion 7 London 3 Change Paris protection operators/ Slow query Singapore operators streams APIs logger + $lookup . . . with more to come in 2019 Highlights

Change Slow query +40 operators, Cost savings Regions streams logger APIs

Get more value of your Improve Increase MongoDB R5 instances, up to 12 new regions data, utilize purpose- operational insights compatibility 100% better launch built databases performance for the same cost © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Federated Query for Amazon Athena (Preview) Run SQL queries on data spanning multiple data stores

Run SQL queries on relational, nonrelational, object, or custom data sources; in the cloud or on premises

Open-source connectors for common data sources S3/Glacier

Amazon Redshift Build connectors to custom data sources Data warehousing

ElastiCache Redis Run connectors in AWS Lambda: no servers to manage

Amazon Aurora MySQL, PostgreSQL

Amazon DocumentDB Document

Amazon DynamoDB Key value © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Getting started

https://docs.aws.amazon.com/documentdb/latest/developerguide/getting-started.html Migration guide Learn how to migrate to Amazon DocumentDB

Offline migration using common utilities Get started quickly, great for POCs You can use AWS Database Migration Online migration using AWS DMS Service (DMS) free (for 6 Near-zero downtime migration months) to easily migrate Hybrid migration leverages both solutions to DocumentDB Best of both worlds

Sources for both relational and document databases

All options support migrations from on-premises and EC2, for both replica sets and sharded clusters https://docs.aws.amazon.com/documentdb/latest/developerguide/docdb-migration.html What’s on the roadmap?

We start with the customer and we work backward Related breakouts

DAT326 – Amazon DocumentDB Deep Dive

DAT370 - Best practices for Amazon DocumentDB DAT372 - Migrating your databases to Amazon DocumentDB DAT338-R - Hands-on workshop: How to migrate to Amazon DocumentDB DAT338-R1 - [REPEAT 1] Hands-on workshop: How to migrate to Amazon DocumentDB STP05 – Building the factory of the future today with robotics & ML STP14 – Cloud, blockchain, and the resource-optimized future Learn databases with AWS Training and Certification Resources created by the experts at AWS to help you build and validate database skills

25+ free digital training courses cover topics and services related to databases, including: • • Amazon ElastiCache • • Amazon DocumentDB • Amazon RDS • Amazon DynamoDB

In beta now Validate expertise with the new AWS Certified Database - Specialty beta exam

Visit aws.training

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!

Joseph Idziorek Antra Grover @josephidziorek [email protected]

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.