DAT326 Amazon DocumentDB deep dive
Joseph Idziorek Antra Grover Principal Product Manager Software Development Engineer Amazon Web Services Fulfillment By Amazon
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda
What is the purpose of a document database?
What customer problems does Amazon DocumentDB (with MongoDB compatibility) solve and how?
Customer use case and learnings: Fulfillment by Amazon
What did we deliver for customers this year?
What’s next? © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Purpose-built databases
Relational Key value Document In-memory Graph Search Time series Ledger Why document databases?
Denormalized data Normalized data model model { 'name': 'Bat City Gelato', 'price': '$', 'rating': 5.0, 'review_count': 46, 'categories': ['gelato', 'ice cream'], 'location': { 'address': '6301 W Parmer Ln', 'city': 'Austin', 'country': 'US', 'state': 'TX', 'zip_code': '78729'} } Why document databases?
GET https://api.yelp.com/v3/businesses/{id}
{ 'name': 'Bat City Gelato', 'price': '$', 'rating': 5.0, 'review_count': 46, 'categories': ['gelato', 'ice cream'], 'location': { 'address': '6301 W Parmer Ln', 'city': 'Austin', 'country': 'US', 'state': 'TX', 'zip_code': '78729'} } Why document databases?
response = yelp_api.search_query(term='ice cream', location='austin, tx', sort_by='rating', limit=5) Why document databases?
for i in response['businesses']: col.insert_one(i)
db.businesses.aggregate([ { $group: { _id: "$price", ratingAvg: { $avg: "$rating"}} } ])
db.businesses.find({ $and: [{"price" : "$"}, {"rating": { $gt: 4.5}}]})
db.businesses.createIndex( { review_count: -1 } ) Why document databases?
JSON all the way When you should use a document database?
Flexible JSON data Ad hoc query Flexible Operational and schema for capabilities indexing analytics fast iteration workloads When are other databases more appropriate?
Database to Known, static Large binary Ultralow Highly Log analytics, enforce access patterns data latency, connected full-text referential for primary key ephemeral data social searches integrity lookups dataset
101 010
Relational Key value Amazon S3 In memory Graph Search Customer use cases
Product FinTech Content catalog management © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Customer challenges with document databases
Self-managing is hard Scaling is hard
Undifferentiated heavy lifting Timely, costly, complex Working backward
We start with the customer and we work backward Amazon DocumentDB (with MongoDB compatibility)
Fully MongoDB managed Scalable compatible
Managed by AWS: Separation of compute Compatible with MongoDB no hardware provisioning; and storage enables 3.6; use the same SDKs, auto patching, quick setup, both layers to scale tools, and applications with secure, and automatic independently; scale out Amazon DocumentDB backups to 15 read replicas in minutes
Cloud-native database architecture Fully managed
Automatic failure Automatic Integrated with Continuous recover and failover patching AWS services backup
Replicas are Up to date with the Amazon CloudWatch, Enabled by default, up automatically latest patches AWS CloudTrail, AWS to 35 days of PITR promoted to primary CloudFormation, AWS Secrets Manager, Amazon VPC, IAM, AWS CLI
“Our engineering teams now spend less time on operations like backup scripts, scale testing, and managing high availability and instead are able to focus on developing new capabilities for our customers.” Fully managed: Safe defaults
Encryption at rest Amazon VPC only TLS by default Compliant by default
Highly available and Backup enabled Authentication by Deletion protection durable by default by default default Scaling your database
You are here Scalable
Scale up Scale out Autoscaling Load balancing in minutes in minutes storage
Scale to 15 read replicas, Scale from 16 to 768 Storage automatically Scale reads across replicas millions of reads GiB or RAM grow from 10 GB to 64 TB
"Adopting Amazon DocumentDB is a game-changer . . . with Amazon DocumentDB, we can add or scale instances in minutes, regardless of data size.” Amazon DocumentDB
Q: How? Cloud-native database architecture Challenges with traditional database architectures
Application
Scale Fail API parsing monolithically monolithically Query processor
Buffer cache
Logging/replication Storage
Storage
Storage Storage Storage Cloud-native database architecture
API Scale compute Compute layer Query processor
Caching
Logging/replication
1 Decouple computeStorage and storage
Scale storage Storage layer Cloud-native database architecture
Logging/replication
Storage
Distributed storage volume
Storage
AZ1 AZ2 AZ3 2 Replicate 10-GB data segments 6× across 3 AZs DocumentDB Architecture
Instance Instance Instance (replica) (primary) (replica)
Compute Writes Reads Reads Reads
Distributed storage volume
Storage
AZ1 AZ2 AZ3 Flexible
Instances: 321
Environment: productiondev/test Instance Instance Instance (replica) (primary) (replica) Availability goal: 99.9999.9%99% %
Durability: highly durable Writes Reads Reads Reads
Distributed storage volume
AZ1 AZ2 AZ3 Scaling reads replicaSet=rs0 readPreference=secondaryPreferred
Instance Instance Instance (replica) (primary) (replica)
Compute Writes Reads Reads Reads
Distributed storage volume
Storage
AZ1 AZ2 AZ3 Scaling reads replicaSet=rs0 readPreference=secondaryPreferred
Instance Instance Instance Instance (replica) (primary) (replica) (replica)
Compute Writes ~8–10 minutes Reads Reads Reads
Distributed storage volume
Storage
AZ1 AZ2 AZ3 Failure recovery
~30 seconds
InstanceInstance Instance Instance Instance (primary)(Replica) (primary) (replica) (replica)
Compute Writes Writes ~8–10 minutes Reads Reads Reads
Distributed storage volume
Storage
AZ1 AZ2 AZ3 Scale-up for analytics Instance (replica)
Instance Instance instance (replica) (primary) (replica)
Compute Writes Reads Reads Reads
Distributed storage volume
Storage
AZ1 AZ2 AZ3 Scaling storage
Instance Instance Instance (replica) (primary) (replica)
Compute Writes Reads Reads Reads
DistributedDistributed storagestorage volume
Storage Storage
AZ1 AZ2 AZ3
AZ1 AZ2 AZ3 Grows automatically from 10 GB to 64 TB Why not sharding?
Sharding motivator DocumentDB approach
Scale reads Add up to 15 read replicas
Scale writes Scale vertically with very low impact
Scale storage Storage scales automatically to 64 TB © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Spot the difference! Spot the difference! “Third-party sales have grown from 3% of the total to 58%. To put it bluntly:
Third-party sellers are kicking our first party butt. Badly.”
https://blog.aboutamazon.com/company-news/2018-letter-to-shareholders FBA overview 1
6 2
Customer success
5 3
4 Inventory Authority Platform (IAP)
Real-time serverless stream processing
Multitude of use cases and growing
Multiple data vending channels including real-time queries Challenges with previous data store
Many transactions per second
Large datasets
Heavy resource and compute
High costs
High operations
Different and complex nature of client requirements Why we chose DocumentDB
{ }
MongoDB API Flexible schema Scalability Separation of Client isolation storage and compute
Instance isolation Seamless AWS Exhaustive Availability and Backups and integration monitoring reliability PITR Why we chose DocumentDB
{ }
MongoDB API Flexible schema Scalability Separation of Client isolation storage and compute
Instance isolation Seamless AWS Exhaustive Availability and Backups and integration monitoring reliability PITR Why we chose DocumentDB
{ }
MongoDB API Flexible schema Scalability Separation of Client isolation storage and compute
Instance isolation Seamless AWS Exhaustive Availability and Backups and integration monitoring reliability PITR Why we chose DocumentDB
{ }
MongoDB API Flexible schema Scalability Separation of Client isolation storage and compute
Instance isolation Seamless AWS Exhaustive Availability and Backups and integration monitoring reliability PITR Architecture
Business logic
Ingestion layer Amazon Amazon AWS Amazon Client-facing (service APIs) Kinesis Kinesis Lambda DocumentDB APIs Results
More client onboarding avenues
Convenient peak load testing
Validation against upstream
Less operations Results Better resource utilization
96 hosts -> 2 Amazon DocumentDB instances Results Better resource Improved utilization performance
96 hosts -> 2 Amazon 66% improvement DocumentDB instances in average latency Results Better resource Improved Room to scale utilization performance
96 hosts -> 2 Amazon 66% improvement 19% CPU utilization DocumentDB instances in average latency Results Better resource Improved Room to scale Cost savings utilization performance
96 hosts -> 2 Amazon 66% improvement 19% CPU utilization 45% cost savings DocumentDB instances in average latency Lessons learned: Scaling writes
Dedupe fast- Split your writes across Archive cold data moving update collections
Evaluate updates vs Handling conditional replace updates (i.e., upserts) Lessons learned: Scaling reads
Connect as a replica Working with long- sets rather than reader Close cursors after running operations endpoints use Lessons learned: Defining indexes
Choosing instance type Single rather than Define indexes before for reader compound indexes enabling writes
Background rather Evaluate the trade- than foreground offs index creation Best practices
Restrict access Test performance Segregate code
Automate Amazon S3 with monitoring via Use the “start/stop” mongodump CloudWatch feature to save money Looking forward
Change streams
Aggregation pipelines
Automated alarming on query profiling © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Working backward
2 Jul 4 Apr 8 May 17 May 5 Jun 9 Jan 12 Feb 28 Feb 13 Mar 14 Mar 15 Mar
DDL 14 Frankfurt DataDog Secrets 17 Tokyo R5 all Sydney Start/stop Launch auditing operators support Manager operators Seoul regions deletion protection
3 Jul 19 Jul 1 Aug 19 Aug 14 Oct 16 Oct 23 Oct 31 Oct
Deletion 7 London 3 Change Paris protection operators/ Slow query Singapore operators streams APIs logger + $lookup . . . with more to come in 2019 Highlights
Change Slow query +40 operators, Cost savings Regions streams logger APIs
Get more value of your Improve Increase MongoDB R5 instances, up to 12 new regions data, utilize purpose- operational insights compatibility 100% better launch built databases performance for the same cost © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Federated Query for Amazon Athena (Preview) Run SQL queries on data spanning multiple data stores
Run SQL queries on relational, nonrelational, object, or custom data sources; in the cloud or on premises
Open-source connectors for common data sources S3/Glacier
Amazon Redshift Build connectors to custom data sources Data warehousing
ElastiCache Redis Run connectors in AWS Lambda: no servers to manage
Amazon Aurora MySQL, PostgreSQL
Amazon DocumentDB Document
Amazon DynamoDB Key value © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Getting started
https://docs.aws.amazon.com/documentdb/latest/developerguide/getting-started.html Migration guide Learn how to migrate to Amazon DocumentDB
Offline migration using common utilities Get started quickly, great for POCs You can use AWS Database Migration Online migration using AWS DMS Service (DMS) free (for 6 Near-zero downtime migration months) to easily migrate Hybrid migration leverages both solutions to DocumentDB Best of both worlds
Sources for both relational and document databases
All options support migrations from on-premises and EC2, for both replica sets and sharded clusters https://docs.aws.amazon.com/documentdb/latest/developerguide/docdb-migration.html What’s on the roadmap?
We start with the customer and we work backward Related breakouts
DAT326 – Amazon DocumentDB Deep Dive
DAT370 - Best practices for Amazon DocumentDB DAT372 - Migrating your databases to Amazon DocumentDB DAT338-R - Hands-on workshop: How to migrate to Amazon DocumentDB DAT338-R1 - [REPEAT 1] Hands-on workshop: How to migrate to Amazon DocumentDB STP05 – Building the factory of the future today with robotics & ML STP14 – Cloud, blockchain, and the resource-optimized future Learn databases with AWS Training and Certification Resources created by the experts at AWS to help you build and validate database skills
25+ free digital training courses cover topics and services related to databases, including: • Amazon Aurora • Amazon ElastiCache • Amazon Neptune • Amazon Redshift • Amazon DocumentDB • Amazon RDS • Amazon DynamoDB
In beta now Validate expertise with the new AWS Certified Database - Specialty beta exam
Visit aws.training
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!
Joseph Idziorek Antra Grover @josephidziorek [email protected]
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.