Amazon Documentdb Deep Dive
Total Page:16
File Type:pdf, Size:1020Kb
DAT326 Amazon DocumentDB deep dive Joseph Idziorek Antra Grover Principal Product Manager Software Development Engineer Amazon Web Services Fulfillment By Amazon © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda What is the purpose of a document database? What customer problems does Amazon DocumentDB (with MongoDB compatibility) solve and how? Customer use case and learnings: Fulfillment by Amazon What did we deliver for customers this year? What’s next? © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Purpose-built databases Relational Key value Document In-memory Graph Search Time series Ledger Why document databases? Denormalized data Normalized data model model { 'name': 'Bat City Gelato', 'price': '$', 'rating': 5.0, 'review_count': 46, 'categories': ['gelato', 'ice cream'], 'location': { 'address': '6301 W Parmer Ln', 'city': 'Austin', 'country': 'US', 'state': 'TX', 'zip_code': '78729'} } Why document databases? GET https://api.yelp.com/v3/businesses/{id} { 'name': 'Bat City Gelato', 'price': '$', 'rating': 5.0, 'review_count': 46, 'categories': ['gelato', 'ice cream'], 'location': { 'address': '6301 W Parmer Ln', 'city': 'Austin', 'country': 'US', 'state': 'TX', 'zip_code': '78729'} } Why document databases? response = yelp_api.search_query(term='ice cream', location='austin, tx', sort_by='rating', limit=5) Why document databases? for i in response['businesses']: col.insert_one(i) db.businesses.aggregate([ { $group: { _id: "$price", ratingAvg: { $avg: "$rating"}} } ]) db.businesses.find({ $and: [{"price" : "$"}, {"rating": { $gt: 4.5}}]}) db.businesses.createIndex( { review_count: -1 } ) Why document databases? JSON all the way When you should use a document database? Flexible JSON data Ad hoc query Flexible Operational and schema for capabilities indexing analytics fast iteration workloads When are other databases more appropriate? Database to Known, static Large binary Ultralow Highly Log analytics, enforce access patterns data latency, connected full-text referential for primary key ephemeral data social searches integrity lookups dataset 101 010 Relational Key value Amazon S3 In memory Graph Search Customer use cases Product FinTech Content catalog management © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Customer challenges with document databases Self-managing is hard Scaling is hard Undifferentiated heavy lifting Timely, costly, complex Working backward We start with the customer and we work backward Amazon DocumentDB (with MongoDB compatibility) Fully MongoDB managed Scalable compatible Managed by AWS: Separation of compute Compatible with MongoDB no hardware provisioning; and storage enables 3.6; use the same SDKs, auto patching, quick setup, both layers to scale tools, and applications with secure, and automatic independently; scale out Amazon DocumentDB backups to 15 read replicas in minutes Cloud-native database architecture Fully managed Automatic failure Automatic Integrated with Continuous recover and failover patching AWS services backup Replicas are Up to date with the Amazon CloudWatch, Enabled by default, up automatically latest patches AWS CloudTrail, AWS to 35 days of PITR promoted to primary CloudFormation, AWS Secrets Manager, Amazon VPC, IAM, AWS CLI “Our engineering teams now spend less time on operations like backup scripts, scale testing, and managing high availability and instead are able to focus on developing new capabilities for our customers.” Fully managed: Safe defaults Encryption at rest Amazon VPC only TLS by default Compliant by default Highly available and Backup enabled Authentication by Deletion protection durable by default by default default Scaling your database You are here Scalable Scale up Scale out Autoscaling Load balancing in minutes in minutes storage Scale to 15 read replicas, Scale from 16 to 768 Storage automatically Scale reads across replicas millions of reads GiB or RAM grow from 10 GB to 64 TB "Adopting Amazon DocumentDB is a game-changer . with Amazon DocumentDB, we can add or scale instances in minutes, regardless of data size.” Amazon DocumentDB Q: How? Cloud-native database architecture Challenges with traditional database architectures Application Scale Fail API parsing monolithically monolithically Query processor Buffer cache Logging/replication Storage Storage Storage Storage Storage Cloud-native database architecture API Scale compute Compute layer Query processor Caching Logging/replication 1 Decouple computeStorage and storage Scale storage Storage layer Cloud-native database architecture Logging/replication Storage Distributed storage volume Storage AZ1 AZ2 AZ3 2 Replicate 10-GB data segments 6× across 3 AZs DocumentDB Architecture Instance Instance Instance (replica) (primary) (replica) Compute Writes Reads Reads Reads Distributed storage volume Storage AZ1 AZ2 AZ3 Flexible Instances: 321 Environment: productiondev/test Instance Instance Instance (replica) (primary) (replica) Availability goal: 99.9999.9%99% % Durability: highly durable Writes Reads Reads Reads Distributed storage volume AZ1 AZ2 AZ3 Scaling reads replicaSet=rs0 readPreference=secondaryPreferred Instance Instance Instance (replica) (primary) (replica) Compute Writes Reads Reads Reads Distributed storage volume Storage AZ1 AZ2 AZ3 Scaling reads replicaSet=rs0 readPreference=secondaryPreferred Instance Instance Instance Instance (replica) (primary) (replica) (replica) Compute Writes ~8–10 minutes Reads Reads Reads Distributed storage volume Storage AZ1 AZ2 AZ3 Failure recovery ~30 seconds InstanceInstance Instance Instance Instance (primary)(Replica) (primary) (replica) (replica) Compute Writes Writes ~8–10 minutes Reads Reads Reads Distributed storage volume Storage AZ1 AZ2 AZ3 Scale-up for analytics Instance (replica) Instance Instance instance (replica) (primary) (replica) Compute Writes Reads Reads Reads Distributed storage volume Storage AZ1 AZ2 AZ3 Scaling storage Instance Instance Instance (replica) (primary) (replica) Compute Writes Reads Reads Reads DistributedDistributed storagestorage volume Storage Storage AZ1 AZ2 AZ3 AZ1 AZ2 AZ3 Grows automatically from 10 GB to 64 TB Why not sharding? Sharding motivator DocumentDB approach Scale reads Add up to 15 read replicas Scale writes Scale vertically with very low impact Scale storage Storage scales automatically to 64 TB © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Spot the difference! Spot the difference! “Third-party sales have grown from 3% of the total to 58%. To put it bluntly: Third-party sellers are kicking our first party butt. Badly.” https://blog.aboutamazon.com/company-news/2018-letter-to-shareholders FBA overview 1 6 2 Customer success 5 3 4 Inventory Authority Platform (IAP) Real-time serverless stream processing Multitude of use cases and growing Multiple data vending channels including real-time queries Challenges with previous data store Many transactions per second Large datasets Heavy resource and compute High costs High operations Different and complex nature of client requirements Why we chose DocumentDB { } MongoDB API Flexible schema Scalability Separation of Client isolation storage and compute Instance isolation Seamless AWS Exhaustive Availability and Backups and integration monitoring reliability PITR Why we chose DocumentDB { } MongoDB API Flexible schema Scalability Separation of Client isolation storage and compute Instance isolation Seamless AWS Exhaustive Availability and Backups and integration monitoring reliability PITR Why we chose DocumentDB { } MongoDB API Flexible schema Scalability Separation of Client isolation storage and compute Instance isolation Seamless AWS Exhaustive Availability and Backups and integration monitoring reliability PITR Why we chose DocumentDB { } MongoDB API Flexible schema Scalability Separation of Client isolation storage and compute Instance isolation Seamless AWS Exhaustive Availability and Backups and integration monitoring reliability PITR Architecture Business logic Ingestion layer Amazon Amazon AWS Amazon Client-facing (service APIs) Kinesis Kinesis Lambda DocumentDB APIs Results More client onboarding avenues Convenient peak load testing Validation against upstream Less operations Results Better resource utilization 96 hosts -> 2 Amazon DocumentDB instances Results Better resource Improved utilization performance 96 hosts -> 2 Amazon 66% improvement DocumentDB instances in average latency Results Better resource Improved Room to scale utilization performance 96 hosts -> 2 Amazon 66% improvement 19% CPU utilization DocumentDB instances in average latency Results Better resource Improved Room to scale Cost savings utilization performance 96 hosts -> 2 Amazon 66% improvement 19% CPU utilization 45% cost savings DocumentDB instances in average latency Lessons learned: Scaling writes Dedupe fast- Split your writes across Archive cold data moving update collections Evaluate updates vs Handling conditional replace updates (i.e., upserts) Lessons learned: Scaling reads Connect as a replica Working with long- sets rather than reader Close cursors after running operations endpoints use Lessons learned: Defining indexes Choosing instance type Single rather than Define indexes before for reader compound indexes enabling writes Background rather Evaluate the trade- than foreground offs index creation Best practices Restrict access Test performance Segregate code Automate Amazon S3 with