Amazon DocumentDB (with MongoDB compatibility) Deep Dive
Jeff Duffy DocumentDB Specialist SA
© 2020, Amazon Web Services, Inc. or its Affiliates. AWS: Purpose-built databases
Relational Key-value Document In-memory Graph Time-series Ledger Wide Column
Amazon Relational Amazon Amazon Amazon Amazon Database Service Amazon Amazon Amazon DynamoDB DocumentDB ElastiCache Neptune (RDS) Timestream Quantum Keyspaces Ledger Aurora Community Commercial Redis Memcached Database
© 2020, Amazon Web Services, Inc. or its Affiliates. Amazon DocumentDB Fast, scalable, and fully managed MongoDB-compatible database service
MongoDB Fast Scalable Fully managed compatible
Millions of requests per second Separation of compute and Managed by AWS: Compatible with MongoDB 3.6; with millisecond latency storage scales both no hardware provisioning; use the same SDKs, tools, and independently; scale out to auto patching, quick setup, applications with Amazon 15 read replicas in minutes secure, and automatic DocumentDB backups
Purpose-built document database engineered for the cloud
© 2020, Amazon Web Services, Inc. or its Affiliates. Customers
Learn more: https://aws.amazon.com/documentdb/customers/
© 2020, Amazon Web Services, Inc. or its Affiliates. Document Databases
© 2020, Amazon Web Services, Inc. or its Affiliates. Evolution of document databases !=
JSON RelationalJSON (Client)
(App) (Database)
JSON became Friction when Object-relational Document the de facto converting JSON mappers (ORMs) databases solved data interchange to the relational were created to the problem format model minimize friction
© 2020, Amazon Web Services, Inc. or its Affiliates. Why document databases?
Documents map naturally to how humans model data Denormalized data Normalized data model model Documents (objects/JSON) are common application data models
Document databases store JSON-like documents
Document databases provide flexible schema and indexing
Ad hoc querying and aggregations
© 2020, Amazon Web Services, Inc. or its Affiliates. Use cases for document data
Content Mobile Personalization management
Catalog Retail and User profiles marketing
© 2020, Amazon Web Services, Inc. or its Affiliates. Document Use Case: Gaming User Profiles
{ userid: 181276, { username: "sue1942", userid: 181276, name: {first: "Susan", username: "sue1942", last: "Benoit"}, name: {first: "Susan", ExplodingSnails: { last: "Benoit"} hi_score: 3185400, } global_rank: 5139, bonus_levels: true },} } promotions: ["new user","5%","snail lover"] }
© 2020, Amazon Web Services, Inc. or its Affiliates. Challenges scaling document databases
© 2020, Amazon Web Services, Inc. or its Affiliates. Traditional Database Architecture Challenges
Application
Single monolithic architectures API
Query processor Not designed for Caching the cloud
Logging
Storage Scale monolithically Fail monolithically
© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Add read capacity on-demand
Replication
Disk Disk Disk Disk Node 1 Node 2 Node 3 Node 4
© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Recover quickly from node failure
Replication
Disk Disk Disk Disk Node 1 Node 2 Node 3 Node 3’
© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Scale storage as data grows
Node Node Node
Storage Storage Storage Volume Volume Volume
© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Backup data without affecting performance
Operational Backup Node Node
Snapshot Snapshot
© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Data Durability
Replication Replication Node Node Node
Storage Volume Storage Volume Storage Volume
© 2020, Amazon Web Services, Inc. or its Affiliates. Amazon DocumentDB Purpose-built and engineered for the cloud
© 2020, Amazon Web Services, Inc. or its Affiliates. API Scale compute Compute layer Query processor
Caching
Logging
Decouple computeStorage and storage
Scale storage Storage layer
© 2020, Amazon Web Services, Inc. or its Affiliates. Amazon DocumentDB: Cloud Native Architecture
Instance Instance Instance (Replica) (Primary) (Replica) Compute Writes Reads Reads Reads
Distributed storage volume
Storage
AZ1 AZ2 AZ3
© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Add read capacity on-demand
Instance Instance Instance Instance (Replica) (Primary) (Replica) (Replica) Compute Writes
~8-10 mins Reads Reads Reads Reads
Distributed storage volume
Storage
AZ1 AZ2 AZ3
© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Quickly recover from node failure
~30 secs
Instance Instance InstanceInstance Instance (Replica) (Primary) (Replica)(Primary) (Replica) Writes Writes ~8-10 mins Reads Reads Reads Reads Reads Distributed storage volume
© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Scale storage as data grows
Instance Instance Instance Compute (Replica) (Primary) (Replica) Writes Reads Reads Reads Distributed storage volume
Storage
Storage
Grows Automatically from 10GB-64TB © 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Backup data without affecting performance
Compute Instance Instance Instance (Replica) (Primary) (Replica) Writes Reads Reads Reads Distributed storage volume
Storage
Continuous Backups Snapshots (PITR) Amazon S3 (Automated and Manual)
© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Data Durability
Instance Instance Instance Instances: 321 (Replica) (Primary) (Replica) Environment: productiondev/test
Availability goal: 99.99%99.9%99% Writes Durability: highly durable Reads Reads Reads
Distributed storage volume
AZ1 AZ2 AZ3
© 2020, Amazon Web Services, Inc. or its Affiliates. Service overview
© 2020, Amazon Web Services, Inc. or its Affiliates. Fully managed
Automatic failover Automated AWS Integration Pay-as-you-go and recovery maintenance pricing
Failed instances are Automated upgrades and CloudWatch, CloudTrail, Per-second instance automatically replaced patching CloudFormation, VPC, billing, no long-term Secrets Manager commitment
“Our engineering teams now spend less time on operations like backup scripts, scale testing, and managing high availability and instead are able to focus on developing new capabilities for our customers.”
© 2020, Amazon Web Services, Inc. or its Affiliates. MongoDB compatible
MongoDB 3.6 Same drivers, Replica sets tools
Compatible with MongoDB Use the same MongoDB Read scaling is easy with Community Edition 3.6 drivers and tools with replica set emulation Amazon DocumentDB
“Getting started with DocumentDB was also simple and we migrated our application in a couple of days without needing to make any meaningful code changes. Everything just worked.”
© 2020, Amazon Web Services, Inc. or its Affiliates. Why choose the MongoDB API?
• Over 60 million downloads • Easy for developers to get started • MongoDB API is powerful • Managing MongoDB is hard
Source: http://db-engines.com
© 2020, Amazon Web Services, Inc. or its Affiliates. Scalable
Scale out Scale up Autoscale Load in minutes in minutes storage balancing
Scale to 15 read Scale from Storage Scale reads across replicas 16 to 768 GiB automatically replicas of RAM grows from 10 GB to 64 TB
© 2020, Amazon Web Services, Inc. or its Affiliates. Working Backwards
FEB 2019 JAN 2019 MAR 2019 APR 2019 MAY 2019 JUN 2019 JUL 2019 AUG 2019 OCT 2019
Launch DDL Frankfurt Aggregation Per-second Sydney Start/Stop Slow query Change Streams logger Auditing Secrets Operators Billing Cluster Mumbai Manager Tokyo Deletion Aggregation Paris Protection Operators Seoul Singapore London
DEC 2019 FEB 2020 MAR 2020 APR 2020 … with more to come in 2020! Aggregation Canada RBAC Glue ETL Operators
© 2020, Amazon Web Services, Inc. or its Affiliates. Why not shard?
Sharding motivator DocumentDB approach
Scale Reads Add up to 15 read replicas
Scale Writes Scale vertically with very low impact
Scale Storage Storage scales automatically to 64TB
© 2020, Amazon Web Services, Inc. or its Affiliates. Security and compliance
VPC Only Encrypted by Safe defaults Compliance default
Strict network isolation with Encryption at rest with Best practices are the PCI DSS, Amazon Virtual Private Cloud AWS KMS defaults ISO 9001, 27001, 27017, (VPC) Encryption in transit and 27018, with TLS SOC 1, 2 & 3, HIPAA
© 2020, Amazon Web Services, Inc. or its Affiliates. Backup and recovery
Automatic backups No performance 35 days of Archive impact PITR snapshots
Automatic, incremental, and Backups do not affect Point-in-time recovery Keep snapshots for as long as continuous backups database performance (PITR) for up to 35 days you need
"Adopting Amazon DocumentDB is a game-changer because we offload management, security, and backup of our MongoDB databases to AWS. With Amazon DocumentDB, we can add or scale instances in minutes, regardless of data size. Further, we get automatic backups and point-in-time restore capabilities, which far exceed other managed DB services at less cost.”
© 2020, Amazon Web Services, Inc. or its Affiliates. Pricing (us-east-1)
Instances: Size/hr * count (db.r5.large $0.277/hr)
Compute Writes I/O: Count ($0.20/million) Reads Reads Reads Distributed storage volume
Storage Storage: GB/mo ($0.10/GB)
Backup: GB/mo Amazon S3 (100% Free! then $0.021/GB) © 2020, Amazon Web Services, Inc. or its Affiliates. Migration Guide Learn how to migrate from MongoDB to Amazon DocumentDB
Offline migration using MongoDB utilities: Get started quickly, great for POCs
Online migration using AWS DMS Near-zero downtime migration
Hybrid migration leverages both solutions Best of both worlds
https://docs.aws.amazon.com/documentdb/latest/developerguide/docdb-migration.html All options support MongoDB on-premises and EC2, for both replica sets and sharded clusters
© 2020, Amazon Web Services, Inc. or its Affiliates. Purpose built The right tool for the right job
Thank you © 2020, Amazon Web Services, Inc. or its Affiliates.