DocumentDB (with MongoDB compatibility) Deep Dive

Jeff Duffy DocumentDB Specialist SA

© 2020, , Inc. or its Affiliates. AWS: Purpose-built

Relational Key-value Document In-memory Graph Time-series Ledger Wide Column

Amazon Relational Amazon Amazon Amazon Amazon Service Amazon Amazon Amazon DynamoDB DocumentDB ElastiCache Neptune (RDS) Timestream Quantum Keyspaces Ledger Aurora Community Commercial Redis Memcached Database

© 2020, Amazon Web Services, Inc. or its Affiliates. Amazon DocumentDB Fast, scalable, and fully managed MongoDB-compatible database service

MongoDB Fast Scalable Fully managed compatible

Millions of requests per second Separation of compute and Managed by AWS: Compatible with MongoDB 3.6; with millisecond latency storage scales both no hardware provisioning; use the same SDKs, tools, and independently; scale out to auto patching, quick setup, applications with Amazon 15 read replicas in minutes secure, and automatic DocumentDB backups

Purpose-built document database engineered for the cloud

© 2020, Amazon Web Services, Inc. or its Affiliates. Customers

Learn more: https://aws.amazon.com/documentdb/customers/

© 2020, Amazon Web Services, Inc. or its Affiliates. Document Databases

© 2020, Amazon Web Services, Inc. or its Affiliates. Evolution of document databases !=

JSON RelationalJSON (Client)

(App) (Database)

JSON became Friction when Object-relational Document the de facto converting JSON mappers (ORMs) databases solved data interchange to the relational were created to the problem format model minimize friction

© 2020, Amazon Web Services, Inc. or its Affiliates. Why document databases?

Documents map naturally to how humans model data Denormalized data Normalized data model model Documents (objects/JSON) are common application data models

Document databases store JSON-like documents

Document databases provide flexible schema and indexing

Ad hoc querying and aggregations

© 2020, Amazon Web Services, Inc. or its Affiliates. Use cases for document data

Content Mobile Personalization management

Catalog Retail and User profiles marketing

© 2020, Amazon Web Services, Inc. or its Affiliates. Document Use Case: Gaming User Profiles

{ userid: 181276, { username: "sue1942", userid: 181276, name: {first: "Susan", username: "sue1942", last: "Benoit"}, name: {first: "Susan", ExplodingSnails: { last: "Benoit"} hi_score: 3185400, } global_rank: 5139, bonus_levels: true },} } promotions: ["new user","5%","snail lover"] }

© 2020, Amazon Web Services, Inc. or its Affiliates. Challenges scaling document databases

© 2020, Amazon Web Services, Inc. or its Affiliates. Traditional Database Architecture Challenges

Application

Single monolithic architectures API

Query processor Not designed for Caching the cloud

Logging

Storage Scale monolithically Fail monolithically

© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Add read capacity on-demand

Replication

Disk Disk Disk Disk Node 1 Node 2 Node 3 Node 4

© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Recover quickly from node failure

Replication

Disk Disk Disk Disk Node 1 Node 2 Node 3 Node 3’

© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Scale storage as data grows

Node Node Node

Storage Storage Storage Volume Volume Volume

© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Backup data without affecting performance

Operational Backup Node Node

Snapshot Snapshot

© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Data Durability

Replication Replication Node Node Node

Storage Volume Storage Volume Storage Volume

© 2020, Amazon Web Services, Inc. or its Affiliates. Amazon DocumentDB Purpose-built and engineered for the cloud

© 2020, Amazon Web Services, Inc. or its Affiliates. API Scale compute Compute layer Query processor

Caching

Logging

Decouple computeStorage and storage

Scale storage Storage layer

© 2020, Amazon Web Services, Inc. or its Affiliates. Amazon DocumentDB: Cloud Native Architecture

Instance Instance Instance (Replica) (Primary) (Replica) Compute Writes Reads Reads Reads

Distributed storage volume

Storage

AZ1 AZ2 AZ3

© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Add read capacity on-demand

Instance Instance Instance Instance (Replica) (Primary) (Replica) (Replica) Compute Writes

~8-10 mins Reads Reads Reads Reads

Distributed storage volume

Storage

AZ1 AZ2 AZ3

© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Quickly recover from node failure

~30 secs

Instance Instance InstanceInstance Instance (Replica) (Primary) (Replica)(Primary) (Replica) Writes Writes ~8-10 mins Reads Reads Reads Reads Reads Distributed storage volume

© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Scale storage as data grows

Instance Instance Instance Compute (Replica) (Primary) (Replica) Writes Reads Reads Reads Distributed storage volume

Storage

Storage

Grows Automatically from 10GB-64TB © 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Backup data without affecting performance

Compute Instance Instance Instance (Replica) (Primary) (Replica) Writes Reads Reads Reads Distributed storage volume

Storage

Continuous Backups Snapshots (PITR) (Automated and Manual)

© 2020, Amazon Web Services, Inc. or its Affiliates. Challenge: Data Durability

Instance Instance Instance Instances: 321 (Replica) (Primary) (Replica) Environment: productiondev/test

Availability goal: 99.99%99.9%99% Writes Durability: highly durable Reads Reads Reads

Distributed storage volume

AZ1 AZ2 AZ3

© 2020, Amazon Web Services, Inc. or its Affiliates. Service overview

© 2020, Amazon Web Services, Inc. or its Affiliates. Fully managed

Automatic failover Automated AWS Integration Pay-as-you-go and recovery maintenance pricing

Failed instances are Automated upgrades and CloudWatch, CloudTrail, Per-second instance automatically replaced patching CloudFormation, VPC, billing, no long-term Secrets Manager commitment

“Our engineering teams now spend less time on operations like backup scripts, scale testing, and managing high availability and instead are able to focus on developing new capabilities for our customers.”

© 2020, Amazon Web Services, Inc. or its Affiliates. MongoDB compatible

MongoDB 3.6 Same drivers, Replica sets tools

Compatible with MongoDB Use the same MongoDB Read scaling is easy with Community Edition 3.6 drivers and tools with replica set emulation Amazon DocumentDB

“Getting started with DocumentDB was also simple and we migrated our application in a couple of days without needing to make any meaningful code changes. Everything just worked.”

© 2020, Amazon Web Services, Inc. or its Affiliates. Why choose the MongoDB API?

• Over 60 million downloads • Easy for developers to get started • MongoDB API is powerful • Managing MongoDB is hard

Source: http://db-engines.com

© 2020, Amazon Web Services, Inc. or its Affiliates. Scalable

Scale out Scale up Autoscale Load in minutes in minutes storage balancing

Scale to 15 read Scale from Storage Scale reads across replicas 16 to 768 GiB automatically replicas of RAM grows from 10 GB to 64 TB

© 2020, Amazon Web Services, Inc. or its Affiliates. Working Backwards

FEB 2019 JAN 2019 MAR 2019 APR 2019 MAY 2019 JUN 2019 JUL 2019 AUG 2019 OCT 2019

Launch DDL Frankfurt Aggregation Per-second Sydney Start/Stop Slow query Change Streams logger Auditing Secrets Operators Billing Cluster Mumbai Manager Tokyo Deletion Aggregation Paris Protection Operators Seoul Singapore London

DEC 2019 FEB 2020 MAR 2020 APR 2020 … with more to come in 2020! Aggregation Canada RBAC Glue ETL Operators

© 2020, Amazon Web Services, Inc. or its Affiliates. Why not shard?

Sharding motivator DocumentDB approach

Scale Reads Add up to 15 read replicas

Scale Writes Scale vertically with very low impact

Scale Storage Storage scales automatically to 64TB

© 2020, Amazon Web Services, Inc. or its Affiliates. Security and compliance

VPC Only Encrypted by Safe defaults Compliance default

Strict network isolation with Encryption at rest with Best practices are the PCI DSS, Amazon AWS KMS defaults ISO 9001, 27001, 27017, (VPC) Encryption in transit and 27018, with TLS SOC 1, 2 & 3, HIPAA

© 2020, Amazon Web Services, Inc. or its Affiliates. Backup and recovery

Automatic backups No performance 35 days of Archive impact PITR snapshots

Automatic, incremental, and Backups do not affect Point-in-time recovery Keep snapshots for as long as continuous backups database performance (PITR) for up to 35 days you need

"Adopting Amazon DocumentDB is a game-changer because we offload management, security, and backup of our MongoDB databases to AWS. With Amazon DocumentDB, we can add or scale instances in minutes, regardless of data size. Further, we get automatic backups and point-in-time restore capabilities, which far exceed other managed DB services at less cost.”

© 2020, Amazon Web Services, Inc. or its Affiliates. Pricing (us-east-1)

Instances: Size/hr * count (db.r5.large $0.277/hr)

Compute Writes I/O: Count ($0.20/million) Reads Reads Reads Distributed storage volume

Storage Storage: GB/mo ($0.10/GB)

Backup: GB/mo Amazon S3 (100% Free! then $0.021/GB) © 2020, Amazon Web Services, Inc. or its Affiliates. Migration Guide Learn how to migrate from MongoDB to Amazon DocumentDB

Offline migration using MongoDB utilities: Get started quickly, great for POCs

Online migration using AWS DMS Near-zero downtime migration

Hybrid migration leverages both solutions Best of both worlds

https://docs.aws.amazon.com/documentdb/latest/developerguide/docdb-migration.html All options support MongoDB on-premises and EC2, for both replica sets and sharded clusters

© 2020, Amazon Web Services, Inc. or its Affiliates. Purpose built The right tool for the right job

Thank you © 2020, Amazon Web Services, Inc. or its Affiliates.