D A T 3 7 0 Best practices for DocumentDB

Joseph Idziorek Principal Product Manager, Amazon DocumentDB

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda

Getting started

Migrations

Cluster sizing

Benchmarking

Endpoints, consistency, replica sets

Backup settings

Cost optimization Related breakouts

DAT326 – Amazon DocumentDB Deep Dive DAT372 - Migrating your to Amazon DocumentDB DAT338-R - Hands-on workshop: How to migrate to Amazon DocumentDB DAT338-R1 - Hands-on workshop: How to migrate to Amazon DocumentDB © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. When you should use a document

JSON data Flexible Highly Operational and Ad hoc Flexible schema for available, analytics query indexing fast iteration durable workloads capabilities When are other databases more appropriate?

Database to Known, static Large binary Ultra-low Highly Log analytics, enforce access patterns data latency, connected full-text referential for primary key ephemeral data social data searches integrity lookups set

101 010

Relational Key-value S3 In-memory Graph Search © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Getting started

Getting started: https://docs.aws.amazon.com/documentdb/latest/developerguide/getting-started.html © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migrations

• Amazon DocumentDB is MongoDB-compatible • Supported APIs • Functional differences • Limits • Continually working backward from customers • Migration methods • AWS Database Migration Service (DMS) – free • mongodump/mongorestore • Create indexes first, before importing data © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cluster sizing

Instance Instance Instance (replica) (primary) (replica)

Compute

Writes

Reads

Reads Reads

Distributed storage volume

Storage

AZ1 AZ2 AZ3 Cluster sizing

Instances:321

Environment: productiondev/test Instance Instance Instance (replica) (primary) (replica) Availability goal: 99.99%99.9%99%

Durability: highly durable Writes

Reads

Reads Reads

Distributed storage volume

AZ1 AZ2 AZ3 Cluster sizing

Performance is a function of the instance size and utilization of those instances • vCPU count • Working set memory (2/3 of RAM) R5 >> R4 instances • Read velocity • Write velocity

db.r5 – Current generation memory-optimized instance classes Class vCPU Memory (GiB) Network performance r5.large 2 16 Up to 10Gbps r5.xlarge 4 32 Up to 10Gbps r5.2xlarge 8 64 Up to 10Gbps r5.4xlarge 16 128 Up to 10Gbps r5.12xlarge 48 384 10Gbps r5.24xlarge 96 768 25Gbps © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benchmarking

• Benchmarking has been around for a long time • Objective measure for relative performance, typically between two or more systems • Commonly seen: YCSB • Limitations • No substitute for your actual workload • Strive for an apples-to-apples comparison • Backup, encryption, durability settings, network, etc. • Take into account price/performance and TOC © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Endpoints Cluster Reader endpoint endpoint

Instance endpoint Instance endpoint Instance endpoint

Compute Instance Instance Instance

(replica) (primary) (replica)

Writes

Reads

Reads Reads

Distributed storage volume

Storage

AZ1 AZ2 AZ3 Primary reads are read-after-write consistent Consistency Replica reads are eventually consistent

Instance Instance Instance (replica) (primary) (replica)

State State Compute Typically Typically

~10-100ms ~10-100ms

Writes

Reads

Reads Reads

Distributed storage volume

Storage

AZ1 AZ2 AZ3 Connecting as a replica set Cluster endpoint

Instance Instance Instance (replica) (primary) (replica)

Compute

Writes

Reads

Reads Reads

Distributed storage volume

Storage

AZ1 AZ2 AZ3 Connecting as a replica set

client = pymongo.MongoClient('://:@mycluster .node.us-east- 1.docdb.amazonaws.com:27017/?ssl=true&ssl_ca_certs=rds-combined- ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred') Scaling reads replicaSet=rs0 readPreference=secondaryPreferred

Instance Instance Instance (replica) (primary) (replica)

Compute

Writes

Reads

Reads Reads

Distributed storage volume

Storage

AZ1 AZ2 AZ3 Connecting as a replica set

Read preference – five options to choose from: 1. Primary (default) 2. Primary preferred (recommended) 3. Secondary 4. Secondary preferred 5. Nearest © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Backup settings

• Backup retention period: Period of time between 1 and 35 days during which you can perform a point-in-time restore • Backup window: Period of time during which automatic snapshots are taken • Automatic snapshot: Full backup, automatically created • Manual snapshot: Customer-initiated, long-term retention Backup window: 03:00-11:00 Backup Backup retention period: 4 (days) Retention period Automatic Manual snapshot snapshot Backup Backup Backup Backup window window window window

Cluster #1

8/1 8/2 8/3 8/4 8/5 Retention period

Cluster #1 (+2 days)

8/1 8/2 8/3 8/4 8/5 8/6 8/7

Retention period

Cluster #1 (+4 days)

8/1 8/2 8/3 8/4 8/5 8/6 8/7 8/8 8/9 © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cost optimization

Instance Instance Instance (replica) (primary) (replica)

Compute

Writes

Reads

Reads Reads

Distributed storage volume

Storage

AZ1 AZ2 AZ3 Cost optimization

Start/stop Per-second Single instance Billing alerts cluster billing cluster

One-click stop/start to Instance hours are Single instance Setup alerts to save on instance costs billed in 1-second cluster is highly protect against increments durable surprises Flexible

Instances:321

Environment: productiondev/test Instance Instance Instance (replica) (primary) (replica) Availability goal: 99.99%99.9%99%

Durability: highly durable Writes

Reads

Reads Reads

Distributed storage volume

AZ1 AZ2 AZ3 Learn databases with AWS Training and Certification Resources created by the experts at AWS to help you build and validate database skills

25+ free digital training courses cover topics and services related to databases, including: • • Amazon DocumentDB • Amazon DynamoDB • Amazon ElastiCache • • Amazon RDS

Validate expertise with the new AWS Certified Database - Specialty beta exam Visit aws.training

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.