AWS Dynamodb • Pay-As-You-Go Model

Home , Dynamo (storage system)

AWS DynamoDB • Pay-as-you-go model

基于应用场景的最佳实践

郎斐大数据系统工程师 Amazon Web Services

2017.7

• NoSQL 数据库

• 完全托管

• 大规模无缝伸缩性能

• 毫秒级响应时间

• 高可用性

• 与AWS其他服务完美兼容

• 根据使用情况付费 Amazon的历程

现今: Tier-0 service 为大 2012年1月: 2004年12月6日多数Amazon内部架 Amazon.com 网站中断 DynamoDB 发布构实现所依赖

2007年12月: 2016年Q3: Dynamo 论文发表 Leader in Gartner MQ, Forrester Wave Dynamo Paper Amazon的历程

现今: Tier-0 service 为大 2012年1月: 2004年12月6日多数Amazon内部架 Amazon.com 网站中断 DynamoDB 发布构实现所依赖

2007年12月: 2016年Q3: Dynamo 论文发表 Leader in Gartner MQ, Forrester Wave The Forrester Wave™: Big Data NoSQL, Q3 2016

The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change. Amazon的历程

现今: Tier-0 service 为大 2012年1月: 2004年12月6日多数Amazon内部架 Amazon.com 网站中断 DynamoDB 发布构实现所依赖

2007年12月: Q3 ‘16: Dynamo 论文发表 Leader in Gartner MQ, Forrester Wave DynamoDB-at-a-Glance • 在全球16个区域可用 • 响应百万级请求数据表; 许多数据表>100TB • 几十个PB级数据; 每个月万亿级请求

欧洲 Dublin London Frankfurt

中国北美 US East – N. Virginia US East - Ohio 亚太 US West – Oregon Tokyo US West – San Singapore Francisco Sydney Canada - Central Mumbai AWS GovCloud (US) Seoul

南美 Sao Paulo Multi-AZ & Cross-Region Replication Available Today Features

•1 数据默认在一个区域的 3个可用区(AZ)备份存储

•2 可通过开源的cross-region replication library以 1 2 及AWS现有服务实现跨区备份

Key Benefits • 无需通过付费实现数据的高可用性

• 当有一个AZ完全down掉时，保证数据仍然可用

• 横向扩展，通过将读取访问定向到其中一个副本为何要使用 DynamoDB

1. 高度伸缩性 4. 高可用& 数据受保护

2. 稳定的性能 5. 完全托管& TCO

3. 安全可监控 6. 开发平台& 工具兼容 Scalability 高度伸缩性 >300% increase from baseline

>200% increase from baseline

Writes

Reads Auto Scaling Available Today Features • 全面托管，读写分开设定，对表和GSI都可进行设定

• 只需要设定吞吐量的目标% 以及最高／最低使用吞吐量

• 可以通过DynamoDB控制界面, CLI, 和SDK访问

Without Auto Scaling Key Benefits • 省去了一切人工吞吐量的预测并设定

• 根据实际使用情况调节吞吐量，将其相应地升高或降低

• 吞吐量的调控可以从控制界面看到

With Auto Scaling 内容概要

• 核心组件, API, 次级索引(LSI/GSI) • DynamoDB的分区行为和性能扩展 • Throttling 请求受限及解决方案 • 客户案例与最佳实践 (新功能预览) • *提问时间核心组件, API, 次级索引(LSI/GSI) Table 表 Table表

Items 项目

Attributes 属性

对于分区键相同的所有项 Partition Key Sort Key 目支持以下操作：分区键排序键 ==, <, >, >=, <= 必须 “begins with” Key-value access pattern “between” Determines data distribution 可选 sorted results Model 1:N relationships counts Enables rich query capabilities top/bottom N values paged responses 对表Table, 项目Item 和 DynamoDB流 (Stream )的API操作

控制平台 (Table) 数据平台 (Item)

CreateTable PutItem DescribeTable BatchWriteItem (up to 25 items) DynamoDB ListTables GetItem UpdateTable BatchGetItem (up to 100 items) DeleteTable Query

Scan DynamoDB Streams API UpdateItem • ListStreams DeleteItem • DescribeStream *BatchWriteItem to Delete • GetShardIterator • GetRecords Local Secondary Index (LSI) 本地二级索引

10 GB max per partition key, i.e. LSIs DynamoDB Table LSI limit the # of sort keys!

A1 A3 A2 A4 A5 A1 A2 A3 A4 A5 (partition key) (sort key) (partition key) (sort key)

A1 A2 (partition key) (sort key) A1 A4 A2 A3 A5 A1 A2 A6 A4 A5 (partition key) (sort key) (partition key) (sort key)

A1 A2 A3 A4 A7 (partition key) (sort key) A1 A5 A2 A3 A4 (partition key) (sort key)

• 排序键为任意属性 • 索引是对同一分区（键）进行的 Global Secondary Index (GSI) 全局二级索引

GSI 拥有独立的 GSI RCUs/WCUs DynamoDB Table

A3 A1 A2 A4 A5 (partition key) (table key) A A1 A2 A3 A4 A5 A3 A1 A2 A4 A7 L (partition key) (sort key) (partition key) (table key) L A1 A2 (partition key) (sort key) A3 A1 A2 (partition key) (table key) A1 A2 A6 A4 A5 INCLUDE (partition key) (sort key) A3 A1 A2 A2 (partition key) (table key) A1 A2 A3 A4 A7 (partition key) (sort key) A3 A1 (partition key) (table key) KEYS_O A3 A1 NLY (partition key) (table key) • 任意的分区(+排序) 键 • 索引是建立在整个表上的（相对于LSI） • 可以随时创建或删除 GSI DynamoDB Table LSI Limit 5 Limit 5

GSI 拥有独立的 RCUs/WCUs 10 GB max per partition key, i.e. A1 A2 A3 A4 A5 LSIs limit the # of sort keys! A3 A1 A2 A4 A5 (partition key) (sort key) (partition key) (table key) A1 A2 A (partition key) (sort key) A3 A1 A2 A4 A7 L A1 A3 A2 A4 A5 (partition key) (table key) L A1 A2 A6 A4 A5 (partition key) (sort key) (partition key) (sort key) A3 A1 A2 A1 A2 A3 A4 A7 (partition key) (table key) INCLUDE (partition key) (sort key) A1 A4 A2 A3 A5 A3 A1 A2 A2 (partition key) (sort key) (partition key) (table key)

A3 A1 Partition Key Sort A1 A5 A2 A3 A4 (partition key) (table key) Key (partition key) (sort key) KEYS_O 必须 A3 A1 NLY 可选 (partition key) (table key) Key-value access Model 1:N relationships pattern Enables rich query Determines data capabilities distribution Partition 分区 Partition A Partition Partition C Partition Partition B Partition 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 00 55 FF AA

CustomerOrdersTable FF 0 Hash.MIN Hash.MIN = Hash.MAX=

Keyspace ) = 7B ) ) = CD) ) = 48 ) 1 3 2 Hash( Hash( Hash( 分区行为 ] ] ]

B00U3FPN4U B00OQVZDJM B00X4WHP5E [ OrderId: 3 OrderId: 3 CustomerId: ASIN ASIN: ASIN: [ [ 2 OrderId: CustomerId: 4 OrderId: 1 OrderId: CustomerId: 1 ASIN:

CustomerOrdersTable Partitioning % % % % % % 16.66 % 16.66 16.66 % 16.66 16.66 % 16.66 16.66 16.66 % 16.66 16.66 % 16.66 16.66 16.66 % 16.66 16.66 : Partition F Partition Partition E Partition Partition A Partition B Partition C Partition D Partition *=subject change *=subject to 3w+ 1r 3000 < * Time Where=w WCU& = r RCU Split for provisioned capacity 每个分区有一定的吞吐量上限 Partition A Partition Partition C Partition Partition B Partition 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 因为预设吞吐量提升发生的分割 00 55 AA FF % % 16.66 % 16.66 16.66 % 16.66 16.66 Partition A Partition Partition B Partition 的数据， 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 size 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 Partition E Partition Partition D Partition 10GB partition Time 发生分区的分割 Split for 一旦超过这个上限，即有可能每个分区能存放 Partition A Partition Partition B Partition C Partition 分区的分割因为数据量增长而发生的分割 % Keyspace 33.33 Capacity % Provisioned 33.33 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 00 55 AA FF split CustomerOrdersTable FF 0

Hash.MIN Hash.MIN = Hash.MAX= Partition Keyspace 分区的冗余存储

3-way replication OrderId: 1 Hash(1) = 7B CustomerId: 1 ASIN: [B00X4WHP5E]

Availability Zone A Availability Zone B Availability Zone C

54: A9: 54: A9: 00: 54: 55: A9: AA: FF: 00: 55: AA: FF: 00: 55: AA: FF: ∞ ∞ ∞ ∞ 0 ∞ 0 ∞ 0 ∞ 0 0 0 ∞ 0 0 0 ∞

Partition A Partition B Partition C Partition A Partition B Partition C Partition A Partition B Partition C 1000 RCUs 1000 RCUs 1000 RCUs 1000 RCUs 1000 RCUs 1000 RCUs 1000 RCUs 1000 RCUs 1000 RCUs 100 WCUs 100 WCUs 100 WCUs 100 WCUs 100 WCUs 100 WCUs 100 WCUs 100 WCUs 100 WCUs

Host A Host B Host C Host E Host F Host G Host H Host I Host J CustomerOrdersTable Throttling 请求受限如何发现的？

CloudWatch throttling metrics 全表实际消耗吞吐量 > 预设吞吐量来自分区的Throttling 内部工具： Heatmap 热图分析 (热键)

Top Items

Time Heat SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT"

User User

DynamoDB

Partition 1 Partition 2

ProductCatalog Table Requests Per Second RequestDistribution Per PartitionKey DynamoDBRequests Item Primary Key Cache Hits DynamoDB Accelerator (DAX) Available Today Features • 完全托管: 用户不用担心任何升级，软件管理

• 灵活: 一个DAX可对应多个表格进行缓存 Your Applications

• 高可用: 容错, 在多个可用区跨区备份 New! DynamoDB Accelerator (DAX) • 扩展性强: 根据负载可横向扩展至10个缓存副本

• 可管理: 与AWS其他服务完美兼容: Amazon CloudWatch, Tagging for DynamoDB, AWS Console

• 安全: 支持Amazon VPC, AWS IAM, AWS CloudTrail, AWS Organizations

DynamoDB DynamoDB Accelerator (DAX) Available Today Key Benefits • 超强性能: 单个DAX集群读取延迟在微秒级

Your Applications • 使用简单: DynamoDB API 兼容- 从而对现有应用程序的代码修改量达到最小 New! DynamoDB Accelerator (DAX) • 降低成本: 减少对DynamoDB 表的直接读取访问从而降低所需吞吐量，适合存有热数据的表格

DynamoDB Messaging App 聊天应用

Large items Filters vs. indexes M:N Modeling—inbox 收件箱 and outbox 发件箱 David

Messages App

Inbox Outbox

SELECT * SELECT * FROM Messages FROM Messages WHERE Recipient='David' WHERE Sender ='David' LIMIT 50 LIMIT 50 ORDER BY Date DESC ORDER BY Date DESC Messages Table 大、小attributes 混合 Inbox David SELECT * FROM Messages Messages Table WHERE Recipient='David' LIMIT 50 Recipient Date Sender Message ORDER BY Date DESC David 2016-10-02 Bob … … 48 more messages for David … 50 items ×256 KB each David 2016-10-03 Alice … Alice 2016-09-28 Bob … Alice 2016-10-01 Carol … 信息主体附件size特别大 (Many more messages) 查询成本计算

Average item size Eventually consistent reads

Items evaluated by query Conversion ratio 将“大数据”剥离 Uniformly distributes large item reads

(50 sequential items at 128 bytes) 1. 查询 Inbox-GSI: 1 RCU David 2. BatchGetItem Messages: 1600 RCU (50 separate items at 256 KB)

Inbox-GSI Messages Table

Recipient Date Sender Subject MsgId MsgId Body David 2016-10-02 Bob Hi!… afed 9d2b … David 2016-10-03 Alice RE: The… 3kf8 3kf8 … Alice 2016-09-28 Bob FW: Ok… 9d2b ct7r … Alice 2016-10-01 Carol Hi!... ct7r afed … 简化写入操作 PutItem { MsgId: 123, Body: ..., Recipient: Steve, David Sender: David, Date: 2016-10-23, ... }

Inbox Messages Global secondary Table index Messaging app

David Inbox Outbox

Inbox Outbox Global secondary Messages Global secondary index Table index 分散“大数据”

减少one-to-many 的数据size 创建GSI 利用GSIs 处理sender和recipient之间的 M:N 关系

Outbox Messages Inbox

同时访问大量“大数据” Real-Time Voting 实时投票系统

Write-heavy items 高写入访问投票系统设计需求

• 每个账户只允许投票一次 • 不允许修改已经投出的选票 • 实时分析 • 选举人分布统计实时投票系统架构设计

RawVotes Table

Voters Voting App

AggregateVotes Table 伸缩瓶颈

Voters

Provision 200,000 WCUs

Partition 1 Partition K Partition M Partition N 1000 WCUs 1000 WCUs 1000 WCUs 1000 WCUs

Candidate A Candidate B

Votes Table 写入分片

Voter

Candidate A_7 Candidate B_4 Candidate B_8 Candidate A_1 Candidate A_4 Candidate B_5 Candidate B_1 Candidate A_5 Candidate B_3 Candidate B_7 Candidate A_3 Candidate A_2

Candidate A_6 Candidate A_8 Votes Table Candidate B_2 Candidate B_6 写入分片设计

Voter

UpdateItem: “CandidateA_” + hash(voterID)%10 ADD 1 to Votes

Candidate A_7 Candidate B_4 Candidate B_8 Candidate A_1 Candidate A_4 Candidate B_5 Candidate B_1 Candidate A_5 Candidate B_3 Candidate B_7 Candidate A_3 Candidate A_2

Candidate A_6 Candidate A_8 Votes Table Candidate B_2 Candidate B_6 分片统计

2. Store Voter Periodic Process 1. Sum

Candidate A_7 Candidate B_8 Candidate A Candidate B_4 Candidate A_1 Candidate A_4 Total: 2.5M Candidate B_5 Candidate B_1 Candidate A_5 Candidate B_3 Candidate B_7 Candidate A_3 Candidate A_2

Candidate A_6 Candidate A_8 Votes Table Candidate B_2 Candidate B_6 将高写入访问的分区再分片

以读换写(扩展性)” 考虑每个分区键的IOPS以及每个分区的IOPS

写入负载不能很好地横向扩展 2017.7 2017.7 2017.7 2017.7 2017.7