<<

AWS DynamoDB • Pay-as-you-go model

基于应用场景的最佳实践

郎斐 大数据系统工程师 Web Services

2017.7

© 2015, , Inc. or its Affiliates. All rights reserved. SQL vs. NoSQL (数据发展趋势) Amazon DynamoDB 的简单概括

• NoSQL 数据库

• 完全托管

• 大规模无缝伸缩性能

• 毫秒级响应时间

• 高可用性

• 与AWS其他服务完美兼容

• 根据使用情况付费 Amazon的历程

现今: Tier-0 service 为大 2012年1月: 2004年12月6日 多数Amazon内部架 Amazon.com 网站中断 DynamoDB 发布 构实现所依赖

2007年12月: 2016年Q3: Dynamo 论文发表 Leader in Gartner MQ, Forrester Wave Dynamo Paper Amazon的历程

现今: Tier-0 service 为大 2012年1月: 2004年12月6日 多数Amazon内部架 Amazon.com 网站中断 DynamoDB 发布 构实现所依赖

2007年12月: 2016年Q3: Dynamo 论文发表 Leader in Gartner MQ, Forrester Wave The Forrester Wave™: Big Data NoSQL, Q3 2016

The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change. Amazon的历程

现今: Tier-0 service 为大 2012年1月: 2004年12月6日 多数Amazon内部架 Amazon.com 网站中断 DynamoDB 发布 构实现所依赖

2007年12月: Q3 ‘16: Dynamo 论文发表 Leader in Gartner MQ, Forrester Wave DynamoDB-at-a-Glance • 在全球16个区域可用 • 响应百万级请求数据表; 许多数据表>100TB • 几十个PB级数据; 每个月万亿级请求

欧洲 Dublin London Frankfurt

中国 北美 US East – N. Virginia US East - Ohio 亚太 US West – Oregon Tokyo US West – San Singapore Francisco Sydney Canada - Central Mumbai AWS GovCloud (US) Seoul

南美 Sao Paulo Multi-AZ & Cross-Region Replication Available Today Features

•1 数据默认在一个区域的 3个可用区(AZ)备份存储

•2 可通过开源的cross-region replication library以 1 2 及AWS现有服务实现跨区备份

Key Benefits • 无需通过付费实现数据的高可用性

• 当有一个AZ完全down掉时,保证数据仍然可用

• 横向扩展,通过将读取访问定向到其中一个副本 为何要使用 DynamoDB

1. 高度伸缩性 4. 高可用& 数据受保护

2. 稳定的性能 5. 完全托管& TCO

3. 安全可监控 6. 开发平台& 工具 兼容 Scalability 高度伸缩性 >300% increase from baseline

>200% increase from baseline

Writes

Reads Auto Scaling Available Today Features • 全面托管, 读写分开设定, 对表和GSI都可进行设定

• 只需要设定吞吐量的目标% 以及 最高/最低 使用吞吐量

• 可以通过DynamoDB控制界面, CLI, 和SDK访问

Without Auto Scaling Key Benefits • 省去了一切人工吞吐量的预测并设定

• 根据实际使用情况调节吞吐量,将其相应地升高或降低

• 吞吐量的调控可以从控制界面看到

With Auto Scaling 内容概要

• 核心组件, API, 次级索引(LSI/GSI) • DynamoDB的分区行为和性能扩展 • Throttling 请求受限及解决方案 • 客户案例与最佳实践 (新功能预览) • *提问时间 核心组件, API, 次级索引(LSI/GSI) Table 表 Table表

Items 项目

Attributes 属性

对于分区键相同的所有项 Partition Key Sort Key 目支持以下操作: 分区键 排序键 ==, <, >, >=, <= 必须 “begins with” Key-value access pattern “between” Determines data distribution 可选 sorted results Model 1:N relationships counts Enables rich query capabilities top/bottom N values paged responses 对表Table, 项目Item 和 DynamoDB流 (Stream )的API操作

控制平台 (Table) 数据平台 (Item)

CreateTable PutItem DescribeTable BatchWriteItem (up to 25 items) DynamoDB ListTables GetItem UpdateTable BatchGetItem (up to 100 items) DeleteTable Query

Scan DynamoDB Streams API UpdateItem • ListStreams DeleteItem • DescribeStream *BatchWriteItem to Delete • GetShardIterator • GetRecords Local Secondary Index (LSI) 本地二级索引

10 GB max per partition key, i.e. LSIs DynamoDB Table LSI limit the # of sort keys!

A1 A3 A2 A4 A5 A1 A2 A3 A4 A5 (partition key) (sort key) (partition key) (sort key)

A1 A2 (partition key) (sort key) A1 A4 A2 A3 A5 A1 A2 A6 A4 A5 (partition key) (sort key) (partition key) (sort key)

A1 A2 A3 A4 A7 (partition key) (sort key) A1 A5 A2 A3 A4 (partition key) (sort key)

• 排序键为任意属性 • 索引是对同一分区(键)进行的 Global Secondary Index (GSI) 全局二级索引

GSI 拥有独立的 GSI RCUs/WCUs DynamoDB Table

A3 A1 A2 A4 A5 (partition key) (table key) A A1 A2 A3 A4 A5 A3 A1 A2 A4 A7 L (partition key) (sort key) (partition key) (table key) L A1 A2 (partition key) (sort key) A3 A1 A2 (partition key) (table key) A1 A2 A6 A4 A5 INCLUDE (partition key) (sort key) A3 A1 A2 A2 (partition key) (table key) A1 A2 A3 A4 A7 (partition key) (sort key) A3 A1 (partition key) (table key) KEYS_O A3 A1 NLY (partition key) (table key) • 任意的分区(+排序) 键 • 索引是建立在整个表上的 (相对于LSI) • 可以随时创建或删除 GSI DynamoDB Table LSI Limit 5 Limit 5

GSI 拥有独立的 RCUs/WCUs 10 GB max per partition key, i.e. A1 A2 A3 A4 A5 LSIs limit the # of sort keys! A3 A1 A2 A4 A5 (partition key) (sort key) (partition key) (table key) A1 A2 A (partition key) (sort key) A3 A1 A2 A4 A7 L A1 A3 A2 A4 A5 (partition key) (table key) L A1 A2 A6 A4 A5 (partition key) (sort key) (partition key) (sort key) A3 A1 A2 A1 A2 A3 A4 A7 (partition key) (table key) INCLUDE (partition key) (sort key) A1 A4 A2 A3 A5 A3 A1 A2 A2 (partition key) (sort key) (partition key) (table key)

A3 A1 Partition Key Sort A1 A5 A2 A3 A4 (partition key) (table key) Key (partition key) (sort key) KEYS_O 必须 A3 A1 NLY 可选 (partition key) (table key) Key-value access Model 1:N relationships pattern Enables rich query Determines data capabilities distribution Partition 分区 Partition A Partition Partition C Partition Partition B Partition 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 00 55 FF AA

CustomerOrdersTable FF 0 Hash.MIN Hash.MIN = Hash.MAX=

Keyspace ) = 7B ) ) = CD) ) = 48 ) 1 3 2 Hash( Hash( Hash( 分区行为 ] ] ]

:

B00U3FPN4U B00OQVZDJM B00X4WHP5E [ OrderId: 3 OrderId: 3 CustomerId: ASIN ASIN: ASIN: [ [ 2 OrderId: CustomerId: 4 OrderId: 1 OrderId: CustomerId: 1 ASIN:

CustomerOrdersTable Partitioning % % % % % % 16.66 % 16.66 16.66 % 16.66 16.66 % 16.66 16.66 16.66 % 16.66 16.66 % 16.66 16.66 16.66 % 16.66 16.66 : Partition F Partition Partition E Partition Partition A Partition B Partition C Partition D Partition *=subject change *=subject to 3w+ 1r 3000 < * Time Where=w WCU& = r RCU Split for provisioned capacity 每个分区有一定的吞吐量上限 Partition A Partition Partition C Partition Partition B Partition 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 因为预设吞吐量提升发生的分割 00 55 AA FF % % 16.66 % 16.66 16.66 % 16.66 16.66 Partition A Partition Partition B Partition 的数据, 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 size 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 Partition E Partition Partition D Partition 10GB partition Time 发生分区的分割 Split for 一旦超过这个上限,即有可能 每个分区能存放 Partition A Partition Partition B Partition C Partition 分区的分割 因为数据量增长而发生的分割 % Keyspace 33.33 Capacity % Provisioned 33.33 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 33.33 % % Keyspace 33.33 Capacity % Provisioned 33.33 00 55 AA FF split CustomerOrdersTable FF 0

Hash.MIN Hash.MIN = Hash.MAX= Partition Keyspace 分区的冗余存储

3-way replication OrderId: 1 Hash(1) = 7B CustomerId: 1 ASIN: [B00X4WHP5E]

Availability Zone A Availability Zone B Availability Zone C

54: A9: 54: A9: 00: 54: 55: A9: AA: FF: 00: 55: AA: FF: 00: 55: AA: FF: ∞ ∞ ∞ ∞ 0 ∞ 0 ∞ 0 ∞ 0 0 0 ∞ 0 0 0 ∞

Partition A Partition B Partition C Partition A Partition B Partition C Partition A Partition B Partition C 1000 RCUs 1000 RCUs 1000 RCUs 1000 RCUs 1000 RCUs 1000 RCUs 1000 RCUs 1000 RCUs 1000 RCUs 100 WCUs 100 WCUs 100 WCUs 100 WCUs 100 WCUs 100 WCUs 100 WCUs 100 WCUs 100 WCUs

Host A Host B Host C Host E Host F Host G Host H Host I Host J CustomerOrdersTable Throttling 请求受限 如何发现的?

CloudWatch throttling metrics 全表实际消耗吞吐量 > 预设吞吐量 来自分区的Throttling 内部工具: Heatmap 热图分析 (热键)

Top Items

• Fire TV Stick • Echo Dot – White • Echo Dot – Black • Kindle Paperwhite • Amazon Fire TV • Fire Tablet with Alexa • – • Fire HD 8 Tablet with Black A… • Fire HD 8 • Fire HD 8 Tablet with Partitions A…

Time Heat 内部工具: Heatmap 热图分析 (访问量陡增)

Alarm setting Partitions

Time Heat DynamoDB 如何应对陡增的访问量? • 以分区为单位,DynamoDB 可以积累至300秒未使用的吞 吐量供一次性访问陡增使用

Bursting is best effort! 访问模式在热图之体现

1. 分区键的选择: 高度唯一

2. 访问方式:均匀分布于各个分区 3. 访问量随时间均匀分布(相对于陡增的访问量) 最佳热图: 时间+空间均匀分布 Partitions

Heat Time Time Time IOPS ‘Dilution’ 吞吐量“稀释”

Partition 1 Total Provisioned Throughput / Partitions Partition 2 = Throughput Per Partition

Partition 3 10000 RCUs / 100 Partitions = 100 RCUs Per Partition Partition 4

Partition 5 100 RCUs / 100 Partitions Client 1 RCUs Per Partition Request Partition 6

Partition N 几个问题

• 如何应对数据量的持续增长?(Time series data) • 如何应对“热”数据? (Popular item) • 如何应对陡增的访问模式? (Bursting traffic) 基于场景的最佳实践 Event Logging 事件日志

Storing time series data 大量的时间戳日志的存储 Time series tables 时间系列数据表

Events_table_2016_April Event_id Timestamp Attribute1 …. Attribute N RCUs = 10000

Current table (Partition key) (sort key) WCUs = 10000 Hot data Hot

Events_table_2016_March RCUs = 1000 Event_id Timestamp Attribute1 …. Attribute N (Partition key) (sort key) WCUs = 100

Older tables Events_table_2016_Feburary RCUs = 100 Event_id Timestamp Attribute1 …. Attribute N

(Partition key) (sort key) WCUs = 1 Cold data Cold Events_table_2016_January RCUs = 10 Event_id Timestamp Attribute1 …. Attribute N (Partition key) (sort key) WCUs = 1

冷热数据分开存放; 可以将不常访问的数据存储至 S3 DynamoDB Streams 流

在出现分区分割时也能保 持良好的沿袭性

DynamoDB Streams KCL Shard Worker Partition A 1  流记录将按照对该项目进行的实际修改的 顺序显示 KCL  每个流记录仅在流中显示一次 Shard Worker  持续并可扩展 Partition B 2  流记录保留24小时  Sub-second latency KCL  与Kinesis Client Library兼容 Partition C Shard Worker Updates 3 GetRecords Amazon Kinesis Client Library Amazon DynamoDB DynamoDB Streams Application Table Stream Time-To-Live (TTL)

 使用TTL标记的Item在  可以使用IAM进行精细化访问控制, 并  免费删除标记项 DynamoDB Streams 且可以使用AWS CloudTrail进行监控 可以被识别 Time-To-Live (TTL)

Time-To-Live 将不相干的数据删除 后台程序根据标记的unix时间 戳将Item删除。这一操作不消 耗任何吞吐量,即免费进行 CustomerActiveOrder

OrderId: 1 Amazon Kinesis CustomerId: 1 MyTTL: 1492641900 TTL job

Amazon Redshift Amazon DynamoDB DynamoDB Streams Table Stream 使用DynamoDB TTL和Stream对冷数据进行删除并存档

Events_table_2016_April Event_id Timestamp myTTL …. Attribute N RCUs = 10000

Current table (Partition key) (sort key) 1489188093 WCUs = 10000 Hot data Hot

Events_Archive Event_id Timestamp Attribute1 …. Attribute N RCUs = 100

(Partition key) (sort key) WCUs = 1 Cold data Cold 对于会潜在出现冷热区分的数据处理

• 创建基于日,周,月的表格 • 对当前表预设所需的吞吐量 • 写入当前表 • 关闭(或者降低) 非当前表的预设吞吐量 或 将数据移至单独表格并使用TTL

处理时间系列数据时 Product Catalog 产品目录

Popular items (read) “热”数据 (读取) 伸缩瓶颈 SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT" 100,000 푅퐶푈 ≈ ퟐퟎퟎퟎ 푅퐶푈 푝푒푟 푝푎푟푡푖푡푖표푛 50푃푎푟푡푖푡푖표푛푠 Shoppers

Partition 1 Partition K Partition M Partition 50 2000 RCUs 2000 RCUs 2000 RCUs 2000 RCU

Product A Product B

ProductCatalog Table Requests Per Second Request Distribution Per Partition Key Partition Per Distribution Request DynamoDBRequests Item Primary Key Heatmap 热图体现

Top Items

• Fire TV Stick • Echo Dot – White • Echo Dot – Black • Kindle Paperwhite • Amazon Fire TV • Fire Tablet with Alexa • Amazon Echo – • Fire HD 8 Tablet with Black A… • Fire HD 8 • Fire HD 8 Tablet with Partitions A…

Time Heat SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT"

User User

DynamoDB

Partition 1 Partition 2

ProductCatalog Table Requests Per Second RequestDistribution Per PartitionKey DynamoDBRequests Item Primary Key Cache Hits DynamoDB Accelerator (DAX) Available Today Features • 完全托管: 用户不用担心任何升级,软件管理

• 灵活: 一个DAX可对应多个表格进行缓存 Your Applications

• 高可用: 容错, 在多个可用区跨区备份 New! DynamoDB Accelerator (DAX) • 扩展性强: 根据负载可横向扩展至10个缓存副本

• 可管理: 与AWS其他服务完美兼容: Amazon CloudWatch, Tagging for DynamoDB, AWS Console

• 安全: 支持Amazon VPC, AWS IAM, AWS CloudTrail, AWS Organizations

DynamoDB DynamoDB Accelerator (DAX) Available Today Key Benefits • 超强性能: 单个DAX集群读取延迟在微秒级

Your Applications • 使用简单: DynamoDB API 兼容- 从而对现有应 用程序的代码修改量达到最小 New! DynamoDB Accelerator (DAX) • 降低成本: 减少对DynamoDB 表的直接读取访 问从而降低所需吞吐量,适合存有热数据的表格

DynamoDB Messaging App 聊天应用

Large items Filters vs. indexes M:N Modeling—inbox 收件箱 and outbox 发件箱 David

Messages App

Inbox Outbox

SELECT * SELECT * FROM Messages FROM Messages WHERE Recipient='David' WHERE Sender ='David' LIMIT 50 LIMIT 50 ORDER BY Date DESC ORDER BY Date DESC Messages Table 大、小attributes 混合 Inbox David SELECT * FROM Messages Messages Table WHERE Recipient='David' LIMIT 50 Recipient Date Sender Message ORDER BY Date DESC David 2016-10-02 Bob … … 48 more messages for David … 50 items ×256 KB each David 2016-10-03 Alice … Alice 2016-09-28 Bob … Alice 2016-10-01 Carol … 信息主体附件size特别大 (Many more messages) 查询成本计算

Average item size Eventually consistent reads

Items evaluated by query Conversion ratio 将“大数据”剥离 Uniformly distributes large item reads

(50 sequential items at 128 bytes) 1. 查询 Inbox-GSI: 1 RCU David 2. BatchGetItem Messages: 1600 RCU (50 separate items at 256 KB)

Inbox-GSI Messages Table

Recipient Date Sender Subject MsgId MsgId Body David 2016-10-02 Bob Hi!… afed 9d2b … David 2016-10-03 Alice RE: The… 3kf8 3kf8 … Alice 2016-09-28 Bob FW: Ok… 9d2b ct7r … Alice 2016-10-01 Carol Hi!... ct7r afed … 简化写入操作 PutItem { MsgId: 123, Body: ..., Recipient: Steve, David Sender: David, Date: 2016-10-23, ... }

Inbox Messages Global secondary Table index Messaging app

David Inbox Outbox

Inbox Outbox Global secondary Messages Global secondary index Table index 分散“大数据”

减少one-to-many 的数据size 创建GSI 利用GSIs 处理sender和recipient之间的 M:N 关系

Outbox Messages Inbox

同时访问大量“大数据” Real-Time Voting 实时投票系统

Write-heavy items 高写入访问 投票系统设计需求

• 每个账户只允许投票一次 • 不允许修改已经投出的选票 • 实时分析 • 选举人分布统计 实时投票系统架构设计

RawVotes Table

Voters Voting App

AggregateVotes Table 伸缩瓶颈

Voters

Provision 200,000 WCUs

Partition 1 Partition K Partition M Partition N 1000 WCUs 1000 WCUs 1000 WCUs 1000 WCUs

Candidate A Candidate B

Votes Table 写入分片

Voter

Candidate A_7 Candidate B_4 Candidate B_8 Candidate A_1 Candidate A_4 Candidate B_5 Candidate B_1 Candidate A_5 Candidate B_3 Candidate B_7 Candidate A_3 Candidate A_2

Candidate A_6 Candidate A_8 Votes Table Candidate B_2 Candidate B_6 写入分片设计

Voter

UpdateItem: “CandidateA_” + hash(voterID)%10 ADD 1 to Votes

Candidate A_7 Candidate B_4 Candidate B_8 Candidate A_1 Candidate A_4 Candidate B_5 Candidate B_1 Candidate A_5 Candidate B_3 Candidate B_7 Candidate A_3 Candidate A_2

Candidate A_6 Candidate A_8 Votes Table Candidate B_2 Candidate B_6 分片统计

2. Store Voter Periodic Process 1. Sum

Candidate A_7 Candidate B_8 Candidate A Candidate B_4 Candidate A_1 Candidate A_4 Total: 2.5M Candidate B_5 Candidate B_1 Candidate A_5 Candidate B_3 Candidate B_7 Candidate A_3 Candidate A_2

Candidate A_6 Candidate A_8 Votes Table Candidate B_2 Candidate B_6 将高写入访问的分区再分片

以读换写(扩展性)” 考虑每个分区键的IOPS以及每个分区的IOPS

写入负载不能很好地横向扩展 2017.7 2017.7 2017.7 2017.7 2017.7