Nosql? No Worries: Dynamodb and Elasticache
Total Page:16
File Type:pdf, Size:1020Kb
NoSQL? No Worries: DynamoDB and ElastiCache Dan Zamansky, Sr. Product Manager, AWS Siva Raghupathy, Principal Solutions Architect, AWS Agenda • NoSQL • Why managed database service? • DynamoDB • ElastiCache • Takeaways NoSQL NoSQL Benefits Constraints • Schema less • No cross table/item • Highly Scalable transactions – Size • No complex queries or – Throughput joins • Highly Available NoSQL available on AWS Managed Unmanaged • Amazon DynamoDB • Apache Cassandra • Amazon ElastiCache • MongoDB – Memcached • CouchDB – Redis • Riak • …. Why managed database services? If you host your databases on-premises App optimization Scaling High availability Database backups DB s/w patches DB s/w installs OS patches OS installation Server maintenance Rack & stack Power, HVAC, net you If you host your databases on-premises App optimization Scaling High availability Database backups DB s/w patches DB s/w installs OS patches OS installation Server maintenance Rack & stack Power, HVAC, net you If you host your databases on Amazon EC2 App optimization Scaling High availability Database backups OS installation DB s/w patches Server maintenance DB s/w installs Rack & stack OS patches Power, HVAC, net you If you host your databases on Amazon EC2 App optimization Scaling High availability Database backups OS installation DB s/w patches Server maintenance DB s/w installs Rack & stack OS patches Power, HVAC, net you If you choose a managed DB service Scaling High availability Database backups DB s/w patches DB s/w installs OS patches OS installation Server maintenance Rack & stack App optimization Power, HVAC, net you Who uses AWS Managed Database Services? Amazon DynamoDB Amazon DynamoDB • Managed NoSQL database service • Accessible via Simple and Powerful APIs • Supports both document and key-value data models • Highly scalable • Consistent, single-digit millisecond latency at any scale • Highly durable & available - 3x replication • No table size or throughout limits Table Table Items Attributes All items for a hash key Hash Range ==, <, >, >=, <= Key Key “begins with” Mandatory “between” Key-value access pattern sorted results Determines data distribution Optional counts Model 1:N relationships Enables rich query capabilities top/bottom N values paged responses Provisioned Throughput Model • Throughput provisioned at the table level – Write capacity units (WCU) are measured in 1 KB per second – Read capacity units (RCU) are measured in 4 KB per second • RCUs measure strictly consistent reads • Eventually consistent reads cost 1/2 of consistent reads • WCU and RCU are independent RCU WCU • Consumed capacity is measured per operation Scaling Partition 1 Partition 2 • Scaling is achieved through partitioning Partition 3 Partition 4 • Tables are partitioned for – Throughput Partition N • Provision any amount of throughput to a table – Size Table • Add any number of items to a table Indexing User-files-table User File Date Shared Size (hash) (range) • Local Secondary Index File-size-LSI – Local to a hash key User Size File Date – Alternate range key (hash) (range) (table key) (projected) Shared-files-GSI • Global Secondary Index Shared User File Date – Across all hash keys (hash) (table key) (table key) (projected) – Alternate hash (+range) key Data types • String (S) • Boolean (BOOL) • Number (N) • Null (NULL) • Binary (B) • List (L) • Map (M) • String Set (SS) • Number Set (NS) Used for storing nested JSON documents • Binary Set (BS) DynamoDB Table and Item API • CreateTable • GetItem • UpdateTable • Query DynamoDB Streams API • DeleteTable • Scan • BatchGetItem • ListStreams • DescribeTable • DescribeStream • ListTables • PutItem • GetShardIterator • UpdateItem • GetRecords • DeleteItem • BatchWriteItem DynamoDB Streams • Stream of updates to • Highly durable a table • Scale with table • Asynchronous • 24-hour lifetime • Exactly once • Sub-second latency • Strictly ordered – Per item DynamoDB Streams and AWS Lambda Cross-region replication US East (N. Virginia) DynamoDB Streams Asia Pacific (Sydney) Open Source Cross- EU (Ireland) Replica Region Replication Library Data & Access Modeling Store data based on how you will access it! 1:1 relationships or key-values • Use a table or GSI with a hash key • Use GetItem or BatchGetItem API Example: Given a user or email, get attributes Users Table Hash key Attributes UserId = bob Email = [email protected], JoinDate = 2011-11-15 UserId = fred Email = [email protected], JoinDate = 2011-12-01 Users-Email-GSI Hash key Attributes Email = [email protected] UserId = bob, JoinDate = 2011-11-15 Email = [email protected] UserId = fred, JoinDate = 2011-12-01 1:N relationships or parent-children • Use a table or GSI with hash and range key • Use Query API Example: – One device has many readings – For DeviceId = 1, find all readings where epoch >= 1435457946 Device-measurements Hash Key Range key Attributes DeviceId = 1 epoch = 1435457946 Temperature = 30, pressure = 90 DeviceId = 1 epoch = 1435457960 Temperature = 32, pressure = 91 DeviceId = 2 epoch = 1435458028 Temperature = 32, pressure = 91 N:M relationships • Use a table and GSI with hash and range key elements switched • Use Query API Example: Given a user, find all games. Or given a game, find all users. User-Games-Table Game-Users-GSI Hash Key Range key Hash Key Range key UserId = bob GameId = Game1 GameId = Game1 UserId = bob UserId = fred GameId = Game2 GameId = Game2 UserId = fred UserId = bob GameId = Game3 GameId = Game3 UserId = bob Documents (JSON) Javascript DynamoDB string S • New data types (M, L, BOOL, number N NULL) introduced to support boolean BOOL JSON null NULL • Document SDKs array L – Simple programming model object M – Conversion to/from JSON – Java, JavaScript, Ruby, .NET • Cannot index (S,N) elements of a JSON object stored in M – They need to be modeled as top-level table attributes to be used in LSIs or GSIs DynamoDB use cases - IoT case class CameraRecord( cameraId: Int, // hash key ownerId: Int, subscribers: Set[Int], hoursOfRecording: Int, ... ) case class Cuepoint( cameraId: Int, // hash key Video: timestamp: Long, // range key https://youtu.be/-0FtKBgYiik?t=79 type: String, ... ) DynamoDB use cases - AdTech Requirements: – Low <5ms response time – 1,000,000+ global requests/second – 100B items DynamoDB table HashKey RangeKey Value Video: Key Segment 1234554343254 https://youtu.be/qV7yAwcMtYE?t=598 Key Segment1 1231231433235 DynamoDB use cases - Retail Video: https://youtu.be/AHk3RhrETi4?t=1616 Amazon DynamoDB Best Practices • Keep item size small Events_table_2012 – Compress large items Event_id Timestam Attribute1 …. Attribute N (Hash key) p – Store metadata in Amazon DynamoDB and large (range key) blobs in Amazon S3 Events_table_2012_05_week1 • Use table per day, week, month etc. for Event_idEvents_table_2012_05_week2Timestam Attribute1 …. Attribute N (Hash key) Event_id p Timestam Attribute1 …. Attribute N (range key) storing time series data (HashEvents_table_2012_05_week3 key) p Event_id (rangeTimestam key) Attribute1 …. Attribute N (Hash key) p • Use conditional updates for de-duping & (range key) versioning • Avoid hot keys and hot partitions Amazon ElastiCache Why In-Memory? ms μs Why In-Memory? • Everything is connected - Phones, Tablets, Cars, Air Conditioners, Toasters • Demand for real-time performance – online games, AdTech, eCommerce, social apps etc. • Load is spikey and unpredictable • DB performance often the bottleneck Amazon ElastiCache • AWS Managed service that lets you easily create, use and scale in-memory key-value stores in the cloud and it comes in two flavors: Memcached Memcached In-memory key-value datastore Insanely fast! Patterns for sharding Slab allocator No persistence Supports strings, objects Very established Multi-threaded Redis In-memory key-value datastore Ridiculously fast! Pub/sub functionality More like a NoSQL db http://redis.io/commands Persistence Supports data types snapshots or append-only log strings, lists, hashes, sets, sorted sets, bitmaps & HyperLogLogs Read replicas Single-threaded Atomic operations supports transactions has ACID properties Memcached or Redis? Memcached Redis Simple caching to offload DB burden Ability to scale horizontally Yes with Redis 3.0 Multithreaded performance Advanced data types Sorting/Ranking data sets Pub/Sub capability HA through replication Persistence How can I leverage In-Memory? Key Use Cases Caching App Reads Cache ElastiCache Clients Updates Database Reads Amazon RDS Elastic Load Balancing Database Writes EC2 App Instances DynamoDB Be Lazy # Python pseudocode def get_user(user_id): # Check the cache record = cache.get(user_id) if record is None: # Run a DB query record = db.query("select * from users where id = ?", user_id) cache.set(user_id, record) return record # App code user = get_user(17) Write-back Caching # Python def save_user(user_id, values): # Save to DB record = db.query("update users ... where id = ?", user_id, values) # Push into cache cache.set(user_id, record) return record # App code user = save_user(17, {"name": "Nate Dogg"}) Leaderboards - Redis • East to implement using Sorted Sets Example • Simultaneously guarantees: ZADD "leaderboard" 1201 "Gollum” – uniqueness and ordering ZADD "leaderboard" 963 "Sauron" ZADD "leaderboard" 1092 "Bilbo" def save_score(user, score): ZADD "leaderboard" 1383 "Frodo” redis.zadd("leaderboard", score, user) def get_rank(user) ZREVRANGE "leaderboard" 0 -1 return redis.zrevrank(user) + 1 1) "Frodo" 2) "Gollum" Not if I 3) "Bilbo" It’s destroy 4) "Sauron” mine! it first! ZREVRANK "leaderboard" "Sauron" (integer)