Pelikan with ADP Yao Yue (@Thinkingfish), Twitter, Inc
Total Page:16
File Type:pdf, Size:1020Kb
Pelikan with ADP Yao Yue (@thinkingfish), Twitter, Inc 1 Pelikan: an Open-sourced, Modular Cache 2 Cache @ Twitter Clusters QPS >400 in prod (single-tenant) max 50M (single cluster) Hosts SLO many thousands p999 < 5ms* Instances Protocol tens of thousands Memcached, Redis/RESP, thrift, … Job size Data Structure 2-6 core, 4-48 GiB Simple KV, counter, list, hash, sorted map... 3 A Modular Architecture Rds Twemcache Slimcache rbuf request response wbuf RESP ... Memcache Memcache SArray,List,.. Slab Cuckoo Slab services parse process compose ziplist sarray memcache admin bitmap ... Data Data data structure RESP ... Structure Store slab cuckoo datastore protocols process_request(struct response **, struct request *); High-performance rpc server 4 Intel® Optane™ Pelikan DC Persistent Memory 5 Intel® Optane™ DC Persistent Memory MEMORY MODE APP DIRECT APPLICATION APPLICATION VOLATILE MEMORY POOL DRAM AS CACHE OPTANE PERSISTENT MEMORY DRAM OPTANE PERSISTENT MEMORY Affordable, large volatile Large capacity persistent memory capacity memory No code changes 6 Pelikan with Apache Pass (AEP) Motivation Constraints Cache more data per instance Maintainable Changes ➔ Reduce TCO if memory-bound ➔ Same codebase ➔ Improve hit rate ➔ Non-invasive, retain high-level APIs Persistent data => warmer cache Operability ➔ Improve operations w/ graceful shutdown and faster rebuild ➔ Flexible invocation Predictable performance ➔ Higher availability during ➔ maintenance 7 Data Pool Abstraction with PMDK slab cuckoo datastore Persistent Persistenc Memory e w/ slab cuckoo (file- libpmem backed) (PMDK) Datapool datastore “cc_alloc” DRAM (malloc) 8 Durable Storage with DRAM Compatibility Cuckoo Slab bucket bucket hash table slabs PMEMDRAM 9 Results 10 Benchmark Overview Core Parameters Focus Instance density 18-30 instances / host common ➔ Serving performance (vs. DRAM) Object size Perf scalability with different dataset Between 64 and 2048 bytes, step x2 ➔ size Dataset size Between 4GiB and 32 GiB / instance, step x2 app direct mode # of Connection per Server ➔ Rebuild performance 100 / 1000 app direct vs memory mode R/W ratio Read-only, 90/10, 80/20 Lab vs data center Twemcache-only for this presentation Bottleneck analysis 11 Stage 1: Serving Performance (memory mode) (aka “does this work at all?”) 4 GiB (throughput) 1600 Hardware Config (Intel lab) 4 GiB (latency) 3M · 2 X Intel Xeon 8160 (24) 1400 · 12 X 32GB DIMM 8 GiB (throughput) · 12 X 128GB AEP 8 GiB (latency) 2.5M 1200 · 2-2-2 config ) 16 GiB (throughput) us · 1 X 25Gb NIC , 16 GiB (latency) 2M 1000 · CentOS 7 999 p 32 GiB (throughput) ( Test Config QPS 800 32 GiB (latency) 1.5M · 30 instances per node 600 · key size is 32 byte Latency 1M · connection count is 100 400 · NUMA-aware · 90 R / 10 W 0.5M 200 0 0 64 128 256 512 1024 2048 Value Size 12 Stage 2: Serving Performance (app direct mode) 4 GiB (throughput) 2M 200 Hardware Config (Intel lab) 4 GiB (latency) · 2 X Intel Xeon 8260 (24) · 12 X 32GB DIMM 8 GiB (throughput) · 12 X 128GB AEP 8 GiB (latency) 150 · 2-2-2 config 1.5M ) 16 GiB (throughput) us · 1 X 25Gb NIC 16 GiB (latency) , · CentOS 7 999 p 32 GiB (throughput) ( Test Config QPS 1M 100 32 GiB (latency) · 24 instances per node · key size is 32 byte Latency · connection count is 100 0.5M 50 · NUMA-aware 0 0 64 128 256 512 1024 Value Size 13 Stage 2: Recovery Performance Status Quo Rebuild from AEP Data Availability Single instance ➔ No redundancy in cache by default ➔ 100 GiB of slab data ➔ Some clusters are mirrored ➔ complete rebuild: 4 minutes Backfill Concurrent ➔ Mostly rely on organic traffic ➔ 18 instances per host ➔ “Bootstrapper” bounded by QPS ➔ complete rebuild: 5 minutes Full warmup takes from minutes to ➔ Potential impact days ➔ Speed up maintenance by 1-2 orders Constraints on maintenance of magnitude (often needs other ➔ 20 minute restart interval by default changes) ➔ Large clusters take days to restart 14 Stage 3: Testing In-house (memory mode) 2 keys: 10M / 4GiB Hardware Config (Twitter DC) keys: 20M / 9GiB 100k · 2 X Intel Xeon 6222 (20) keys: 40M / 18GiB · 12 X 16GB DIMM 5 keys: 80M / 37GiB · 4 X 512GB AEP keys: 160M / 74GiB 2 · 2-1-1 config · 1 X 25Gb NIC ) 10k · CentOS 7 us SLO: p999 < 5ms ( 5 Test Config 2 · 20 instances per node Latency · key size is 64 byte 1000 · value size is 256 byte 5 · connection count is 1000 · NUMA-aware 2 · read-only p999 max = 16ms p9999 max = 148ms 100 p25 p50 p75 p90 p99 p999 p9999 throughput 1.08M QPS Percentile 15 Stage 3: Testing In-house (app direct mode) 2 keys: 10M / 4GiB Hardware Config (Twitter DC) keys: 20M / 9GiB 100k · 2 X Intel Xeon 6222 (20) keys: 40M / 18GiB · 12 X 16GB DIMM 5 keys: 80M / 37GiB · 4 X 512GB AEP keys: 160M / 74GiB 2 · 2-1-1 config · 1 X 25Gb NIC ) 10k · CentOS 7 us SLO: p999 < 5ms ( 5 Test Config 2 · 20 instances per node Latency · key size is 64 byte 1000 · value size is 256 byte 5 · connection count is 1000 · NUMA-aware 2 · read-only p999 max = 1.4ms p9999 max = 2.5ms 100 p25 p50 p75 p90 p99 p999 p9999 throughput 1.08M QPS Percentile 16 Conclusion Next Step App direct mode Network ● Changes were modest ● Testing in-house with ADQ ● Can serve all data structures ● Serving performance comparable to Production canary DRAM for tested Twitter workloads ● Will we see the same performance? ● Recovery performance was good ● How does larger heap affect hit rate? Memory mode ● Fully-loaded config performs like Performance DRAM ● Scaling with connection counts ● Less scalable w/ wimpier config ● Profiling, especially for memory mode Bottleneck ● Tuning data structure/storage ● Network is still primary design ● Testing AEP with pelikan_rds 17 Further Read Contributors Pelikan Thank you Intel Team! ● Redis at Scale Ali Alavi (@TheAliAlavi), Andy Rudoff ● Caching with Twemcache (@andyrudoff), Jakub Schmiegel, Jason ● Why Pelikan Harper, Michal Biesek, Mauricio Cuervo ● Pelikan Github (@mauriciocuervo), Piotr Balcer, Usha Upadhyayula Cache w/ AEP ● Redis-pmem Thank you Twitter Team! ● Memcached with pmem Brian Martin (@brayniac), Kevin Yang (@kevjyang), Matt Silver (@msilver) #collaborate 18.