Revisiting Concurrency in High-Performance Nosql Databases

Revisiting Concurrency in High-Performance NoSQL Databases

Yuvraj Patel1, Mohit Verma2, Andrea C. Arpaci-Dusseau1, Remzi H. Arpaci-Dusseau1 University of Wisconsin-Madison1, NVIDIA2 Technological changes

Faster storage devices

DRAM NVM SSD HDD Read Latency 100-200 ns 500ns-2us 20-50 us 5-10ms Write Bandwidth 20-50 GB/sec 5-10 GB/s per die 200-500 MB/s 50-100 MB/sec

More processing power available per node

Year 2006 2010 2014 2019 # of cores 2 8 32-64 128

7/10/2018 2 Concurrency control

Concurrency control Uncoordinated access Coordinated access • Act of coordinating concurrent accesses Threads Threads Key to scaling • Decades worth of effort • Optimistic concurrency control • Adaptive concurrency control Shared Concurrency • Many other ways….. Data control

Correctness??? Shared Data

7/10/2018 3 Key Question

Concurrency control designed based on older technology • Very few cores • Hard disk Is concurrency control a bottleneck? • Can current techniques work well with newer technological changes? • Is it time to revisit concurrency control?

7/10/2018 4 Answer – Quantitative study

We study performance scaling of 5 popular systems • MongoDB, Cassandra, CouchDB, ArangoDB, Oracle NoSQL DB Measure actual and zero-contention concurrency control(ZCCC) scaling performance on single node Observation • No system scales well • Presence of faster storage device does not help • Difference between ZCCC and Actual is more than 3X

7/10/2018 5 Answer – Qualitative study

Qualitative study of techniques used by these systems • Categorize to understand weaknesses • Thread Architecture, Batching, Granularity, Partitioning, Scheduling, Low-level efficiency Observations • Batching, Thread Architecture – Used by all • Partitioning, Scheduling not used efficiently for scaling • Scope for improvement to optimize common cases

7/10/2018 6 Our Solution

We present Xyza – an extension of MongoDB • Concentrate on Partitioning, Scheduling & Low-level efficiency • Per client and key-space partitioning • Two novel scheduling techniques • Contention-aware scheduling & Semantic-aware scheduling • Optimize common cases using atomics Performance • 2X to 3X faster than MongoDB • Xyza single instance performance 0.8X-0.9X that of ZCCC performance

7/10/2018 7 Outline

• Overview • Concurrency Analysis • Xyza Design • Evaluation • Conclusion

7/10/2018 8 Zero-contention concurrency control(ZCCC)

Performance increases linearly as the workload increases Saturates once the cpu/disk/network/memory exhausts 8 Saturation 6

0 Throughput (ops/sec) 12345678910 Workload Capacity 7/10/2018 9 Concurrent application

Today’s concurrent application architecture Clients • Clients connect to server • Application – multithreaded • Multi/Many processing cores • One or more disks Multi-threaded application • Memory Multi-core C1 C2 C3 C4 processing

Disks

7/10/2018 10 Identify zero-contention concurrency control

For ZCCC, • Hypothesis - Less contention of Clients resources by perfectly partitioning How to reduce contention of resources? Multi-threaded • Instantiate multiple instances of application the same application Multi-core • CPU/disk/memory/network still C1 C2 C3 C4 stressed processing Disks

7/10/2018 11 Actual vs ZCCC

Why writes? • Modify global structures frequently • Hard to scale Why weakest consistency option? • Designed to achieve high performance Workload • 50M key-value pair inserts, value - 100 bytes Setup • 2 socket 8-core hyperthreaded, 1 480 GB SSD, 128 GB RAM, 10Gbps N/w

7/10/2018 12 Results - MongoDB

1200 Disk b/w ranges 26-350 MB/s CPU utilization ranges 6-90 % 1000 ZCCC 800 600 Near zero contention helps in scaling 400 performance 200

Throughput (Kops/sec) 0 1 2 5 10 15 20 25 30 35 No. of Clients

7/10/2018 13 Results - MongoDB

1200 Disk b/w ranges 26-350 MB/s Single Instance CPU utilization ranges 6-90 % 1000 ZCCC CPU utilization increases even 800 though throughput is saturated pointing to resource contention 600 Near zero contention helps in scaling 400 performance 200 Disk b/w ranges 25-130 MB/s

Throughput (Kops/sec) CPU utilization ranges 6-85 % 0 1 2 5 10 15 20 25 30 35 No. of Clients

7/10/2018 14 Results – Cassandra

400 Single Instance ZCCC (2 instances) 300

200

100

Throughput (Kops/sec) 0 1 50 100 150 200 250 300 420 No. of Clients

7/10/2018 15 Results – Oracle NoSQL DB

120 Single Instance 100 ZCCC (10 instances) 80 60 40 20 Throughput (Kops/sec) 0 1 2 5 10 15 20 25 30 No. of Clients

7/10/2018 16 Results – CouchDB

12 Single Instance 10 ZCCC 8 6 4 2

Throughput (Kops/sec) 0 1 2 5 10 25 50 No. of Clients

7/10/2018 17 Results – ArangoDB

50 Single Instance 40 ZCCC 30 20 10

Throughput (Kops/sec) 0 1 5 10 20 40 60 80 100 120 No. of Clients

7/10/2018 18 Categorization

Study 5 systems qualitatively • Analyze code, design and architecture documents • Categorize the techniques - Understand the weaknesses 6 categories • Thread architecture, Batching, Granularity, Partitioning, Scheduling, Low-level efficiency (Read the paper) Observation • Partitioning and Scheduling not used efficiently • Scope to improve common cases performance

7/10/2018 19 Analysis - Summary

Quantitative study shows • Resource competition kills actual performance • ZCCC performance scales as hypothesized • Storage and Network not a bottleneck Qualitative study shows • Partitioning & Scheduling not used efficiently • There is a need to optimize for common cases Systems designed for slow-storage media than faster devices

7/10/2018 20 Outline

• Overview • Concurrency Analysis • Xyza Design • Evaluation • Conclusion

7/10/2018 21 Xyza Design

Xyza – extension of MongoDB Why MongoDB • Certain design ideas useful for Xyza • Scope for improvements • Popular and heavily used Xyza design concentrated on partitioning, scheduling and low-level efficiency

7/10/2018 22 MongoDB architecture

MongoDB’s concurrency architecture diagram Global Shared Data Per Client thread Checkpointing, V Vector Clients Logging thread CPU 1 J Journal

B-tree CPU 2 (per collection) Key-Space Disk Cache

7/10/2018 23 MongoDB architecture problems

MongoDB’s concurrency architecture diagram Global Shared Data Per Client thread Checkpointing, V Vector Clients Logging thread CPU 1 J Journal

B-tree CPU 2 (per collection) Key-Space Disk Cache

7/10/2018 24 Xyza - Partitioning

Xyza’s concurrency architecture diagram  Take advantage Per client data of per client Per Client thread Checkpointing, thread V1 V2 Vector Clients Exclusive Logging thread CPU 1 access J1 J2 Journal

B-tree CPU 2 (per collection) Key-Space Disk Cache

7/10/2018 25 Xyza - Partitioning

B-tree CPU 2  Key-space partitioning (per collection) facilitates execution of K1 K2 non-overlapping Disk operations Cache

7/10/2018 26 Xyza - Partitioning

Xyza’s concurrency architecture diagram  Take advantage Per Client thread Per client data of per client Checkpointing, thread CPU 1 V1 V2 Vector Clients Exclusive Logging thread access J1 J2 Journal

Thread Pinning B-tree

 Key-space partitioning (per collection) facilitates execution of K1 K2 non-overlapping CPU 2 Disk operations Cache

7/10/2018 27 Xyza – Contention-aware scheduling

Xyza’s concurrency architecture diagram Pinned Per Partitioned Data  Locks/partition as Client thread S Checkpointing, resource c V1 V2 Vector Logging thread Clients CPU 1  h Exclusive access per Journal key-space partition e J1 J2 d  Execute operations that will not contend u for same resources l B-tree i (per collection)  Overlapping n KS1 KS2 operations wait g Disk CPU 2 Cache 7/10/2018Scheduling - form of primitive synchronization 28 Xyza scheduling example – No conflict

Different partition access Pinned Per Partitioned Data Client thread S Checkpointing, V1 V2 Vector Clients CPU 1 c Logging thread h Key:1 J1 J2 Journal Value:1 e d u l B-tree i (per collection) n KS1 KS2 g Disk CPU 2 Cache

7/10/2018 29 Xyza scheduling example – No conflict

Different partition access Pinned Per Partitioned Data Client thread S Checkpointing, V1 V2 Vector Clients CPU 1 c Logging thread h Key:1 J1 J2 Journal Value:1 e d Key:10 u Value:2 l B-tree i (per collection) n KS1 KS2 g Disk CPU 2 Cache

7/10/2018 30 Xyza scheduling example - Conflict

Same partition access Pinned Per Partitioned Data Client thread S Checkpointing, V1 V2 Vector Clients CPU 1 c Logging thread h Key:1 J1 J2 Journal Value:1 e d u l B-tree i (per collection) n KS1 KS2 g Disk CPU 2 Cache

7/10/2018 31 Xyza scheduling example - Conflict

Same partition access Pinned Per Partitioned Data Client thread S Checkpointing, V1 V2 Vector Clients CPU 1 c Logging thread h Key:1 J1 J2 Journal Value:1 e d Key:2 u Value:2 l B-tree i (per collection) Second operation waits n KS1 KS2 for the first one g Disk to complete CPU 2 Cache

7/10/2018 32 Xyza - Semantic-aware scheduling

Same partition access Pinned Per Partitioned Data Client thread S Checkpointing, V1 V2 Vector Clients CPU 1 c Logging thread Whath can be done when the operation Key:1 J1 J2 Journal Value:1 e waits for its turn? d Key:2 Semantic-awareu scheduling is the Value:2 l answer (Read the paper)B-tree Second i (per collection) operation waits n KS1 KS2 for the first one g Disk to complete CPU 2 Cache

7/10/2018 33 Xyza – Low-level efficiency

MongoDB • Use atomics internally • Lock manager for multiple-granularity locking

Xyza Databases • Lock manager replaced • Simple wait-signal mechanism Collections • Optimize common case • Mutex locks heavy compared to atomics • Bypass mutex - Instead use atomics Documents

7/10/2018 34 Outline

• Overview • Concurrency Analysis • Xyza Design • Evaluation • Conclusion

7/10/2018 35 Evaluation

1200 Single Instance 2X faster *Same workload & setup as earlier ZCCC MongoDB 1000 Xyza 800 Partitioning and efficient Disk b/w ranges 25-280 MB/s scheduling helps reduce 600 CPU utilization ranges 6-78% contention 400 200 Throughput (Kops/sec) 0 1 2 5 10 15 20 25 30 35 No. of Clients

7/10/2018 36 Evaluation – contention reduction

1200 ZCCC MongoDB 1000 800 600 400 200 Throughput (Kops/sec) 0 1 2 5 10 15 20 25 30 35 No. of Clients

7/10/2018 37 Evaluation – contention reduction

1200 ZCCC MongoDB 1000 Single Instance MongoDB 800 600 400 200 Throughput (Kops/sec) 0 1 2 5 10 15 20 25 30 35 No. of Clients

7/10/2018 38 Evaluation – contention reduction

1200 Single Instance Mongo_PCDB 1000 ZCCC MongoDB Single Instance MongoDB Lock manager stressed even 800 when contention reduced 600 400 200 Throughput (Kops/sec) 0 1 2 5 10 15 20 25 30 35 No. of Clients *PCDB – Per Client Database

7/10/2018 39 Evaluation – contention reduction

0.8-0.9X of ZCCC performance 1400 Single Instance Mongo_PCDB 1200 ZCCC MongoDB 1000 Xyza_PCDB Optimizing common 800 case helps scaling 600 400 200

Throughput (Kops/sec) Less resource contention – key to scaling 0 1 2 5 10 15 20 25 30 35 No. of Clients *PCDB – Per Client Database

7/10/2018 40 Outline

• Overview • Concurrency Analysis • Xyza Design • Evaluation • Conclusion

7/10/2018 41 Conclusion

Today’s NoSQL DB’s do not scale well despite the presence of faster storage medium and high processing capacity We present Xyza that scales well and comes close to ZCCC performance Time to revisit concurrency and consider aggressive concurrency mechanisms The proposed techniques can be applied in many other systems ….WXYZAB….

7/10/2018 42 Thank you

7/10/2018 43