Shared-Memory Synchronization
Total Page:16
File Type:pdf, Size:1020Kb
Shared-Memory Synchronization MIchael L. Scott University of Rochester SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #1 &MC Morgan& cLaypool publishers ABSTRACT Ever since the advent of time sharing in the 1960s, designers of concurrent and parallel systems have needed to synchronize the activities of threads of control that share data structures in memory. In recent years, the study of synchronization has gained new urgency with the proliferation of multicore processors, on which even relatively simple user-level programs must frequently run in parallel. This monograph offers a comprehensive survey of shared-memory synchronization, with an emphasis on “systems-level” issues. It includes sufficient coverage of architectural details to understand correctness and performance on modern multicore machines, and sufficient coverage of higher-level issues to understand how synchronization is embedded in modern programming languages. The primary intended audience is “systems programmers”—the authors of operating systems, library packages, language run-time systems, and server and utility programs. Much of the discussion should also be of interest to application programmers who want to make good use of the synchronization mechanisms available to them, and to computer architects who want to understand the ramifications of their design decisions on systems- level code. KEYWORDS Atomicity, barriers, busy-waiting, conditions, locality, locking, memory mod- els, monitors, multiprocessor architecture, nonblocking algorithms, scheduling, semaphores, synchronization, transactional memory. iii To Kelly, my wife and partner of more than 30 years. iv Contents 1 Introduction .................................................................1 1.1 Atomicity.................................................................3 1.2 Condition Synchronization . 5 1.3 Spinning vs. Blocking . 6 1.4 Safety and Liveness . 7 1.5 The Rest of this Monograph . 8 2 Architectural Background ...................................................9 2.1 Cores and Caches: Basic Shared-Memory Architecture . 9 2.1.1 Temporal and Spatial Locality . 11 2.1.2 Cache Coherence. .12 2.1.3 Processor (Core) Locality . 13 2.2 Cache Consistency . 14 2.2.1 Sources of Inconsistency. .14 2.2.2 Memory Fence Instructions . 16 2.2.3 Example Architectures . 17 2.3 Atomic Primitives . 18 2.3.1 The ABA Problem . 21 2.3.2 Other Synchronization Hardware . 23 3 Some Useful Theory ........................................................24 3.1 Safety....................................................................24 3.1.1 Deadlock Freedom . 25 3.1.2 Atomicity . 26 3.2 Liveness..................................................................33 3.2.1 Nonblocking Progress . 34 CONTENTS v 3.2.2 Fairness. .36 3.3 The Consensus Hierarchy. .37 3.4 Memory Models . 38 3.4.1 Formal Framework . 39 3.4.2 Data Races . 41 3.4.3 Real-World Models. .42 4 Practical Spin Locks ........................................................44 4.1 Classical load-store-only Algorithms . 44 4.2 Centralized Algorithms. .47 4.2.1 Test and set Locks.................................................48 4.2.2 The Ticket Lock . 50 4.3 Queued Spin Locks. .51 4.3.1 The MCS Lock . 52 4.3.2 The CLH Lock. .56 4.3.3 Which Spin Lock Should I Use? . 60 4.4 Special-case Optimizations . 60 4.4.1 Nested Locks . 60 4.4.2 Locality-conscious Locking . 61 4.4.3 Double-checked Locking. .63 4.4.4 Asymmetric Locking . 63 5 Spin-based Conditions and Barriers.........................................67 5.1 Flags. .67 5.2 Barrier Algorithms . 68 5.2.1 The Sense-Reversing Centralized Barrier . 69 5.2.2 Software Combining. .69 5.2.3 The Dissemination Barrier . 71 5.2.4 Tournament Barriers . 72 5.2.5 Static Tree Barriers . 74 5.2.6 Which Barrier Should I Use? . 74 vi CONTENTS 5.3 Barrier Extensions . 76 5.3.1 Fuzzy Barriers . 76 5.3.2 Adaptive Barriers . 79 5.3.3 Barrier-like Constructs . 81 6 Read-mostly Atomicity .....................................................83 6.1 Reader-writer Locks . 83 6.1.1 Centralized Algorithms . 84 6.1.2 Queued Reader-writer Locks . 85 6.2 Sequence Locks . 90 6.3 Read-Copy Update . 93 7 Synchronization and Scheduling ............................................98 7.1 Scheduling . 98 7.2 Semaphores . 98 7.3 Monitors . 98 7.4 Other Language Mechanisms . 98 7.4.1 Conditional Critical Regions . 98 7.4.2 Futures . 98 7.4.3 Series-Parallel Execution . 98 7.5 Kernel-Only Mechanisms . 98 7.6 Performance Considerations . 98 7.6.1 Spin-Then-Wait. .99 7.6.2.