Shared-Memory Synchronization

Shared-Memory Synchronization MIchael L. Scott University of Rochester SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #1 &MC Morgan& cLaypool publishers ABSTRACT Ever since the advent of time sharing in the 1960s, designers of concurrent and parallel systems have needed to synchronize the activities of threads of control that share data structures in memory. In recent years, the study of synchronization has gained new urgency with the proliferation of multicore processors, on which even relatively simple user-level programs must frequently run in parallel. This monograph offers a comprehensive survey of shared-memory synchronization, with an emphasis on “systems-level” issues. It includes sufficient coverage of architectural details to understand correctness and performance on modern multicore machines, and sufficient coverage of higher-level issues to understand how synchronization is embedded in modern programming languages. The primary intended audience is “systems programmers”—the authors of operating systems, library packages, language run-time systems, and server and utility programs. Much of the discussion should also be of interest to application programmers who want to make good use of the synchronization mechanisms available to them, and to computer architects who want to understand the ramifications of their design decisions on systems- level code. KEYWORDS Atomicity, barriers, busy-waiting, conditions, locality, locking, memory models, monitors, multiprocessor architecture, nonblocking algorithms, scheduling, semaphores, synchronization, transactional memory. iii To Kelly, my wife and partner of more than 30 years. iv Contents 1 Introduction .................................................................1 1.1 Atomicity.................................................................3 1.2 Condition Synchronization . 5 1.3 Spinning vs. Blocking . 6 1.4 Safety and Liveness . 7 1.5 The Rest of this Monograph . 8 2 Architectural Background ...................................................9 2.1 Cores and Caches: Basic Shared-Memory Architecture . 9 2.1.1 Temporal and Spatial Locality . 11 2.1.2 Cache Coherence. .12 2.1.3 Processor (Core) Locality . 13 2.2 Cache Consistency . 14 2.2.1 Sources of Inconsistency. .14 2.2.2 Memory Fence Instructions . 16 2.2.3 Example Architectures . 17 2.3 Atomic Primitives . 18 2.3.1 The ABA Problem . 21 2.3.2 Other Synchronization Hardware . 23 3 Some Useful Theory ........................................................24 3.1 Safety....................................................................24 3.1.1 Deadlock Freedom . 25 3.1.2 Atomicity . 26 3.2 Liveness..................................................................33 3.2.1 Nonblocking Progress . 34 CONTENTS v 3.2.2 Fairness. .36 3.3 The Consensus Hierarchy. .37 3.4 Memory Models . 38 3.4.1 Formal Framework . 39 3.4.2 Data Races . 41 3.4.3 Real-World Models. .42 4 Practical Spin Locks ........................................................44 4.1 Classical load-store-only Algorithms . 44 4.2 Centralized Algorithms. .47 4.2.1 Test and set Locks.................................................48 4.2.2 The Ticket Lock . 50 4.3 Queued Spin Locks. .51 4.3.1 The MCS Lock . 52 4.3.2 The CLH Lock. .56 4.3.3 Which Spin Lock Should I Use? . 60 4.4 Special-case Optimizations . 60 4.4.1 Nested Locks . 60 4.4.2 Locality-conscious Locking . 61 4.4.3 Double-checked Locking. .63 4.4.4 Asymmetric Locking . 63 5 Spin-based Conditions and Barriers.........................................67 5.1 Flags. .67 5.2 Barrier Algorithms . 68 5.2.1 The Sense-Reversing Centralized Barrier . 69 5.2.2 Software Combining. .69 5.2.3 The Dissemination Barrier . 71 5.2.4 Tournament Barriers . 72 5.2.5 Static Tree Barriers . 74 5.2.6 Which Barrier Should I Use? . 74 vi CONTENTS 5.3 Barrier Extensions . 76 5.3.1 Fuzzy Barriers . 76 5.3.2 Adaptive Barriers . 79 5.3.3 Barrier-like Constructs . 81 6 Read-mostly Atomicity .....................................................83 6.1 Reader-writer Locks . 83 6.1.1 Centralized Algorithms . 84 6.1.2 Queued Reader-writer Locks . 85 6.2 Sequence Locks . 90 6.3 Read-Copy Update . 93 7 Synchronization and Scheduling ............................................98 7.1 Scheduling . 98 7.2 Semaphores . 98 7.3 Monitors . 98 7.4 Other Language Mechanisms . 98 7.4.1 Conditional Critical Regions . 98 7.4.2 Futures . 98 7.4.3 Series-Parallel Execution . 98 7.5 Kernel-Only Mechanisms . 98 7.6 Performance Considerations . 98 7.6.1 Spin-Then-Wait. .99 7.6.2.

Shared-Memory Synchronization

ACM SIGACT News Distributed Computing Column 28

Fast and Correct Load-Link/Store-Conditional Instruction Handling in DBT Systems

Edsger Dijkstra: the Man Who Carried Computer Science on His Shoulders

Arxiv:1901.04709V1 [Cs.GT]

Distributed Algorithms for Wireless Networks

2020 SIGACT REPORT SIGACT EC – Eric Allender, Shuchi Chawla, Nicole Immorlica, Samir Khuller (Chair), Bobby Kleinberg September 14Th, 2020

LNCS 4308, Pp

Stabilization

Mastering Concurrent Computing Through Sequential Thinking

A Separation Logic for Refining Concurrent Objects

Lock-Free Programming

Scalable NUMA-Aware Blocking Synchronization Primitives