Time Complexity Bounds for Shared-Memory Mutual Exclusion
Total Page:16
File Type:pdf, Size:1020Kb
Time Complexity Bounds for Shared-memory Mutual Exclusion by Yong-Jik Kim A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science. Chapel Hill 2003 Approved by: James H. Anderson, Advisor Sanjoy K. Baruah, Reader Jan F. Prins, Reader Michael Merritt, Reader Lars S. Nyland, Reader Jack Snoeyink, Reader ii c 2003 Yong-Jik Kim ALL RIGHTS RESERVED iii ABSTRACT YONG-JIK KIM: Time Complexity Bounds for Shared-memory Mutual Exclusion. (Under the direction of James H. Anderson.) Mutual exclusion algorithms are used to resolve conflicting accesses to shared re- sources by concurrent processes. The problem of designing such an algorithm is widely regarded as one of the “classic” problems in concurrent programming. Recent work on scalable shared-memory mutual exclusion algorithms has shown that the most crucial factor in determining an algorithm’s performance is the amount of traffic it generates on the processors-to-memory interconnect [23, 38, 61, 84]. In light of this, the RMR (remote-memory-reference) time complexity measure was proposed [84]. Under this measure, an algorithm’s time complexity is defined to be the worst-case number of remote memory references required by one process in order to enter and then exit its critical section. In the study of shared-memory mutual exclusion algorithms, the following funda- mental question arises: for a given system model, what is the most efficient mutual exclusion algorithm that can be designed under the RMR measure? This question is important because its answer enables us to compare the cost of synchronization in dif- ferent systems with greater accuracy. With some primitives, constant-time algorithms are known, so the answer to this question is trivial. However, with other primitives (e.g., under read/write atomicity), there are still gaps between the best known algorithms and lower bounds. In this dissertation, we address this question. The main thesis to be supported by this dissertation can be summarized as follows. The mutual exclusion problem exhibits different time-complexity bounds as a function of both the number of processes (N) and the number of simultaneously active processes (contention, k), depending on the available synchronization primitives and the underlying system model. Moreover, these time-complexity bounds are nontrivial, in that constant-time algorithms are impossible in many cases. In support of this thesis, we present a time-complexity lower bound of Ω(log N/ log log N) for systems using atomic reads, writes, and comparison primitives, such as compare-and-swap. This bound is within a factor of Θ(log log N) of being optimal. iv Given that constant-time algorithms based on fetch-and-φ primitives exist, this lower bound points to an unexpected weakness of compare-and-swap, which is widely regarded as being the most useful of all primitives to provide in hardware. We also present an adaptive algorithm with Θ(min(k, log N)) RMR time complexity under read/write atomicity, where k is contention. In addition, we present another lower bound that precludes the possibility of an o(k) algorithm for such systems, even if comparison primitives are allowed. Regarding nonatomic systems, we present a Θ(log N) nonatomic algorithm, and show that adaptive mutual exclusion is impossible in such systems by proving that any nonatomic algorithm must have a single-process execution that accesses Ω(log N/ log log N)distinctvariables. Finally, we present a generic fetch-and-φ-based local-spin mutual exclusion algo- rithm with Θ(logr N) RMR time complexity. This algorithm is “generic” in the sense that it can be implemented using any fetch-and-φ primitive of rank r,where2≤ r<N. The rank of a fetch-and-φ primitive expresses the extent to which processes may “order themselves” using that primitive. For primitives that meet a certain additional con- dition, we present a Θ(log N/ log log N) algorithm, which is time-optimal for certain primitives of constant rank. v ACKNOWLEDGMENTS First, I want to thank my advisor, Jim Anderson, for his support and enthusiasm. I have learned an enormous amount from working with him, and this dissertation would not have been possible without his far-reaching insight and constant encouragement. I also want to thank Michael Merritt, who took the trouble of reading this large dissertation and flying to attend my defense. Thanks, too, to the rest of my committee: Jan Prins, Jack Snoeyink, Sanjoy Baruah, and Lars Nyland. I greatly appreciate their willingness to bend their schedules to accommodate countless meetings and exams, and also their support and encouragement throughout the process. I would also like to thank Guido Gerig, my ex-advisor, and people in the medical imaging group, for being supportive and friendly throughout my stay at UNC. In addition, I would like to thank people in the distributed algorithms research community for lively discussions and reviewing earlier versions of the results presented in this dissertation. I cannot hope to recall everyone who should be mentioned, but the list would surely include the following people: Hagit Attiya, Faith Fich, Maurice Herlihy, Leslie Lamport, Victor Luchangco, Maged Michael, Mark Moir, Eric Ruppert, Jennifer Welch, and many others. Many thanks to Anand Srinivasan, Phil Holman, and Shelby Funk, for their friend- ship and their help in term projects, writing papers, and other things. Lucia Cevidanes, I enjoyed working with you; I hope you will find a better matlab programmer who can decipher my code! Special thanks to the Korean friends who made my stay in Chapel Hill a lot more homelike: Juhyun, Sungeui, Joohee (of the math department), Joohi, Sang-Uok and Hye-Chung, In-Kyung, Hyemi Choi, Kuan-Hui Lee, and (of course) many, many others. No words will be enough to thank my family for their love and support. Thank you, Mom and Dad, Namhee, Heeyeon, and Yonggon. I love all of you. Finally, I would like to thank Eunjung, my lovely wife. Throughout the years we have been together, she has always been kind, understanding, and inspiring. Without her, I would have missed much of what life is. vi CONTENTS LIST OF TABLES x LIST OF FIGURES xi 1 Introduction 1 1.1KnownLocal-spinAlgorithms....................... 4 1.2KnownAdaptiveAlgorithms........................ 5 1.3KnownNonatomicAlgorithms....................... 6 1.4Contributions................................ 6 1.4.1 GeneralMutualExclusion..................... 8 1.4.2 AdaptiveMutualExclusion.................... 9 1.4.3 NonatomicMutualExclusion................... 9 1.4.4 Mutual Exclusion with fetch-and-φ Primitives . 10 1.5Organization................................ 11 2 Related Work 12 2.1Preliminaries................................ 13 2.2Local-spinAlgorithms........................... 16 2.2.1 Algorithms that use fetch-and-φ Primitives............ 17 2.2.2 Algorithms that use Only Reads and Writes . 23 2.3FastMutualExclusion........................... 26 2.3.1 Algorithm L: Lamport’s Fast Mutual Exclusion Algorithm . 27 2.3.2 Fast Mutual Exclusion with Local Spinning . 29 2.4AdaptiveAlgorithms............................ 30 vii 2.4.1 AdaptiveAlgorithmsusingFilters................. 32 2.4.2 AnAdaptiveBakeryAlgorithm.................. 36 2.4.3 Adaptive Mutual Exclusion with Infinitely Many Processes . 37 2.4.4 Local-spinAdaptiveAlgorithms.................. 38 2.5MutualExclusionwithNonatomicOperations.............. 38 2.6 Time- and Space-complexity Lower Bounds . 41 2.6.1 OverviewofProofMethodologies................. 41 2.6.2 Lower-boundResults........................ 42 3 Space-optimal Mutual Exclusion Under Read/Write Atomicity 47 3.1 Transformation of Algorithm YA-N .................. 48 3.2TimeComplexity.............................. 55 3.3ConcludingRemarks............................ 57 4 Adaptive Mutual Exclusion Under Read/Write Atomicity 59 4.1AdaptiveAlgorithm............................. 61 4.1.1 RelatedAlgorithmsandKeyIdeas................ 61 4.1.2 Algorithm A-U: Unbounded Algorithm . 65 4.1.3 Algorithm A-B: Bounded Algorithm . 73 4.1.4 Algorithm A-LS:Linear-spaceAlgorithm........... 75 4.2ConcludingRemarks............................ 83 5 Time-complexity Lower Bound for General Mutual Exclusion 85 5.1Definitions.................................. 87 5.1.1 AtomicShared-memorySystems.................. 87 5.1.2 Mutual-exclusionSystems..................... 93 5.1.3 Cache-coherentSystems...................... 95 5.2Lower-boundProofStrategy........................ 99 5.2.1 Process Groups and Regular Computations . 99 5.2.2 DetailedProofOverview...................... 103 5.3DetailedLower-boundProof........................ 110 5.4 Constant-time Algorithm for LFCU Systems . 137 viii 5.5ConcludingRemarks............................ 139 6 Time-complexity Lower Bound for Adaptive Mutual Exclusion 142 6.1ProofStrategy................................ 143 6.2DetailedLower-boundProof........................ 148 6.3ConcludingRemarks............................ 167 7 Algorithm and Time-complexity Lower Bound for Nonatomic Sys- tems 169 7.1NonatomicAlgorithm............................ 172 7.2NonatomicShared-memorySystems.................... 175 7.3 Lower Bound: Proof Sketch . 180 7.3.1 BriefOverview........................... 181 7.3.2 FormalDefinitions......................... 190 7.3.3 DetailedProofOverview...................... 198 7.4ConcludingRemarks............................ 208 8 Generic Algorithm for fetch-and-φ Primitives