Non-Blocking Data Structures Handling Multiple Changes Atomically Niloufar Shafiei a Dissertation Submitted to the Faculty of Gr

NON-BLOCKING DATA STRUCTURES HANDLING MULTIPLE CHANGES ATOMICALLY NILOUFAR SHAFIEI A DISSERTATION SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY GRADUATE PROGRAM IN DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING YORK UNIVERSITY TORONTO, ONTARIO JULY 2015 c NILOUFAR SHAFIEI, 2015 Abstract Here, we propose a new approach to design non-blocking algorithms that can apply multiple changes to a shared data structure atomically using Compare&Swap (CAS) instructions. We applied our approach to two data structures, doubly-linked lists and Patricia tries. In our implementations, only update operations perform CAS instructions; operations other than updates perform only reads of shared memory. Our doubly-linked list implements a novel specification that is designed to make it easy to use as a black box in a concurrent setting. In our doubly-linked list implementation, each process accesses the list via a cursor, which is an object in the process's local memory that is located at an item in the list. Our specification describes how updates affect cursors and how a process gets feedback about other processes' updates at the location of its cursor. We provide a detailed proof of correctness for our list implementation. We also give an amortized analysis for our list implementation, which is the first upper bound on amortized time complexity that has been proved for a concurrent doubly-linked list. In addition, we evaluate ii our list algorithms on a multi-core system empirically to show that they are scalable in practice. Our non-blocking Patricia trie implementation stores a set of keys, represented as bit strings, and allows processes to concurrently insert, delete and find keys. In addition, our implementation supports the replace operation, which deletes a key from the trie and adds a new key to the trie simultaneously. Since the correctness proof of our trie is similar to the correctness proof of our list implementation, we only provide a sketch of a correctness proof of our trie implementation here. We empirically evaluate our trie and compare our trie to some existing set implementations. Our empirical results show that our Patricia trie implementation consistently performs well under different scenarios. iii I dedicated this thesis to my son, Ryan, who brings endless joy to our lives. iv Acknowledgements I would like to express my deepest gratitude to my supervisor, Eric Ruppert, for his tremendous guidance, motivation and support over years and for understanding my complicated life. He taught me how to conduct research and approach a prob- lem from different points of views and how to be precise in conducting research. I believe these skills are also very useful throughout life. I would also like to thank Frank Van Breugel for great comments that he provided during writing the thesis as a supervisory committee member. Besides, I thank Rachid Geurraoui, Patrick Dymond, Nantel Bergeron and Suprakash Datta for agreeing to be on the committee and giving great comments to improve the thesis. I thank Trevor Brown for providing lots of help and code for experiments in Section 9.4 and Michael L. Scott for giving us access to his multicore machines. At last but not least, I thank my family. I thank my mother, Forough, for supporting me in any way she could, so I was able to spend more time on conducting research and writing the thesis. Besides, I thank my husband, Ali, my father, v Mansour and my sister, Nastartan, for their support and encouragement. vi Table of Contents Abstract ii Dedication iv Acknowledgementsv Table of Contents vii List of Tablesx List of Figures xii 1 Introduction1 1.1 Contributions..............................7 2 Simple Examples of Non-blocking Implementations 14 2.1 Non-blocking Stack Implementation.................. 14 2.2 Non-blocking Queue Implementation................. 19 vii 3 Related Work 22 3.1 Multi-word Compare&Swap...................... 23 3.2 Lists................................... 29 3.3 Trees................................... 36 4 Formal Definitions 39 5 General Approach 45 6 Doubly-linked List Implementation 50 6.1 Sequential Specification......................... 52 6.2 Overview of How Updates Are Performed............... 58 6.3 Representation of the List in Memory................. 63 6.4 Descriptions of Algorithms....................... 66 7 Correctness Proof of Doubly-linked List 75 7.1 Basic Invariants............................. 77 7.2 Behaviour of Flag CAS Steps..................... 92 7.3 Behaviour of Pointer CAS Steps.................... 105 7.4 Linearizability.............................. 145 8 Performance of Doubly-linked List 195 8.1 Amortized Analysis of the Doubly-linked List............ 196 viii 8.1.1 The Potential Function..................... 199 8.1.2 Changes to Φ by Steps within UpdateCursor ....... 208 8.1.3 Changes to Φ by Steps Belonging to Move Operations.... 209 8.1.4 Changes to Φ by Steps that Belong to Update Operations. 211 8.1.5 Summing Up.......................... 245 8.2 The Results of Empirical Evaluation of Doubly-linked List..... 246 9 Patricia Trie Implementation 253 9.1 Representation of the Trie in Memory................. 263 9.2 Algorithm Descriptions......................... 266 9.3 Sketch of Correctness Proof of Patricia Trie............. 277 9.4 Empirical Evaluation of Patricia Trie................. 289 10 Conclusion 295 Bibliography 300 ix List of Tables 3.1 The number of CAS steps that a k-word CAS implementation performs in absence of contention..................... 29 3.2 Implementations of doubly-linked lists................ 34 6.1 Effects of InitializeCursor, DestroyCursor, ResetCursor, Get and InsertBefore operations (c0 is a cursor such that c 6= c0 and c:item = c0:item).......................... 55 6.2 Effects of Delete and MoveRight operations (c0 is a cursor such that c 6= c0 and c:item = c0:item)................... 56 6.3 Effects of MoveLeft operations................... 57 8.1 The steps that might change Φ.................... 207 8.2 Changes to Φ by steps wihtin UpdateCursor (Lemma 8.4).... 208 8.3 CheckInfo returns false on line 95 (Lemma 8.12)......... 218 8.4 CheckInfo returns false on line 100 (Lemma 8.12)......... 219 x 8.5 CheckInfo returns false on line 97 (Lemma 8.13)......... 221 8.6 The attempt att fails because it fails to flag Iatt:nodes[0] (Lemma 8.14)223 8.7 The attempt att fails because it fails to flag Iatt:nodes[1] (Lemma 8.18)231 8.8 The attempt att fails because it fails to flag Iatt:nodes[2] (Lemma 8.18)231 8.9 The attempt att's call to Help(Iatt) on line 34 or 46 returns true (Lemma 8.20).............................. 235 xi List of Figures 2.1 The stack containing x and y ..................... 15 2.2 The stack after the Push operation.................. 15 2.3 The stack after the Pop operation.................. 16 2.4 The Pop operation........................... 16 2.5 The Push operation.......................... 17 2.6 The pseudo-code of the non-blocking stack implementation..... 18 2.7 The Dequeue operation........................ 20 2.8 The Enqueue operation........................ 21 3.1 A doubly-linked list containing four nodes.............. 32 3.2 After deletion of C........................... 32 3.3 Incorrect result of deleting B and C concurrently.......... 32 4.1 Steps of CAS.............................. 40 xii 4.2 An execution that corresponds to the history H (Each rectangle represents an operation and each dot inside a rectangle shows the linearization point of the operation.)................. 43 4.3 An execution that corresponds to the history H0 ........... 44 6.1 The list containing three Nodes.................... 58 6.2 Removing the Node B from the list.................. 58 6.3 Inserting the new Node D between A and B (standard sequential approach)................................ 59 6.4 The Insertion swings B:prv from A to D incorrectly......... 59 6.5 Inserting the new Node D between A and B (our approach, which creates a new copy B0 of B)...................... 61 6.6 Steps of Delete operation....................... 62 6.7 Steps of InsertBefore operation.................. 63 6.8 Object types used to implement doubly-linked lists......... 64 6.9 Initialization of the doubly-linked list................. 65 6.10 An example of a call to UpdateCursor ............... 72 8.1 Ratio: i5-d5-m90............................ 248 8.2 Ratio: i30-d30-m40........................... 249 8.3 Sorted List................................ 250 xiii 9.1 An example of a Patricia trie. Leaves are represented by squares and internal nodes are represented by circles................ 254 9.2 Removing the key 1100 from the trie................. 257 9.3 Delete(v) Triangles are either a leaf node or a subtree. The grey circles are flagged nodes. The dotted lines are the new child pointers that replace the old child pointers (solid lines)............ 257 9.4 Inserting the key 010 into the trie................... 258 9.5 Inserting the key 1110 into the trie.................. 259 9.6 Different cases of Insert(v). The dotted circles and squares are newly created nodes........................... 259 9.7 Special cases of Replace(vd; vi).................... 261 9.8 Replacing the key 011 with the key 010 (Special case 1)....... 261 9.9 Replacing the key 1010 with the key 1000 (Special case 3)..... 262 9.10 Replacing the key 1100 with the key 1111 (Special case 4)..... 262 9.11 Object types used to implement Patricia trie............. 264 9.12 Initialization of the Patricia trie.................... 265 9.13 The correct order of steps inside Help(I) for each Flag object I. (Steps can be performed by different calls to Help(I).)...... 279 9.14 Uniformly distributed keys with key range (0; 106).......... 291 9.15 Uniformly distributed keys with key range (0; 102).......... 292 xiv 9.16 Replace operations of PAT....................... 293 9.17 Non-uniformly distributed keys (The lines for 4-ST, BST, AVL and SL overlap.)............................... 294 xv 1 Introduction The first computers were developed with a single central processing unit.

Non-Blocking Data Structures Handling Multiple Changes Atomically Niloufar Shafiei a Dissertation Submitted to the Faculty of Gr

(12) Patent Application Publication (10) Pub. No.: US 2004/0107227 A1 Michael (43) Pub

Practical Parallel Data Structures

Contention Resistant Non-Blocking Priority Queues

Efficient and Practical Non-Blocking Data Structures

On the Design and Implementation of an Efficient Lock-Free Scheduler

High Performance Dynamic Lock-Free Hash Tables and List-Based Sets

Verifying Concurrent Memory Reclamation Algorithms with Grace

Verifying Concurrent Memory Reclamation Algorithms with Grace

Memory Reclamation Strategies for Lock-Free and Concurrently-Readable Data Structures

Distributed Computing with Modern Shared Memory