KASHYAP-DISSERTATION-2020.Pdf

SCALING SYNCHRONIZATION PRIMITIVES A Dissertation Presented to The Academic Faculty By Sanidhya Kashyap In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Computer Science Georgia Institute of Technology August 2020 Copyright © Sanidhya Kashyap 2020 SCALING SYNCHRONIZATION PRIMITIVES Approved by: Dr. Taesoo Kim, Advisor School of Computer Science Dr. Irina Calciu Georgia Institute of Technology VMWare Research Dr. Changwoo Min, Co-advisor The Bradley Department of Electri- Dr. Joy Arulraj cal and Computer Engineering School of Computer Science Virginia Tech Georgia Institute of Technology Dr. Ada Gavrilovska Date Approved: June 11, 2020 School of Computer Science Georgia Institute of Technology To maa: for her strength, wisdom, and love. To papa: for his unconditional belief in me. To my brother: for being my critic. To my background thread. ACKNOWLEDGEMENTS I am fortunate to have had the opportunity to collaborate with and learn from amazing people. First, I owe my deepest gratitude to my advisor, Taesoo Kim. Throughout my Ph.D., Taesoo gave me an ample amount of freedom to explore ideas and collaborate with anyone. I am never going to forget these words: “Do whatever you want to do.” His in-depth technical insight, concrete feedback, advice, encouragement, and guidance has led me to where I am today. I hope one day, I can be a good advisor to my future students, as Taesoo has been to me. Another person who had an immeasurable impact on my life is my co-advisor: Chang- woo Min. His vast knowledge of designing parallel systems was essential to realize any of this work. I am never going to forget our discussions, which I still miss. I used to bug him almost every day during his postdoc days, and I still do when I am excited to share some new ideas. I am immensely grateful to him, as he was my mentor, friend, and guide, all at the same time, and continues to do so. In my later part of Ph.D., I was fortunate to work with Irina Calciu. Working with Irina always kept me on edge, as she asked too many (difficult) questions to make me rethink about the work. I hope this continues for the foreseeable future. I also want to thank the other members of my committee: Ada Gavrilovska, and Joy Arulraj. Both of them were always available to discuss ideas and even my career options. I want to thank all the amazing folks that I got to know, learn from, and work with through these years, notably: Sudarsun Kannan, Virendra Maratha, Seulbae Kim, Meng Xu, Insu Yun, Mingwei Shih, Steffen Maass, Mohan Kumar, Tushar Krishna, Fan Sang, Ren Ding, Kyuhong Park, Pradeep Fernando, Mansour Alharthi, Hong Hu, Woonhak Kang, Byoungyoung Lee, Chengyu Song, Woonhak Kang, Wen Xu, Kangnyeon Kim, Hyungon Moon, Seulbae Kim, Meng, Xu, Jean Pierre Lozi, Margo Seltzer, Alex Kogan, Dave Dice, Hong Hu, Hanqing Zhao, Se Kwon Lee, Soujanya Ponnapalli, Madhavan Krishnan iv Ramanathan, Sujin Park, Chulwon Kang, and Xiaohe Cheng. I am really grateful to Seulbae for being a good friend. I want to thank our previous Ph.D. coordinator, Venkat, for his help in navigating various PhD-related administrative issues and even suggesting I work with Taesoo. I am also grateful to our administrative problem-solvers, who made my life easier, especially Elizabeth Ndongi, Trinh Doan, and Sue Jean Chae. Finally, I want to thank my parents for their support and patience along this journey. I want to thank my friend, Jaspal, who has been there through thick and thin, remotely. I am especially grateful to my brother, from whom I have a lot to learn in the arena of life, philosophy, and about myself. v TABLE OF CONTENTS Acknowledgments............................... iv List of Tables................................. x List of Figures................................. xii List of Pseudo-Code............................. xvii Summary.................................... xviii Chapter 1: Introduction........................... 1 1.1 Ordering in Concurrency Control Algorithms ................ 3 1.2 Double Scheduling in Virtualization ..................... 4 1.3 Scalable and Practical Locking Primitives .................. 4 1.4 Outline and Contributions........................... 5 Chapter 2: Background and Motivation................. 6 2.1 A Primer on Multicore Machines....................... 6 2.2 Ordering in Concurrency........................... 7 2.2.1 Multicore Hardware Clocks ..................... 8 2.3 Locking Primitives .............................. 9 2.3.1 Evolution of Lock design....................... 10 vi 2.3.2 Locks in the kernel space (Linux) .................. 12 2.3.3 Locking Bottlenecks in Deployed File Systems ........... 13 2.4 Double Scheduling in VMs.......................... 15 2.5 Conclusion .................................. 17 Chapter 3: Ordering Primitive....................... 18 3.1 Ordo: A Scalable Ordering Primitive..................... 19 3.1.1 Embracing Uncertainty in Clock: Ordo API............. 20 3.1.2 Measuring Uncertainty between Clocks: Calculating ORDO_BOUNDARY 20 3.2 Algorithms with Ordo without Uncertainty.................. 26 3.2.1 Read-Log-Update (RLU)........................ 26 3.2.2 Concurrency Control for Databases ................. 29 3.2.3 Software Transactional Memory (TL2)................ 31 3.2.4 Oplog: An Update-heavy Data Structures Library.......... 33 3.3 Implementation................................ 33 3.4 Evaluation................................... 34 3.4.1 Scalability of Invariant Hardware Clocks............... 36 3.4.2 Evaluating Ordo Primitive ...................... 36 3.4.3 Physical Timestamping: Oplog .................... 38 3.4.4 Read Log Update........................... 39 3.4.5 Concurrency Control Mechanism................... 43 3.4.6 Software Transactional Memory................... 46 3.4.7 Sensitivity Analysis of ORDO_BOUNDARY . 48 vii 3.5 Chapter Summary............................... 49 Chapter 4: Impact of Scheduling on Virtual Machines......... 50 4.1 Design..................................... 51 4.1.1 Lightweight Para-virtualized methods................ 53 4.1.2 Eventual Fairness with Selective Scheduling............. 55 4.2 Use Case.................................... 56 4.3 Implementation................................ 57 4.4 Evaluation................................... 59 4.4.1 Overhead of eCS ............................ 60 4.4.2 Performance in an Over-committed Scenario ............ 61 4.4.3 Performance in an Under-committed Case.............. 67 4.4.4 Addressing BWW Problem via eCS ................... 69 4.4.5 System Eventual Fairness....................... 70 4.5 Chapter Summary............................... 72 Chapter 5: Scalable Locking Primitives.................. 73 5.1 Dominating Factors in Lock Design ..................... 74 5.2 SHFLLOCKS .................................. 77 5.2.1 The Shuffling Mechanism ...................... 78 5.2.2 SHFLLOCKS Design ......................... 78 5.3 Implementation................................ 90 5.4 Evaluation................................... 90 5.4.1 SHFLLOCK Performance Comparison................ 91 viii 5.4.2 Improving Application Performance................. 95 5.4.3 Performance Breakdown....................... 99 5.4.4 Performance With Userspace SHFLLOCK . 101 5.5 Chapter Summary...............................105 Chapter 6: Related Work.......................... 106 6.1 Ordering in Concurrency Control Algorithms . 106 6.2 Double Scheduling in VMs..........................108 6.3 Locking Primitives ..............................110 Chapter 7: Reflections............................ 112 7.1 Limitations ..................................112 7.1.1 Ordering in Concurrency with Ordo . 112 7.1.2 Enlightened Critical Sections.....................113 7.1.3 Shuffling-based Lock Algorithms ..................114 7.2 Future Work..................................115 Chapter 8: Conclusion............................ 117 References................................... 135 ix LIST OF TABLES 2.1 Evolution of synchronization primitives in the last 15 years in Linux along with their introduction with the corresponding Linux version. 11 2.2 Identified lock-based scalability bottlenecks in tested file systems with FxMark [77]. ................................. 14 3.1 Various machine configurations that we use in our evaluation as wellas the calculated offset between cores. While min is the minimum offset between cores, max is the global offset, called ORDO_BOUNDARY (refer to Fig- ure 3.1), which we used, including up to the maximum hardware threads (Cores∗SMT) in a machine. ......................... 36 4.1 Set of para-virtualized methods exposed by the hypervisor to a VM for providing hints to the hypervisor to mitigate double scheduling. These methods provide hints to the hypervisor and VM via shared memory. A vCPU relies on the first four methods to ask for an extra schedule to overcome LHP, LWP, RP, RRP, and ICP. Meanwhile, a vCPU gets hints from the hypervisor by using the last two methods to mitigate LWP and BWW problems. The cpu_id is the core id that is used by tasks running inside a guest OS. yCurrently, is_vcpu_preempted() is already exposed to the VM in Linux. 52 4.2 Applicability of our six lightweight para-virtualized methods that strive to address the symptoms of double scheduling.................. 56 4.3 eCS requires small modifications to the existing Linux kernel, and thean- notation effort is also minimal: 60 LoC changes to support the 10 million LoC Linux kernel that has around 12,000 of lock instances with 85,000

Load more