Low-Level Concurrent Programming Using the Relaxed Memory Calculus

Total Page:16

File Type:pdf, Size:1020Kb

Low-Level Concurrent Programming Using the Relaxed Memory Calculus Low-level Concurrent Programming Using the Relaxed Memory Calculus Michael J. Sullivan CMU-CS-17-126 November 2017 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Karl Crary, Chair Kayvon Fatahalian Todd Mowry Paul McKenney, IBM Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright c 2017 Michael J. Sullivan This research was sponsored in part by the Carnegie Mellon University Presidential Fellowship. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: programming languages, compilers, concurrency, memory models, relaxed memory Abstract The Relaxed Memory Calculus (RMC) is a novel approach for portable low- level concurrent programming in the presence of the the relaxed memory behavior caused by modern hardware architectures and optimizing compilers. RMC takes a declarative approach to programming with relaxed memory: programmers explicitly specify constraints on execution order and on the visibility of writes. This differs from other low-level programming language memory models, which—when they exist—are usually based on ordering annotations attached to synchronization opera- tions and/or explicit memory barriers. In this thesis, we argue that this declarative approach based on explicit programmer-specified constraints is a practical approach for implementing low-level concurrent algorithms. To this end, we present RMC-C++, an extension of C++ with RMC constraints, and rmc-compiler, an LLVM based compiler for it. We live on a placid island of ignorance in the midst of black seas of infinity, and it was not meant that we should voyage far. — “The Call of Cthulhu”, H.P. Lovecraft And all I ask is a tall ship and a star to steer her by; — “Sea Fever”, John Masefield iv Acknowledgments It’s difficult to overstate how much thanks is owed to my parents, Jack and Mary Sullivan. They’ve provided nearly three decades of consistent encouragement, support (emotional, moral, logistical, financial, and more), and advice (“Put ‘write thesis proposal’ on your To-Do List, then write the proposal and cross it off!”). I know I wasn’t the easiest kid to raise, but they managed to mostly salvage the situation. Thank you—I wouldn’t be here without you. Thanks to my brother, Joey, who my interactions with over the years have shaped me in more ways that I understand. And I’m very grateful that—at this point in our lives—most of those interactions are now positive. Immeasurable thanks to my advisor, Karl Crary, without whom this never would have hap- pened. Karl is an incredible researcher and it was a pleasure to spend the last six years collabo- rating with him. Karl also—when I asked him about the CMU 5th Year Masters—was the one to convince me to apply to Ph.D. programs instead. Whether that is quite worth a “thanks” is a hard question to answer with anything approaching certainty, but I believe it is. I also owe a great debt to Dave Eckhardt. Operating Systems Design and Implementation (15-410) is the best class that I have ever taken and I haven’t been able to get it out of my head. The work I did over seven semesters as one of Dave’s TAs for 410—teaching students about concurrency, operating systems, and, most of all, giving thoughtful and detailed feedback about software architecture and design—is the work I am most proud of, and I am supremely thankful for being able to be a part of it. While much of my work is likely to take me far afield, I try to carry a little of the kernel hacker ethic in my heart wherever I go. On top of that, Dave has been a consistent source of advice, support, and guidance without which I’d never have made it through grad school. I owe much to a long string of excellent teachers I had in the Mequon-Thiensville School Dis- trict, beginning with my third-grade teacher Mrs. Movell, who—in response to me asking if there was a less tedious way to draw some particular shape in LOGO—arranged for a math teacher to come over from the high school to teach me about the wonders of variables, if statements, and GOTOs. I owe a particularly large debt to Robin Schlei, who provided years of support, and to Kathleen Connelly, who first put me in touch with Mark Stehlik. Thank you! On that note, thanks to Mark Stehlik, who was as good an undergraduate advisor as could be asked for, even if he wouldn’t let me count HOT Compilation as an “applications” class. And thanks to all the great teachers I had, especially Bob Harper, who kindled an interest in programming languages. Grad school was a long and sometimes painful road, and I’d have never made it through without my hobbies. Chief among them, my regular tabletop role-playing game group, who sac- rificed countless Saturday nights to join me in liberating Polonia, exploring the Ironian Wastes for the Ambrosia Trading Company, and conclusively demonstrating, via their work for the Kro- mian Mages Guild, that not all wizards are subtle: Rhett Lauffenburger (Roger Kazynski), Gabe Routh (Hrothgar Boarbuggerer ne´ Hrunting), Ben Blum (Yolanda Swaggins-Moonflute), Matt Maurer (Jack Silver), Michael Arntzenius (Jakub the Half-Giant), and Will Hagen (Daema). A GM couldn’t ask for a better group of players. I’d also like to thank my other major escape: the EVE Online alliance Of Sound Mind, and especially the CEOs who have built SOUND into such a great group to fly with—XavierVE, June Ting, Ronnie Cordova, and Jacob Matthew Jansen. v Fly safe! The SCS Musical provided another consistently valuable form of stress in my life, and I’d like to thank everybody involved in helping to revive it and keep it running. I had allowed myself to forget about the giant technical-theatre-shaped hole in my heart, and I don’t think I’ll be able to again. Thanks to the old gang from Homestead—Becki Johnson, Gina Buzzanca, and Michael and Andrew Bartlein. Though we’ve been scattered far and wide from our parents’ basements in Wisconsin—and though the Bartleins are rubbish at keeping in touch—you’ve remained some of my truest friends. A lot of people provided—in their own ways—important emotional support at different points in the process. I owe particular thanks to Jack Ferris, Joshua Keller, Margaret Meyerhofer, Amanda Watson, and Sarah Schultz. Salil Joshi and Joe Tassarotti contributed in important ways to the design of RMC and its compiler. Ben Segall, Paul Dagnelie, and Rick Benua provided helpful suggestions about bench- marking woes. Paul and Rick also provided useful comments on the draft. Matt Maurer provided some much needed statistical help. I’d like to thank Shaked Flur et al. for providing me with the ARMv8 version of the ppcmem tool, which proved invaluable for exploring the possible behav- iors of ARMv8s. A heartfelt thanks to everyone on my committee. Paul McKenney provided early and in- teresting feedback on this document (and a quote from “Sea Fever”), Kayvon Fatahalian gave excellent advice on what he saw as the key arguments of my work, and Todd Mowry helped me understand the computer architect’s view of the world a little better. I’ve had a lot of fantastic colleagues during the decade I spent at CMU, both as an undergrad and as a grad student. I learned more about computer science and about programming languages in conversation with my colleagues than I have from any books or papers. Thanks to Joshua Wise, Jacob Potter, Ben Blum, Ben Segall, Glenn Willen, Nathaniel Wesley Filardo, Matthew Wright, Elly Fong-Jones, Philip Gianfortoni, Car Bauer, Laura Abbott, Chris Lu, Eric Faust, Andrew Drake, Amanda Watson, Robert Marsh, Jack Ferris, Joshua Keller, Joshua Watzman, Cassandra Sparks, Ryan Pearl, Kelly Hope Harrington, Margaret Meyerhofer, Rick Benua, Paul Dagnelie, Michael Arntzenius, Carlo Anguili, Matthew Maurer, Joe Tassarotti, Rob Simmons, Chris Martens, Nico Feltman, Favonia, Danny Gratzer, Stefan Mueller, Anna Gommerstadt, and many many others. Thanks also to all the wonderful colleagues I had working at Mozilla and Facebook on the Rust and Hack programming languages. The most stressful and miserable part of my grad school process was the semester I spent preparing my thesis proposal. I was nearly burnt out after it and I don’t think I could have kept on going without the ten weeks I spent “Working From Tahoe” (WFT) from The Fairway House in Lake Tahoe. Thanks to Vail Resorts and everybody who split the ski cabin with me for making it possible—especially to Jack Ferris and Kelly Hope Harrington for bullying me into it. And thanks to the Northstar Ski Patrol, the paramedics of the Truckee Fire Protection District, and the staff at the Tahoe Forest Hospital and the University of Pittsburgh Medical Center for all the excellent care I received after I skied into a tree. Thanks to Aaron Rodgers and Mike McCarthy, but not to Dom Capers. Go Pack Go! NVIDIA Corporation donated a Jetson TK1 development kit which was used for testing vi and benchmarking. The IBM Power Systems Academic Initiative team provided access to a POWER8 machine which was used for testing and benchmarking. To all of the undoubtedly countless people who I really really ought to have thanked here but who slipped my mind: I’m sorry, and thank you! And last, but certainly not least, I’d like to extend a heartfelt thanks to Phil, Madz, Manny, and all the rest of the gang down at Emarhavil Heavy Industries.
Recommended publications
  • Fast and Correct Load-Link/Store-Conditional Instruction Handling in DBT Systems
    Fast and Correct Load-Link/Store-Conditional Instruction Handling in DBT Systems Abstract Atomic Read Memory Dynamic Binary Translation (DBT) requires the implemen- Operation tation of load-link/store-conditional (LL/SC) primitives for guest systems that rely on this form of synchronization. When Equals targeting e.g. x86 host systems, LL/SC guest instructions are YES NO typically emulated using atomic Compare-and-Swap (CAS) expected value? instructions on the host. Whilst this direct mapping is efficient, this approach is problematic due to subtle differences between Write new value LL/SC and CAS semantics. In this paper, we demonstrate that to memory this is a real problem, and we provide code examples that fail to execute correctly on QEMU and a commercial DBT system, Exchanged = true Exchanged = false which both use the CAS approach to LL/SC emulation. We then develop two novel and provably correct LL/SC emula- tion schemes: (1) A purely software based scheme, which uses the DBT system’s page translation cache for correctly se- Figure 1: The compare-and-swap instruction atomically reads lecting between fast, but unsynchronized, and slow, but fully from a memory address, and compares the current value synchronized memory accesses, and (2) a hardware acceler- with an expected value. If these values are equal, then a new ated scheme that leverages hardware transactional memory value is written to memory. The instruction indicates (usually (HTM) provided by the host. We have implemented these two through a return register or flag) whether or not the value was schemes in the Synopsys DesignWare® ARC® nSIM DBT successfully written.
    [Show full text]
  • Is Parallel Programming Hard, And, If So, What Can You Do About It?
    Is Parallel Programming Hard, And, If So, What Can You Do About It? Edited by: Paul E. McKenney Linux Technology Center IBM Beaverton [email protected] December 16, 2011 ii Legal Statement This work represents the views of the authors and does not necessarily represent the view of their employers. IBM, zSeries, and Power PC are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. Linux is a registered trademark of Linus Torvalds. i386 is a trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of such companies. The non-source-code text and images in this doc- ument are provided under the terms of the Creative Commons Attribution-Share Alike 3.0 United States li- cense (http://creativecommons.org/licenses/ by-sa/3.0/us/). In brief, you may use the contents of this document for any purpose, personal, commercial, or otherwise, so long as attribution to the authors is maintained. Likewise, the document may be modified, and derivative works and translations made available, so long as such modifications and derivations are offered to the public on equal terms as the non-source-code text and images in the original document. Source code is covered by various versions of the GPL (http://www.gnu.org/licenses/gpl-2.0.html). Some of this code is GPLv2-only, as it derives from the Linux kernel, while other code is GPLv2-or-later.
    [Show full text]
  • Mastering Concurrent Computing Through Sequential Thinking
    review articles DOI:10.1145/3363823 we do not have good tools to build ef- A 50-year history of concurrency. ficient, scalable, and reliable concur- rent systems. BY SERGIO RAJSBAUM AND MICHEL RAYNAL Concurrency was once a specialized discipline for experts, but today the chal- lenge is for the entire information tech- nology community because of two dis- ruptive phenomena: the development of Mastering networking communications, and the end of the ability to increase processors speed at an exponential rate. Increases in performance come through concur- Concurrent rency, as in multicore architectures. Concurrency is also critical to achieve fault-tolerant, distributed services, as in global databases, cloud computing, and Computing blockchain applications. Concurrent computing through sequen- tial thinking. Right from the start in the 1960s, the main way of dealing with con- through currency has been by reduction to se- quential reasoning. Transforming problems in the concurrent domain into simpler problems in the sequential Sequential domain, yields benefits for specifying, implementing, and verifying concur- rent programs. It is a two-sided strategy, together with a bridge connecting the Thinking two sides. First, a sequential specificationof an object (or service) that can be ac- key insights ˽ A main way of dealing with the enormous challenges of building concurrent systems is by reduction to sequential I must appeal to the patience of the wondering readers, thinking. Over more than 50 years, more sophisticated techniques have been suffering as I am from the sequential nature of human developed to build complex systems in communication. this way. 12 ˽ The strategy starts by designing —E.W.
    [Show full text]
  • A Separation Logic for Refining Concurrent Objects
    A Separation Logic for Refining Concurrent Objects Aaron Turon ∗ Mitchell Wand Northeastern University {turon, wand}@ccs.neu.edu Abstract do { tmp = *C; } until CAS(C, tmp, tmp+1); Fine-grained concurrent data structures are crucial for gaining per- implements inc using an optimistic approach: it takes a snapshot formance from multiprocessing, but their design is a subtle art. Re- of the counter without acquiring a lock, computes the new value cent literature has made large strides in verifying these data struc- of the counter, and uses compare-and-set (CAS) to safely install the tures, using either atomicity refinement or separation logic with new value. The key is that CAS compares *C with the value of tmp, rely-guarantee reasoning. In this paper we show how the owner- atomically updating *C with tmp + 1 and returning true if they ship discipline of separation logic can be used to enable atomicity are the same, and just returning false otherwise. refinement, and we develop a new rely-guarantee method that is lo- Even for this simple data structure, the fine-grained imple- calized to the definition of a data structure. We present the first se- mentation significantly outperforms the lock-based implementa- mantics of separation logic that is sensitive to atomicity, and show tion [17]. Likewise, even for this simple example, we would prefer how to control this sensitivity through ownership. The result is a to think of the counter in a more abstract way when reasoning about logic that enables compositional reasoning about atomicity and in- its clients, giving it the following specification: terference, even for programs that use fine-grained synchronization and dynamic memory allocation.
    [Show full text]
  • Lock-Free Programming
    Lock-Free Programming Geoff Langdale L31_Lockfree 1 Desynchronization ● This is an interesting topic ● This will (may?) become even more relevant with near ubiquitous multi-processing ● Still: please don’t rewrite any Project 3s! L31_Lockfree 2 Synchronization ● We received notification via the web form that one group has passed the P3/P4 test suite. Congratulations! ● We will be releasing a version of the fork-wait bomb which doesn't make as many assumptions about task id's. – Please look for it today and let us know right away if it causes any trouble for you. ● Personal and group disk quotas have been grown in order to reduce the number of people running out over the weekend – if you try hard enough you'll still be able to do it. L31_Lockfree 3 Outline ● Problems with locking ● Definition of Lock-free programming ● Examples of Lock-free programming ● Linux OS uses of Lock-free data structures ● Miscellanea (higher-level constructs, ‘wait-freedom’) ● Conclusion L31_Lockfree 4 Problems with Locking ● This list is more or less contentious, not equally relevant to all locking situations: – Deadlock – Priority Inversion – Convoying – “Async-signal-safety” – Kill-tolerant availability – Pre-emption tolerance – Overall performance L31_Lockfree 5 Problems with Locking 2 ● Deadlock – Processes that cannot proceed because they are waiting for resources that are held by processes that are waiting for… ● Priority inversion – Low-priority processes hold a lock required by a higher- priority process – Priority inheritance a possible solution L31_Lockfree
    [Show full text]
  • 11. Kernel Design 11
    11. Kernel Design 11. Kernel Design Interrupts and Exceptions Low-Level Synchronization Low-Level Input/Output Devices and Driver Model File Systems and Persistent Storage Memory Management Process Management and Scheduling Operating System Trends Alternative Operating System Designs 269 / 352 11. Kernel Design Bottom-Up Exploration of Kernel Internals Hardware Support and Interface Asynchronous events, switching to kernel mode I/O, synchronization, low-level driver model Operating System Abstractions File systems, memory management Processes and threads Specific Features and Design Choices Linux 2.6 kernel Other UNIXes (Solaris, MacOS), Windows XP and real-time systems 270 / 352 11. Kernel Design – Interrupts and Exceptions 11. Kernel Design Interrupts and Exceptions Low-Level Synchronization Low-Level Input/Output Devices and Driver Model File Systems and Persistent Storage Memory Management Process Management and Scheduling Operating System Trends Alternative Operating System Designs 271 / 352 11. Kernel Design – Interrupts and Exceptions Hardware Support: Interrupts Typical case: electrical signal asserted by external device I Filtered or issued by the chipset I Lowest level hardware synchronization mechanism Multiple priority levels: Interrupt ReQuests (IRQ) I Non-Maskable Interrupts (NMI) Processor switches to kernel mode and calls specific interrupt service routine (or interrupt handler) Multiple drivers may share a single IRQ line IRQ handler must identify the source of the interrupt to call the proper → service routine 272 / 352 11. Kernel Design – Interrupts and Exceptions Hardware Support: Exceptions Typical case: unexpected program behavior I Filtered or issued by the chipset I Lowest level of OS/application interaction Processor switches to kernel mode and calls specific exception service routine (or exception handler) Mechanism to implement system calls 273 / 352 11.
    [Show full text]
  • Understanding the Linux Kernel, 3Rd Edition by Daniel P
    1 Understanding the Linux Kernel, 3rd Edition By Daniel P. Bovet, Marco Cesati ............................................... Publisher: O'Reilly Pub Date: November 2005 ISBN: 0-596-00565-2 Pages: 942 Table of Contents | Index In order to thoroughly understand what makes Linux tick and why it works so well on a wide variety of systems, you need to delve deep into the heart of the kernel. The kernel handles all interactions between the CPU and the external world, and determines which programs will share processor time, in what order. It manages limited memory so well that hundreds of processes can share the system efficiently, and expertly organizes data transfers so that the CPU isn't kept waiting any longer than necessary for the relatively slow disks. The third edition of Understanding the Linux Kernel takes you on a guided tour of the most significant data structures, algorithms, and programming tricks used in the kernel. Probing beyond superficial features, the authors offer valuable insights to people who want to know how things really work inside their machine. Important Intel-specific features are discussed. Relevant segments of code are dissected line by line. But the book covers more than just the functioning of the code; it explains the theoretical underpinnings of why Linux does things the way it does. This edition of the book covers Version 2.6, which has seen significant changes to nearly every kernel subsystem, particularly in the areas of memory management and block devices. The book focuses on the following topics: • Memory management, including file buffering, process swapping, and Direct memory Access (DMA) • The Virtual Filesystem layer and the Second and Third Extended Filesystems • Process creation and scheduling • Signals, interrupts, and the essential interfaces to device drivers • Timing • Synchronization within the kernel • Interprocess Communication (IPC) • Program execution Understanding the Linux Kernel will acquaint you with all the inner workings of Linux, but it's more than just an academic exercise.
    [Show full text]
  • Multithreading for Gamedev Students
    Multithreading for Gamedev Students Keith O’Conor 3D Technical Lead Ubisoft Montreal @keithoconor - Who I am - PhD (Trinity College Dublin), Radical Entertainment (shipped Prototype 1 & 2), Ubisoft Montreal (shipped Watch_Dogs & Far Cry 4) - Who this is aimed at - Game programming students who don’t necessarily come from a strict computer science background - Some points might be basic for CS students, but all are relevant to gamedev - Slides available online at fragmentbuffer.com Overview • Hardware support • Common game engine threading models • Race conditions • Synchronization primitives • Atomics & lock-free • Hazards - Start at high level, finish in the basement - Will also talk about potential hazards and give an idea of why multithreading is hard Overview • Only an introduction ◦ Giving a vocabulary ◦ See references for further reading ◦ Learn by doing, hair-pulling - Way too big a topic for a single talk, each section could be its own series of talks Why multithreading? - Hitting power & heat walls when going for increased frequency alone - Go wide instead of fast - Use available resources more efficiently - Taken from http://www.karlrupp.net/2015/06/40-years-of-microprocessor- trend-data/ Hardware multithreading Hardware support - Before we use multithreading in games, we need to understand the various levels of hardware support that allow multiple instructions to be executed in parallel - There are many more aspects to hardware multithreading than we’ll look at here (processor pipelining, cache coherency protocols etc.) - Again,
    [Show full text]
  • CS 241 Honors Concurrent Data Structures
    CS 241 Honors Concurrent Data Structures Bhuvan Venkatesh University of Illinois Urbana{Champaign March 7, 2017 CS 241 Course Staff (UIUC) Lock Free Data Structures March 7, 2017 1 / 37 What to go over Terminology What is lock free Example of Lock Free Transactions and Linerazability The ABA problem Drawbacks Use Cases CS 241 Course Staff (UIUC) Lock Free Data Structures March 7, 2017 2 / 37 Atomic instructions - Instructions that happen in one step to the CPU or not at all. Some examples are atomic add, atomic compare and swap. Terminology Transaction - Like atomic operations, either the entire transaction goes through or doesn't. This could be a series of operations (like push or pop for a stack) CS 241 Course Staff (UIUC) Lock Free Data Structures March 7, 2017 3 / 37 Terminology Transaction - Like atomic operations, either the entire transaction goes through or doesn't. This could be a series of operations (like push or pop for a stack) Atomic instructions - Instructions that happen in one step to the CPU or not at all. Some examples are atomic add, atomic compare and swap. CS 241 Course Staff (UIUC) Lock Free Data Structures March 7, 2017 3 / 37 int atomic_cas(int *addr, int *expected, int value){ if(*addr == *expected){ *addr = value; return 1;//swap success }else{ *expected = *addr; return 0;//swap failed } } But this all happens atomically! Atomic Compare and Swap? CS 241 Course Staff (UIUC) Lock Free Data Structures March 7, 2017 4 / 37 But this all happens atomically! Atomic Compare and Swap? int atomic_cas(int *addr, int *expected,
    [Show full text]
  • Linux Kernel Synchronization
    10/27/12 Logical Diagram Binary Memory Threads Formats Allocators Linux kernel Today’s Lecture User SynchronizationSystem Calls Kernel synchronization in the kernel RCU File System Networking Sync Don Porter CSE 506 Memory Device CPU Management Drivers Scheduler Hardware Interrupts Disk Net Consistency Why Linux Warm-up synchronization? ò What is synchronization? ò A modern OS kernel is one of the most complicated parallel programs you can study ò Code on multiple CPUs coordinate their operations ò Examples: ò Other than perhaps a database ò Includes most common synchronization patterns ò Locking provides mutual exclusion while changing a pointer-based data structure ò And a few interesting, uncommon ones ò Threads might wait at a barrier for completion of a phase of computation ò Coordinating which CPU handles an interrupt The old days: They didn’t Historical perspective worry! ò Why did OSes have to worry so much about ò Early/simple OSes (like JOS, pre-lab4): No need for synchronization back when most computers have only synchronization one CPU? ò All kernel requests wait until completion – even disk requests ò Heavily restrict when interrupts can be delivered (all traps use an interrupt gate) ò No possibility for two CPUs to touch same data 1 10/27/12 Slightly more recently A slippery slope ò Optimize kernel performance by blocking inside the kernel ò We can enable interrupts during system calls ò Example: Rather than wait on expensive disk I/O, block and ò More complexity, lower latency schedule another process until it completes ò We can block in more places that make sense ò Cost: A bit of implementation complexity ò Better CPU usage, more complexity ò Need a lock to protect against concurrent update to pages/ inodes/etc.
    [Show full text]
  • Towards Scalable Synchronization on Multi-Cores
    Towards Scalable Synchronization on Multi-Cores THÈSE NO 7246 (2016) PRÉSENTÉE LE 21 OCTOBRE 2016 À LA FACULTÉ INFORMATIQUE ET COMMUNICATIONS LABORATOIRE DE PROGRAMMATION DISTRIBUÉE PROGRAMME DOCTORAL EN INFORMATIQUE ET COMMUNICATIONS ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES PAR Vasileios TRIGONAKIS acceptée sur proposition du jury: Prof. J. R. Larus, président du jury Prof. R. Guerraoui, directeur de thèse Dr T. Harris, rapporteur Dr G. Muller, rapporteur Prof. W. Zwaenepoel, rapporteur Suisse 2016 “We can only see a short distance ahead, but we can see plenty there that needs to be done.” — Alan Turing To my parents, Eirini and Charalampos Acknowledgements “The whole is greater than the sum of its parts.” — Aristotle To date, my education (i.e., diploma, M.Sc., and Ph.D.) has lasted for 13 years. I could not possibly be here and sustain all the pressure (and of course the financial expenses) without the help and support of my family, Eirini (my mother), Charalampos (my father), and Eleni (my sister). I want to deeply thank them for being there for me throughout these years. In my experience, a successful Ph.D. thesis in the area called “systems” (i.e., with a focus on software systems) requires either 10 years of solo work, or 5–6 years of fruitful collaborations. I was lucky enough to belong in the latter category and to have the chance to collaborate with many amazing people in producing the research that is included in this dissertation. This is actually the main reason why in the main body of this dissertation I use “we” instead of “I.” First and foremost, I would like to thank my advisor, Rachid Guerraoui.
    [Show full text]
  • Synchronizations in Linux
    W4118 Operating Systems Instructor: Junfeng Yang Learning goals of this lecture Different flavors of synchronization primitives and when to use them, in the context of Linux kernel How synchronization primitives are implemented for real “Portable” tricks: useful in other context as well (when you write a high performance server) Optimize for common case 2 Synchronization is complex and subtle Already learned this from the code examples we’ve seen Kernel synchronization is even more complex and subtle Higher requirements: performance, protection … Code heavily optimized, “fast path” often in assembly, fit within one cache line 3 Recall: Layered approach to synchronization Hardware provides simple low-level atomic operations , upon which we can build high-level, synchronization primitives , upon which we can implement critical sections and build correct multi-threaded/multi-process programs Properly synchronized application High-level synchronization primitives Hardware-provided low-level atomic operations 4 Outline Low-level synchronization primitives in Linux Memory barrier Atomic operations Synchronize with interrupts Spin locks High-level synchronization primitives in Linux Completion Semaphore Futex Mutex 5 Architectural dependency Implementation of synchronization primitives: highly architecture dependent Hardware provides atomic operations Most hardware platforms provide test-and-set or similar: examine and modify a memory location atomically Some don’t, but would inform if operation attempted was atomic 6 Memory
    [Show full text]