Reader-Writer Synchronization for Shared-Memory Multiprocessor Real-Time Systems∗

Total Page:16

File Type:pdf, Size:1020Kb

Reader-Writer Synchronization for Shared-Memory Multiprocessor Real-Time Systems∗ Reader-Writer Synchronization for Shared-Memory Multiprocessor Real-Time Systems∗ Bjorn¨ B. Brandenburg and James H. Anderson Department of Computer Science, University of North Carolina at Chapel Hill Abstract lar mutex lock “can unnecessarily reduce [concurrency] in the Reader preference, writer preference, and task-fair reader- critical path of event propagation.” [21] writer locks are shown to cause undue blocking in multipro- In practice, RW locks are in wide-spread use since they are cessor real-time systems. A new phase-fair reader-writer lock supported by all POSIX-compliant real-time operating sys- is proposed as an alternative that significantly reduces worst- tems. However, to the best of our knowledge, such locks have case blocking for readers and an efficient local-spin imple- not been analyzed in the context of multiprocessor real-time mentation is provided. Both task- and phase-fair locks are systems from a schedulability perspective. In fact, RW locks evaluated and contrasted to mutex locks in terms of hard and that are subject to starvation have been suggested to practi- soft real-time schedulability under consideration of runtime tioners without much concern for schedulability [24]. This overheads on a multicore computer. highlights the need for a well-understood analytically-sound real-time RW synchronization protocol. 1 Introduction Related work. In work on non-real-time systems, Cour- tois et al. were the first to investigate RW synchronization and With the transition to multicore architectures by most (if not proposed two semaphore-based RW locks [18]: a writer pref- all) major chip manufacturers, multiprocessors are now a stan- erence lock, wherein writers have higher priority than read- dard deployment platform for (soft) real-time applications. ers, and a reader preference lock, wherein readers have higher This has led to renewed interest in real-time multiprocessor priority than writers. Both are problematic for real-time sys- scheduling and synchronization algorithms (see [11, 14, 15, tems as they can give rise to extended delays and even starva- 17] for recent comparative studies and relevant references). tion. To improve reader throughput at the expense of writer However, prior work on synchronization has been somewhat throughput in large parallel systems, Hsieh and Weihl pro- limited, being mostly focused on mechanisms that ensure posed a semaphore-based RW lock wherein the lock state strict mutual exclusion (mutex). Reader-writer (RW) synchro- is distributed across processors [22]. In work on scalable nization, which requires mutual exclusion only for updates synchronization on shared-memory multiprocessors, Mellor- (and not for reads) has not been considered in prior work on Crummey and Scott proposed spin-based reader preference, real-time shared-memory multiprocessor systems despite its writer preference, and task-fair RW locks [28]. In a task- great practical relevance. fair RW lock, readers and writers gain access in strict FIFO The need for RW synchronization arises naturally in many order, which avoids starvation. In the same work, Mellor- situations. Two common examples are few-producers/many- Crummey and Scott also proposed local-spin versions of their consumers relationships (e.g., obtaining and distributing sen- RW locks, in which excessive memory bus traffic under high sor data) and rarely-changing shared state (e.g., configuration contention is avoided. An alternative local-spin implementa- data). As an example for the former, consider a robot such as tion of task-fair RW locks was later proposed by Krieger et TU Berlin’s autonomous helicopter Marvin [29]: its GPS re- al. [23]. A probabilistic performance analysis of task-fair RW ceiver updates the current position estimate 20 times per sec- locks wherein reader arrivals are modeled as a Poisson pro- ond, and the latest position estimate is read at various rates by cess was conducted by Reiman and Wright [31]. a flight controller and by image acquisition, camera targeting, In work on uniprocessor real-time systems, two relevant and communication modules. An example of rarely-changing suspension-based protocols have been proposed: Baker’s shared data occurs in work of Gore et al. [21], who employed stack resource policy [3] and Rajkumar’s read-write priority RW locks to optimize latency in a real-time notification ser- ceiling protocol [30]. The latter has also been studied in the vice. In their system, every incoming event must be matched context of distributed real-time databases [30]. against a shared subscription lookup table to determine the set An alternative to locking is the use of specialized non- of subscribing clients. Since events occur very frequently and blocking algorithms [1]. However, compared to lock-based changes to the table occur only very rarely, the use of a regu- RW synchronization, which allows in-place updates, non- blocking approaches usually require additional memory, incur ∗Work supported by IBM, Intel, and Sun Corps.; NSF grants CNS significant copying or retry overheads, and are less general.1 0834270, CNS 0834132, and CNS 0615197; and ARO grant W911NF-06- 1-0425. 1In non-blocking read-write algorithms, the value to be written must be 1 In this paper, we focus on lock-based RW synchronization in algorithm, wherein the earliest-deadline-first (EDF) algo- cache-coherent shared-memory multiprocessor systems. rithm is used on each processor, and in the global case, the Contributions. The contributions of this paper are as fol- global EDF (G-EDF) algorithm. However, the synchro- lows: (i) We study the applicability of existing RW locks nization algorithms considered herein apply equally to other to real-time systems and derive worst-case blocking bounds; scheduling algorithms that assign each job a fixed priority, (ii) we propose a novel type of RW lock based on the concept including partitioned static-priority (P-SP) scheduling. Fur- of “phase-fairness,” derive corresponding worst-case blocking ther, we consider both hard real-time (HRT) systems in which bounds, and present two efficient implementations; (iii) we deadlines should not be missed, and soft real-time systems report on the results of an extensive performance evaluation (SRT) in which bounded deadline tardiness is permissible. of RW synchronization choices in terms of hard and soft j Resources. When a job Ti is going to update (observe) the real-time schedulability under consideration of system over- state of a resource `, it issues a write (read) request ` ( `) heads. These experiments show that employing “phase-fair” for ` and is said to be a writer (reader). In the followingW R RW locks can significantly improve real-time performance. discussion, which applies equally to read and write requests, The rest of this paper is organized as follows. We sum- we denote a request for ` of either kind as `. marize relevant background in Sec. 2, and discuss our syn- j X ` is satisfied as soon as Ti holds `, and completes when chronization protocol and other RW lock choices in Sec. 3. j X j Ti releases `. ` denotes the maximum time that Ti will An empirical performance evaluation is presented in Sec. 4, j jX j hold `. Ti becomes blocked on ` if ` cannot be satisfied followed by our conclusions in Sec. 5. immediately. (A resource can be held byX multiple jobs simul- j taneously only if they are all readers.) If Ti issues another 2 Background request 0 before is complete, then 0 is nested within . We assumeX nestingX is proper, i.e., 0 mustX complete no laterX We consider the scheduling of a system of sporadic tasks, de- than completes. An outermost requestX is not nested within noted T ;:::;T , on m processors. The jth job (or invoca- 1 N any otherX request. A resource is global if it can be requested tion) of task T is denoted T j. Such a job T j becomes avail- i i i concurrently by jobs scheduled on different processors, and able for execution at its release time, r j . Each task is (Ti ) Ti local otherwise. specified by its worst-case (per-job) execution cost, e(Ti), its Infrequent requests. Some infrequently read or updated re- period, p(Ti), and its relative deadline, d(Ti) e(Ti).A j ≥ sources may not be requested by every job of a task. We as- job Ti should complete execution by its absolute deadline, j sociate a request period rp( ) with every request r(T ) + d(T ), and is tardy if it completes later. The spacing N i i to allow infrequent resource requestsX 2 to be represented in theX between job releases must satisfy r(T j+1) r(T j) + p(T ). i i i task model without introducing unnecessary pessimism. The A job that has not completed execution is either≥ preemptable request period limits the maximum request frequency: if jobs or non-preemptable. A job scheduled on a processor can only of T issue and rp( ) = k, then at most every kth job of be preempted when it is preemptable. Task T ’s utilization i i T issues X. X (or weight) reflects the processor share that it requires and is i X For example, consider a task Tm that reads a sensor `s given by e(Ti)=p(Ti). at a rate of ten samples per second via a read request s Multiprocessor scheduling. There are two fundamental ap- R (p(Tm) = 100ms) and that updates a shared variable `a stor- proaches to scheduling sporadic tasks on multiprocessors — ing a history of average readings once every two seconds via global and partitioned. With global scheduling, processors a write request a. In this case, assuming that every job of are scheduled by selecting jobs from a single, shared queue, W Tm issues a when analyzing contention for `a would be whereas with partitioned scheduling, each processor has a pri- unnecessarilyW pessimistic.
Recommended publications
  • Palmetto Tatters Guild Glossary of Tatting Terms (Not All Inclusive!)
    Palmetto Tatters Guild Glossary of Tatting Terms (not all inclusive!) Tat Days Written * or or To denote repeating lines of a pattern. Example: Rep directions or # or § from * 7 times & ( ) Eg. R: 5 + 5 – 5 – 5 (last P of prev R). Modern B Bead (Also see other bead notation below) notation BTS Bare Thread Space B or +B Put bead on picot before joining BBB B or BBB|B Three beads on knotting thread, and one bead on core thread Ch Chain ^ Construction picot or very small picot CTM Continuous Thread Method D Dimple: i.e. 1st Half Double Stitch X4, 2nd Half Double Stitch X4 dds or { } daisy double stitch DNRW DO NOT Reverse Work DS Double Stitch DP Down Picot: 2 of 1st HS, P, 2 of 2nd HS DPB Down Picot with a Bead: 2 of 1st HS, B, 2 of 2nd HS HMSR Half Moon Split Ring: Fold the second half of the Split Ring inward before closing so that the second side and first side arch in the same direction HR Half Ring: A ring which is only partially closed, so it looks like half of a ring 1st HS First Half of Double Stitch 2nd HS Second Half of Double Stitch JK Josephine Knot (Ring made of only the 1st Half of Double Stitch OR only the 2nd Half of Double Stitch) LJ or Sh+ or SLJ Lock Join or Shuttle join or Shuttle Lock Join LPPCh Last Picot of Previous Chain LPPR Last Picot of Previous Ring LS Lock Stitch: First Half of DS is NOT flipped, Second Half is flipped, as usual LCh Make a series of Lock Stitches to form a Lock Chain MP or FP Mock Picot or False Picot MR Maltese Ring Pearl Ch Chain made with pearl tatting technique (picots on both sides of the chain, made using three threads) P or - Picot Page 1 of 4 PLJ or ‘PULLED LOOP’ join or ‘PULLED LOCK’ join since it is actually a lock join made after placing thread under a finished ring and pulling this thread through a picot.
    [Show full text]
  • Logix 5000 Controllers Security, 1756-PM016O-EN-P
    Logix 5000 Controllers Security 1756 ControlLogix, 1756 GuardLogix, 1769 CompactLogix, 1769 Compact GuardLogix, 1789 SoftLogix, 5069 CompactLogix, 5069 Compact GuardLogix, Studio 5000 Logix Emulate Programming Manual Original Instructions Logix 5000 Controllers Security Important User Information Read this document and the documents listed in the additional resources section about installation, configuration, and operation of this equipment before you install, configure, operate, or maintain this product. Users are required to familiarize themselves with installation and wiring instructions in addition to requirements of all applicable codes, laws, and standards. Activities including installation, adjustments, putting into service, use, assembly, disassembly, and maintenance are required to be carried out by suitably trained personnel in accordance with applicable code of practice. If this equipment is used in a manner not specified by the manufacturer, the protection provided by the equipment may be impaired. In no event will Rockwell Automation, Inc. be responsible or liable for indirect or consequential damages resulting from the use or application of this equipment. The examples and diagrams in this manual are included solely for illustrative purposes. Because of the many variables and requirements associated with any particular installation, Rockwell Automation, Inc. cannot assume responsibility or liability for actual use based on the examples and diagrams. No patent liability is assumed by Rockwell Automation, Inc. with respect to use of information, circuits, equipment, or software described in this manual. Reproduction of the contents of this manual, in whole or in part, without written permission of Rockwell Automation, Inc., is prohibited. Throughout this manual, when necessary, we use notes to make you aware of safety considerations.
    [Show full text]
  • Stclang: State Thread Composition As a Foundation for Monadic Dataflow Parallelism Sebastian Ertel∗ Justus Adam Norman A
    STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism Sebastian Ertel∗ Justus Adam Norman A. Rink Dresden Research Lab Chair for Compiler Construction Chair for Compiler Construction Huawei Technologies Technische Universität Dresden Technische Universität Dresden Dresden, Germany Dresden, Germany Dresden, Germany [email protected] [email protected] [email protected] Andrés Goens Jeronimo Castrillon Chair for Compiler Construction Chair for Compiler Construction Technische Universität Dresden Technische Universität Dresden Dresden, Germany Dresden, Germany [email protected] [email protected] Abstract using monad-par and LVars to expose parallelism explicitly Dataflow execution models are used to build highly scalable and reach the same level of performance, showing that our parallel systems. A programming model that targets parallel programming model successfully extracts parallelism that dataflow execution must answer the following question: How is present in an algorithm. Further evaluation shows that can parallelism between two dependent nodes in a dataflow smap is expressive enough to implement parallel reductions graph be exploited? This is difficult when the dataflow lan- and our programming model resolves short-comings of the guage or programming model is implemented by a monad, stream-based programming model for current state-of-the- as is common in the functional community, since express- art big data processing systems. ing dependence between nodes by a monadic bind suggests CCS Concepts • Software and its engineering → Func- sequential execution. Even in monadic constructs that explic- tional languages. itly separate state from computation, problems arise due to the need to reason about opaquely defined state.
    [Show full text]
  • Multithreading Design Patterns and Thread-Safe Data Structures
    Lecture 12: Multithreading Design Patterns and Thread-Safe Data Structures Principles of Computer Systems Autumn 2019 Stanford University Computer Science Department Lecturer: Chris Gregg Philip Levis PDF of this presentation 1 Review from Last Week We now have three distinct ways to coordinate between threads: mutex: mutual exclusion (lock), used to enforce critical sections and atomicity condition_variable: way for threads to coordinate and signal when a variable has changed (integrates a lock for the variable) semaphore: a generalization of a lock, where there can be n threads operating in parallel (a lock is a semaphore with n=1) 2 Mutual Exclusion (mutex) A mutex is a simple lock that is shared between threads, used to protect critical regions of code or shared data structures. mutex m; mutex.lock() mutex.unlock() A mutex is often called a lock: the terms are mostly interchangeable When a thread attempts to lock a mutex: Currently unlocked: the thread takes the lock, and continues executing Currently locked: the thread blocks until the lock is released by the current lock- holder, at which point it attempts to take the lock again (and could compete with other waiting threads). Only the current lock-holder is allowed to unlock a mutex Deadlock can occur when threads form a circular wait on mutexes (e.g. dining philosophers) Places we've seen an operating system use mutexes for us: All file system operation (what if two programs try to write at the same time? create the same file?) Process table (what if two programs call fork() at the same time?) 3 lock_guard<mutex> The lock_guard<mutex> is very simple: it obtains the lock in its constructor, and releases the lock in its destructor.
    [Show full text]
  • APPLYING MODEL-VIEW-CONTROLLER (MVC) in DESIGN and DEVELOPMENT of INFORMATION SYSTEMS an Example of Smart Assistive Script Breakdown in an E-Business Application
    APPLYING MODEL-VIEW-CONTROLLER (MVC) IN DESIGN AND DEVELOPMENT OF INFORMATION SYSTEMS An Example of Smart Assistive Script Breakdown in an e-Business Application Andreas Holzinger, Karl Heinz Struggl Institute of Information Systems and Computer Media (IICM), TU Graz, Graz, Austria Matjaž Debevc Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia Keywords: Information Systems, Software Design Patterns, Model-view-controller (MVC), Script Breakdown, Film Production. Abstract: Information systems are supporting professionals in all areas of e-Business. In this paper we concentrate on our experiences in the design and development of information systems for the use in film production processes. Professionals working in this area are neither computer experts, nor interested in spending much time for information systems. Consequently, to provide a useful, useable and enjoyable application the system must be extremely suited to the requirements and demands of those professionals. One of the most important tasks at the beginning of a film production is to break down the movie script into its elements and aspects, and create a solid estimate of production costs based on the resulting breakdown data. Several film production software applications provide interfaces to support this task. However, most attempts suffer from numerous usability deficiencies. As a result, many film producers still use script printouts and textmarkers to highlight script elements, and transfer the data manually into their film management software. This paper presents a novel approach for unobtrusive and efficient script breakdown using a new way of breaking down text into its relevant elements. We demonstrate how the implementation of this interface benefits from employing the Model-View-Controller (MVC) as underlying software design paradigm in terms of both software development confidence and user satisfaction.
    [Show full text]
  • Objective C Runtime Reference
    Objective C Runtime Reference Drawn-out Britt neighbour: he unscrambling his grosses sombrely and professedly. Corollary and spellbinding Web never nickelised ungodlily when Lon dehumidify his blowhard. Zonular and unfavourable Iago infatuate so incontrollably that Jordy guesstimate his misinstruction. Proper fixup to subclassing or if necessary, objective c runtime reference Security and objects were native object is referred objects stored in objective c, along from this means we have already. Use brake, or perform certificate pinning in there attempt to deter MITM attacks. An object which has a reference to a class It's the isa is a and that's it This is fine every hierarchy in Objective-C needs to mount Now what's. Use direct access control the man page. This function allows us to voluntary a reference on every self object. The exception handling code uses a header file implementing the generic parts of the Itanium EH ABI. If the method is almost in the cache, thanks to Medium Members. All reference in a function must not control of data with references which met. Understanding the Objective-C Runtime Logo Table Of Contents. Garbage collection was declared deprecated in OS X Mountain Lion in exercise of anxious and removed from as Objective-C runtime library in macOS Sierra. Objective-C Runtime Reference. It may not access to be used to keep objects are really calling conventions and aggregate operations. Thank has for putting so in effort than your posts. This will cut down on the alien of Objective C runtime information. Given a daily Objective-C compiler and runtime it should be relate to dent a.
    [Show full text]
  • Lock Free Data Structures Using STM in Haskell
    Lock Free Data Structures using STM in Haskell Anthony Discolo1, Tim Harris2, Simon Marlow2, Simon Peyton Jones2, Satnam Singh1 1 Microsoft, One Microsoft Way, Redmond, WA 98052, USA {adiscolo, satnams}@microsoft.com http://www.research.microsoft.com/~satnams 2 Microsoft Research, 7 JJ Thomson Avenue, Cambridge, CB3 0FB, United Kingdon {tharris, simonmar, simonpj}@microsoft.com Abstract. This paper explores the feasibility of re-expressing concurrent algo- rithms with explicit locks in terms of lock free code written using Haskell’s im- plementation of software transactional memory. Experimental results are pre- sented which show that for multi-processor systems the simpler lock free im- plementations offer superior performance when compared to their correspond- ing lock based implementations. 1 Introduction This paper explores the feasibility of re-expressing lock based data structures and their associated operations in a functional language using a lock free methodology based on Haskell’s implementation of composable software transactional memory (STM) [1]. Previous research has suggested that transactional memory may offer a simpler abstraction for concurrent programming that avoids deadlocks [4][5][6][10]. Although there is much recent research activity in the area of software transactional memories much of the work has focused on implementation. This paper explores software engineering aspects of using STM for a realistic concurrent data structure. Furthermore, we consider the runtime costs of using STM compared with a more lock-based design. To explore the software engineering aspects, we took an existing well-designed concurrent library and re-expressed part of it in Haskell, in two ways: first by using explicit locks, and second using STM.
    [Show full text]
  • Process Scheduling
    PROCESS SCHEDULING ANIRUDH JAYAKUMAR LAST TIME • Build a customized Linux Kernel from source • System call implementation • Interrupts and Interrupt Handlers TODAY’S SESSION • Process Management • Process Scheduling PROCESSES • “ a program in execution” • An active program with related resources (instructions and data) • Short lived ( “pwd” executed from terminal) or long-lived (SSH service running as a background process) • A.K.A tasks – the kernel’s point of view • Fundamental abstraction in Unix THREADS • Objects of activity within the process • One or more threads within a process • Asynchronous execution • Each thread includes a unique PC, process stack, and set of processor registers • Kernel schedules individual threads, not processes • tasks are Linux threads (a.k.a kernel threads) TASK REPRESENTATION • The kernel maintains info about each process in a process descriptor, of type task_struct • See include/linux/sched.h • Each task descriptor contains info such as run-state of process, address space, list of open files, process priority etc • The kernel stores the list of processes in a circular doubly linked list called the task list. TASK LIST • struct list_head tasks; • init the "mother of all processes” – statically allocated • extern struct task_struct init_task; • for_each_process() - iterates over the entire task list • next_task() - returns the next task in the list PROCESS STATE • TASK_RUNNING: running or on a run-queue waiting to run • TASK_INTERRUPTIBLE: sleeping, waiting for some event to happen; awakes prematurely if it receives a signal • TASK_UNINTERRUPTIBLE: identical to TASK_INTERRUPTIBLE except it ignores signals • TASK_ZOMBIE: The task has terminated, but its parent has not yet issued a wait4(). The task's process descriptor must remain in case the parent wants to access it.
    [Show full text]
  • Scheduling: Introduction
    7 Scheduling: Introduction By now low-level mechanisms of running processes (e.g., context switch- ing) should be clear; if they are not, go back a chapter or two, and read the description of how that stuff works again. However, we have yet to un- derstand the high-level policies that an OS scheduler employs. We will now do just that, presenting a series of scheduling policies (sometimes called disciplines) that various smart and hard-working people have de- veloped over the years. The origins of scheduling, in fact, predate computer systems; early approaches were taken from the field of operations management and ap- plied to computers. This reality should be no surprise: assembly lines and many other human endeavors also require scheduling, and many of the same concerns exist therein, including a laser-like desire for efficiency. And thus, our problem: THE CRUX: HOW TO DEVELOP SCHEDULING POLICY How should we develop a basic framework for thinking about scheduling policies? What are the key assumptions? What metrics are important? What basic approaches have been used in the earliest of com- puter systems? 7.1 Workload Assumptions Before getting into the range of possible policies, let us first make a number of simplifying assumptions about the processes running in the system, sometimes collectively called the workload. Determining the workload is a critical part of building policies, and the more you know about workload, the more fine-tuned your policy can be. The workload assumptions we make here are mostly unrealistic, but that is alright (for now), because we will relax them as we go, and even- tually develop what we will refer to as ..
    [Show full text]
  • Scheduling Light-Weight Parallelism in Artcop
    Scheduling Light-Weight Parallelism in ArTCoP J. Berthold1,A.AlZain2,andH.-W.Loidl3 1 Fachbereich Mathematik und Informatik Philipps-Universit¨at Marburg, D-35032 Marburg, Germany [email protected] 2 School of Mathematical and Computer Sciences Heriot-Watt University, Edinburgh EH14 4AS, Scotland [email protected] 3 Institut f¨ur Informatik, Ludwig-Maximilians-Universit¨at M¨unchen, Germany [email protected] Abstract. We present the design and prototype implementation of the scheduling component in ArTCoP (architecture transparent control of parallelism), a novel run-time environment (RTE) for parallel execution of high-level languages. A key feature of ArTCoP is its support for deep process and memory hierarchies, shown in the scheduler by supporting light-weight threads. To realise a system with easily exchangeable components, the system defines a micro-kernel, providing basic infrastructure, such as garbage collection. All complex RTE operations, including the handling of parallelism, are implemented at a separate system level. By choosing Concurrent Haskell as high-level system language, we obtain a prototype in the form of an executable specification that is easier to maintain and more flexible than con- ventional RTEs. We demonstrate the flexibility of this approach by presenting implementations of a scheduler for light-weight threads in ArTCoP, based on GHC Version 6.6. Keywords: Parallel computation, functional programming, scheduling. 1 Introduction In trying to exploit the computational power of parallel architectures ranging from multi-core machines to large-scale computational Grids, we are currently developing a new parallel runtime environment, ArTCoP, for executing parallel Haskell code on such complex, hierarchical architectures.
    [Show full text]
  • Lock-Free Programming
    Lock-Free Programming Geoff Langdale L31_Lockfree 1 Desynchronization ● This is an interesting topic ● This will (may?) become even more relevant with near ubiquitous multi-processing ● Still: please don’t rewrite any Project 3s! L31_Lockfree 2 Synchronization ● We received notification via the web form that one group has passed the P3/P4 test suite. Congratulations! ● We will be releasing a version of the fork-wait bomb which doesn't make as many assumptions about task id's. – Please look for it today and let us know right away if it causes any trouble for you. ● Personal and group disk quotas have been grown in order to reduce the number of people running out over the weekend – if you try hard enough you'll still be able to do it. L31_Lockfree 3 Outline ● Problems with locking ● Definition of Lock-free programming ● Examples of Lock-free programming ● Linux OS uses of Lock-free data structures ● Miscellanea (higher-level constructs, ‘wait-freedom’) ● Conclusion L31_Lockfree 4 Problems with Locking ● This list is more or less contentious, not equally relevant to all locking situations: – Deadlock – Priority Inversion – Convoying – “Async-signal-safety” – Kill-tolerant availability – Pre-emption tolerance – Overall performance L31_Lockfree 5 Problems with Locking 2 ● Deadlock – Processes that cannot proceed because they are waiting for resources that are held by processes that are waiting for… ● Priority inversion – Low-priority processes hold a lock required by a higher- priority process – Priority inheritance a possible solution L31_Lockfree
    [Show full text]
  • Scheduling Weakly Consistent C Concurrency for Reconfigurable
    SUBMISSION TO IEEE TRANS. ON COMPUTERS 1 Scheduling Weakly Consistent C Concurrency for Reconfigurable Hardware Nadesh Ramanathan, John Wickerson, Member, IEEE, and George A. Constantinides Senior Member, IEEE Abstract—Lock-free algorithms, in which threads synchronise These reorderings are invisible in a single-threaded context, not via coarse-grained mutual exclusion but via fine-grained but in a multi-threaded context, they can introduce unexpected atomic operations (‘atomics’), have been shown empirically to behaviours. For instance, if another thread is simultaneously be the fastest class of multi-threaded algorithms in the realm of conventional processors. This article explores how these writing to z, then reordering two instructions above may algorithms can be compiled from C to reconfigurable hardware introduce the behaviour where x is assigned the latest value via high-level synthesis (HLS). but y gets an old one.1 We focus on the scheduling problem, in which software The implication of this is not that existing HLS tools are instructions are assigned to hardware clock cycles. We first wrong; these optimisations can only introduce new behaviours show that typical HLS scheduling constraints are insufficient to implement atomics, because they permit some instruction when the code already exhibits a race condition, and races reorderings that, though sound in a single-threaded context, are deemed a programming error in C [2, §5.1.2.4]. Rather, demonstrably cause erroneous results when synthesising multi- the implication is that if these memory accesses are upgraded threaded programs. We then show that correct behaviour can be to become atomic (and hence allowed to race), then existing restored by imposing additional intra-thread constraints among scheduling constraints are insufficient.
    [Show full text]