Multiprocessor Real-Time Locking Protocols: a Systematic Review
Total Page:16
File Type:pdf, Size:1020Kb
Revision 1 September 2019 Multiprocessor Real-Time Locking Protocols A Systematic Review Björn B. Brandenburg Max Planck Institute for Software Systems (MPI-SWS) Kaiserslautern, Germany Abstract We systematically survey the literature on analytically sound multiprocessor real-time locking protocols from 1988 until 2018, covering the following topics: • progress mechanisms that prevent the lock-holder preemption problem (Section 3), • spin-lock protocols (Section 4), • binary semaphore protocols (Sections 5 and 6), • independence-preserving (or fully preemptive) locking protocols (Section 7), • reader-writer and k-exclusion synchronization (Section 8), • support for nested critical sections (Section 9), and • implementation and system-integration aspects (Section 10). A special focus is placed on the suspension-oblivious and suspension-aware analysis approaches for semaphore protocols, their respective notions of priority inversion, optimality criteria, lower bounds on arXiv:1909.09600v1 [cs.DC] 20 Sep 2019 maximum priority-inversion blocking, and matching asymptotically optimal locking protocols. 1 Introduction In contrast to the thoroughly explored and well understood uniprocessor real-time synchronization problem, the multiprocessor case considered herein is still the subject of much ongoing work. In particular, uniprocessor real-time locking protocols that are both optimal and practical are readily available since the late 1980s and early 1990s [23, 163, 179], can flexibly support mutual exclusion, © 2017–2019 B. Brandenburg. All rights reserved. Multiprocessor Real-Time Locking Protocols — A Systematic Review B. Brandenburg reader-writer synchronization, multi-unit resources, and have been widely adopted and deployed in industry (in POSIX, OSEK/AUTOSAR, etc.). Not so in the multiprocessor case: there is no widely agreed- upon standard protocol or approach, most proposals have focused exclusively on mutual exclusion to date (with works on reader-writer, k-exclusion, and multi-unit synchronization starting to appear only in the past decade), and questions of optimality, practicality, and industrial adoption are still the subject of ongoing exploration and debate. Another major difference to the uniprocessor case is the extent to which nested critical sections are supported: a large fraction of the work on multiprocessor real-time locking protocols to date has simply disregarded (or defined away) fine-grained nesting (i.e., cases where a task holding a resource may dynamically acquire a second resource), though notable exceptions exist (discussed in Section 9). No such limitations exist in the state of the art w.r.t. uniprocessor real-time synchronization. We can thus expect the field to continue to evolve rapidly: it is simply not yet possible to provide a “final” and comprehensive survey on multiprocessor real-time locking, as many open problems remain to be explored. Nonetheless, a considerable number of results has accumulated since the multiprocessor real-time locking problem was first studied more than three decades ago. The purpose of this survey is to provide a systematic review of the current snapshot of this body of work, covering most papers in this area published until the end of 2018. We restrict the scope of this survey to real-time locking protocols for shared-memory multiprocessors (although some of the techniques discussed in Section 6 could also find applications in distributed sys- tems), and do not consider alternative synchronization strategies such as lock- and wait-free algorithms, transactional memory techniques, or middleware (or database) approaches that provide a higher-level transactional interface or multi-versioned datastore abstraction (as any of these techniques warrants a full survey on its own). We further restrict the focus to runtime mechanisms as commonly provided by real-time operating systems (RTOSs) or programming language runtimes for use in dynamically scheduled systems and exclude fully static planning approaches (e.g., as studied by Xu [212]) that resolve (or avoid) all potential resource conflicts statically during the construction of the system’s scheduling table so that no runtime resource arbitration mechanisms are needed. Our goal is to structure the existing body of knowledge on multiprocessor real-time locking protocols to aid the interested reader in understanding key problems, established solutions, and recurring themes and techniques. We therefore focus primarily on ideas, algorithms, and provable guarantees, and place less emphasis on empirical performance comparisons or the chronological order of developments. 2 Multiprocessor Real-Time Locking Protocols — A Systematic Review B. Brandenburg 2 The Multiprocessor Real-Time Locking Problem We begin by defining the core problems and objectives, summarizing common assumptions, and surveying key design parameters. Consider a shared-memory multiprocessor platform consisting of m identical processors (or cores) hosting n sequential tasks (or threads) denoted as τ1,..., τn. Each activation of a task is called a job. In this document, we use the terms ‘processor‘ and ‘core‘ interchangeably, and do not precisely distinguish between jobs and tasks when the meaning is clear from context. The tasks share a number of software-managed shared resources `1, `2, `3,... that are explicitly acquired and released with lock() and unlock() calls.1 Common examples of such resources include shared data structures, OS-internal structures such as the scheduler’s ready queue(s), I/O ports, memory- mapped device registers, ring buffers, etc. that must be accessed in a mutually exclusive fashion. (We consider weaker exclusion requirements later in Section 8.) The primary objective of the algorithms considered herein is to serialize all resource requests—that is, all critical sections, which are code segments surrounded by matching lock() and unlock() calls—such that the timing constraints of all tasks are met, despite the blocking delays that tasks incur when waiting to lock a contested resource. In particular, in order to give nontrivial response-time guarantees, one must bound the maximum (i.e., worst-case) blocking delay incurred by any task due to contention for shared resources. To this end, a multiprocessor real-time locking protocol determines which type of locks are used, the rules that tasks must follow to request a lock on a shared resource, and how locks interact with the scheduler. We discuss these design questions in more detail below. 2.1 Common Assumptions Although there exists much diversity in system models and general context, most surveyed works share the following basic assumptions (significant deviations will be noted where relevant). Tasks are typically considered to follow a periodic or sporadic activation pattern, where the two models can be used largely interchangeably since hardly any work on multiprocessor real-time locking protocols to date has exploited the knowledge of future periodic arrivals.2 The timing requirements of the tasks are usually expressed as implicit or constrained deadlines (rather than arbitrary deadlines), since arbitrary deadlines that exceed a task’s period (or minimum inter-arrival time) allow for increased contention 1Multicore platforms also often feature hardware-managed, implicitly shared resources such as last-level caches (LLCs), memory controllers, DRAM banks, etc.; techniques for managing such hardware resources are not the subject of this survey. 2 Notable exceptions include proposals by Chen et al. [64] and Shi et al. [181]. 3 Multiprocessor Real-Time Locking Protocols — A Systematic Review B. Brandenburg and cause additional analytical complications. For the purpose of schedulability analysis, but not for the operation of a locking protocol at runtime, it is required to know a worst-case execution time (WCET) bound for each task, usually including the cost of all critical sections (but not including any blocking delays), and also individually for each critical section (i.e., a maximum critical section length must be known for each critical section). Furthermore, to enable a meaningful blocking analysis, the maximum number of critical sections in each job (i.e., the maximum number of lock() calls per activation of each task) must be known on a per-resource basis. While it is generally impossible to make strong a priori response-time guarantees without this information being available at analysis time (for at least some of the tasks), to be practical, it is generally desirable for a locking protocol to work as intended even if this information is unknown at runtime. For example, RTOSs are typically used for many purposes, and not all workloads will be subject to static analysis—the implemented locking protocol should function correctly and predictably nonetheless. Similarly, during early prototyping phases, sound bounds are usually not yet available, but the RTOS is expected to behave just as it will in the final version of the system. With regard to scheduling, most of the covered papers assume either partitioned or global multiprocessor scheduling. Under partitioned scheduling, each task is statically assigned to exactly one of the m processors (i.e., its partition), and each processor is scheduled using a uniprocessor policy such as fixed-priority (FP) scheduling or earliest-deadline first (EDF) scheduling. The two most prominent partitioned policies are partitioned FP (P-FP) and partitioned EDF (P-EDF) scheduling. Under global scheduling, all tasks are dynamically dispatched at runtime, may