Reader-Writer Synchronization for Shared-Memory Multiprocessor Real-Time Systems∗
Total Page:16
File Type:pdf, Size:1020Kb
Reader-Writer Synchronization for Shared-Memory Multiprocessor Real-Time Systems∗ Bjorn¨ B. Brandenburg and James H. Anderson Department of Computer Science, University of North Carolina at Chapel Hill Abstract lar mutex lock “can unnecessarily reduce [concurrency] in the Reader preference, writer preference, and task-fair reader- critical path of event propagation.” [21] writer locks are shown to cause undue blocking in multipro- In practice, RW locks are in wide-spread use since they are cessor real-time systems. A new phase-fair reader-writer lock supported by all POSIX-compliant real-time operating sys- is proposed as an alternative that significantly reduces worst- tems. However, to the best of our knowledge, such locks have case blocking for readers and an efficient local-spin imple- not been analyzed in the context of multiprocessor real-time mentation is provided. Both task- and phase-fair locks are systems from a schedulability perspective. In fact, RW locks evaluated and contrasted to mutex locks in terms of hard and that are subject to starvation have been suggested to practi- soft real-time schedulability under consideration of runtime tioners without much concern for schedulability [24]. This overheads on a multicore computer. highlights the need for a well-understood analytically-sound real-time RW synchronization protocol. 1 Introduction Related work. In work on non-real-time systems, Cour- tois et al. were the first to investigate RW synchronization and With the transition to multicore architectures by most (if not proposed two semaphore-based RW locks [18]: a writer pref- all) major chip manufacturers, multiprocessors are now a stan- erence lock, wherein writers have higher priority than read- dard deployment platform for (soft) real-time applications. ers, and a reader preference lock, wherein readers have higher This has led to renewed interest in real-time multiprocessor priority than writers. Both are problematic for real-time sys- scheduling and synchronization algorithms (see [11, 14, 15, tems as they can give rise to extended delays and even starva- 17] for recent comparative studies and relevant references). tion. To improve reader throughput at the expense of writer However, prior work on synchronization has been somewhat throughput in large parallel systems, Hsieh and Weihl pro- limited, being mostly focused on mechanisms that ensure posed a semaphore-based RW lock wherein the lock state strict mutual exclusion (mutex). Reader-writer (RW) synchro- is distributed across processors [22]. In work on scalable nization, which requires mutual exclusion only for updates synchronization on shared-memory multiprocessors, Mellor- (and not for reads) has not been considered in prior work on Crummey and Scott proposed spin-based reader preference, real-time shared-memory multiprocessor systems despite its writer preference, and task-fair RW locks [28]. In a task- great practical relevance. fair RW lock, readers and writers gain access in strict FIFO The need for RW synchronization arises naturally in many order, which avoids starvation. In the same work, Mellor- situations. Two common examples are few-producers/many- Crummey and Scott also proposed local-spin versions of their consumers relationships (e.g., obtaining and distributing sen- RW locks, in which excessive memory bus traffic under high sor data) and rarely-changing shared state (e.g., configuration contention is avoided. An alternative local-spin implementa- data). As an example for the former, consider a robot such as tion of task-fair RW locks was later proposed by Krieger et TU Berlin’s autonomous helicopter Marvin [29]: its GPS re- al. [23]. A probabilistic performance analysis of task-fair RW ceiver updates the current position estimate 20 times per sec- locks wherein reader arrivals are modeled as a Poisson pro- ond, and the latest position estimate is read at various rates by cess was conducted by Reiman and Wright [31]. a flight controller and by image acquisition, camera targeting, In work on uniprocessor real-time systems, two relevant and communication modules. An example of rarely-changing suspension-based protocols have been proposed: Baker’s shared data occurs in work of Gore et al. [21], who employed stack resource policy [3] and Rajkumar’s read-write priority RW locks to optimize latency in a real-time notification ser- ceiling protocol [30]. The latter has also been studied in the vice. In their system, every incoming event must be matched context of distributed real-time databases [30]. against a shared subscription lookup table to determine the set An alternative to locking is the use of specialized non- of subscribing clients. Since events occur very frequently and blocking algorithms [1]. However, compared to lock-based changes to the table occur only very rarely, the use of a regu- RW synchronization, which allows in-place updates, non- blocking approaches usually require additional memory, incur ∗Work supported by IBM, Intel, and Sun Corps.; NSF grants CNS significant copying or retry overheads, and are less general.1 0834270, CNS 0834132, and CNS 0615197; and ARO grant W911NF-06- 1-0425. 1In non-blocking read-write algorithms, the value to be written must be 1 In this paper, we focus on lock-based RW synchronization in algorithm, wherein the earliest-deadline-first (EDF) algo- cache-coherent shared-memory multiprocessor systems. rithm is used on each processor, and in the global case, the Contributions. The contributions of this paper are as fol- global EDF (G-EDF) algorithm. However, the synchro- lows: (i) We study the applicability of existing RW locks nization algorithms considered herein apply equally to other to real-time systems and derive worst-case blocking bounds; scheduling algorithms that assign each job a fixed priority, (ii) we propose a novel type of RW lock based on the concept including partitioned static-priority (P-SP) scheduling. Fur- of “phase-fairness,” derive corresponding worst-case blocking ther, we consider both hard real-time (HRT) systems in which bounds, and present two efficient implementations; (iii) we deadlines should not be missed, and soft real-time systems report on the results of an extensive performance evaluation (SRT) in which bounded deadline tardiness is permissible. of RW synchronization choices in terms of hard and soft j Resources. When a job Ti is going to update (observe) the real-time schedulability under consideration of system over- state of a resource `, it issues a write (read) request ` ( `) heads. These experiments show that employing “phase-fair” for ` and is said to be a writer (reader). In the followingW R RW locks can significantly improve real-time performance. discussion, which applies equally to read and write requests, The rest of this paper is organized as follows. We sum- we denote a request for ` of either kind as `. marize relevant background in Sec. 2, and discuss our syn- j X ` is satisfied as soon as Ti holds `, and completes when chronization protocol and other RW lock choices in Sec. 3. j X j Ti releases `. ` denotes the maximum time that Ti will An empirical performance evaluation is presented in Sec. 4, j jX j hold `. Ti becomes blocked on ` if ` cannot be satisfied followed by our conclusions in Sec. 5. immediately. (A resource can be held byX multiple jobs simul- j taneously only if they are all readers.) If Ti issues another 2 Background request 0 before is complete, then 0 is nested within . We assumeX nestingX is proper, i.e., 0 mustX complete no laterX We consider the scheduling of a system of sporadic tasks, de- than completes. An outermost requestX is not nested within noted T ;:::;T , on m processors. The jth job (or invoca- 1 N any otherX request. A resource is global if it can be requested tion) of task T is denoted T j. Such a job T j becomes avail- i i i concurrently by jobs scheduled on different processors, and able for execution at its release time, r j . Each task is (Ti ) Ti local otherwise. specified by its worst-case (per-job) execution cost, e(Ti), its Infrequent requests. Some infrequently read or updated re- period, p(Ti), and its relative deadline, d(Ti) e(Ti).A j ≥ sources may not be requested by every job of a task. We as- job Ti should complete execution by its absolute deadline, j sociate a request period rp( ) with every request r(T ) + d(T ), and is tardy if it completes later. The spacing N i i to allow infrequent resource requestsX 2 to be represented in theX between job releases must satisfy r(T j+1) r(T j) + p(T ). i i i task model without introducing unnecessary pessimism. The A job that has not completed execution is either≥ preemptable request period limits the maximum request frequency: if jobs or non-preemptable. A job scheduled on a processor can only of T issue and rp( ) = k, then at most every kth job of be preempted when it is preemptable. Task T ’s utilization i i T issues X. X (or weight) reflects the processor share that it requires and is i X For example, consider a task Tm that reads a sensor `s given by e(Ti)=p(Ti). at a rate of ten samples per second via a read request s Multiprocessor scheduling. There are two fundamental ap- R (p(Tm) = 100ms) and that updates a shared variable `a stor- proaches to scheduling sporadic tasks on multiprocessors — ing a history of average readings once every two seconds via global and partitioned. With global scheduling, processors a write request a. In this case, assuming that every job of are scheduled by selecting jobs from a single, shared queue, W Tm issues a when analyzing contention for `a would be whereas with partitioned scheduling, each processor has a pri- unnecessarilyW pessimistic.