Exploring Real-Time Futexes in LITMUSRT

Fast on Average, Predictable in the Worst Case: Exploring Real-Time Futexes in LITMUSRT Roy Spliet Manohar Vanga Bjorn¨ B. Brandenburg Sven Dziadek MPI-SWS MPI-SWS MPI-SWS TU Dresden [email protected] [email protected] [email protected] [email protected] Abstract—This paper explores the problem of how to improve locking overhead in such applications allows for supporting the average-case performance of real-time locking protocols, much higher throughput (e.g. higher frame-rate) and reducing preferably without significantly deteriorating worst-case perfor- the number of deadline misses. Another advantage is in mixed- mance. Motivated by the futex implementation in Linux, where uncontended locks under the Priority Inheritance Protocol (PIP) criticality systems where optimizing for the average-case can do not incur kernel-switching overheads, we extend this concept help avoid deadline misses in low-criticality tasks. While these to more sophisticated protocols; namely the PCP, the MPCP and examples motivate the need for low average-case overheads, the FMLP+. We identify the challenges involved in implementing system schedulability is ultimately based on the worst-case futexes in these protocols and present the design and evaluation overheads. Thus, average-case overhead reductions must not of their implementation under LITMUSRT, a real-time extension of the Linux kernel. Our evaluation shows that the average-case significantly increase worst-case overheads. locking overhead is improved by up to 87% for some scenarios, An effective approach for achieving these goals is through but it does come at a price in the worst case overhead. the support for the fast userspace mutexes (futexes) [11] in the Linux kernel, a mechanism that supports efficient lock I. INTRODUCTION implementations with low average-case overheads. By ex- Suspension-based real-time locking protocols, such as the porting lock-state information to userspace, expensive system Priority Inheritance Protocol (PIP), are used in real-time calls can be avoided when a lock is uncontended. Linux’s systems to ensure mutually exclusive access to shared re- PIP implementation uses futexes to reduce the average-case sources while preventing unbounded priority inversions, where locking overhead. a higher priority task that should be scheduled can be delayed While prior work [1] has successfully applied the futex by lower priority tasks for potentially unbounded durations. approach to the PIP, the approach does not easily generalize to Such blocking, termed priority-inversion blocking (henceforth more sophisticated protocols. This is problematic as the PIP is pi-blocking), increases worst-case response times of tasks. limited in its theoretical properties: it has non-optimal bounds Various protocols have been proposed which provide different on pi-blocking and is susceptible to deadlocks unlike the PCP. bounds on worst-case blocking time. The choice of real-time Additionally protocols such as FIFO Multiprocessor Locking locking protocol is thus critical in the design of real-time Protocol (FMLP+) are asymptotically optimal in clustered systems as it affects their schedulability. Job-Level Fixed-Priority scheduling, a property not shared by Practical systems, in addition to accounting for pi-blocking, the PIP. However, it is unclear whether the futex approach must also carefully account for system overheads associated can be extended to improve the average-case performance of with the lock acquisition and lock release operations. These these protocols. In fact, even though the Pthreads package pro- overheads typically comprise both hardware overheads, such vides an implementation of the PCP (called PRIO PROTECT) as those incurred by protection mode switches, as well as which uses the futex API, the PRIO PROTECT implemen- bookkeeping code in the operating system kernel. Failure tation does not adhere to the futex philosophy and requires to account for these overheads can lead to execution time multiple system calls for uncontended lock operations. underestimates and consequently to deadline misses. Ideally, The key challenge in generalizing the PIP futex imple- one would choose an optimal locking protocol, such as the mentation to more sophisticated protocols boils down to the Priority Ceiling Protocol (PCP), provided that an efficient following difference: The PIP can be considered a reactive implementation is available. locking protocol, in that it makes all its decisions in reaction The majority of prior work dealing with the efficient im- to lock acquisition or lock release operations. In contrast, plementation of real-time locking protocols has focused on the PCP and multiprocessor locking protocols such as MPCP worst-case overheads. However, there are numerous workloads and FMLP+ are what we term anticipatory locking protocols: that can benefit from faster average-case overheads, where they make use of additional information about the workload locks are typically uncontended. A prime example is that to anticipate problematic scenarios and prevent them. For of soft real-time workloads such as video playback where example, in the PCP, each lock is associated with a priority locks may be acquired by threads of different priorities many ceiling that specifies the highest priority task that may acquire thousands of times per second [7]. Reducing average-case it. This can result in scenarios where a lock is uncontended but the semantics of the protocol forbid a task from acquiring it. A. Futexes + Similarly, MPCP and FMLP keep track of additional infor- Futexes, short for fast userspace mutexes, are a mechanism mation to ensure bounded remote blocking via the technique found in the Linux kernel to support the implementation of priority boosting, where jobs in critical sections are boosted of suspension-based locks in userspace [11]. A basic futex- in priority to ensure progress. enabled lock consists of (i) a shared integer that contains the This motivates the key question of this paper: can the current state of the lock, and (ii) a kernel-side wait-queue futex approach be extended to support anticipatory real-time exposed through the futex API. The integer represents the locking protocols without violating their semantics? Further, number of tasks contending for this lock: 0 means free, 1 can this be done without significantly increasing the worst- means acquired, and any larger value indicates there are jobs case overheads? In this paper, we show that this is indeed blocking on this mutex. Lock and unlock operations are imple- possible and we describe the key properties that help us + mented using atomic fetch-and-increment/fetch-and-decrement achieve this for three protocols: the PCP, MPCP and FMLP . respectively, where the old value indicates whether the lock We implemented futex-based versions of these protocols in RT + operation succeeded or some further action is required. Further LITMUS called the PCP-DU, MPCP-DU and FMLP -DU. action involves an expensive system call to the kernel in order Here, “DU” stands for deferred update, as our implementations to manipulate the wait-queue. implement futexes by deferring the update of the state of a lock until the next time the scheduler is called. B. Locking in LITMUSRT A. Contributions LITMUSRT is a real-time extension of the Linux ker- In this paper, we make the following contributions: nel [8], providing a testbed for novel scheduling and locking algorithms. It supports the sporadic task model and modular • We identify key properties of the PCP, MPCP and + scheduler plugins. Besides several multiprocessor scheduling FMLP (Secs.IV andV) that allow us to extend the futex RT approach to anticipatory real-time locking protocols. policies, LITMUS also implements a variety of locking • protocols, including the PCP, MPCP, FMLP [6] and the We describe futex-based implementations of these pro- + tocols (Secs.IV andV) in LITMUS RT: the PCP-DU, FMLP . These implementations have been designed primarily MPCP-DU and FMLP+-DU respectively. with low worst-case locking overheads in mind. • We evaluated these protocols (Sec.VI) and observe Real-time tasks use the mutex-based locking API of RT upto 93% reduction in overheads in certain uncontended LITMUS to ensure mutually exclusive access to shared scenarios, and only at most 22% increase in contended resources. Tasks use special system calls provided by RT overheads. LITMUS to create locks associated with a particular protocol, and to lock and unlock them. These routines track We begin by providing the necessary background informa- the state of a lock and take care of task blocking when the tion before presenting the design, implementation details and protocol forbids lock acquisition for a particular task. We evaluation of the improved implementations. implemented our protocols in LITMUSRT to benefit from the II. BACKGROUND AND DEFINITIONS generic scheduling framework and extensible API it offers. We consider real-time scheduling under a system compris- C. Real-Time Locking Protocols ing a set of n tasks fT1;:::;Tng running on a set of m processors fP1;:::;Pmg, where each task Ti releases a series The simplest suspension-based real-time locking protocol th of jobs, where the j job is denoted Ji;j. We assume the is the Priority Inheritance Protocol (PIP) proposed by Sha sporadic task model where each task is defined by the worst- et al.[14]. Under the PIP, when a high-priority task blocks case execution time (WCET) ei, a minimum inter-arrival time on a lock currently held by a lower-priority task, the latter or period pi and a relative deadline di. There are a set of shared inherits the priority of the high-priority task, thus preventing resources 2 R = fr1; : : : ; rng in the system, each associated unbounded priority inversions. PRIO INHERIT is the Linux with a lock. implementation of the PIP, which uses the futex API and We assume Partitioned Fixed-Priority (P-FP) scheduling [9], consequently does not incur system call overheads for uncon- where the set of tasks are partitioned among the m processors tended lock operations.

Load more