Robust Real-Time Multiprocessor Interrupt Handling Motivated by Gpus
Total Page:16
File Type:pdf, Size:1020Kb
Robust Real-Time Multiprocessor Interrupt Handling Motivated by GPUs Glenn A. Elliott and James H. Anderson Department of Computer Science, University of North Carolina at Chapel Hill Abstract—Architectures in which multicore chips are Prior Work. GPUs have received serious consideration augmented with graphics processing units (GPUs) have in the real-time community only recently. Both theo- great potential in many domains in which computationally retical work ([9], [10]) on partitioning and scheduling intensive real-time workloads must be supported. How- ever, unlike standard CPUs, GPUs are treated as I/O algorithms and applied work ([11], [12], [13]) on quality- devices and require the use of interrupts to facilitate of-service techniques and improved responsiveness has communication with CPUs. Given their disruptive nature, been done. Outside the real-time community, others have interrupts must be dealt with carefully in real-time systems. proposed operating system designs where GPUs are With GPU-driven interrupts, such disruptiveness is further scheduled in much the same way as CPUs [14]. compounded by the closed-source nature of GPU drivers. In this paper, such problems are considered and a solution In our own work, we have investigated the challenges is presented in the form of an extension to LITMUSRT faced when augmenting multicore platforms with GPUs called klmirqd. The design of klmirqd targets systems that have non-real-time, throughput-oriented, closed- with multiple CPUs and GPUs. In such settings, interrupt- source device drivers [15]. These drivers exhibit prob- related issues arise that have not been previously addressed. lematic behaviors for real-time systems. For example, tasks non-preemptively execute on a GPU in first-in-first- 2 I. INTRODUCTION out order. Also, GPU access is arbitrated by non-real- time spinlocks that lack priority-inheritance mechanisms. Graphics processing units (GPUs) are capable of per- CPU time lost to spinning can be significant: GPU ac- forming parallel computations at rates orders of mag- cesses commonly take tens of milliseconds up to several nitude greater than traditional CPUs. Driven both by seconds [15]. Further, blocked tasks may experience this and by increased GPU programmability and single- unbounded priority inversions due to the lack of priority precision floating-point support, the use of GPUs to solve inheritance.3 non-graphical (general purpose) computational problems The primary solution we presented in [15] to address began gaining wide-spread popularity about ten years these issues is to treat a GPU as a shared resource, ago [1], [2], [3]. However, at that time, non-graphical protected by a real-time suspension-based semaphore. algorithms had to be mapped to graphics-specific lan- This removes the GPU driver from resource arbitration guages. GPU manufactures realized they could reach decisions and enables bounds on blocking time to be new markets by supporting general purpose computa- determined. We validated this approach in experiments tions on GPUs (GPGPU) and released flexible language on LITMUSRT [16], UNC’s real-time extension to Linux, extensions and runtime environments.1 Since the release and demonstrated that our methods reduced both CPU of these second-generation GPGPU technologies, both utilization and deadline tardiness. graphics hardware and runtime environments have grown in generality, enabling GPGPU across many domains. Contributions. One issue not addressed in our prior Today, GPUs can be found integrated on-chip in mobile work is the effect GPU interrupts have on real-time devices and laptops [4], [5], [6], as discrete cards in execution. Interrupts cause complications in the design higher-end consumer computers and workstations, and and analysis of real-time systems. Ideally, interrupt han- within many of the world’s fastest supercomputers [7]. dling should respect the priorities of executing real-time GPUs have applications in many real-time domains. tasks. However, this is a non-trivial issue, especially for For example, GPUs can efficiently perform multidi- systems with shared I/O resources. In this paper, we mensional FFTs and convolutions, as used in signal examine the effects interrupt servicing techniques have processing, as well as matrix operations such as fac- on real-time execution on a multiprocessor, with GPU- torization on large data sets. Such operations are used related interrupts particularly in mind. in medical imaging and video processing, where real- Our major contributions are threefold. First, we de- time constraints are common. A particularly compelling velop techniques that enable interrupts due to asyn- use case is driver-assisted and autonomous automobiles, chronous I/O to be handled without violating the single- where multiple streams of video and sensor data must 2Newer GPUs allow some degree of concurrency, at the expense be processed and correlated in real time [8]. GPUs are of introducing non-determinism due to conflicts within co-scheduled well suited for this purpose. work. Further, execution remains non-preemptive in any case. 3A priority inversion occurs when a task has sufficient priority to 1Notable platforms include the Compute Unified Device Architec- be scheduled but it is not. Priority inversions can be introduced by a ture (CUDA) from Nvidia, Stream from AMD/ATI, OpenCL from variety of sources, including locking protocols and interrupt services. Apple and the Khronos Group, and DirectCompute from Microsoft. These inversions must be bounded to ensure real-time guarantees. threaded sporadic task model, improving schedulability separate thread of execution with an appropriate priority. analysis. Prior interrupt-related work has not directly The duration of interruption is minimized and deferred addressed asynchronous I/O on multiprocessors. Second, work competes fairly with other tasks for CPU time. we propose a technique to override the interrupt process- GPU interrupt management is important to real-time ing of closed-source drivers and apply this technique to systems for two reasons. First, we want to minimize a GPU driver. This required significant challenges to be the interference of GPU interrupts on all system tasks, overcome to alter the interrupt handling of the closed- including those that do not use a GPU. Second, we source GPU driver. Third, we discuss an implementation want a system that can be modeled analytically, so that of the proposed techniques and present an associated schedulability can be assessed. Such analysis could be experimental evaluation. This implementation is given in useful for systems such as driver-assisted automobiles. the form of an extension to LITMUSRT called klmirqd. For example, response time bounds of GPU-based sensor The rest of this paper is organized as follows. In processing may be required by an automatic collision Sec. II, we provide necessary background. In Sec. III, avoidance component since response time equates di- we review prior work on real-time interrupt handling rectly to distance-travelled in a moving vehicle. Through and describe our solution, klmirqd. In Sec. IV, we show analysis, we can determine the minimum system pro- how GPU interrupt processing can be intercepted and visioning that meets the requirements of the collision rerouted, despite the use of a closed-source GPU driver. avoidance component, as well as any other timing re- In Secs. V–VII, we present an experimental evaluation quirements of other components. of klmirqd. We conclude in Sec. VIII. Interrupt Handling In Linux. We now review how Due to space limitations, we henceforth limit attention Linux performs split interrupt handling. We focus on to GPU technologies from the manufacture NVIDIA, Linux for two reasons. First, despite its general-purpose whose CUDA [17] platform is widely accepted as the origins, variants of Linux are widely used in supporting leading GPGPU solution. real-time workloads. Second, GPGPU is well supported on Linux and it is the only OS where we can make use of II. INTERRUPT HANDLING GPGPU and have the ability to modify OS source code. An interrupt is a hardware signal issued from a system Other operating systems with reasonably robust GPGPU device to a system CPU. Upon receipt of an interrupt, support (Windows and Mac OS X) are closed-source. a CPU halts its currently-executing task and invokes an Virtualization techniques that enable GPGPU in guest interrupt handler, which is a segment of code responsible operating systems (ex. [21]) cannot be used because the for taking the appropriate actions to process the interrupt. GPGPU software, including the GPU driver, must still be Each device driver registers a set of driver-specific hosted in a traditional OS environment, such as Linux. interrupt handlers for all interrupts its associated device During the initialization of the Linux kernel, device may raise. An interrupted task can only resume execution drivers (even closed-source ones) register interrupt han- after the interrupt handler has completed. dlers with the kernel’s interrupt services layer, mapping Interrupts require careful implementation and analysis interrupt signals to interrupt service routines (ISRs). in real-time systems. In uniprocessor and partitioned Upon receipt of an interrupt on a CPU, Linux imme- multiprocessor systems, an interrupt handler can be diately invokes the registered ISR. The ISR is the top- modeled as the highest-priority real-time task [18], [19], half of the split interrupt handler. If an interrupt requires though the unpredictable