GPU Scheduling for Real-Time Multi-Tasking Environments

TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments Shinpei Kato† ‡, Karthik Lakshmanan†, and Ragunathan (Raj) Rajkumar†, Yutaka Ishikawa‡ † Department of Electrical and Computer Engineering, Carnegie Mellon University ‡ Department of Computer Science, The University of Tokyo Abstract transition. Especially recent trends on 3-D browser and desktop applications, such as SpaceTime, Web3D, 3D- The Graphics Processing Unit (GPU) is now commonly Desktop, Compiz Fusion, BumpTop, Cooliris, and Win- used for graphics and data-parallel computing. As more dows Aero, are all intriguing possibilities for future user and more applications tend to accelerate on the GPU in interfaces. GPUs are also leveraged in various domains multi-tasking environments where multiple tasks access of general-purpose GPU (GPGPU) processing to facili- the GPU concurrently, operating systems must provide tate data-parallel compute-intensive applications. prioritization and isolation capabilities in GPU resource Real-time multi-tasking support is a key requirement management, particularly in real-time setups. for such emerging GPU applications. For example, users We present TimeGraph, a real-time GPU scheduler could launch multiple GPU applications concurrently in at the device-driver level for protecting important GPU their desktop computers, including games, video play- workloads from performance interference. TimeGraph ers, web browsers, and live messengers, sharing the same adopts a new event-driven model that synchronizes the GPU. In such a case, quality-aware soft real-time ap- GPU with the CPU to monitor GPU commands issued plications like games and video players should be pri- from the user space and control GPU resource usage in oritized over live messengers and any other applications a responsive manner. TimeGraph supports two priority- accessing the GPU in the background. Other examples based scheduling policies in order to address the trade- include GPGPU-based cloud computing services, such off between response times and throughput introduced as Amazon EC2, where virtual machines sharing GPUs by the asynchronous and non-preemptive nature of GPU must be prioritized and isolated from each other. More processing. Resource reservation mechanisms are also in general, important applications must be well-isolated employed to account and enforce GPU resource usage, from others for quality and security issues on GPUs, which prevent misbehaving tasks from exhausting GPU as on-line and user-space programs can create any arbi- resources. Prediction of GPU command execution costs trary set of GPU commands, and access the GPU directly is further provided to enhance isolation. through generic I/O system calls, meaning that malicious Our experiments using OpenGL graphics benchmarks and buggyprogramscan easily cause the GPU to be over- demonstrate that TimeGraph maintains the frame-rates loaded. Thus, GPU resource management consolidating of primary GPU tasks at the desired level even in the prioritization and isolation capabilities plays a vital role face of extreme GPU workloads, whereas these tasks be- in real-time multi-tasking environments. come nearly unresponsive without TimeGraph support. GPU resource management is usually supported at the Our findings also include that the performance overhead operating-system level, while GPU program code itself imposed on TimeGraph can be limited to 4-10%, and its including GPU commands is generated through libraries, event-driven scheduler improves throughput by about 30 compilers, and runtime frameworks. Particularly, it is times over the existing tick-driven scheduler. a device driver that transfers GPU commands from the CPU to the GPU, regardless of whether they produce 1 Introduction graphics or GPGPU workloads. Hence, the development of a robust GPU device driver is of significant impact for The Graphics Processing Unit (GPU) is the burgeoning many GPU applications. Unfortunately, existing GPU platform to support high-performance graphics and data- device drivers [1, 5, 7, 19, 25] are not tailored to support parallel computing, as its peak performance is exceeding real-time multi-tasking environments, but accelerate one 1000 GFLOPS, which is nearly equivalent of 10 times particular high-performance application in the system or that of traditional microprocessors. User-end windowing provide fairness among applications. systems, for instance, use GPUs to present a more lively We have conducted a preliminary evaluation to see the interface that improves the user experience significantly performance of existing GPU drivers, (i) the NVIDIA through 3-D windows, high-quality graphics, and smooth proprietary driver [19] and (ii) the Nouveau open-source 1 w/ Engine isolate tasks on the GPU, which provide different levels w/ Clearspd 0.8 of quality of service (QoS) at the expenseof differentlev- els of overhead. To the best of our knowledge, this is the 0.6 first work that enables GPU applications to be prioritized 0.4 and isolated in real-time multi-tasking environments. Organization: The rest of this paper is organized as 0.2 follows. Section 2 introduces our system model, in- Relative frame rate to standalone 0 cluding the scope and limitations of TimeGraph. Sec- NVIDIA Nouveau NVIDIA Nouveau tion 3 provides the system architecture of TimeGraph. GeForce 9800 GT GeForce GTX 285 Section 4 and Section 5 describe the design and imple- Graphics drivers / Graphics cards mentation of TimeGraph GPU scheduling and reservation mechanisms respectively. In Section 6, the perfor- Figure 1: Decrease in performanceof the OpenArena application competing with different GPU applications. mance of TimeGraph is evaluated, and its capabilities are demonstrated. Related work is discussed in Section 7. Our concluding remarks are provided in Section 8. driver [7], in multi-tasking environments, using two different NVIDIA graphics cards, (i) GeForce 9800 GT and 2 System Model (ii) GeForce GTX 285, where the Linux 2.6.35 kernel is used as the underlying operating system. It should be Scope and Limitations: We assume a system composed noted that this NVIDIA driver evaluated on Linux is also of a generic multi-core CPU and an on-board GPU. We expected to closely match the performance of the Win- do not manipulate any GPU-internal units, and hence dows driver (WDDM [25]), as they share about 90% of GPU commands are not preempted once they are submit- code [23]. Figure 1 shows the relative decrease in per- ted to the GPU. TimeGraph is independent of libraries, formance (frame-rate) of an OpenGL game (OpenArena) compilers, and runtime engines. The principles of Time- competing with two GPU-accelerated programs (Engine Graph are therefore applicable for different GPU archi- and Clearspd [15]) respectively. The Engine program tectures (e.g., NVIDIA Fermi/Tesla and ATI Stream) represents a regularly-behaved GPU workload, while and programming frameworks (e.g., OpenGL, OpenCL, the Clearspd program produces a GPU command bomb CUDA, and HMPP). Currently, TimeGraph is designed causing the GPU to be overloaded, which represents a and implemented for Nouveau [7] available in the Gal- malicious or buggy program. To achieve the best possi- lium3D [15] OpenGL software stack, which is also ble performance, this preliminary evaluation assigns the planned to support OpenCL. Moreover, TimeGraph has highest CPU (nice) priority to the OpenArena applica- been ported to the PSCNV open-source driver [22] pack- tion as an important application. As observed in Fig- aged in the PathScale ENZO suite [21], which supports ure 1, the performance of the important OpenArena ap- CUDA and HMPP. This paper is, however, focused on plication drops significantly due to the existence of com- OpenGL workloads, given the currently-available set of peting GPU applications. It highlights the fact that GPU open-source solutions: Nouveau and Gallium3D. resource management in the current state of the art is Driver Model: TimeGraph is part of the device driver, woefully inadequate, lacking prioritization and isolation which is an interface for user-space programs to submit capabilities for multiple GPU applications. GPU commands to the GPU. We assume that the device Contributions: We propose, design, and implement driver is designed based on the Direct Rendering Infras- TimeGraph, a GPU scheduler to provide prioritization tructure (DRI) [14] model that is adopted in most UNIX- and isolation capabilities for GPU applications in soft like operating systems, as part of the X Window System. real-time multi-tasking environments. We address a core Under the DRI model, user-space programs are allowed challenge for GPU resource management posed due to to access the GPU directly to render frames without us- the asynchronous and non-preemptive nature of GPU ing windowing protocols, while they still use the win- processing. Specifically, TimeGraph adopts an event- dowing server to blit the rendered frames to the screen. driven scheduler model that synchronizes the GPU with GPGPU frameworks require no such windowing proce- the CPU in a responsive manner, using GPU-to-CPU dures, and hence their model is more simplified. interrupts, to schedule non-preemptive GPU commands In order to submit GPU commands to the GPU, user- for the asynchronously-operatingGPU. Under this event- space programs must be allocated GPU channels, which driven model, TimeGraph supports two scheduling poli- conceptually represent separate address spaces on the cies to prioritize tasks on the GPU, which address the

GPU Scheduling for Real-Time Multi-Tasking Environments

Reverse Engineering Power Management on NVIDIA Gpus - Anatomy of an Autonomic-Ready System Martin Peres

High Speed Visualization in the Jetos Aviation Operating System Using Hardware Acceleration*

Ati Driver Freebsd

Openbsd Gaming Resource

A Full GPU Virtualization Solution with Mediated Pass-Through

A Simplified Graphics System Based on Direct Rendering Manager System

The Linux Graphics Stack Attributions

Porting X Windows System to Operating System Compliant with Portable Operating System Interface

06523078 Galih Hendro Martono.Pdf (8.658Mb)

Quake Three Download

Urban Terror [Server] Guide

Mechdyne-TGX-2.1-Installation-Guide