Enabling CUDA Acceleration Within Virtual Machines Using Rcuda
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by UPCommons. Portal del coneixement obert de la UPC Enabling CUDA Acceleration within Virtual Machines using rCUDA Jose´ Duato, Antonio J. Pena,˜ Federico Silla Juan C. Fernandez,´ Rafael Mayo, D. Informatica´ de Sistemas y Computadores Enrique S. Quintana-Ort´ı Universitat Politecnica` de Valencia` D. Ingenier´ıa y Ciencia de los Computadores Camino de Vera s/n Universitat Jaume I 46022 Valencia, Spain Av. Vicente Sos Baynat s/n {jduato, fsilla}@disca.upv.es 12071 Castellon,´ Spain [email protected] {jfernand, mayo, quintana}@icc.uji.es Abstract 1. Introduction Many-core specialized processors and accelerators The hardware and software advances of Graphics and, in particular, graphics processors (GPUs), are Processing Units (GPUs) have favored the develop- experiencing increased adoption as an appealing way ment of GPGPU (General-Purpose Computation on of reducing the time-to-solution in areas as diverse GPUs) and its adoption in many scientific, engineer- as finance [1], chemical physics [2], computational ing, and industrial areas. Thus, GPUs are increasingly fluid dynamics [3], computational algebra [4], image being introduced in high-performance computing sys- analysis [5], and many others. These hardware accel- tems as well as in datacenters. On the other hand, erators offer a large amount of processing elements virtualization technologies are also receiving rising with reduced cache memories and high processor-to- interest in these domains, because of their many ben- memory bandwidth, so that applications featuring a efits on acquisition and maintenance savings. There high rate of computations per data item can attain high are currently several works on GPU virtualization. performance. In addition, as GPU technology targets However, there is no standard solution allowing ac- the gaming market, these devices present a relatively cess to GPGPU capabilities from virtual machine high performance/cost ratio, resulting in an interesting environments like, e.g., VMware, Xen, VirtualBox, or option for HPC (high performance computing). KVM. Such lack of a standard solution is delaying the On the other hand, virtualization technologies are integration of GPGPU into these domains. currently widely deployed, as their use yields important In this paper, we propose a first step towards a benefits such as resource sharing, process isolation, general and open source approach for using GPGPU and reduced management costs. Thus, it is straight- features within VMs. In particular, we describe the use forward that the usage of virtual machines (VMs) in of rCUDA, a GPGPU (General-Purpose Computation HPC is an active area of research [6]. VMs provide an on GPUs) virtualization framework, to permit the exe- improved approach to increase resource utilization in cution of GPU-accelerated applications within virtual HPC clusters, as several different customers may share machines (VMs), thus enabling GPGPU capabilities a single computing node with the illusion that they own on any virtualized environment. Our experiments with their entire machine in an exclusive way. Security is rCUDA in the context of KVM and VirtualBox on a ensured by the virtualization layer, which additionally system equipped with two NVIDIA GeForce 9800 GX2 may provide other services, such as migrating a VM cards illustrate the overhead introduced by the rCUDA from one node to another if required, for example. middleware and prove the feasibility and scalability of Processes running in a VM may also require the this general virtualizing solution. Experimental results services of a GPU in order to accelerate part of their show that the overhead is proportional to the dataset computations. However, the real GPU does not directly size, while the scalability is similar to that of the native belong to the VM, but it should be attached, in some environment. way, to it. To do so, the GPGPU capabilities should © 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works be exposed to the VMs in such a way that they can in a cluster, which leads to savings in acquisition, make use of the real GPU. However, although there energy, maintenance, and cooling costs, in this paper is currently some work on the virtualization of the we extend the applicability of the rCUDA framework graphics application programming interfaces (APIs) of to also cover a wide spectrum of GPGPU in virtualized GPUs (e.g., [7]), those efforts are not directly useful environments, exposing CUDA capabilities to VMs to expose GPGPU features to virtualized environments. running in a CUDA-enabled computer. More specifi- The main cause is that both uses of GPUs are com- cally, we explore the benefits of using rCUDA in VMs, pletely different and, therefore, advances in one of ranging from a single VM instance to multiple VMs them do not translate in progress in the other. The concurrently running in the same server, equipped with reason for this is that current GPUs lack a standard a small number of accelerators. To do this, we analyze low-level interface —unlike other devices such as the execution of a set of CUDA SDK examples on storage and network controllers— and, therefore, their a platform composed of 8 general-purpose cores and use for graphics purposes is approached by employing two NVIDIA cards providing a total of 4 GPUs. The high-level standard interfaces such as Direct3D [7] or results obtained with the Kernel-based Virtual Machine OpenGL [8], while using them for GPGPU requires (KVM) and Oracle’s VirtualBox Open Source Edition APIs like OpenCL [9] or NVIDIA CUDA (Compute (VB-OSE) VMMs using rCUDA are additionally com- Unified Device Architecture) [10]. GPGPU-oriented pared with those of the native environment. Although APIs significantly differ from their graphics-processing our solution is also valid for the VMware Server oriented counterparts (OpenGL and Direct3D), as the solution, we cannot disclose the results due to licensing former have to deal with general-purpose computing policies. To the best of our knowledge, this is the first issues, while the latter are mainly concerned with paper analyzing rCUDA usage for VMs, as previous graphics-related features such as object interposition, papers focused on cluster environments. flickering, or graphics redirection. On the other hand, The rest of the paper is organized as follows: in Sec- the few efforts done up to now to expose CUDA tion 2 we introduce some general background related capabilities to VMs [11], [12], [13] (1) are incomplete with our contribution. In Section 3, we describe the prototypes, (2) make use of obsolete versions of the architecture of rCUDA, while Section 4 provides infor- GPGPU API, (3) are not general solutions, as they mation about its usage in VM environments. Section 5 target a particular virtual machine monitor (VMM) presents the results of the experimental evaluation. environment, or (4) employ inefficient communication Finally, Section 6 summarizes the conclusions of this protocols between the front-end and back-end compo- work. nents of their middleware. In this paper, we move a step forward in the vir- 2. Background on CUDA Virtualization tualization of GPUs for their use as GPGPU accel- erators by VMs. We propose using an open source, In addition to rCUDA, which will be described in VMM-independent, and communication-efficient way the following section, there are other approaches that of exposing GPGPU capabilities to VMs featuring a pursue the virtualization of the CUDA Runtime API for recent GPGPU API version. Our work addresses the VMs: vCUDA [13], GViM [12], and gVirtuS [11]. All virtualization of the CUDA Runtime API, a widely solutions feature a distributed middleware comprised used API to perform general computations over the of two parts: the front-end and the back-end. The latest NVIDIA GPUs. By using our proposal, GPGPU frond-end middleware is installed on the VM, while features of CUDA compatible GPUs are available to the back-end counterpart, with direct access to the processes running in any VM. In particular, we employ acceleration hardware, is run by the host operating a framework initially intended to enable remote use system (OS) — the one executing the VMM. of the CUDA Runtime API. This framework named vCUDA [13] implements an unspecified subset of rCUDA [14], was initially designed to use TCP sockets the CUDA Runtime version 1.1. It employs XML- to communicate a GPU-accelerated process running RPC for the application level communications, which in a computer not having a GPU with a remote makes the solution portable across different VMMs host providing GPGPU services, thus providing the but, as the experiments in [13] show, the time spent accelerated process with the illusion of directly us- in the encoding/decoding steps of the communication ing a GPU. Therefore, this framework is suitable for protocol causes a considerable negative impact on the distributed-memory environments. Note however that overall performance of the solution. although the primary goal of rCUDA was providing a On the other hand, GViM [12] uses Xen-specific way to reduce the number of GPU-based accelerators mechanisms for the communication between both middleware actors, including shared memory buffers, which enhance the communications between user and administrative domains, at the expense of losing VMM independence. In addition, this solution, based also in CUDA 1.1, does not seem to implement the whole Runtime API. Finally, gVirtuS [11] (version 01-beta1) is a tool with a purpose similar to rCUDA. This version seems Figure 1. Outline of the rCUDA architecture.