GPU Virtualization on VMware’s Hosted I/O Architecture Micah Dowty, Jeremy Sugerman VMware, Inc. 3401 Hillview Ave, Palo Alto, CA 94304 [email protected], [email protected] Abstract more computational performance than CPUs. At the Modern graphics co-processors (GPUs) can produce same time, GPU acceleration has extended beyond en- high fidelity images several orders of magnitude faster tertainment (e.g., games and video) into the basic win- than general purpose CPUs, and this performance expec- dowing systems of recent operating systems and is start- tation is rapidly becoming ubiquitous in personal com- ing to be applied to non-graphical high-performance ap- puters. Despite this, GPU virtualization is a nascent field plications including protein folding, financial modeling, of research. This paper introduces a taxonomy of strate- and medical image processing. The rise in applications gies for GPU virtualization and describes in detail the that exploit, or even assume, GPU acceleration makes specific GPU virtualization architecture developed for it increasingly important to expose the physical graph- VMware’s hosted products (VMware Workstation and ics hardware in virtualized environments. Additionally, VMware Fusion). virtual desktop infrastructure (VDI) initiatives have led We analyze the performance of our GPU virtualiza- many enterprises to try to simplify their desktop man- tion with a combination of applications and microbench- agement by delivering VMs to their users. Graphics vir- marks. We also compare against software rendering, the tualization is extremely important to a user whose pri- GPU virtualization in Parallels Desktop 3.0, and the na- mary desktop runs inside a VM. tive GPU. We find that taking advantage of hardware GPUs pose a unique challenge in the field of virtu- acceleration significantly closes the gap between pure alization. Machine virtualization multiplexes physical emulation and native, but that different implementations hardware by presenting each VM with a virtual device and host graphics stacks show distinct variation. The mi- and combining their respective operations in the hyper- crobenchmarks show that our architecture amplifies the visor platform in a way that utilizes native hardware overheads in the traditional graphics API bottlenecks: while preserving the illusion that each guest has a com- draw calls, downloading buffers, and batch sizes. plete stand-alone device. Graphics processors are ex- Our virtual GPU architecture runs modern graphics- tremely complicated devices. In addition, unlike CPUs, intensive games and applications at interactive frame chipsets, and popular storage and network controllers, rates while preserving virtual machine portability. The GPU designers are highly secretive about the specifi- applications we tested achieve from 86% to 12% of na- cations for their hardware. Finally, GPU architectures tive rates and 43 to 18 frames per second with VMware change dramatically across generations and their gener- Fusion 2.0. ational cycle is short compared to CPUs and other de- vices. Thus, it is nearly intractable to provide a virtual 1 Introduction device corresponding to a real modern GPU. Even start- Over the past decade, virtual machines (VMs) have be- ing with a complete implementation, updating it for each come increasingly popular as a technology for multi- new GPU generation would be prohibitively laborious. plexing both desktop and server commodity x86 com- Thus, rather than modeling a complete modern GPU, puters. Over that time, several critical challenges in our primary approach paravirtualizes: it delivers an ide- CPU virtualization were solved and there are now both alized software-only GPU and our own custom graphics software and hardware techniques for virtualizing CPUs driver for interfacing with the guest operating system. with very low overheads [1]. I/O virtualization, how- The main technical contributions of this paper are (1) ever, is still very much an open problem and a wide a taxonomy of GPU virtualization strategies—both emu- variety of strategies are used. Graphics co-processors lated and passthrough-based, (2) an overview of the vir- (GPUs) in particular present a challenging mixture of tual graphics stack in VMware’s hosted architecture, and broad complexity, high performance, rapid change, and (3) an evaluation and comparison of VMware Fusion’s limited documentation. 3D acceleration with other approaches. We find that a Modern high-end GPUs have more transistors, draw hosted model [2] is a good fit for handling complicated, more power, and offer at least an order of magnitude rapidly changing GPUs while the largely asynchronous Published in the USENIX Workshop on I/O Virtualization 2008 1 graphics programming model is still able efficiently to ware, formerly fixed-function transformation and shad- utilize GPU hardware acceleration. ing has become generally programmable. Graphics ap- The rest of this paper is organized as follows. Sec- plications use high-level Application Programming In- tion 2 provides background and some terminology. Sec- terfaces (APIs) to configure the pipeline, and provide tion 3 describes a taxonomy of strategies for exposing shader programs which perform application specific GPU acceleration to VMs. Section 4 describes the de- per-vertex and per-pixel processing on the GPU [13]. vice emulation and rendering thread of the graphics vir- Future GPUs are expected to continue providing in- tualization in VMware products. Section 5 evaluates the creased programmability. Intel recently announced its 3D acceleration in VMware Fusion. Section 6 summa- Larrabee [14] architecture, a potentially disruptive tech- rizes our findings and describes potential future work. nology which follows this trend to its extreme. With the recent exception of many AMD GPUs, for 2 Background which open documentation is now available [15], GPU While CPU virtualization has a rich research and com- hardware is proprietary. NVIDIA’s hardware documen- mercial history, graphics hardware virtualization is a rel- tation, for example, is a closely guarded trade secret. atively new area. VMware’s virtual hardware has al- Nearly all graphics applications interact with the GPU ways included a display adapter, but it initially included via a standardized API such as Microsoft’s DirectX or only basic 2D support [3]. Experimental 3D support the vendor-independent OpenGL standard. did not appear until VMware Workstation 5.0 (April 2005). Both Blink [4] and VMGL [5] used a user-level 3 GPU Virtualization Taxonomy Chromium-like approach [6] to accelerate fixed function This section explores the GPU virtualization approaches OpenGL in Linux and other UNIX-like guests. Parallels we have considered at VMware. We use four primary Desktop 3.0 [7] accelerates some OpenGL and Direct3D criteria for judging them: performance, fidelity, multi- guest applications with a combination of Wine and pro- plexing, and interposition. The former two emphasize prietary code [8], but loses its interposition while those minimizing the cost of virtualization: users desire native applications are running. Finally, at the most recent Intel performance and full access to the native hardware fea- Developer Forum, Parallels presented a demo that ded- tures. The latter two emphasize the added value of virtu- icates an entire native GPU to a single virtual machine alization: virtualization is fundamentally about enabling using Intel’s VT-d [9, 10]. many virtual instances of one physical entity and then The most immediate application for GPU virtualiza- hopefully using that abstraction to deliver secure isola- tion is to desktop virtualization. While server workloads tion, resource management, virtual machine portability, still form the core use case for virtualization, desktop and many other features enabled by insulating the guest virtualization is now the strongest growth market [11]. from physical hardware dependencies. Desktop users run a diverse array of applications, in- We observe that different use cases weight the crite- cluding entertainment, CAD, and visualization software. ria differently—for example a VDI deployment values Windows Vista, Mac OS X, and recent Linux distribu- high VM-to-GPU consolidation ratios (e.g., multiplex- tions all include GPU-accelerated windowing systems. ing) while a consumer running a VM to access a game or Furthermore, an increasing number of ubiquitous appli- CAD application unavailable on his host values perfor- cations are adopting GPU acceleration. Adobe Flash mance and likely fidelity. A tech support person main- Player 10, the next version of a product which currently taining a library of different configurations and an IT ad- reaches 99.0% of Internet viewers [12], will include ministrator running server VMs are both likely to value GPU acceleration. There is a user expectation that vir- portability and secure isolation (interposition). tualized applications will “just work”, and this increas- Since these criteria are often in opposition (e.g., per- ingly includes having access to their graphics card. formance at the expense of interposition), we describe several possible designs. Rather than give an exhaustive 2.1 GPU Hardware list, we describe points in the design space which high- This section will briefly introduce GPU hardware. It is light interesting trade-offs and capabilities. At a high not within the scope of this paper to provide a full dis- level, we group them into two categories: front-end (ap- cussion of GPU architecture and programming models. plication
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-