GPU Virtualization on Vmware's Hosted I/O Architecture

GPU Virtualization on VMware’s Hosted I/O Architecture Micah Dowty, Jeremy Sugerman VMware, Inc. 3401 Hillview Ave, Palo Alto, CA 94304 [email protected], [email protected] Published in the USENIX Workshop on I/O Virtualization 2008, Industrial Practice session. Abstract limited documentation. Modern high-end GPUs have more transistors, draw Modern graphics co-processors (GPUs) can produce more power, and offer at least an order of magnitude high fidelity images several orders of magnitude faster more computational performance than CPUs. At the than general purpose CPUs, and this performance expec- same time, GPU acceleration has extended beyond en- tation is rapidly becoming ubiquitous in personal com- tertainment (e.g., games and video) into the basic win- puters. Despite this, GPU virtualization is a nascent field dowing systems of recent operating systems and is start- of research. This paper introduces a taxonomy of strate- ing to be applied to non-graphical high-performance ap- gies for GPU virtualization and describes in detail the plications including protein folding, financial modeling, specific GPU virtualization architecture developed for and medical image processing. The rise in applications VMware’s hosted products (VMware Workstation and that exploit, or even assume, GPU acceleration makes VMware Fusion). it increasingly important to expose the physical graph- We analyze the performance of our GPU virtualiza- ics hardware in virtualized environments. Additionally, tion with a combination of applications and microbench- virtual desktop infrastructure (VDI) initiatives have led marks. We also compare against software rendering, the many enterprises to try to simplify their desktop man- GPU virtualization in Parallels Desktop 3.0, and the na- agement by delivering VMs to their users. Graphics vir- tive GPU. We find that taking advantage of hardware tualization is extremely important to a user whose pri- acceleration significantly closes the gap between pure mary desktop runs inside a VM. emulation and native, but that different implementations GPUs pose a unique challenge in the field of virtu- and host graphics stacks show distinct variation. The mi- alization. Machine virtualization multiplexes physical crobenchmarks show that our architecture amplifies the hardware by presenting each VM with a virtual device overheads in the traditional graphics API bottlenecks: and combining their respective operations in the hyper- draw calls, downloading buffers, and batch sizes. visor platform in a way that utilizes native hardware Our virtual GPU architecture runs modern graphics- while preserving the illusion that each guest has a com- intensive games and applications at interactive frame plete stand-alone device. Graphics processors are ex- rates while preserving virtual machine portability. The tremely complicated devices. In addition, unlike CPUs, applications we tested achieve from 86% to 12% of na- chipsets, and popular storage and network controllers, tive rates and 43 to 18 frames per second with VMware GPU designers are highly secretive about the specifi- Fusion 2.0. cations for their hardware. Finally, GPU architectures change dramatically across generations and their gener- 1 Introduction ational cycle is short compared to CPUs and other de- Over the past decade, virtual machines (VMs) have be- vices. Thus, it is nearly intractable to provide a virtual come increasingly popular as a technology for multi- device corresponding to a real modern GPU. Even start- plexing both desktop and server commodity x86 com- ing with a complete implementation, updating it for each puters. Over that time, several critical challenges in new GPU generation would be prohibitively laborious. CPU virtualization were solved and there are now both Thus, rather than modeling a complete modern GPU, software and hardware techniques for virtualizing CPUs our primary approach paravirtualizes: it delivers an ide- with very low overheads [1]. I/O virtualization, how- alized software-only GPU and our own custom graphics ever, is still very much an open problem and a wide driver for interfacing with the guest operating system. variety of strategies are used. Graphics co-processors The main technical contributions of this paper are (1) (GPUs) in particular present a challenging mixture of a taxonomy of GPU virtualization strategies—both emu- broad complexity, high performance, rapid change, and lated and passthrough-based, (2) an overview of the vir- Published in the USENIX Workshop on I/O Virtualization 2008, Industrial Practice session. 1 tual graphics stack in VMware’s hosted architecture, and lution from mere CRT controllers to powerful pro- (3) an evaluation and comparison of VMware Fusion’s grammable stream processors. Early graphics acceler- 3D acceleration with other approaches. We find that a ators could draw rectangles or bitmaps. Later graphics hosted model [2] is a good fit for handling complicated, accelerators could rasterize triangles and transform and rapidly changing GPUs while the largely asynchronous light them in hardware. With current PC graphics hard- graphics programming model is still able efficiently to ware, formerly fixed-function transformation and shad- utilize GPU hardware acceleration. ing has become generally programmable. Graphics ap- The rest of this paper is organized as follows. Sec- plications use high-level Application Programming In- tion 2 provides background and some terminology. Sec- terfaces (APIs) to configure the pipeline, and provide tion 3 describes a taxonomy of strategies for exposing shader programs which perform application specific GPU acceleration to VMs. Section 4 describes the de- per-vertex and per-pixel processing on the GPU [13]. vice emulation and rendering thread of the graphics vir- Future GPUs are expected to continue providing in- tualization in VMware products. Section 5 evaluates the creased programmability. Intel recently announced its 3D acceleration in VMware Fusion. Section 6 summa- Larrabee [14] architecture, a potentially disruptive tech- rizes our findings and describes potential future work. nology which follows this trend to its extreme. With the recent exception of many AMD GPUs, for 2 Background which open documentation is now available [15], GPU While CPU virtualization has a rich research and com- hardware is proprietary. NVIDIA’s hardware documen- mercial history, graphics hardware virtualization is a rel- tation, for example, is a closely guarded trade secret. atively new area. VMware’s virtual hardware has al- Nearly all graphics applications interact with the GPU ways included a display adapter, but it initially included via a standardized API such as Microsoft’s DirectX or only basic 2D support [3]. Experimental 3D support the vendor-independent OpenGL standard. did not appear until VMware Workstation 5.0 (April 2005). Both Blink [4] and VMGL [5] used a user-level 3 GPU Virtualization Taxonomy Chromium-like approach [6] to accelerate fixed function OpenGL in Linux and other UNIX-like guests. Parallels This section explores the GPU virtualization approaches Desktop 3.0 [7] accelerates some OpenGL and Direct3D we have considered at VMware. We use four primary guest applications with a combination of Wine and pro- criteria for judging them: performance, fidelity, multi- prietary code [8], but loses its interposition while those plexing, and interposition. The former two emphasize applications are running. Finally, at the most recent Intel minimizing the cost of virtualization: users desire native Developer Forum, Parallels presented a demo that ded- performance and full access to the native hardware fea- icates an entire native GPU to a single virtual machine tures. The latter two emphasize the added value of virtu- using Intel’s VT-d [9, 10]. alization: virtualization is fundamentally about enabling The most immediate application for GPU virtualiza- many virtual instances of one physical entity and then tion is to desktop virtualization. While server workloads hopefully using that abstraction to deliver secure isola- still form the core use case for virtualization, desktop tion, resource management, virtual machine portability, virtualization is now the strongest growth market [11]. and many other features enabled by insulating the guest Desktop users run a diverse array of applications, in- from physical hardware dependencies. cluding entertainment, CAD, and visualization software. We observe that different use cases weight the crite- Windows Vista, Mac OS X, and recent Linux distribu- ria differently—for example a VDI deployment values tions all include GPU-accelerated windowing systems. high VM-to-GPU consolidation ratios (e.g., multiplex- Furthermore, an increasing number of ubiquitous appli- ing) while a consumer running a VM to access a game or cations are adopting GPU acceleration. Adobe Flash CAD application unavailable on his host values perfor- Player 10, the next version of a product which currently mance and likely fidelity. A tech support person main- reaches 99.0% of Internet viewers [12], will include taining a library of different configurations and an IT ad- GPU acceleration. There is a user expectation that vir- ministrator running server VMs are both likely to value tualized applications will “just work”, and this increas- portability and secure isolation (interposition). ingly includes having access to their graphics card. Since these criteria are often in opposition (e.g., performance at the expense of interposition), we describe 2.1 GPU Hardware several possible

Load more