Gscale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics

gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space Mochi Xue, Shanghai Jiao Tong University and Intel Corporation; Kun Tian, Intel Corporation; Yaozu Dong, Shanghai Jiao Tong University and Intel Corporation; Jiacheng Ma, Jiajun Wang, and Zhengwei Qi, Shanghai Jiao Tong University; Bingsheng He, National University of Singapore; Haibing Guan, Shanghai Jiao Tong University https://www.usenix.org/conference/atc16/technical-sessions/presentation/xue This paper is included in the Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC ’16). June 22–24, 2016 • Denver, CO, USA 978-1-931971-30-0 Open access to the Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC ’16) is sponsored by USENIX. gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space Mochi Xue1,2, Kun Tian2, Yaozu Dong1,2, Jiacheng Ma1, Jiajun Wang1, Zhengwei Qi1, Bingsheng He3, Haibing Guan1 {xuemochi, mjc0608, jiajunwang, qizhenwei, hbguan}@sjtu.edu.cn {kevin.tian, eddie.dong}@intel.com [email protected] 1Shanghai Jiao Tong University, 2Intel Corporation, 3National University of Singapore Abstract As one of the key enabling technologies of GPU cloud, GPU virtualization is intended to provide flexible and With increasing GPU-intensive workloads deployed on scalable GPU resources for multiple instances with high cloud, the cloud service providers are seeking for practi- performance. To achieve such a challenging goal, sev- cal and efficient GPU virtualization solutions. However, eral GPU virtualization solutions were introduced, i.e., the cutting-edge GPU virtualization techniques such as GPUvm [28] and gVirt [30]. gVirt, also known as GVT- gVirt still suffer from the restriction of scalability, which g, is a full virtualization solution with mediated pass- constrains the number of guest virtual GPU instances. through support for Intel Graphics processors. Benefited This paper introduces gScale, a scalable GPU virtu- by gVirt’s open-source distribution, we are able to in- alization solution. By taking advantage of the GPU vestigate its design and implementation throughly. In programming model, gScale presents a dynamic shar- each virtual machine (VM), running with native graphics ing mechanism which combines partition and sharing to- driver, a virtual GPU (vGPU) instance is maintained to gether to break the hardware limitation of global graph- provide performance-critical resources directly assigned, ics memory space. Particularly, we propose three ap- since there is no hypervisor intervention in performance proaches for gScale: (1) the private shadow graphics critical paths. Thus, it optimizes resources among the translation table, which enables global graphics memory performance, feature, and sharing capabilities [5]. space sharing among virtual GPU instances, (2) ladder For a virtualization solution, scalability is an indis- mapping and fence memory space pool, which allows pensable feature which ensures high resource utilization the CPU to access host physical memory space (serving by hosting dense VM instances on cloud servers. Al- the graphics memory) bypassing global graphics mem- though gVirt successfully puts GPU virtualization into ory space, (3) slot sharing, which improves the perfor- practice, it suffers from scaling up the number of vGPU mance of vGPU under a high density of instances. instances. The current release of gVirt only supports 3 The evaluation shows that gScale scales up to 15 guest guest vGPU instances on one physical Intel GPU1, which virtual GPU instances in Linux or 12 guest virtual GPU limits the number of guest VM instances down to 3. In instances in Windows, which is 5x and 4x scalability, re- contrast, CPU virtualization techniques (e.g., Xen 4.6 spectively, compared to gVirt. At the same time, gScale guest VM supports up to 256 vCPUs [11]) are maturely incurs a slight runtime overhead on the performance of achieved to exploit their potential. The mismatch be- gVirt when hosting multiple virtual GPU instances. tween the scalability of GPU and other resources like CPU will certainly diminish the number of VM in- 1 Introduction stances. Additionally, high scalability improves the con- solidation of resources. Recent studies (eg., VGRIS [26]) The Graphic Processing Unit (GPU) is playing an indis- have observed that GPU workloads can fluctuate sig- pensable role in cloud computing as GPU efficiently ac- nificantly on GPU utilization. Such low scalability of celerates the computation of certain workloads such as gVirt could result in severe GPU resource underutiliza- 2D and 3D rendering. With increasing GPU intensive tion. If more guest VMs can be consolidated to a sin- workloads deployed on cloud, cloud service providers in- gle host, cloud providers have more chances to multiplex troduce a new computing paradigm called GPU Cloud to the GPU power among VMs with different workload pat- meet the high demands of GPU resources (e.g., Amazon 1In this paper, Intel GPU refers to the Intel HD Graphics embedded EC2 GPU instance [2], Aliyun GPU server [1]). in HASWELL CPU. 1 USENIX Association 2016 USENIX Annual Technical Conference 579 terns (e.g., scheduling VMs with GPU intensive or idle • Slot sharing, a mechanism to optimize the perfor- patterns) so that the physical resource usage of GPU can mance of vGPUs, reduces the overhead of private be improved. shadow GTT copying under a high instance density. This paper explores the design of gVirt, and presents gScale, a practical, efficient and scalable GPU virtualiza- • The evaluation shows that gScale can provide 15 tion solution. To increase the number of vGPU instances, guest vGPU instances for Linux VMs or 12 guest gScale targets at the bottleneck design of gVirt and in- vGPU instances for Windows VMs on one physi- troduces a dynamic sharing scheme for global graphics cal machine, which is 5x and 4x scalability respec- memory space. gScale provides each vGPU instance tively, compared to gVirt. It achieves up to 96% per- with a private shadow graphics translation table (GTT) formance of gVirt under a high density of instances. to break the limitation of global graphics memory space. The rest of paper is organized as follows. Section 2 de- gScale copies vGPU’s private shadow GTT to physical scribes the background of gScale, and Section 3 reveals GTT along with the context switch. The private shadow gVirt’s scalability issue and its bottleneck design. The GTT allows vGPUs to share an overlapped range of detailed design and implementation of gScale are pre- global graphics memory space, which is an essential de- sented in Section 4. We evaluate gScale’s performance in sign of gScale. However, it is non-trivial to make the Section 5 with the overhead analysis. We discuss the ap- global graphics memory space sharable, because global plicability of our work in Section 6 and the related work graphics memory space is both accessible to CPU and is in Section 7. Finally, in Section 8 we conclude our GPU. gScale implements a novel ladder mapping mech- work with a brief discussion of future work. anism and a fence memory space pool to let CPU access host physical memory space serving the graphics memory, which bypasses the global graphics memory space. 2 Background and Preliminary At the same time, gScale proposes slot sharing to im- prove the performance of vGPUs under a high density of GPU Programming Model Driven by high level pro- instances. gramming APIs like OpenGL and DirectX, graphics This paper implements gScale based on gVirt, which driver produces GPU commands into primary buffer and is comprised of about 1000 LoCs. The source code is batch buffer while GPU consumes the commands ac- now available on Github2. In summary, this paper over- cordingly. The primary buffer is designed to deliver the comes various challenges, and makes the following con- primary commands with a ring structure, but the size of tributions: primary buffer is limited. To make up for the space short- age, batch buffer is linked to the primary buffer to deliver • A private shadow GTT for each vGPU, which most of the GPU commands. makes the global graphics memory space sharable. GPU commands are produced by CPU and transferred It keeps a specific copy of the physical GTT for the from CPU to GPU in batches. To ensure that GPU con- vGPU. When the vGPU becomes the render owner, sumes the commands after CPU produces them, a notifi- its private shadow graphics translation table will be cation mechanism is implemented in the primary buffer written on the physical graphics translation table by with two registers. The tail register is updated when CPU gScale to provide correct translations. finishes the placement of commands, and it informs GPU to get commands in the primary buffer. When GPU com- • The ladder mapping mechanism, which directly pletes processing all the commands, it writes the head maps guest physical address to host physical ad- register to notify CPU for incoming commands [30]. dress serving the guest graphic memory. With ladder mapping mechanism, the CPU can access Graphics Translation Table and Global Graphics the host physical memory space serving the guest Memory Space Graphics translation table (GTT), graphic memory, without referring to the global sometimes known as global graphics translation table, is graphics memory space. a memory-resident page table providing the translations from logical graphics memory address to physical mem- • Fence memory space pool, a dedicated memory space reserved in global graphics memory space ory address, as Figure 1 shows. It is worth noting that the with dynamic management. It guarantees that the physical memory space served by GTT is also assigned fence registers operate correctly when a certain to be the global graphics memory space, especially for range of global graphics memory space is unavail- GPUs without dedicated memory, such as Intel’s GPU. able for CPU. However, through the Aperture [6], a range defined in the graphics memory mapping input/output (MMIO), CPU 2https://github.com/inkpool/XenGT-Preview-kernel/tree/gScale could also access the global graphics memory space.

Gscale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics

Overview: Graphics Processing Units

GPU Virtualization on Vmware's Hosted I/O Architecture

Comparison of Technologies for General-Purpose Computing on Graphics Processing Units

Crane: Fast and Migratable GPU Passthrough for Opencl Applications

A Full GPU Virtualization Solution with Mediated Pass-Through

Download Original Super Mario for Windows 10 Download Mario Forever for Windows 10 and Windows 7

GPU Virtualization on Vmware's Hosted I/O Architecture

3D Graphics on the ADS512101 Board Using Opengl ES By: Francisco Sandoval Zazueta Infotainment Multimedia and Telematics (IMT)

Enabling VDI for Engineers and Designers Positioning Information

Virtual GPU Software User Guide Is Organized As Follows: ‣ This Chapter Introduces the Capabilities and Features of NVIDIA Vgpu Software

With NVIDIA Virtual GPU

Global Offensive Short-Cut Their Way Onto Linux Like the Next-Generation of What Was Originally Just a Mod