A Case for High Performance Computing with Virtual Machines

Total Page:16

File Type:pdf, Size:1020Kb

A Case for High Performance Computing with Virtual Machines A Case for High Performance Computing with Virtual Machines Wei Huangy Jiuxing Liuz Bulent Abaliz Dhabaleswar K. Panday y Computer Science and Engineering z IBM T. J. Watson Research Center The Ohio State University 19 Skyline Drive Columbus, OH 43210 Hawthorne, NY 10532 fhuanwei, [email protected] fjl, [email protected] ABSTRACT in the 1960s [9], but are experiencing a resurgence in both Virtual machine (VM) technologies are experiencing a resur- industry and research communities. A VM environment pro- gence in both industry and research communities. VMs of- vides virtualized hardware interfaces to VMs through a Vir- fer many desirable features such as security, ease of man- tual Machine Monitor (VMM) (also called hypervisor). VM agement, OS customization, performance isolation, check- technologies allow running different guest VMs in a phys- pointing, and migration, which can be very beneficial to ical box, with each guest VM possibly running a different the performance and the manageability of high performance guest operating system. They can also provide secure and computing (HPC) applications. However, very few HPC ap- portable environments to meet the demanding requirements plications are currently running in a virtualized environment of computing resources in modern computing systems. due to the performance overhead of virtualization. Further, Recently, network interconnects such as InfiniBand [16], using VMs for HPC also introduces additional challenges Myrinet [24] and Quadrics [31] are emerging, which provide such as management and distribution of OS images. very low latency (less than 5 µs) and very high bandwidth In this paper we present a case for HPC with virtual ma- (multiple Gbps). Due to those characteristics, they are be- chines by introducing a framework which addresses the per- coming strong players in the field of high performance com- formance and management overhead associated with VM- puting (HPC). As evidenced by the Top 500 Supercomputer based computing. Two key ideas in our design are: Virtual list [35], clusters, which are typically built from commodity Machine Monitor (VMM) bypass I/O and scalable VM im- PCs connected through high speed interconnects, have be- age management. VMM-bypass I/O achieves high commu- come the predominant architecture for HPC since the past nication performance for VMs by exploiting the OS-bypass decade. feature of modern high speed interconnects such as Infini- Although originally more focused on resource sharing, cur- Band. Scalable VM image management significantly reduces rent virtual machine technologies provide a wide range of the overhead of distributing and managing VMs in large benefits such as ease of management, system security, per- scale clusters. Our current implementation is based on the formance isolation, checkpoint/restart and live migration. Xen VM environment and InfiniBand. However, many of Cluster-based HPC can take advantage of these desirable our ideas are readily applicable to other VM environments features of virtual machines, which is especially important and high speed interconnects. when ultra-scale clusters are posing additional challenges on We carry out detailed analysis on the performance and performance, scalability, system management, and adminis- management overhead of our VM-based HPC framework. tration of these systems [29, 17]. Our evaluation shows that HPC applications can achieve In spite of these advantages, VM technologies have not almost the same performance as those running in a native, yet been widely adopted in the HPC area. This is due to non-virtualized environment. Therefore, our approach holds the following challenges: promise to bring the benefits of VMs to HPC applications • Virtualization overhead: To ensure system integrity, with very little degradation in performance. the virtual machine monitor (VMM) has to trap and process privileged operations from the guest VMs. This 1. INTRODUCTION overhead is especially visible for I/O virtualization, where the VMM or a privileged host OS has to in- Virtual machine (VM) technologies were first introduced tervene every I/O operation. This added overhead is not favored by HPC applications where communica- tion performance may be critical. Moreover, memory consumption for VM-based environment is also a con- Permission to make digital or hard copies of all or part of this work for cern because a physical box is usually hosting several personal or classroom use is granted without fee provided that copies are guest virtual machines. not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to • Management Efficiency: Though it is possible to uti- republish, to post on servers or to redistribute to lists, requires prior specific lize VMs as static computing environments and run ap- permission and/or a fee. ICS'06 June 28­30, Cairns, Queensland, Australia. plications in pre-configured systems, high performance Copyright 2006 ACM 1­59593­282­8/06/0006 ...$5.00. computing cannot fully benefit from VM technologies unless there exists a management framework which helps to map VMs to physical machines, dynamically With the deployment of large scale clusters for HPC appli- distribute VM OS images to physical machines, boot- cations, management and scalability issues on these clusters up and shutdown VMs with low overhead in cluster are becoming increasingly important. Virtual machines can environments. greatly benefit cluster computing in such systems, especially from the following aspects: In this paper, we take on these challenges and propose a VM-based framework for HPC which addresses various • Ease of management: A system administrator can performance and management issues associated with virtu- view a virtual machine based cluster as consisting of a alization. In this framework, we reduce the overhead of net- set of virtual machine templates and a pool of physi- work I/O virtualization through VMM-bypass I/O [18]. The cal resources [33]. To create the runtime environment concept of VMM-bypass extends the idea of OS-bypass [39, for certain applications, the administrator needs only 38], which takes the shortest path for time critical oper- pick the correct template and instantiate the virtual ations through user-level communication. An example of machines on physical nodes. VMs can be shutdown VMM-bypass I/O was presented in Xen-IB [18], a proto- and brought up much easier than real physical ma- type we developed to virtualize InfiniBand under Xen [6]. chines, which eases the task of system reconfiguration. Bypassing the VMM for time critical communication oper- VMs also provide clean solutions for live migration ations, Xen-IB provides virtualized InfiniBand devices for and checkpoint/restart, which are helpful to deal with Xen virtual machines with near-native performance. Our hardware problems like hardware upgrades and failure, framework also provides the flexibility of using customized which happen frequently in large-scale systems. kernels/OSes for individual HPC applications. It also al- lows building very small VM images which can be managed • Customized OS: Currently most clusters are utiliz- very efficiently. With detailed performance evaluations, we ing general purpose operating systems, such as Linux, demonstrate that high performance computing jobs can run to meet a wide range of requirements from various user as efficiently in our Xen-based cluster as in a native, non- applications. Although researchers have been suggest- virtualized InfiniBand cluster. Although we focus on Infini- ing that light-weight OSes customized for each type Band and Xen, we believe that our framework can be read- of application can potentially gain performance bene- ily extended for other high-speed interconnects and other fits [10], this has not yet been widely adopted because VMMs. To the best of our knowledge, this is the first study of management difficulties. However, with VMs, it is to adopt VM technologies for HPC in modern cluster envi- possible to highly customize the OS and the run-time ronments equipped with high speed interconnects. environment for each application. For example, ker- In summary, the main contributions of our work are: nel configuration, system software parameters, loaded modules, as well as memory and disk space can be • We propose a framework which allows high perfor- changed conveniently. mance computing applications to benefit from the de- sirable features of virtual machines. To demonstrate • System Security: Some applications, such as system the framework, we have developed a prototype system level profiling [27], may require additional kernel ser- using Xen virtual machines on an InfiniBand cluster. vices to run. In a traditional HPC system, this requires either the service to run for all applications, or users to • We describe how the disadvantages of virtual machines, be trusted with privileges for loading additional kernel such as virtualization overhead, memory consumption, modules, which may lead to compromised system in- management issues, etc., can be addressed using cur- tegrity. With VM environments, it is possible to allow rent technologies with our framework. normal users to execute such privileged operations on • We carry out detailed performance evaluations on the VMs. Because the resource integrity for the physical overhead of using virtual machines for high perfor- machine is controlled by the virtual machine monitor mance computing (HPC). This evaluation shows that instead of the guest operating system, faulty or mali- our virtualized InfiniBand cluster is able to deliver al- cious code in guest OS may in the worst case crash a most the same performance for HPC applications as virtual machine, which can be easily recovered. those in a non-virtualized InfiniBand cluster. 3. BACKGROUND The rest of the paper is organized as follows: To further In this section, we provide the background information justify our motivation, we start with more discussion on the benefits of VM for HPC in Section 2. In Section 3, we pro- for our work. Our implementation of VM-based comput- ing is based on the Xen VM environment and InfiniBand.
Recommended publications
  • A Light-Weight Virtual Machine Monitor for Blue Gene/P
    A Light-Weight Virtual Machine Monitor for Blue Gene/P Jan Stoessx{1 Udo Steinbergz1 Volkmar Uhlig{1 Jonathan Appavooy1 Amos Waterlandj1 Jens Kehnex xKarlsruhe Institute of Technology zTechnische Universität Dresden {HStreaming LLC jHarvard School of Engineering and Applied Sciences yBoston University ABSTRACT debugging tools. CNK also supports I/O only via function- In this paper, we present a light-weight, micro{kernel-based shipping to I/O nodes. virtual machine monitor (VMM) for the Blue Gene/P Su- CNK's lightweight kernel model is a good choice for the percomputer. Our VMM comprises a small µ-kernel with current set of BG/P HPC applications, providing low oper- virtualization capabilities and, atop, a user-level VMM com- ating system (OS) noise and focusing on performance, scal- ponent that manages virtual BG/P cores, memory, and in- ability, and extensibility. However, today's HPC application terconnects; we also support running native applications space is beginning to scale out towards Exascale systems of directly atop the µ-kernel. Our design goal is to enable truly global dimensions, spanning companies, institutions, compatibility to standard OSes such as Linux on BG/P via and even countries. The restricted support for standardized virtualization, but to also keep the amount of kernel func- application interfaces of light-weight kernels in general and tionality small enough to facilitate shortening the path to CNK in particular renders porting the sprawling diversity applications and lowering OS noise. of scalable applications to supercomputers more and more a Our prototype implementation successfully virtualizes a bottleneck in the development path of HPC applications.
    [Show full text]
  • Opportunities for Leveraging OS Virtualization in High-End Supercomputing
    Opportunities for Leveraging OS Virtualization in High-End Supercomputing Kevin T. Pedretti Patrick G. Bridges Sandia National Laboratories∗ University of New Mexico Albuquerque, NM Albuquerque, NM [email protected] [email protected] ABSTRACT formance, making modest virtualization performance over- This paper examines potential motivations for incorporating heads viable. In these cases,the increased flexibility that vir- virtualization support in the system software stacks of high- tualization provides can be used to support a wider range end capability supercomputers. We advocate that this will of applications, to enable exascale co-design research and increase the flexibility of these platforms significantly and development, and provide new capabilities that are not pos- enable new capabilities that are not possible with current sible with the fixed software stacks that high-end capability fixed software stacks. Our results indicate that compute, supercomputers use today. virtual memory, and I/O virtualization overheads are low and can be further mitigated by utilizing well-known tech- The remainder of this paper is organized as follows. Sec- niques such as large paging and VMM bypass. Furthermore, tion 2 discusses previous work dealing with the use of virtu- since the addition of virtualization support does not affect alization in HPC. Section 3 discusses several potential areas the performance of applications using the traditional native where platform virtualization could be useful in high-end environment, there is essentially no disadvantage to its ad- supercomputing. Section 4 presents single node native vs. dition. virtual performance results on a modern Intel platform that show that compute, virtual memory, and I/O virtualization 1.
    [Show full text]
  • A Virtual Machine Environment for Real Time Systems Laboratories
    AC 2007-904: A VIRTUAL MACHINE ENVIRONMENT FOR REAL-TIME SYSTEMS LABORATORIES Mukul Shirvaikar, University of Texas-Tyler MUKUL SHIRVAIKAR received the Ph.D. degree in Electrical and Computer Engineering from the University of Tennessee in 1993. He is currently an Associate Professor of Electrical Engineering at the University of Texas at Tyler. He has also held positions at Texas Instruments and the University of West Florida. His research interests include real-time imaging, embedded systems, pattern recognition, and dual-core processor architectures. At the University of Texas he has started a new real-time systems lab using dual-core processor technology. He is also the principal investigator for the “Back-To-Basics” project aimed at engineering student retention. Nikhil Satyala, University of Texas-Tyler NIKHIL SATYALA received the Bachelors degree in Electronics and Communication Engineering from the Jawaharlal Nehru Technological University (JNTU), India in 2004. He is currently pursuing his Masters degree at the University of Texas at Tyler, while working as a research assistant. His research interests include embedded systems, dual-core processor architectures and microprocessors. Page 12.152.1 Page © American Society for Engineering Education, 2007 A Virtual Machine Environment for Real Time Systems Laboratories Abstract The goal of this project was to build a superior environment for a real time system laboratory that would allow users to run Windows and Linux embedded application development tools concurrently on a single computer. These requirements were dictated by real-time system applications which are increasingly being implemented on asymmetric dual-core processors running different operating systems. A real time systems laboratory curriculum based on dual- core architectures has been presented in this forum in the past.2 It was designed for a senior elective course in real time systems at the University of Texas at Tyler that combines lectures along with an integrated lab.
    [Show full text]
  • GT-Virtual-Machines
    College of Design Virtual Machine Setup for CP 6581 Fall 2020 1. You will have access to multiple College of Design virtual machines (VMs). CP 6581 will require VMs with ArcGIS installed, and for class sessions we will start the semester using the K2GPU VM. You should open each VM available to you and check to see if ArcGIS is installed. 2. Here are a few general notes on VMs. a. Because you will be a new user when you first start a VM, you may be prompted to set up Microsoft Explorer, Microsoft Edge, Adobe Creative Suite, or other software. You can close these windows and set up specific software later, if you wish. b. Different VMs allow different degrees of customization. To customize right-click on an empty Desktop area and choose “Personalize”. Change your background to a solid color that’s a distinctive color so you can easily distinguish your virtual desktop from your local computer’s desktop. Some VMs will remember your desktop color, other VMs must be re-set at each login, other VMs won’t allow you to change the background color but may allow you to change a highlight color from the “Colors” item on the left menu just below “Background”. c. Your desktop will look exactly the same as physical computer’s desktop except for the small black rectangle at the middle of the very top of the VM screen. Click on the rectangle and a pop-down horizontal menu of VM options will appear. Here are two useful options. i. Preferences then File Access will allow you to grant VM access to your local computer’s drives and files.
    [Show full text]
  • Virtualizationoverview
    VMWAREW H WHITEI T E PPAPERA P E R Virtualization Overview 1 VMWARE WHITE PAPER Table of Contents Introduction .............................................................................................................................................. 3 Virtualization in a Nutshell ................................................................................................................... 3 Virtualization Approaches .................................................................................................................... 4 Virtualization for Server Consolidation and Containment ........................................................... 7 How Virtualization Complements New-Generation Hardware .................................................. 8 Para-virtualization ................................................................................................................................... 8 VMware’s Virtualization Portfolio ........................................................................................................ 9 Glossary ..................................................................................................................................................... 10 2 VMWARE WHITE PAPER Virtualization Overview Introduction Virtualization in a Nutshell Among the leading business challenges confronting CIOs and Simply put, virtualization is an idea whose time has come. IT managers today are: cost-effective utilization of IT infrastruc- The term virtualization broadly describes the separation
    [Show full text]
  • Exploring Massive Parallel Computation with GPU
    Need for parallelism Graphical Processor Units Gravitational Microlensing Modelling Exploring Massive Parallel Computation with GPU Ian Bond Massey University, Auckland, New Zealand 2011 Sagan Exoplanet Workshop Pasadena, July 25-29 2011 Ian Bond | Microlensing parallelism 1/40 Need for parallelism Graphical Processor Units Gravitational Microlensing Modelling Assumptions/Purpose You are all involved in microlensing modelling and you have (or are working on) your own code this lecture shows how to get started on getting code to run on a GPU then its over to you . Ian Bond | Microlensing parallelism 2/40 Need for parallelism Graphical Processor Units Gravitational Microlensing Modelling Outline 1 Need for parallelism 2 Graphical Processor Units 3 Gravitational Microlensing Modelling Ian Bond | Microlensing parallelism 3/40 Need for parallelism Graphical Processor Units Gravitational Microlensing Modelling Paralel Computing Parallel Computing is use of multiple computers, or computers with multiple internal processors, to solve a problem at a greater computational speed than using a single computer (Wilkinson 2002). How does one achieve parallelism? Ian Bond | Microlensing parallelism 4/40 Need for parallelism Graphical Processor Units Gravitational Microlensing Modelling Grand Challenge Problems A grand challenge problem is one that cannot be solved in a reasonable amount of time with todays computers’ Examples: – Modelling large DNA structures – Global weather forecasting – N body problem (N very large) – brain simulation Has microlensing
    [Show full text]
  • Dcuda: Hardware Supported Overlap of Computation and Communication
    dCUDA: Hardware Supported Overlap of Computation and Communication Tobias Gysi Jeremia Bar¨ Torsten Hoefler Department of Computer Science Department of Computer Science Department of Computer Science ETH Zurich ETH Zurich ETH Zurich [email protected] [email protected] [email protected] Abstract—Over the last decade, CUDA and the underlying utilization of the costly compute and network hardware. To GPU hardware architecture have continuously gained popularity mitigate this problem, application developers can implement in various high-performance computing application domains manual overlap of computation and communication [23], [27]. such as climate modeling, computational chemistry, or machine learning. Despite this popularity, we lack a single coherent In particular, there exist various approaches [13], [22] to programming model for GPU clusters. We therefore introduce overlap the communication with the computation on an inner the dCUDA programming model, which implements device- domain that has no inter-node data dependencies. However, side remote memory access with target notification. To hide these code transformations significantly increase code com- instruction pipeline latencies, CUDA programs over-decompose plexity which results in reduced real-world applicability. the problem and over-subscribe the device by running many more threads than there are hardware execution units. Whenever a High-performance system design often involves trading thread stalls, the hardware scheduler immediately proceeds with off sequential performance against parallel throughput. The the execution of another thread ready for execution. This latency architectural difference between host and device processors hiding technique is key to make best use of the available hardware perfectly showcases the two extremes of this design space.
    [Show full text]
  • Virtual Machine Benchmarking Kim-Thomas M¨Oller Diploma Thesis
    Universitat¨ Karlsruhe (TH) Institut fur¨ Betriebs- und Dialogsysteme Lehrstuhl Systemarchitektur Virtual Machine Benchmarking Kim-Thomas Moller¨ Diploma Thesis Advisors: Prof. Dr. Frank Bellosa Joshua LeVasseur 17. April 2007 I hereby declare that this thesis is the result of my own work, and that all informa- tion sources and literature used are indicated in the thesis. I also certify that the work in this thesis has not previously been submitted as part of requirements for a degree. Hiermit erklare¨ ich, die vorliegende Arbeit selbstandig¨ und nur unter Benutzung der angegebenen Literatur und Hilfsmittel angefertigt zu haben. Alle Stellen, die wortlich¨ oder sinngemaߨ aus veroffentlichten¨ und nicht veroffentlichten¨ Schriften entnommen wurden, sind als solche kenntlich gemacht. Die Arbeit hat in gleicher oder ahnlicher¨ Form keiner anderen Prufungsbeh¨ orde¨ vorgelegen. Karlsruhe, den 17. April 2007 Kim-Thomas Moller¨ Abstract The resurgence of system virtualization has provoked diverse virtualization tech- niques targeting different application workloads and requirements. However, a methodology to compare the performance of virtualization techniques at fine gran- ularity has not yet been introduced. VMbench is a novel benchmarking suite that focusses on virtual machine environments. By applying the pre-virtualization ap- proach for hypervisor interoperability, VMbench achieves hypervisor-neutral in- strumentation of virtual machines at the instruction level. Measurements of dif- ferent virtual machine configurations demonstrate how VMbench helps rate and predict virtual machine performance. Kurzfassung Das wiedererwachte Interesse an der Systemvirtualisierung hat verschiedenartige Virtualisierungstechniken fur¨ unterschiedliche Anwendungslasten und Anforde- rungen hervorgebracht. Jedoch wurde bislang noch keine Methodik eingefuhrt,¨ um Virtualisierungstechniken mit hoher Granularitat¨ zu vergleichen. VMbench ist eine neuartige Benchmarking-Suite fur¨ Virtuelle-Maschinen-Umgebungen.
    [Show full text]
  • Distributed Virtual Machines: a System Architecture for Network Computing
    Distributed Virtual Machines: A System Architecture for Network Computing Emin Gün Sirer, Robert Grimm, Arthur J. Gregory, Nathan Anderson, Brian N. Bershad {egs,rgrimm,artjg,nra,bershad}@cs.washington.edu http://kimera.cs.washington.edu Dept. of Computer Science & Engineering University of Washington Seattle, WA 98195-2350 Abstract Modern virtual machines, such as Java and Inferno, are emerging as network computing platforms. While these virtual machines provide higher-level abstractions and more sophisticated services than their predecessors from twenty years ago, their architecture has essentially remained unchanged. State of the art virtual machines are still monolithic, that is, they are comprised of closely-coupled service components, which are thus replicated over all computers in an organization. This crude replication of services forms one of the weakest points in today’s networked systems, as it creates widely acknowledged and well-publicized problems of security, manageability and performance. We have designed and implemented a new system architecture for network computing based on distributed virtual machines. In our system, virtual machine services that perform rule checking and code transformation are factored out of clients and are located in enterprise- wide network servers. The services operate by intercepting application code and modifying it on the fly to provide additional service functionality. This architecture reduces client resource demands and the size of the trusted computing base, establishes physical isolation between virtual machine services and creates a single point of administration. We demonstrate that such a distributed virtual machine architecture can provide substantially better integrity and manageability than a monolithic architecture, scales well with increasing numbers of clients, and does not entail high overhead.
    [Show full text]
  • On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the Gvirtus Framework
    On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework Montella, R., Giunta, G., Laccetti, G., Lapegna, M., Palmieri, C., Ferraro, C., Pelliccia, V., Hong, C-H., Spence, I., & Nikolopoulos, D. (2017). On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework. International Journal of Parallel Programming, 45(5), 1142-1163. https://doi.org/10.1007/s10766-016-0462-1 Published in: International Journal of Parallel Programming Document Version: Peer reviewed version Queen's University Belfast - Research Portal: Link to publication record in Queen's University Belfast Research Portal Publisher rights © 2016 Springer Verlag. The final publication is available at Springer via http://dx.doi.org/ 10.1007/s10766-016-0462-1 General rights Copyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The Research Portal is Queen's institutional repository that provides access to Queen's research output. Every effort has been made to ensure that content in the Research Portal does not infringe any person's rights, or applicable UK laws. If you discover content in the Research Portal that you believe breaches copyright or violates any law, please contact [email protected]. Download date:08. Oct. 2021 Noname manuscript No. (will be inserted by the editor) On the virtualization of CUDA based GPU remoting on ARM and X86 machines in the GVirtuS framework Raffaele Montella · Giulio Giunta · Giuliano Laccetti · Marco Lapegna · Carlo Palmieri · Carmine Ferraro · Valentina Pelliccia · Cheol-Ho Hong · Ivor Spence · Dimitrios S.
    [Show full text]
  • Enabling Technologies for Distributed and Cloud Computing Dr
    Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Technologies for Network-Based Systems Multi-core CPUs and Multithreading Technologies CPU’s today assume a multi-core architecture with dual, quad, six, or more processing cores. The clock rate increased from 10 MHz for Intel 286 to 4 GHz for Pentium 4 in 30 years. However, the clock rate reached its limit on CMOS chips due to power limitations. Clock speeds cannot continue to increase due to excessive heat generation and current leakage. Multi-core CPUs can handle multiple instruction threads. Technologies for Network-Based Systems Multi-core CPUs and Multithreading Technologies LI cache is private to each core, L2 cache is shared and L3 cache or DRAM is off the chip. Examples of multi-core CPUs include Intel i7, Xeon, AMD Opteron. Each core can also be multithreaded. E.g. the Niagara II has 8 cores with each core handling 8 threads for a total of 64 threads maximum. Technologies for Network-Based Systems Hyper-threading (HT) Technology A feature of certain Intel chips (such as Xeon, i7) that makes one physical core appear as two logical processors. On an operating system level, a single-core CPU with Hyper- Threading technology will be reported as two logical processors, dual-core CPU with HT is reported as four logical processors, and so on. HT adds a second set of general, control and special registers. The second set of registers allows the CPU to keep the state of both cores, and effortlessly switch between them by switching the register set.
    [Show full text]
  • HPVM: Heterogeneous Parallel Virtual Machine
    HPVM: Heterogeneous Parallel Virtual Machine Maria Kotsifakou∗ Prakalp Srivastava∗ Matthew D. Sinclair Department of Computer Science Department of Computer Science Department of Computer Science University of Illinois at University of Illinois at University of Illinois at Urbana-Champaign Urbana-Champaign Urbana-Champaign [email protected] [email protected] [email protected] Rakesh Komuravelli Vikram Adve Sarita Adve Qualcomm Technologies Inc. Department of Computer Science Department of Computer Science [email protected]. University of Illinois at University of Illinois at com Urbana-Champaign Urbana-Champaign [email protected] [email protected] Abstract hardware, and that runtime scheduling policies can make We propose a parallel program representation for heteroge- use of both program and runtime information to exploit the neous systems, designed to enable performance portability flexible compilation capabilities. Overall, we conclude that across a wide range of popular parallel hardware, including the HPVM representation is a promising basis for achieving GPUs, vector instruction sets, multicore CPUs and poten- performance portability and for implementing parallelizing tially FPGAs. Our representation, which we call HPVM, is a compilers for heterogeneous parallel systems. hierarchical dataflow graph with shared memory and vector CCS Concepts • Computer systems organization → instructions. HPVM supports three important capabilities for Heterogeneous (hybrid) systems; programming heterogeneous systems: a compiler interme- diate representation (IR), a virtual instruction set (ISA), and Keywords Virtual ISA, Compiler, Parallel IR, Heterogeneous a basis for runtime scheduling; previous systems focus on Systems, GPU, Vector SIMD only one of these capabilities. As a compiler IR, HPVM aims to enable effective code generation and optimization for het- 1 Introduction erogeneous systems.
    [Show full text]