Architectural and Runtime Enhancements for Dynamically Controlled Multi-Level Concurrency on Gpus

Total Page:16

File Type:pdf, Size:1020Kb

Architectural and Runtime Enhancements for Dynamically Controlled Multi-Level Concurrency on Gpus Architectural and Runtime Enhancements for Dynamically Controlled Multi-Level Concurrency on GPUs A Dissertation Presented by Yash Ukidave to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Engineering Northeastern University Boston, Massachusetts December 2015 c Copyright 2015 by Yash Ukidave All Rights Reserved i Contents List of Figures vi List of Tables ix List of Acronyms x Acknowledgments xi Abstract xii 1 Introduction 1 1.1 TypesofConcurrencyonGPUs . ... 2 1.2 Benefits of Multi-level concurrency . ........ 3 1.3 Challenges in Implementing Multi-Level Concurrency . ............. 5 1.3.1 Runtime Level Challenges . 5 1.3.2 Architectural Challenges . .... 5 1.3.3 Scheduling Challenges for Multi-Tasked GPUs . ........ 6 1.4 Contributions of the Thesis . ...... 7 1.5 Organization of the Thesis . ..... 9 2 Background 10 2.1 GPUarchitecture ................................. 10 2.1.1 AMD Graphics Core Next (GCN) Architecture . ..... 11 2.1.2 NVIDIA Kepler Architecture . 12 2.1.3 Evolution of GPU Architecture . 14 2.2 TheOpenCLProgrammingModel . 16 2.2.1 PlatformModel................................ 16 2.2.2 MemoryModel................................ 17 2.2.3 RuntimeandDriverModel. 18 2.3 Multi2Sim: Cycle-level GPU Simulator . ........ 20 2.3.1 Disassembler ................................. 20 2.3.2 Functional Simulation / Emulation . ...... 21 2.3.3 Architectural Simulator . 22 2.3.4 OpenCL Runtime and Driver Stack for Multi2Sim . ...... 23 ii 2.4 Concurrent Kernel Execution on GPUs . ....... 24 2.4.1 Support for Concurrent Kernel Execution on GPUs . ........ 26 2.5 Concurrent Context Execution on GPUs . ....... 27 2.5.1 Context Concurrency for Cloud Engines and Data Centers ......... 27 2.5.2 Key Challenges for Multi-Context Execution . ........ 28 3 Related Work 30 3.1 GPUModelingandSimulation . 30 3.1.1 Functional simulation of GPUs for compute and graphics applications . 31 3.1.2 Architectural Simulation of GPUs . ..... 31 3.1.3 PowerandAreaEstimationonGPUs . 33 3.2 Design of Runtime Environments for GPUs . ....... 33 3.3 Resource Management for Concurrent Execution . .......... 34 3.3.1 Spatial Partitioning of Compute Resources . ........ 34 3.3.2 QoS-aware Shared Resource Management . ..... 35 3.3.3 Workload Scheduling on GPUs . 36 3.3.4 GPU virtualization for Data Center and Cloud Engines . .......... 37 3.3.5 Scheduling on Cloud Servers and Datacenter . ....... 38 3.3.6 Address Space and Memory Management on GPUs . 38 4 Adaptive Spatial Partitioning 40 4.1 Mapping Multiple Command Queues to Sub-Devices . ......... 41 4.2 Design of the Adaptive Spatial Partitioning Mechanism . .............. 43 4.2.1 Compute Unit Reassignment Procedure . ..... 45 4.2.2 Case Study for Adaptive Partitioning . ...... 46 4.3 Workgroup Scheduling Mechanisms and Partitioning Policies............ 47 4.4 Evaluation Methodology . 49 4.4.1 Platform for Evaluation . 49 4.4.2 EvaluatedBenchmarks . 49 4.5 EvaluationResults ............................... 51 4.5.1 Performance Enhancements provided by Multiple Command Queue Mapping 51 4.5.2 Effective Utilization of Cache Memory . ...... 53 4.5.3 Effects of Workgroup Scheduling Mechanisms and Partitioning Policies . 54 4.5.4 Timeline Describing Load-Balancing Mechanisms for the Adaptive Partitioning 55 4.6 Summary ....................................... 58 5 Multi-Context Execution 59 5.1 Need for Multi-Context Execution . ....... 60 5.2 Architectural and Runtime Enhancements for Multi-Context Execution . 61 5.2.1 Extensions in OpenCL Runtime . 61 5.2.2 Address Space Isolation . 63 5.3 Resource Management for Efficient Multi-Context Execution............ 64 5.3.1 Transparent Memory Management using CML . 64 5.3.2 Remapping Compute-Resources between Contexts . ........ 66 5.4 Firmware Extensions on Command-Processor (CP) for Multi-Context Execution . 68 iii 5.5 Evaluation Methodology . 69 5.5.1 Platform for Evaluation . 69 5.5.2 Applications for Evaluation . 69 5.6 EvaluationResults ............................... 70 5.6.1 Speedup and Utilization with Multiple Contexts . .......... 70 5.6.2 Evaluation of Global Memory Efficiency . ..... 74 5.6.3 TMMevaluation ............................... 75 5.6.4 Evaluation of Compute-Unit Remapping Mechanism . ........ 79 5.6.5 L2 cache Efficiency for Multi-Context Execution . ......... 79 5.6.6 Firmware characterization and CP selection . ......... 81 5.6.7 Overhead in Multi-Context Execution . ...... 82 5.7 Discussion...................................... 84 5.7.1 Static L2 Cache Partitioning for Multiple Contexts . ........... 84 5.8 Summary ....................................... 84 6 Application-Aware Resource Management for GPUs 86 6.1 Need and Opportunity for QoS . 87 6.2 Virtuoso DefinedQoS................................. 89 6.2.1 QoSpolicies ................................. 89 6.2.2 QoSMetrics ................................. 90 6.3 Virtuoso Architecture ................................. 90 6.3.1 QoS Interface: Settings Register (QSR) . ....... 90 6.3.2 QoS Assignment: Performance Analyzer . ..... 92 6.3.3 QoS Enforcement: Resource Manager . 93 6.4 Evaluation Methodology . 95 6.4.1 Platform for Evaluation . 95 6.4.2 Applications for Evaluation . 96 6.4.3 Choosing the QoS Reallocation Interval . ....... 97 6.5 VirtuosoEvaluation..... ...... ..... ...... ...... .. 98 6.5.1 Evaluation of QoS Policies . 98 6.5.2 Impact of Compute Resource Allocation . ...... 99 6.5.3 Effects of QoS-aware Memory Management . 100 6.5.4 Case Study: Overall Impact of Virtuoso ................... 102 6.5.5 Evaluation of Memory Oversubscription . ....... 103 6.5.6 Sensitivity Analysis of High-Priority Constraints . ............. 104 6.5.7 Measuring QoS Success rate . 105 6.6 Discussion...................................... 105 6.7 Summary ....................................... 106 7 Machine Learning Based Scheduling for GPU Cloud Servers 108 7.1 Scheduling Challenges for GPU Cloud Servers and Datacenters . 109 7.2 GPU Remoting for Cloud Servers and Datacenters . ......... 111 7.3 Collaborative Filtering for Application Performance Prediction . 112 7.4 MysticArchitecture..... ...... ..... ...... ...... .. 114 7.4.1 The Causes of Interference (CoI) . 114 iv 7.4.2 Stage-1: Initializer and Profile Generator . ......... 114 7.4.3 Stage-2: Collaborative Filtering (CF) based Prediction ........... 116 7.4.4 Stage-3: Interference Aware Scheduler . ....... 117 7.4.5 Validation................................... 118 7.5 Evaluation Methodology . 118 7.5.1 ServerSetup ................................. 118 7.5.2 Workloads for Evaluation . 119 7.5.3 Metrics .................................... 120 7.6 EvaluationResults ............................... 121 7.6.1 Scheduling Performance . 121 7.6.2 System Performance Using Mystic . 123 7.6.3 GPUUtilization ............................... 124 7.6.4 Quality of Launch Sequence Selection . 125 7.6.5 Scheduling Decision Quality . 126 7.6.6 Scheduling Overheads Using Mystic .................... 126 7.7 Summary ....................................... 127 8 Conclusion 128 8.1 FutureWork...................................... 129 Bibliography 131 v List of Figures 2.1 BasicarchitectureofaGPU. ..... 11 2.2 AMD GCN Compute Unit block diagram. 12 2.3 NVIDIA Advanced Advanced Stream Multiprocessor (SMX) . ........... 13 2.4 OpenCL Host Device Architecture . ...... 17 2.5 TheOpenCLmemorymodel.. 18 2.6 The runtime and driver model of OpenCL . ...... 19 2.7 Multi2Sim’s three stage architectural simulation . ............... 21 2.8 PipelinemodelofGCNSIMDunit . 22 2.9 Multi2Sim Runtime and Driver Stack for OpenCL . ......... 24 2.10 ComputeStagesofCLSurf . 25 2.11 Kernel scheduling for Fermi GPU . ...... 26 2.12 Multiple hardware queues in GPUs . ....... 27 2.13 Active time of data buffers in GPU applications . ............ 28 4.1 High-level model describing multiple command queue mapping .......... 41 4.2 Flowchart for adaptive partitioning handler . ............ 44 4.3 Procedure to initiate compute unit reallocation between kernels . 46 4.4 Case study to demonstrate our adaptive partitioning mechanism. 46 4.5 Workgroup scheduling mechanisms . ...... 47 4.6 Performance speedup using multiple command queue mapping........... 52 4.7 Memory hierarchy of the modeled AMD HD 7970 GPU device. ........ 52 4.8 L2 cache hit rate when using multiple command queue mapping .......... 53 4.9 Effect of different partitioning policy and workgroup scheduling mechanisms . 54 4.10 Timeline analysis of compute-unit reassignment . .............. 56 4.11 Overhead for CU transfer . ..... 57 5.1 Resource utilization and execution time of Rodinia kernels ............. 60 5.2 Position of CML and context dispatch in the runtime . ........... 62 5.3 Address space isolation using CID . ....... 63 5.4 Combined host and CP interaction for TMM. ....... 65 5.5 TMM operation, managing three concurrent contexts. ............. 66 5.6 Speedup and resource utilization improvement using multi-context execution . 72 5.7 Compute units utilization using multi-context execution............... 73 5.8 Increase in execution time for multi-context execution. ............... 74 vi 5.9 Global memory bandwidth utilization using multi-context execution. 74 5.10 Timeline of global memory usage for contexts from different mixes. 75 5.11 Evaluation of different global memory management mechanisms . 77 5.12 Performance of pinned memory updates over transfer stalling............ 77 5.13 Timeline analysis of the compute unit
Recommended publications
  • AMD Accelerated Parallel Processing Opencl Programming Guide
    AMD Accelerated Parallel Processing OpenCL Programming Guide November 2013 rev2.7 © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Accelerated Parallel Processing, the AMD Accelerated Parallel Processing logo, ATI, the ATI logo, Radeon, FireStream, FirePro, Catalyst, and combinations thereof are trade- marks of Advanced Micro Devices, Inc. Microsoft, Visual Studio, Windows, and Windows Vista are registered trademarks of Microsoft Corporation in the U.S. and/or other jurisdic- tions. Other names are for informational purposes only and may be trademarks of their respective owners. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. The information contained herein may be of a preliminary or advance nature and is subject to change without notice. No license, whether express, implied, arising by estoppel or other- wise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as compo- nents in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD’s product could create a situation where personal injury, death, or severe property or envi- ronmental damage may occur.
    [Show full text]
  • Download Drivers Sapphire Nitro R7 370 SAPPHIRE NITRO R7 370 4GB DRIVERS for MAC
    download drivers sapphire nitro r7 370 SAPPHIRE NITRO R7 370 4GB DRIVERS FOR MAC. This makes msi s r7 370 gaming card 10.2-inches in length. This card features exclusive asus auto-extreme technology with super alloy power ii for premium aerospace-grade quality and reliability. Gpu card reviewed both cards and 4gb, windows 7/8. Sapphire r7 370 4 gb bios warning, you are viewing an unverified bios file. Alternatively a suitable upgrade choice for the radeon r7 370 sapphire nitro 4gb edition is the rx 5000 series radeon rx 5500 4gb, which is 130% more powerful and can run 726 of the 1000. Discussion created by nefe on latest reply on by uncatt. Equipped with a modest gaming rig. Msi r7 370 4 gb bios warning, you are viewing an unverified bios file. With every new generation of purchase. This upload has not been verified by us in any way like we do for the entries listed under the 'amd', 'ati' and 'nvidia' sections . 27-05-2016 the sapphire nitro radeon r7 370 4gb gddr5 retails at around rm 750, the card performs much better than any r7 360 cards and offers much better value. It always installs drivers for r9 200, so i'm forced to install the driver for the actual gpu. Published on so i've looked all over the internet and everyone with a similar problem with a similar card just ends up rma ing. Über 400.000 Testberichte und aktuelle Tests. We delete comments that violate our policy, which we. But if i keep the driver for r9 200 that windows installed.
    [Show full text]
  • Comparison of Technologies for General-Purpose Computing on Graphics Processing Units
    Master of Science Thesis in Information Coding Department of Electrical Engineering, Linköping University, 2016 Comparison of Technologies for General-Purpose Computing on Graphics Processing Units Torbjörn Sörman Master of Science Thesis in Information Coding Comparison of Technologies for General-Purpose Computing on Graphics Processing Units Torbjörn Sörman LiTH-ISY-EX–16/4923–SE Supervisor: Robert Forchheimer isy, Linköpings universitet Åsa Detterfelt MindRoad AB Examiner: Ingemar Ragnemalm isy, Linköpings universitet Organisatorisk avdelning Department of Electrical Engineering Linköping University SE-581 83 Linköping, Sweden Copyright © 2016 Torbjörn Sörman Abstract The computational capacity of graphics cards for general-purpose computing have progressed fast over the last decade. A major reason is computational heavy computer games, where standard of performance and high quality graphics con- stantly rise. Another reason is better suitable technologies for programming the graphics cards. Combined, the product is high raw performance devices and means to access that performance. This thesis investigates some of the current technologies for general-purpose computing on graphics processing units. Tech- nologies are primarily compared by means of benchmarking performance and secondarily by factors concerning programming and implementation. The choice of technology can have a large impact on performance. The benchmark applica- tion found the difference in execution time of the fastest technology, CUDA, com- pared to the slowest, OpenCL, to be twice a factor of two. The benchmark applica- tion also found out that the older technologies, OpenGL and DirectX, are compet- itive with CUDA and OpenCL in terms of resulting raw performance. iii Acknowledgments I would like to thank Åsa Detterfelt for the opportunity to make this thesis work at MindRoad AB.
    [Show full text]
  • Amd Driver 17.11.2 Download DRIVER RADEON V17.11.2 for WINDOWS 7 DOWNLOAD
    amd driver 17.11.2 download DRIVER RADEON V17.11.2 FOR WINDOWS 7 DOWNLOAD. The headline changes to switch optimization between graphics support for free. Rx vega radeon setting enhanced sync - amd rx vega radeon relive. 330 free download the release notes for free. Show me where to locate my serial number or snid on my device. The system might tells you it is not supported but do not mind that. Issues with access violations, Community. Gpu workload, a new toggle in radeon settings that can be found under the gaming, global settings options. Power supply power to manually requires some computer hardware. Amd for radeon products such as 17. Windows operating systems only or select your device. This package includes laptop and patience. Ethereum + OpenCL Benchmarks With The Latest AMDGPU-PRO. This toggle will allow you to switch optimization between graphics or compute workloads on select radeon rx 500, radeon rx 400, radeon r9 390, radeon r9 380, radeon r9 290 and radeon r9 285 series graphics products. The radeon software adrenalin 2020 edition 20.3.1 configuration scored an average of 139.1 fps, while the 20.2.2 edition configuration scored an average of 133.1 fps, showing an 5% uplift driver over driver. Download new and previously released drivers including support software, bios, utilities, firmware and patches for intel products. The amd product verification tool, donlot driver number of. Download latest reply on this page. A4-6300 apu with the samsung devices. This is a number for mac. Downloaded 5193 times, i was created, and 11.
    [Show full text]
  • 3D Animation
    Contents Zoom In Zoom Out For navigation instructions please click here Search Issue Next Page ComputerINNOVATIONS IN VISUAL COMPUTING FOR THE GLOBAL DCC COMMUNITY June 2007 www.cgw.com WORLD Making Waves Digital artists create ‘pretend spontaneity’ in the documentary-style animation Surf’s Up $4.95 USA $6.50 Canada Contents Zoom In Zoom Out For navigation instructions please click here Search Issue Next Page A CW Previous Page Contents Zoom In Zoom Out Front Cover Search Issue Next Page BEF MaGS _____________________________________________________ A CW Previous Page Contents Zoom In Zoom Out Front Cover Search Issue Next Page BEF MaGS A CW Previous Page Contents Zoom In Zoom Out Front Cover Search Issue Next Page BEF MaGS June 2007 • Volume 30 • Number 6 INNOVATIONS IN VISUAL COMPUTING FOR THE GLOBAL DCC COMMUNITY Also see www.cgw.com for computer graphics news, special surveys and reports, and the online gallery. ____________ » Director Luc Besson discusses Computer WORLD his black-and-white fi lm, WORLD Post Angel-A. » Trends in broadcast design. » Getting the most out of canned music and sound. See it in www.postmagazine.com Features Cover story Radical, Dude 12 3D ANIMATION | In one of the most unusual animated features to hit the Departments screen, Surf’s Up incorporates a documentary fi lming style into the Editor’s Note 2 CG medium. Triple the Fun Summer blockbusters are making their By Barbara Robertson debut at theaters, and this year, it is Wrangling Waves 18 apparent that three’s a charm, as ani- 3D ANIMATION | The visual effects mators upped the graphics ante in 12 supervisor on Surf’s Up takes us on an Spider-Man 3, Shrek 3, and At World’s incredible behind-the-scenes journey End.
    [Show full text]
  • High-Performance Reconfigurable Computing
    High-Performance Reconfigurable Computing Tarek El-Ghazawi Director, Institute for Massively Parallel Applications and Computing Technology (IMPACT) Co-Director, NSF Center for High-Performance Reconfigurable Computing (CHREC) The George Washington University ICFPT07 12/11/07 1 Acknowledgements ARSC, AMI, Cray, DoD, HPTi, NASA, NSF/CHREC, SGI, SRC, Star Bridge, Xtreme Data, many others ICFPT07 12/11/07 2 1 Outline Architectures and Systems Tools and Programming Applications Performance Wrap-up ICFPT07 12/11/07 3 Reconfigurable Supercomputing (RSC) Efficient high performance computing using parallel and distributed systems of both reconfigurable hardware resources and conventional microprocessors This tutorial establishes the current status, the direction taken, and the potential for RSC ICFPT07 12/11/07 4 2 Top 500 Supercomputers Rank Site Computer Processors Year Rmax Rpeak eServer Blue DOE/NNSA/LLNL Gene Solution 1 United States 212992 2007 478200 596378 IBM Forschungszentrum Blue Gene/P 2 Juelich (FZJ) Solution 65536 2007 167300 222822 Germany IBM SGI/New Mexico SGI Altix ICE Computing Applications 8200, Xeon quad 3 Center (NMCAC) core 3.0 GHz 14336 2007 126900 172032 United States SGI Cluster Platform Computational Research 3000 BL460c, Laboratories, TATA Xeon 53xx 3GHz, 4 SONS 14240 2007 117900 170880 Infiniband India HP Cluster Platform 3000 BL460c, Government Agency Xeon 53xx 5 Sweden 2.66GHz, 13728 2007 102800 146430 Infiniband HP ICFPT07 12/11/07 5 Reconfigurable Computers The microchip that rewires itself Scientific American – June 1997 0 Computers that modify their hardware circuits as they operate are opening a new era in computer design. 0 Reconfigurable computers architecture is based on FPGAs (Field Programmable Gate Arrays) Source: [Sci97] ICFPT07 12/11/07 6 3 Execution Model for HPRCs μP •Transfer of Control •Input Data RP PC •Output Data Piplines, Systolic Arrays, SIMD, ..
    [Show full text]
  • System Requirements for Virtual Classes Updated 5/11/2020
    System Requirements for Virtual Classes Updated 5/11/2020 See also: ​Software List for Virtual Classes​ (includes installation instructions) After signing up your child for one of Empow’s virtual classes, it is highly advised to install the appropriate software or create an account for the class. Each class description will contain one or more of the following tools, and all classes require Zoom. Please take careful note of which operating systems (OSes) are required for the software that your child will be using in classes they are registered in. In most cases a computer is required rather than a tablet. Zoom: Supported OSes: Windows XP+, Mac OS 10.7+, Linux, ChromeOS Supported Tablets: iPad 2 or later with iPadOS 13+, Android 4.0+ with 1Ghz processor or better Required: ​Microphone Recommended: ​Headphones Recommended: ​Webcam Install AND Account creation required https://zoom.us/download EV3 Programming: OS requirements: Windows Vista or later, or Mac OS 10.6 - 10.14 ---​ ​DOES NOT WORK IN OS 10.15 Catalina Other requirements: Dual core processor - 2.0 Ghz or higher, 2GB of RAM, 2GB of hard drive space. Install required. https://www.lego.com/en-us/themes/mindstorms/downloads Scroll down the page to find the download for either Mac or Windows. Telephone #: 617-395-7527 x300 Website: empow.me Flowlab: Recommended Browsers: Chrome, Firefox, Safari No install required. HUE Animation PC requirements: Windows 10, 8, 7 or XP and graphics drivers with OpenGL 2.0 support Mac requirements: OS X 10.5 (Leopard) to macOS 10.14 (Mojave). ​DOES NOT
    [Show full text]
  • Readthedocs-Breathe Documentation Release 1.0.0
    ReadTheDocs-Breathe Documentation Release 1.0.0 Thomas Edvalson Feb 06, 2019 Contents 1 Going to 11: Amping Up the Programming-Language Run-Time Foundation3 2 Solid Compilation Foundation and Language Support5 2.1 Quick Start Guide............................................5 2.1.1 Current Release Notes.....................................5 2.1.2 Installation Guide........................................5 2.1.3 Programming Guide......................................6 2.1.4 ROCm GPU Tunning Guides..................................7 2.1.5 GCN ISA Manuals.......................................7 2.1.6 ROCm API References.....................................7 2.1.7 ROCm Tools..........................................8 2.1.8 ROCm Libraries........................................9 2.1.9 ROCm Compiler SDK..................................... 10 2.1.10 ROCm System Management.................................. 10 2.1.11 ROCm Virtualization & Containers.............................. 10 2.1.12 Remote Device Programming................................. 11 2.1.13 Deep Learning on ROCm.................................... 11 2.1.14 System Level Debug...................................... 11 2.1.15 Tutorial............................................. 11 2.1.16 ROCm Glossary......................................... 12 2.2 Current Release Notes.......................................... 12 2.2.1 New features and enhancements in ROCm 2.1......................... 12 2.2.1.1 RocTracer v1.0 preview release – ‘rocprof’ HSA runtime tracing and statistics sup- port
    [Show full text]
  • Graviton: Trusted Execution Environments on Gpus
    In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18) Graviton: Trusted Execution Environments on GPUs Stavros Volos Kapil Vaswani Rodrigo Bruno Microsoft Research Microsoft Research INESC-ID / IST, University of Lisbon Abstract tors. This limitation gives rise to an undesirable trade-off between security and performance. We propose Graviton, an architecture for supporting There are several reasons why adding TEE support trusted execution environments on GPUs. Graviton en- to accelerators is challenging. With most accelerators, ables applications to offload security- and performance- a device driver is responsible for managing device re- sensitive kernels and data to a GPU, and execute kernels sources (e.g., device memory) and has complete control in isolation from other code running on the GPU and all over the device. Furthermore, high-throughput accelera- software on the host, including the device driver, the op- tors (e.g., GPUs) achieve high performance by integrat- erating system, and the hypervisor. Graviton can be in- ing a large number of cores, and using high bandwidth tegrated into existing GPUs with relatively low hardware memory to satisfy their massive bandwidth requirements complexity; all changes are restricted to peripheral com- [4, 11]. Any major change in the cores, memory man- ponents, such as the GPU’s command processor, with agement unit, or the memory controller can result in no changes to existing CPUs, GPU cores, or the GPU’s unacceptably large overheads. For instance, providing MMU and memory controller. We also propose exten- memory confidentiality and integrity via an encryption sions to the CUDA runtime for securely copying data engine and Merkle tree will significantly impact avail- and executing kernels on the GPU.
    [Show full text]
  • Graviton: Trusted Execution Environments on Gpus
    Graviton: Trusted Execution Environments on GPUs Stavros Volos and Kapil Vaswani, Microsoft Research; Rodrigo Bruno, INESC-ID / IST, University of Lisbon https://www.usenix.org/conference/osdi18/presentation/volos This paper is included in the Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’18). October 8–10, 2018 • Carlsbad, CA, USA ISBN 978-1-939133-08-3 Open access to the Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX. Graviton: Trusted Execution Environments on GPUs Stavros Volos Kapil Vaswani Rodrigo Bruno Microsoft Research Microsoft Research INESC-ID / IST, University of Lisbon Abstract tors. This limitation gives rise to an undesirable trade-off between security and performance. We propose Graviton, an architecture for supporting There are several reasons why adding TEE support trusted execution environments on GPUs. Graviton en- to accelerators is challenging. With most accelerators, ables applications to offload security- and performance- a device driver is responsible for managing device re- sensitive kernels and data to a GPU, and execute kernels sources (e.g., device memory) and has complete control in isolation from other code running on the GPU and all over the device. Furthermore, high-throughput accelera- software on the host, including the device driver, the op- tors (e.g., GPUs) achieve high performance by integrat- erating system, and the hypervisor. Graviton can be in- ing a large number of cores, and using high bandwidth tegrated into existing GPUs with relatively low hardware memory to satisfy their massive bandwidth requirements complexity; all changes are restricted to peripheral com- [4, 11].
    [Show full text]
  • Energy-Aware Resource Management for Heterogeneous Systems
    FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO Energy-aware resource management for heterogeneous systems Eduardo Fernandes Mestrado Integrado em Engenharia Informática e Computação Supervisor: Jorge Barbosa July 7, 2016 Energy-aware resource management for heterogeneous systems Eduardo Fernandes Mestrado Integrado em Engenharia Informática e Computação July 7, 2016 Abstract Nowadays computers, be they personal or a node contained in a multi machine environment, can contain different kinds of processing units. A common example is the personal computer that nowadays always includes a CPU and a GPU, both capable of executing code, sometimes even in the same integrated circuit package. These are the so called heterogeneous systems. It’s important to be aware that the various processing units aren’t equal, for instance CPUs are very different from GPUs. This raises a problem, since not every task can be executed in all processing units. To solve this problem a new task scheduling algorithm was developed with the aid of SimDag from the SimGrid toolkit. This algorithm uses a DAG (directed acyclic graph) to aid the scheduling of different tasks, be they from a single application or from various different applications. The algorithm is based on the HEFT scheduling algorithm, a greedy algorithm with a short execution time, developed by Topcuoglu et al. This new algorithm is aware of the different pro- cessing units and of the different performance/power levels. This solves the problem of not all tasks being able to be executed in all processing units. Since previous studies show that reducing the CPU clock speed on DVFS (dynamic voltage frequency scaling) CPUs can reduce the energy spent by the CPU while executing various tasks with little increase in runtime.
    [Show full text]
  • Amd Drivers Windows 10 64 Bit Download AMD Radeon Adrenalin 2021 Edition Graphics Driver 21.4.1
    amd drivers windows 10 64 bit download AMD Radeon Adrenalin 2021 Edition Graphics Driver 21.4.1. Unleash the powerful performance and innovation built into Radeon Graphics through an intuitive and beautiful UI for both PCs and mobile devices. Download. What's New. Specs. Related Drivers 10. Windows 10 64-bit (21.4.1) Windows 7 64-bit (21.4.1) Windows 10 32-bit (18.5.1) Windows 7 32-bit (18.5.1) Windows 8 64-bit (17.4.4) Create, capture, and share your remarkable moments. Effortlessly boost performance and efficiency. Experience Radeon Software with industry- leading user satisfaction, rigorously-tested stability, comprehensive certification, and more. It might also interest you to download the new AMD Link App for Android, which allows you to conveniently access gameplay performance metrics and PC system info on your smartphone and/or tablet. Note to Windows 8 users: Beginning with the release of driver version 17.4.4, AMD will not be releasing newer drivers with support for Windows 8. What's New: Support For. AMD Link A brand-new AMD Link for Windows client is now available that allows you to stream your games and desktop to other Radeon graphics enabled PCs. New "Link Game" feature that allows you to easily connect with a friend to play games together on a single PC or even help them troubleshoot a PC issue or problem. Redesigned streaming technology for better visuals and lower latency. New quality of service feature that dynamically adjusts your streaming settings based on your internet connection. Now supports up to 4k/144fps streaming.
    [Show full text]