Flexwafe - an Architecture For

Total Page:16

File Type:pdf, Size:1020Kb

Flexwafe - an Architecture For FlexWAFE - an Architecture for Reconfigurable Image Processing Systems FlexWAFE - eine Architektur für rekonfigurierbare-Bildverarbeitungssysteme Von der Fakultät für Elektrotechnik, Informationstechnik, Physik der Technischen Universität Carolo-Wilhelmina zu Braunschweig zur Erlangung der Würde eines Doktor-Ingenieurs (Dr.-Ing.) genehmigte Dissertation von: Eng. Amilcar do Carmo Lucas aus: Almeirim, Portugal eingereicht am: 19.04.2012 mündliche Prüfung am: 23.07.2012 Vorsitzender: Prof. Dr.-Ing. Harald Michalik 1. Referent: Prof. Dr.-Ing. Rolf Ernst 2. Referent: Prof. Dr.-Ing. Holger Blume 2012 Kurzfassung Kürzlich gab es eine Zunahme der Nachfrage nach hochauflösenden digitalen Medieninhalten in den Kino- und Fernsehenindustrien. Derzeit vorhandene Systeme entsprechen nicht den Anforderungen, oder sind zu teuer. Neue Hardware-Systeme und neuer Programmiertechniken sind erforderlich, um den hochauflösenden, hochwertigen, Bildanforderungen zu genügen und Kosten zu verringern. Die Industrie sucht eine flexible Architektur zur Ausführung mehrerer Anwendungen auf Standard-Komponenten, mit reduzierten Entwicklungszeiten. Bis jetzt ist gängige Praxis, spezialisierten Architektur und Systeme zu entwickeln, die eine einzelne Anwendung zielen. Dieses hat wenig Flexibilität und führt zu hohe Entwicklungs- kosten, jede neue Anwendung ist fast von Grund auf neu konzipiert. Unser Fokus war es, eine für Bild Verarbeitung geeignet Architektur zu entwickeln dass die Flexibilität hat mehrere Anwendungen an dieselbe FPGA-basierte Hardware-Plattform zu laufen. Die Neuheit in unserem Ansatz ist, dass wir Teile der Architektur zur Laufzeit rekon- figurieren, aber, ohne das Zeit und constraints strafe von FPGA Partielle-Rekonfiguration- Techniken. Die Architektur verwendet eine hierarchische Kontrollstruktur, die zur paral- lel Verarbeitung gut geeignet ist, und Single-Cycle-Latenz Rekonfiguration von Teilen der Verarbeitungs-Pipeline ermöglicht. Dieses wird unter Verwendung relativ weniger Ressour- cen für die verteiltes Steuerung Strukturen erzielt. Um das entwickelte Architektur zu testen ein komplexer Film-Korn-Rauschunterdrückung Algorithmus wurde auf einer von Thomson-Grass Valley entwickelt standard Hardware- Plattform umgesetzt. Das System erfüllt alle Anforderungen und hatte sehr wenig Last auf den hierarchischen Kontrollstrukturen, es gibt viel Wachstum Spielraum für viel komplizier- tere Steuerunganforderungen. Die Architektur ist zu anderen Hardwareplattformen portiert worden, und andere Anwen- dungen wurden ebenfalls implementiert. Der Laufzeitreconfigurability ist ein Schlüsselfaktor im Erfolg des FlexWAFE gewesen. ii Abstract Recently there has been an increase in demand for high-resolution digital media content in both cinema and television industries. Currently existing equipment does not meet the re- quirements, or is too costly. New hardware systems and new programming techniques are needed in order to meet the high-resolution, high-quality, image requirements and reduce costs. The industry seeks a flexible architecture capable of running multiple applications on top of standard off-the-shelf components, with reduced development time. Until now, standard practice has been to develop specialized architectures and systems that target a single application. This has little flexibility and leads to high developments costs, every new application is designed almost from scratch. Our focus was to develop an architecture that is suited to image stream processing and has the flexibility to run multiple applications using the same FPGA-based hardware platform. The novelty in our approach is that we reconfigure parts of the architecture at run-time, but without incurring in the time and added constraints penalty of FPGA-partial-reconfiguration techniques. The architecture uses a hierarchical control structure that is well suited to parallel processing, and allows single cycle latency reconfiguration of parts of the processing pipeline. This is achieved using relatively little resources for the distributed control structures. To test the developed architecture a complex film-grain noise reduction algorithm was im- plemented on an off-the-shelf hardware platform developed by Thomson-Grass Valley. The system meet all the requirements and had very little load on the hierarchical control structures, there is growth headroom for much complexer control demands. The architecture has been ported to other hardware platforms, and other applications have been implemented as well. The run-time reconfigurability has proven to be a key factor in the success of the FlexWAFE. iv Acknowledgments I would like to thank my thesis advisor Professor Rolf Ernst for the opportunity and trust he gave me when he offered me the job position after a short three month DAAD internship at his institute. I am very grateful for the motivation, guidance and insightful suggestions he gave me throughout my work. Furthermore I would like to thank Professor Harald Michalik for chairing the examination committee and Professor Holger Blume for the co-examination. I would like to express my deepest gratitude to my college and friend Dr. Sven Heithecker who shared the office with me for many years and worked together with me on the same project. I’ve learned a lot from him, our cooperative brainstorming sessions helped shape the FlexWAFE architecture and he also helped me adapt to a new country. Many thanks go also to office college Henning Sahlbach, who supported me in the final stages of the thesis and built my Dr. hat1. I would like to thank Dr. Marek Jersak for the trust he gave me after the short internship, and for the friendship. An important role in this thesis was also played by the excellent work environment provided by all the colleagues at the institute. I am thankful for the Kaffee Runde discussions, the institute excursions, the Christmas parties, and the extracurricular hat building opportunities provided by the institute. On a more personal note I would like to thank my loving wife for all the support and encouragement she gave me throughout this thesis. Most importantly of all, I would like to thank my parents, for the patience they had when educating me. And for the lovely way they found to bring-up technical curiosity in me. At age three I was already sure I wanted to be an electronics engineer. Thank you. 1PhD. candidates at IDA get a big, heavy, personalized microcontroller controlled hat with lots of flashing lights and moving parts vi Contents Kurzfassungi Abstract iii Acknowledgmentsv List of Figures xi 1. Introduction1 1.1. Digital Film Processing.............................1 1.1.1. Image Gathering............................2 1.1.2. Post Production.............................2 1.1.2.1. Application.........................2 1.1.2.2. Techniques..........................3 1.1.3. Delivery.................................3 1.1.4. Resolutions...............................4 1.2. The FlexFilm Project..............................4 1.2.1. Usage of FPGAs............................4 1.2.2. Motivation...............................6 1.2.3. Project partners and their work-packages...............8 1.2.4. Example Application..........................9 1.2.4.1. Film Grain Noise......................9 1.2.4.2. The Algorithm........................9 1.2.5. Hardware Architecture......................... 10 1.2.5.1. FPGAs............................ 12 1.2.6. Communication Channels....................... 12 1.2.7. Contribution to FlexFilm........................ 12 1.3. Thesis Outline.................................. 14 2. Processing platforms 17 2.1. Processing platforms for digital film processing................ 17 2.1.1. Line Dancer............................... 17 2.1.2. GPU.................................. 18 2.1.3. Cell Processor............................. 18 2.1.4. Storm-1................................. 19 2.1.5. FPGA based processors........................ 19 viii Contents 2.1.6. FPOA.................................. 20 2.1.7. Software on standard processors.................... 20 2.2. FPGA programming methodologies...................... 21 2.2.1. C as input................................ 21 2.2.2. Matlab/Simulink as input........................ 22 2.2.3. Hardware description languages.................... 23 2.3. Summary and Conclusion............................ 23 3. FlexWAFE 25 3.1. FlexWAFE Reconfigurable Architecture.................... 25 3.2. Inter-block Signaling.............................. 27 3.2.1. Data Types............................... 28 3.3. Data Processing Unit (DPU)- Data Processing Unit.............. 30 3.3.1. Processing Groups........................... 31 3.3.2. SIMD like processing......................... 31 3.4. Local Memory with Controller (LMC)- Local Memory with Controller.... 32 3.4.1. Asynchronous FIFOs.......................... 32 3.4.2. Address Stepper............................ 32 3.4.3. Cascaded Stepper............................ 33 3.4.4. LMC reorder streams.......................... 34 3.4.5. LMC with external memory...................... 35 3.4.6. Large FIFO based on external SDRAM memory........... 36 3.4.7. Other LMCs.............................. 36 3.5. Custom DDR-SDRAM Memory Controller.................. 36 3.6. Inter-chip and Inter-board Communication................... 37 3.7. Control and Programmability.......................... 38 3.7.1. Local Controller............................ 40 3.7.2. Algorithm Controller (AC)- Algorithm Controller........... 43 3.7.3. Control Bus............................... 44 3.8. Summary
Recommended publications
  • AMD Accelerated Parallel Processing Opencl Programming Guide
    AMD Accelerated Parallel Processing OpenCL Programming Guide November 2013 rev2.7 © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Accelerated Parallel Processing, the AMD Accelerated Parallel Processing logo, ATI, the ATI logo, Radeon, FireStream, FirePro, Catalyst, and combinations thereof are trade- marks of Advanced Micro Devices, Inc. Microsoft, Visual Studio, Windows, and Windows Vista are registered trademarks of Microsoft Corporation in the U.S. and/or other jurisdic- tions. Other names are for informational purposes only and may be trademarks of their respective owners. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. The information contained herein may be of a preliminary or advance nature and is subject to change without notice. No license, whether express, implied, arising by estoppel or other- wise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as compo- nents in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD’s product could create a situation where personal injury, death, or severe property or envi- ronmental damage may occur.
    [Show full text]
  • Download Drivers Sapphire Nitro R7 370 SAPPHIRE NITRO R7 370 4GB DRIVERS for MAC
    download drivers sapphire nitro r7 370 SAPPHIRE NITRO R7 370 4GB DRIVERS FOR MAC. This makes msi s r7 370 gaming card 10.2-inches in length. This card features exclusive asus auto-extreme technology with super alloy power ii for premium aerospace-grade quality and reliability. Gpu card reviewed both cards and 4gb, windows 7/8. Sapphire r7 370 4 gb bios warning, you are viewing an unverified bios file. Alternatively a suitable upgrade choice for the radeon r7 370 sapphire nitro 4gb edition is the rx 5000 series radeon rx 5500 4gb, which is 130% more powerful and can run 726 of the 1000. Discussion created by nefe on latest reply on by uncatt. Equipped with a modest gaming rig. Msi r7 370 4 gb bios warning, you are viewing an unverified bios file. With every new generation of purchase. This upload has not been verified by us in any way like we do for the entries listed under the 'amd', 'ati' and 'nvidia' sections . 27-05-2016 the sapphire nitro radeon r7 370 4gb gddr5 retails at around rm 750, the card performs much better than any r7 360 cards and offers much better value. It always installs drivers for r9 200, so i'm forced to install the driver for the actual gpu. Published on so i've looked all over the internet and everyone with a similar problem with a similar card just ends up rma ing. Über 400.000 Testberichte und aktuelle Tests. We delete comments that violate our policy, which we. But if i keep the driver for r9 200 that windows installed.
    [Show full text]
  • Comparison of Technologies for General-Purpose Computing on Graphics Processing Units
    Master of Science Thesis in Information Coding Department of Electrical Engineering, Linköping University, 2016 Comparison of Technologies for General-Purpose Computing on Graphics Processing Units Torbjörn Sörman Master of Science Thesis in Information Coding Comparison of Technologies for General-Purpose Computing on Graphics Processing Units Torbjörn Sörman LiTH-ISY-EX–16/4923–SE Supervisor: Robert Forchheimer isy, Linköpings universitet Åsa Detterfelt MindRoad AB Examiner: Ingemar Ragnemalm isy, Linköpings universitet Organisatorisk avdelning Department of Electrical Engineering Linköping University SE-581 83 Linköping, Sweden Copyright © 2016 Torbjörn Sörman Abstract The computational capacity of graphics cards for general-purpose computing have progressed fast over the last decade. A major reason is computational heavy computer games, where standard of performance and high quality graphics con- stantly rise. Another reason is better suitable technologies for programming the graphics cards. Combined, the product is high raw performance devices and means to access that performance. This thesis investigates some of the current technologies for general-purpose computing on graphics processing units. Tech- nologies are primarily compared by means of benchmarking performance and secondarily by factors concerning programming and implementation. The choice of technology can have a large impact on performance. The benchmark applica- tion found the difference in execution time of the fastest technology, CUDA, com- pared to the slowest, OpenCL, to be twice a factor of two. The benchmark applica- tion also found out that the older technologies, OpenGL and DirectX, are compet- itive with CUDA and OpenCL in terms of resulting raw performance. iii Acknowledgments I would like to thank Åsa Detterfelt for the opportunity to make this thesis work at MindRoad AB.
    [Show full text]
  • MSI Afterburner V4.6.4
    MSI Afterburner v4.6.4 MSI Afterburner is ultimate graphics card utility, co-developed by MSI and RivaTuner teams. Please visit https://msi.com/page/afterburner to get more information about the product and download new versions SYSTEM REQUIREMENTS: ...................................................................................................................................... 3 FEATURES: ............................................................................................................................................................. 3 KNOWN LIMITATIONS:........................................................................................................................................... 4 REVISION HISTORY: ................................................................................................................................................ 5 VERSION 4.6.4 .............................................................................................................................................................. 5 VERSION 4.6.3 (PUBLISHED ON 03.03.2021) .................................................................................................................... 5 VERSION 4.6.2 (PUBLISHED ON 29.10.2019) .................................................................................................................... 6 VERSION 4.6.1 (PUBLISHED ON 21.04.2019) .................................................................................................................... 7 VERSION 4.6.0 (PUBLISHED ON
    [Show full text]
  • Mode D'emploi
    MODE D'EMPLOI Moniteur de jeu C49RG90SS* La couleur et l'aspect du produit peuvent varier en fonction du modèle, et ses spécifications peuvent être modifiées sans préavis pour des raisons d'amélioration des performances. Le contenu du présent guide est sujet à modification sans préavis à des fins d'amélioration de la qualité. © Samsung Electronics Samsung Electronics détient les droits d'auteur du présent guide. Toute utilisation ou reproduction du présent guide, en partie ou intégralement, est interdite sans l'autorisation de Samsung Electronics. Les marques de fabrique autres que celles de Samsung Electronics sont la propriété de leurs détenteurs respectifs. Des frais d'administration peuvent vous être facturés dans les situations suivantes : (a) Un technicien intervient à votre demande alors que le produit ne présente aucun défaut (c.-à-d. vous n'avez pas lu le manuel d'utilisation). (b) Vous amenez le produit dans un centre de réparation alors que le produit ne présente aucun défaut (c.-à-d. vous n'avez pas lu le manuel d'utilisation). Le montant des frais d'administration vous sera communiqué avant la visite du technicien. Table des matières Avant utilisation du produit Connexion et utilisation d'un Jeu périphérique source Sécurisation de l'espace d'installation 4 Mode image 28 Précautions de stockage 4 Points à vérifier avant la connexion 21 Fréqu. rafraîch. 29 Consignes de sécurité 4 Branchement et utilisation d'un PC 21 Ég. zones sombres 30 Nettoyage 5 Branchement par câble HDMI 21 Électricité et sécurité 5 Connexion à l'aide du câble DP 21 Temps de réponse 30 Installation 6 Branchement au casque 22 FreeSync 30 Fonctionnement 7 Branchement au microphone 22 Connexion à l'aide du câble audio 22 Retard aff.
    [Show full text]
  • 3D Animation
    Contents Zoom In Zoom Out For navigation instructions please click here Search Issue Next Page ComputerINNOVATIONS IN VISUAL COMPUTING FOR THE GLOBAL DCC COMMUNITY June 2007 www.cgw.com WORLD Making Waves Digital artists create ‘pretend spontaneity’ in the documentary-style animation Surf’s Up $4.95 USA $6.50 Canada Contents Zoom In Zoom Out For navigation instructions please click here Search Issue Next Page A CW Previous Page Contents Zoom In Zoom Out Front Cover Search Issue Next Page BEF MaGS _____________________________________________________ A CW Previous Page Contents Zoom In Zoom Out Front Cover Search Issue Next Page BEF MaGS A CW Previous Page Contents Zoom In Zoom Out Front Cover Search Issue Next Page BEF MaGS June 2007 • Volume 30 • Number 6 INNOVATIONS IN VISUAL COMPUTING FOR THE GLOBAL DCC COMMUNITY Also see www.cgw.com for computer graphics news, special surveys and reports, and the online gallery. ____________ » Director Luc Besson discusses Computer WORLD his black-and-white fi lm, WORLD Post Angel-A. » Trends in broadcast design. » Getting the most out of canned music and sound. See it in www.postmagazine.com Features Cover story Radical, Dude 12 3D ANIMATION | In one of the most unusual animated features to hit the Departments screen, Surf’s Up incorporates a documentary fi lming style into the Editor’s Note 2 CG medium. Triple the Fun Summer blockbusters are making their By Barbara Robertson debut at theaters, and this year, it is Wrangling Waves 18 apparent that three’s a charm, as ani- 3D ANIMATION | The visual effects mators upped the graphics ante in 12 supervisor on Surf’s Up takes us on an Spider-Man 3, Shrek 3, and At World’s incredible behind-the-scenes journey End.
    [Show full text]
  • High-Performance Reconfigurable Computing
    High-Performance Reconfigurable Computing Tarek El-Ghazawi Director, Institute for Massively Parallel Applications and Computing Technology (IMPACT) Co-Director, NSF Center for High-Performance Reconfigurable Computing (CHREC) The George Washington University ICFPT07 12/11/07 1 Acknowledgements ARSC, AMI, Cray, DoD, HPTi, NASA, NSF/CHREC, SGI, SRC, Star Bridge, Xtreme Data, many others ICFPT07 12/11/07 2 1 Outline Architectures and Systems Tools and Programming Applications Performance Wrap-up ICFPT07 12/11/07 3 Reconfigurable Supercomputing (RSC) Efficient high performance computing using parallel and distributed systems of both reconfigurable hardware resources and conventional microprocessors This tutorial establishes the current status, the direction taken, and the potential for RSC ICFPT07 12/11/07 4 2 Top 500 Supercomputers Rank Site Computer Processors Year Rmax Rpeak eServer Blue DOE/NNSA/LLNL Gene Solution 1 United States 212992 2007 478200 596378 IBM Forschungszentrum Blue Gene/P 2 Juelich (FZJ) Solution 65536 2007 167300 222822 Germany IBM SGI/New Mexico SGI Altix ICE Computing Applications 8200, Xeon quad 3 Center (NMCAC) core 3.0 GHz 14336 2007 126900 172032 United States SGI Cluster Platform Computational Research 3000 BL460c, Laboratories, TATA Xeon 53xx 3GHz, 4 SONS 14240 2007 117900 170880 Infiniband India HP Cluster Platform 3000 BL460c, Government Agency Xeon 53xx 5 Sweden 2.66GHz, 13728 2007 102800 146430 Infiniband HP ICFPT07 12/11/07 5 Reconfigurable Computers The microchip that rewires itself Scientific American – June 1997 0 Computers that modify their hardware circuits as they operate are opening a new era in computer design. 0 Reconfigurable computers architecture is based on FPGAs (Field Programmable Gate Arrays) Source: [Sci97] ICFPT07 12/11/07 6 3 Execution Model for HPRCs μP •Transfer of Control •Input Data RP PC •Output Data Piplines, Systolic Arrays, SIMD, ..
    [Show full text]
  • Time Complexity Parallel Local Binary Pattern Feature Extractor on a Graphical Processing Unit
    ICIC Express Letters ICIC International ⃝c 2019 ISSN 1881-803X Volume 13, Number 9, September 2019 pp. 867{874 θ(1) TIME COMPLEXITY PARALLEL LOCAL BINARY PATTERN FEATURE EXTRACTOR ON A GRAPHICAL PROCESSING UNIT Ashwath Rao Badanidiyoor and Gopalakrishna Kini Naravi Department of Computer Science and Engineering Manipal Institute of Technology Manipal Academy of Higher Education Manipal, Karnataka 576104, India f ashwath.rao; ng.kini [email protected] Received February 2019; accepted May 2019 Abstract. Local Binary Pattern (LBP) feature is used widely as a texture feature in object recognition, face recognition, real-time recognitions, etc. in an image. LBP feature extraction time is crucial in many real-time applications. To accelerate feature extrac- tion time, in many previous works researchers have used CPU-GPU combination. In this work, we provide a θ(1) time complexity implementation for determining the Local Binary Pattern (LBP) features of an image. This is possible by employing the full capa- bility of a GPU. The implementation is tested on LISS medical images. Keywords: Local binary pattern, Medical image processing, Parallel algorithms, Graph- ical processing unit, CUDA 1. Introduction. Local binary pattern is a visual descriptor proposed in 1990 by Wang and He [1, 2]. Local binary pattern provides a distribution of intensity around a center pixel. It is a non-parametric visual descriptor helpful in describing a distribution around a center value. Since digital images are distributions of intensity, it is helpful in describing an image. The pattern is widely used in texture analysis, object recognition, and image description. The local binary pattern is used widely in real-time description, and analysis of objects in images and videos, owing to its computational efficiency and computational simplicity.
    [Show full text]
  • Readthedocs-Breathe Documentation Release 1.0.0
    ReadTheDocs-Breathe Documentation Release 1.0.0 Thomas Edvalson Feb 06, 2019 Contents 1 Going to 11: Amping Up the Programming-Language Run-Time Foundation3 2 Solid Compilation Foundation and Language Support5 2.1 Quick Start Guide............................................5 2.1.1 Current Release Notes.....................................5 2.1.2 Installation Guide........................................5 2.1.3 Programming Guide......................................6 2.1.4 ROCm GPU Tunning Guides..................................7 2.1.5 GCN ISA Manuals.......................................7 2.1.6 ROCm API References.....................................7 2.1.7 ROCm Tools..........................................8 2.1.8 ROCm Libraries........................................9 2.1.9 ROCm Compiler SDK..................................... 10 2.1.10 ROCm System Management.................................. 10 2.1.11 ROCm Virtualization & Containers.............................. 10 2.1.12 Remote Device Programming................................. 11 2.1.13 Deep Learning on ROCm.................................... 11 2.1.14 System Level Debug...................................... 11 2.1.15 Tutorial............................................. 11 2.1.16 ROCm Glossary......................................... 12 2.2 Current Release Notes.......................................... 12 2.2.1 New features and enhancements in ROCm 2.1......................... 12 2.2.1.1 RocTracer v1.0 preview release – ‘rocprof’ HSA runtime tracing and statistics sup- port
    [Show full text]
  • Graviton: Trusted Execution Environments on Gpus
    In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18) Graviton: Trusted Execution Environments on GPUs Stavros Volos Kapil Vaswani Rodrigo Bruno Microsoft Research Microsoft Research INESC-ID / IST, University of Lisbon Abstract tors. This limitation gives rise to an undesirable trade-off between security and performance. We propose Graviton, an architecture for supporting There are several reasons why adding TEE support trusted execution environments on GPUs. Graviton en- to accelerators is challenging. With most accelerators, ables applications to offload security- and performance- a device driver is responsible for managing device re- sensitive kernels and data to a GPU, and execute kernels sources (e.g., device memory) and has complete control in isolation from other code running on the GPU and all over the device. Furthermore, high-throughput accelera- software on the host, including the device driver, the op- tors (e.g., GPUs) achieve high performance by integrat- erating system, and the hypervisor. Graviton can be in- ing a large number of cores, and using high bandwidth tegrated into existing GPUs with relatively low hardware memory to satisfy their massive bandwidth requirements complexity; all changes are restricted to peripheral com- [4, 11]. Any major change in the cores, memory man- ponents, such as the GPU’s command processor, with agement unit, or the memory controller can result in no changes to existing CPUs, GPU cores, or the GPU’s unacceptably large overheads. For instance, providing MMU and memory controller. We also propose exten- memory confidentiality and integrity via an encryption sions to the CUDA runtime for securely copying data engine and Merkle tree will significantly impact avail- and executing kernels on the GPU.
    [Show full text]
  • Graviton: Trusted Execution Environments on Gpus
    Graviton: Trusted Execution Environments on GPUs Stavros Volos and Kapil Vaswani, Microsoft Research; Rodrigo Bruno, INESC-ID / IST, University of Lisbon https://www.usenix.org/conference/osdi18/presentation/volos This paper is included in the Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’18). October 8–10, 2018 • Carlsbad, CA, USA ISBN 978-1-939133-08-3 Open access to the Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX. Graviton: Trusted Execution Environments on GPUs Stavros Volos Kapil Vaswani Rodrigo Bruno Microsoft Research Microsoft Research INESC-ID / IST, University of Lisbon Abstract tors. This limitation gives rise to an undesirable trade-off between security and performance. We propose Graviton, an architecture for supporting There are several reasons why adding TEE support trusted execution environments on GPUs. Graviton en- to accelerators is challenging. With most accelerators, ables applications to offload security- and performance- a device driver is responsible for managing device re- sensitive kernels and data to a GPU, and execute kernels sources (e.g., device memory) and has complete control in isolation from other code running on the GPU and all over the device. Furthermore, high-throughput accelera- software on the host, including the device driver, the op- tors (e.g., GPUs) achieve high performance by integrat- erating system, and the hypervisor. Graviton can be in- ing a large number of cores, and using high bandwidth tegrated into existing GPUs with relatively low hardware memory to satisfy their massive bandwidth requirements complexity; all changes are restricted to peripheral com- [4, 11].
    [Show full text]
  • Amd Drivers Windows 10 64 Bit Download AMD Radeon Adrenalin 2021 Edition Graphics Driver 21.4.1
    amd drivers windows 10 64 bit download AMD Radeon Adrenalin 2021 Edition Graphics Driver 21.4.1. Unleash the powerful performance and innovation built into Radeon Graphics through an intuitive and beautiful UI for both PCs and mobile devices. Download. What's New. Specs. Related Drivers 10. Windows 10 64-bit (21.4.1) Windows 7 64-bit (21.4.1) Windows 10 32-bit (18.5.1) Windows 7 32-bit (18.5.1) Windows 8 64-bit (17.4.4) Create, capture, and share your remarkable moments. Effortlessly boost performance and efficiency. Experience Radeon Software with industry- leading user satisfaction, rigorously-tested stability, comprehensive certification, and more. It might also interest you to download the new AMD Link App for Android, which allows you to conveniently access gameplay performance metrics and PC system info on your smartphone and/or tablet. Note to Windows 8 users: Beginning with the release of driver version 17.4.4, AMD will not be releasing newer drivers with support for Windows 8. What's New: Support For. AMD Link A brand-new AMD Link for Windows client is now available that allows you to stream your games and desktop to other Radeon graphics enabled PCs. New "Link Game" feature that allows you to easily connect with a friend to play games together on a single PC or even help them troubleshoot a PC issue or problem. Redesigned streaming technology for better visuals and lower latency. New quality of service feature that dynamically adjusts your streaming settings based on your internet connection. Now supports up to 4k/144fps streaming.
    [Show full text]