Gpu: Platform and Programming Georgian-German Science Bridge Qualistartup

Total Page:16

File Type:pdf, Size:1020Kb

Gpu: Platform and Programming Georgian-German Science Bridge Qualistartup GPU: PLATFORM AND PROGRAMMING GEORGIAN-GERMAN SCIENCE BRIDGE QUALISTARTUP 12 September 2019 Andreas Herten Forschungszentrum Jülich Handout Version Member of the Helmholtz Association About, Outline Topics Motivation Platform Hardware Features High Throughput Summary Vendor Comparison Programming GPUs Libraries Jülich Supercomputing Centre About GPU Programming Operation of supercomputers Directives Application support Languages Research Abstraction Libraries/DSL Me: All things GPU Tools Slides: http://bit.ly/ggsb-gpu Member of the Helmholtz Association 12 September 2019 Slide 1 64 Status Quo A short but parallel story 1999 Graphics computation pipeline implemented in dedicated graphics hardware Computations using OpenGL graphics library [2] »GPU« coined by NVIDIA [3] 2001 NVIDIA GeForce 3 with programmable shaders (instead of fixed pipeline) and floating-point support; 2003: DirectX 9 at ATI 2007 CUDA 2009 OpenCL 2019 Top 500: 25 % with NVIDIA GPUs (#1, #2) [4], Green 500: 8 of top 10 with GPUs [5] 2021 Aurora: First (?) US exascale supercomputer based on Intel GPUs Frontier: First (?) US more-than-exascale supercomputer based on AMD GPUs Theoretical Peak Performance, Double Precision 104 MI60 Tesla P100 Tesla V100 Status Quo FirePro W9100 FirePro S9150 A short but parallel story Platinum 9282 Tesla K40 Tesla K20X Tesla K40 Xeon Phi 7290 (KNL) Tesla K20 Platinum 8180 3 10 Xeon Phi 7120 (KNC) HD 6970 HD 6970 HD 5870 HD 8970 HD 7970 GHz Ed. E5-2699 v4 GFLOP/sec MI25 Tesla M2090 E5-2699 v3 E5-2699 v3 HD 4870 Tesla C2050 INTEL Xeon CPUs HD 3870 E5-2697 v2 Tesla C1060 2 E5-2690 10 Tesla C1060 NVIDIA Tesla GPUs AMD Radeon GPUs X5680 X5690 INTEL Xeon Phis X5482 X5492 W5590 2008 2010 2012 2014 2016 2018 End of Year ] 6 Graphic: Rupp [ Theoretical Peak Memory Bandwidth Comparison 103 Tesla V100 Status Quo Tesla P100 MI60 Peak performance double precision FirePro W9100 FirePro S9150 HD 8970 HD 7970 GHz Ed. Xeon PhiMI25 7290 (KNL) Tesla K40 HD 6970 HD 6970 HD 5870 Xeon Phi 7120 (KNC) Tesla K20X HD 4870 Tesla K20 Tesla M2090 Platinum 9282 102 GB/sec Tesla C2050 HD 3870 Platinum 8180 Tesla C1060 Tesla C1060 E5-2699 v4 E5-2699 v3 E5-2699 v3 INTEL Xeon CPUs E5-2697 v2 E5-2690 NVIDIA Tesla GPUs W5590 X5680 X5690 X5482 X5492 AMD Radeon GPUs INTEL Xeon Phis 101 2008 2010 2012 2014 2016 2018 End of Year ] 6 Graphic: Rupp [ Status Quo Peak memory bandwidth JURECA – Jülich’s Multi-Purpose Supercomputer 1872 nodes with Intel Xeon E5 CPUs (2 × 12 cores) 75 nodes with 2 NVIDIA Tesla K80 cards (look like 4 GPUs) JURECA Booster: 1640 nodes with Intel Xeon Phi Knights Landing 1:8 (CPU) + 0:44 (GPU) + 5 (KNL)PFLOP=s peak performance Mellanox EDR InfiniBand Next: Booster! JUWELS – Jülich’s New Scalable System 2500 nodes with Intel Xeon CPUs (2 × 24 cores) 48 nodes with 4 NVIDIA Tesla V100 cards 10:4 (CPU) + 1:6 (GPU)PFLOP=s peak performance Platform CPU vs. GPU A matter of specialties ] 8 ] and Shearings Holidays [ 7 Graphics: Lee [ Transporting one Transporting many Member of the Helmholtz Association 12 September 2019 Slide 6 64 CPU vs. GPU Chip ALU ALU Control ALUALU Cache DRAM DRAM Member of the Helmholtz Association 12 September 2019 Slide 6 64 ! High Throughput GPU Architecture Overview Aim: Hide Latency Everything else follows SIMT Asynchronicity Memory Member of the Helmholtz Association 12 September 2019 Slide 7 64 ! High Throughput GPU Architecture Overview Aim: Hide Latency Everything else follows SIMT Asynchronicity Memory Member of the Helmholtz Association 12 September 2019 Slide 7 64 Memory GPU memory ain’t no CPU memory Host Unified Virtual Addressing GPU: accelerator / extension card ! Separate device fromALU CPU ALU SeparateControl memory, but UVA ALUALU PCIe 3 Memory transfers need special consideration! <16 GB=s Do as little as possible! Formerly: Explicitly copy data to/from GPU Now: Done automaticallyCache (performance…?) P100 V100 16 GB RAM,DRAM 720 GB=s 32 GB RAM, 900 GB=s DRAM Device Member of the Helmholtz Association 12 September 2019 Slide 8 64 Memory GPU memory ain’t no CPU memory Host Unified Memory GPU: accelerator / extension card ! Separate device fromALU CPU ALU SeparateControl memory, but UVA and UM ALUALU NVLink Memory transfers need special consideration! ≈80 GB=s Do as little as possible! Formerly: Explicitly copy data to/from GPU Cache Now: Done automatically (performance…?) HBM2 <900 GB=s P100 V100 16 GB RAM,DRAM 720 GB=s 32 GB RAM, 900 GB=s DRAM Device Member of the Helmholtz Association 12 September 2019 Slide 8 64 3 Transfer results back to host memory Processing Flow Scheduler CPU ! GPU ! CPU CPU ::: CPU Memory Interconnect L2 1 Transfer data from CPU memory to GPU memory, transfer program 2 Load GPU program, execute on SMs, get (cached) data from memory; write back DRAM Member of the Helmholtz Association 12 September 2019 Slide 9 64 3 Transfer results back to host memory Processing Flow Scheduler CPU ! GPU ! CPU CPU ::: CPU Memory Interconnect L2 1 Transfer data from CPU memory to GPU memory, transfer program 2 Load GPU program, execute on SMs, get (cached) data from memory; write back DRAM Member of the Helmholtz Association 12 September 2019 Slide 9 64 Processing Flow Scheduler CPU ! GPU ! CPU CPU ::: CPU Memory Interconnect L2 1 Transfer data from CPU memory to GPU memory, transfer program 2 Load GPU program, execute on SMs, get (cached) data from memory; write back DRAM 3 Transfer results back to host memory Member of the Helmholtz Association 12 September 2019 Slide 9 64 ! High Throughput GPU Architecture Overview Aim: Hide Latency Everything else follows SIMT Asynchronicity Memory Member of the Helmholtz Association 12 September 2019 Slide 10 64 Async Following different streams Problem: Memory transfer is comparably slow Solution: Do something else in meantime (computation)! ! Overlap tasks Copy and compute engines run separately (streams) Copy Compute Copy Compute Copy Compute Copy Compute GPU needs to be fed: Schedule many computations CPU can do other work while GPU computes; synchronization Also: Fast switching of contexts to keep GPU busy (KGB) Member of the Helmholtz Association 12 September 2019 Slide 11 64 ! High Throughput GPU Architecture Overview Aim: Hide Latency Everything else follows SIMT Asynchronicity Memory Member of the Helmholtz Association 12 September 2019 Slide 12 64 SIMT Vector Of threads and warps A0 B0 C0 A1 B1 C1 + = A2 B2 C2 CPU: A3 B3 C3 Single Instruction, Multiple Data (SIMD) SMT Simultaneous Multithreading (SMT) Thread GPU: Single Instruction, Multiple Threads (SIMT) Core Core CPU core u GPU multiprocessor (SM) Thread Working unit: set of threads (32, a warp) Core Core Fast switching of threads (large register file) ! hide latency Branching if SIMT Member of the Helmholtz Association 12 September 2019 Slide 13 64 SIMT Vector Of threads and warps A0 B0 C0 A1 B1 C1 + = A2 B2 C2 CPU: A3 B3 C3 Single Instruction, Multiple Data (SIMD) ] SMT 9 Simultaneous Multithreading (SMT) Thread GPU: Single Instruction, Multiple Threads (SIMT) Core Core CPU core u GPU multiprocessor (SM) Thread Working unit: set of threads (32, a warp) Core Core Fast switching of threads (large register file) ! hide latency Graphics: Nvidia Corporation [ Branching if SIMT Tesla V100 Member of the Helmholtz Association 12 September 2019 Slide 13 64 Multiprocessor SIMT Vector Of threads and warps A0 B0 C0 A1 B1 C1 + = A2 B2 C2 CPU: A3 B3 C3 Single Instruction, Multiple Data (SIMD) ] SMT 9 Simultaneous Multithreading (SMT) TensorCore Thread GPU: Single Instruction, Multiple Threads (SIMT) Core Core CPU core u GPU multiprocessor (SM) Thread Working unit: set of threads (32, a warp) Core Core Fast switching of threads (large register file) ! hide latency Graphics: Nvidia Corporation [ Branching if SIMT Tesla V100 Member of the Helmholtz Association 12 September 2019 Slide 13 64 New: Tensor Cores New in Volta 8 Tensor Cores per Streaming Multiprocessor (SM) (640 total for V100) Performance: 125 TFLOP=s (half precision) Calculate A × B + C = D (4 × 4 matrices; A, B: half precision) ! 64 floating-point FMA operations per clock (mixed precision) × + = FP16 FP16 FP16 FP32 FP16 FP32 FP32 FP32 FP32 Member of the Helmholtz Association 12 September 2019 Slide 14 64 ! High Throughput GPU Architecture Overview Aim: Hide Latency Everything else follows SIMT Asynchronicity Memory Member of the Helmholtz Association 12 September 2019 Slide 15 64 ! High Throughput GPU Architecture Overview Aim: Hide Latency Everything else follows SIMT Asynchronicity Memory Member of the Helmholtz Association 12 September 2019 Slide 15 64 ! High Throughput GPU Architecture Overview Aim: Hide Latency Everything else follows SIMT Asynchronicity Memory Member of the Helmholtz Association 12 September 2019 Slide 15 64 ! High Throughput GPU Architecture Overview Aim: Hide Latency Everything else follows SIMT Asynchronicity Memory Member of the Helmholtz Association 12 September 2019 Slide 15 64 ! High Throughput GPU Architecture Overview Aim: Hide Latency Everything else follows SIMT Asynchronicity Memory Member of the Helmholtz Association 12 September 2019 Slide 15 64 Low Latency vs. High Throughput Maybe GPU’s ultimate feature CPU Minimizes latency within each thread GPU Hides latency with computations from other thread warps CPU Core: Low Latency T1 T2 T3 T4 GPU Streaming Multiprocessor: High Throughput W1 Thread/Warp W2 Processing Context Switch W3 Ready Waiting W4 Member of the Helmholtz Association 12 September 2019 Slide 16 64 CPU vs. GPU Let’s summarize this! Optimized for low latency Optimized for high throughput + Large main
Recommended publications
  • AMD Accelerated Parallel Processing Opencl Programming Guide
    AMD Accelerated Parallel Processing OpenCL Programming Guide November 2013 rev2.7 © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Accelerated Parallel Processing, the AMD Accelerated Parallel Processing logo, ATI, the ATI logo, Radeon, FireStream, FirePro, Catalyst, and combinations thereof are trade- marks of Advanced Micro Devices, Inc. Microsoft, Visual Studio, Windows, and Windows Vista are registered trademarks of Microsoft Corporation in the U.S. and/or other jurisdic- tions. Other names are for informational purposes only and may be trademarks of their respective owners. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. The information contained herein may be of a preliminary or advance nature and is subject to change without notice. No license, whether express, implied, arising by estoppel or other- wise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as compo- nents in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD’s product could create a situation where personal injury, death, or severe property or envi- ronmental damage may occur.
    [Show full text]
  • Comparison of Technologies for General-Purpose Computing on Graphics Processing Units
    Master of Science Thesis in Information Coding Department of Electrical Engineering, Linköping University, 2016 Comparison of Technologies for General-Purpose Computing on Graphics Processing Units Torbjörn Sörman Master of Science Thesis in Information Coding Comparison of Technologies for General-Purpose Computing on Graphics Processing Units Torbjörn Sörman LiTH-ISY-EX–16/4923–SE Supervisor: Robert Forchheimer isy, Linköpings universitet Åsa Detterfelt MindRoad AB Examiner: Ingemar Ragnemalm isy, Linköpings universitet Organisatorisk avdelning Department of Electrical Engineering Linköping University SE-581 83 Linköping, Sweden Copyright © 2016 Torbjörn Sörman Abstract The computational capacity of graphics cards for general-purpose computing have progressed fast over the last decade. A major reason is computational heavy computer games, where standard of performance and high quality graphics con- stantly rise. Another reason is better suitable technologies for programming the graphics cards. Combined, the product is high raw performance devices and means to access that performance. This thesis investigates some of the current technologies for general-purpose computing on graphics processing units. Tech- nologies are primarily compared by means of benchmarking performance and secondarily by factors concerning programming and implementation. The choice of technology can have a large impact on performance. The benchmark applica- tion found the difference in execution time of the fastest technology, CUDA, com- pared to the slowest, OpenCL, to be twice a factor of two. The benchmark applica- tion also found out that the older technologies, OpenGL and DirectX, are compet- itive with CUDA and OpenCL in terms of resulting raw performance. iii Acknowledgments I would like to thank Åsa Detterfelt for the opportunity to make this thesis work at MindRoad AB.
    [Show full text]
  • 3D Animation
    Contents Zoom In Zoom Out For navigation instructions please click here Search Issue Next Page ComputerINNOVATIONS IN VISUAL COMPUTING FOR THE GLOBAL DCC COMMUNITY June 2007 www.cgw.com WORLD Making Waves Digital artists create ‘pretend spontaneity’ in the documentary-style animation Surf’s Up $4.95 USA $6.50 Canada Contents Zoom In Zoom Out For navigation instructions please click here Search Issue Next Page A CW Previous Page Contents Zoom In Zoom Out Front Cover Search Issue Next Page BEF MaGS _____________________________________________________ A CW Previous Page Contents Zoom In Zoom Out Front Cover Search Issue Next Page BEF MaGS A CW Previous Page Contents Zoom In Zoom Out Front Cover Search Issue Next Page BEF MaGS June 2007 • Volume 30 • Number 6 INNOVATIONS IN VISUAL COMPUTING FOR THE GLOBAL DCC COMMUNITY Also see www.cgw.com for computer graphics news, special surveys and reports, and the online gallery. ____________ » Director Luc Besson discusses Computer WORLD his black-and-white fi lm, WORLD Post Angel-A. » Trends in broadcast design. » Getting the most out of canned music and sound. See it in www.postmagazine.com Features Cover story Radical, Dude 12 3D ANIMATION | In one of the most unusual animated features to hit the Departments screen, Surf’s Up incorporates a documentary fi lming style into the Editor’s Note 2 CG medium. Triple the Fun Summer blockbusters are making their By Barbara Robertson debut at theaters, and this year, it is Wrangling Waves 18 apparent that three’s a charm, as ani- 3D ANIMATION | The visual effects mators upped the graphics ante in 12 supervisor on Surf’s Up takes us on an Spider-Man 3, Shrek 3, and At World’s incredible behind-the-scenes journey End.
    [Show full text]
  • Graviton: Trusted Execution Environments on Gpus
    In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18) Graviton: Trusted Execution Environments on GPUs Stavros Volos Kapil Vaswani Rodrigo Bruno Microsoft Research Microsoft Research INESC-ID / IST, University of Lisbon Abstract tors. This limitation gives rise to an undesirable trade-off between security and performance. We propose Graviton, an architecture for supporting There are several reasons why adding TEE support trusted execution environments on GPUs. Graviton en- to accelerators is challenging. With most accelerators, ables applications to offload security- and performance- a device driver is responsible for managing device re- sensitive kernels and data to a GPU, and execute kernels sources (e.g., device memory) and has complete control in isolation from other code running on the GPU and all over the device. Furthermore, high-throughput accelera- software on the host, including the device driver, the op- tors (e.g., GPUs) achieve high performance by integrat- erating system, and the hypervisor. Graviton can be in- ing a large number of cores, and using high bandwidth tegrated into existing GPUs with relatively low hardware memory to satisfy their massive bandwidth requirements complexity; all changes are restricted to peripheral com- [4, 11]. Any major change in the cores, memory man- ponents, such as the GPU’s command processor, with agement unit, or the memory controller can result in no changes to existing CPUs, GPU cores, or the GPU’s unacceptably large overheads. For instance, providing MMU and memory controller. We also propose exten- memory confidentiality and integrity via an encryption sions to the CUDA runtime for securely copying data engine and Merkle tree will significantly impact avail- and executing kernels on the GPU.
    [Show full text]
  • Graviton: Trusted Execution Environments on Gpus
    Graviton: Trusted Execution Environments on GPUs Stavros Volos and Kapil Vaswani, Microsoft Research; Rodrigo Bruno, INESC-ID / IST, University of Lisbon https://www.usenix.org/conference/osdi18/presentation/volos This paper is included in the Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’18). October 8–10, 2018 • Carlsbad, CA, USA ISBN 978-1-939133-08-3 Open access to the Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX. Graviton: Trusted Execution Environments on GPUs Stavros Volos Kapil Vaswani Rodrigo Bruno Microsoft Research Microsoft Research INESC-ID / IST, University of Lisbon Abstract tors. This limitation gives rise to an undesirable trade-off between security and performance. We propose Graviton, an architecture for supporting There are several reasons why adding TEE support trusted execution environments on GPUs. Graviton en- to accelerators is challenging. With most accelerators, ables applications to offload security- and performance- a device driver is responsible for managing device re- sensitive kernels and data to a GPU, and execute kernels sources (e.g., device memory) and has complete control in isolation from other code running on the GPU and all over the device. Furthermore, high-throughput accelera- software on the host, including the device driver, the op- tors (e.g., GPUs) achieve high performance by integrat- erating system, and the hypervisor. Graviton can be in- ing a large number of cores, and using high bandwidth tegrated into existing GPUs with relatively low hardware memory to satisfy their massive bandwidth requirements complexity; all changes are restricted to peripheral com- [4, 11].
    [Show full text]
  • Flexwafe - an Architecture For
    FlexWAFE - an Architecture for Reconfigurable Image Processing Systems FlexWAFE - eine Architektur für rekonfigurierbare-Bildverarbeitungssysteme Von der Fakultät für Elektrotechnik, Informationstechnik, Physik der Technischen Universität Carolo-Wilhelmina zu Braunschweig zur Erlangung der Würde eines Doktor-Ingenieurs (Dr.-Ing.) genehmigte Dissertation von: Eng. Amilcar do Carmo Lucas aus: Almeirim, Portugal eingereicht am: 19.04.2012 mündliche Prüfung am: 23.07.2012 Vorsitzender: Prof. Dr.-Ing. Harald Michalik 1. Referent: Prof. Dr.-Ing. Rolf Ernst 2. Referent: Prof. Dr.-Ing. Holger Blume 2012 Kurzfassung Kürzlich gab es eine Zunahme der Nachfrage nach hochauflösenden digitalen Medieninhalten in den Kino- und Fernsehenindustrien. Derzeit vorhandene Systeme entsprechen nicht den Anforderungen, oder sind zu teuer. Neue Hardware-Systeme und neuer Programmiertechniken sind erforderlich, um den hochauflösenden, hochwertigen, Bildanforderungen zu genügen und Kosten zu verringern. Die Industrie sucht eine flexible Architektur zur Ausführung mehrerer Anwendungen auf Standard-Komponenten, mit reduzierten Entwicklungszeiten. Bis jetzt ist gängige Praxis, spezialisierten Architektur und Systeme zu entwickeln, die eine einzelne Anwendung zielen. Dieses hat wenig Flexibilität und führt zu hohe Entwicklungs- kosten, jede neue Anwendung ist fast von Grund auf neu konzipiert. Unser Fokus war es, eine für Bild Verarbeitung geeignet Architektur zu entwickeln dass die Flexibilität hat mehrere Anwendungen an dieselbe FPGA-basierte Hardware-Plattform zu laufen. Die Neuheit in unserem Ansatz ist, dass wir Teile der Architektur zur Laufzeit rekon- figurieren, aber, ohne das Zeit und constraints strafe von FPGA Partielle-Rekonfiguration- Techniken. Die Architektur verwendet eine hierarchische Kontrollstruktur, die zur paral- lel Verarbeitung gut geeignet ist, und Single-Cycle-Latenz Rekonfiguration von Teilen der Verarbeitungs-Pipeline ermöglicht. Dieses wird unter Verwendung relativ weniger Ressour- cen für die verteiltes Steuerung Strukturen erzielt.
    [Show full text]
  • Changelog for All of You Interested in the Changes Between Specific Driver Releases Please See the Details Below
    Changelog For all of you interested in the changes between specific driver releases please see the details below. Details for older releases of mXDriver are not contained in this changelog. If you wanna now the details AMD made in even more detail you are free to look at their website. They also show Known Issues which will not be shown here. mXDriver 17.6.1 updated mXDriver to Radeon Crimson ReLive Edition 17.6.1 improved support for Prey® and DiRT 4™ Fixed Issues o Virtual Super Resolution may fail to enable on some Radeon RX 400 and Radeon RX 500 series graphics products. o HDR may fail to enable on some displays for QHD or higher resolutions. o Flickering may be observed on some Radeon RX 500 series products when using HDMI® with QHD high refresh rate displays. o AMD XConnect: systems with Modern Standby enabled may experience a system hang after resuming from hibernation. o Fast mouse movement may cause an FPS drop or stutter in Prey® when running in Multi GPU system configurations. o Adjusting memory clocks in some third party overclocking applications may cause a hang on Radeon R9 390 Series products. o Graphics memory clock may fluctuate causing inconsistent frame rates while gaming when using AMD FreeSync technology. o Mass Effect™: Andromeda may experience stutter or hitching in Multi GPU system configurations. o The GPU Scaling feature in Radeon Settings may fail to enable for some applications. o Error message “Radeon Additional Settings: Host application has stopped working” pop up will sometimes appear when hot plugging displays with Radeon Settings open.
    [Show full text]
  • Porting and Maintaining a Developing Optix Path Tracer for Non-NVIDIA Devices
    Master Thesis Porting and Maintaining a Developing OptiX Path Tracer for Non-NVIDIA Devices Supervisors: Dr. ing. Jacco Bikker, Dr. Zerrin Yumak Satwiko Wirawan Indrawanto ICA-6201539 Faculty of Science Department of Informatics and Computing Science 13 August 2019 Contents 1. Introduction ............................................................................................................................................ 4 1.1 Objectives ..................................................................................................................................... 4 1.2 Research Questions ...................................................................................................................... 4 1.3 Content Overview ........................................................................................................................ 4 2. Preliminaries .......................................................................................................................................... 6 2.1 Rendering ..................................................................................................................................... 6 2.1.1 Rendering Equation ................................................................................................................. 6 2.1.2 Path Tracing ............................................................................................................................. 7 2.1.3 Performance ............................................................................................................................
    [Show full text]
  • HP COMMERCIAL CLIENT SOLUTIONS POWERED by AMD a Resource Guide for Cios and IT Professionals
    HP COMMERCIAL CLIENT SOLUTIONS POWERED BY AMD A resource guide for CIOs and IT professionals www.amd.com OUR HERITAGE 2016 2018 World's first $199 First 7nm 2011 VR GPU solution datacenter GPU World's first APU's 2000 2006 First to break First to break teraFLOPS the historic performance barrier 1GHz barrier AMD Ryzen Inside every major Technology gaming console 2017 2013 World's first x 86-64 First to break 1GHz bit architecture GPU barrier 2003 2009 HP & AMD ARE DESIGNED FOR SUCCESS RELIABILITY QUALITY COMPATABILITY VALUE HP & AMD have worked HP AMD Elite desktops Series-agnostic chassis Saving on your PC together consistently & notebooks are built design means all enables you to further for 15 years and continue from the ground up with AMD-based desktop upgrade your experi- to strengthen the port- the same rigorous test- products have complete ence. folio of solutions with ing and attention to compatibility with HP’s gen/genproductivity detail as the rest of the accessories, including improvements in perfor- HP Elite products. This our robust Desktop Mini mance and reliability. includes HP’s ecosystem. Elite-grade security and management support. Boldly Confident in Recommending AMD-Based Products. THE BALANCE OF “ZEN” The world’s top companies trust in AMD. Innovation is AMD Ryzen™ PRO Processors with Radeon™ VegaOer at the core of everything we do and products like the powerful and ecient enterprise perfomance along- AMD Ryzen™ PRO processors demonstrate our drive side7 Graphics state-of-the-art security features, man- to push technology forward with new architecture ageability, and reliability on all processor models.
    [Show full text]
  • AMD APP SDK Opencl Optimization Guide
    AMD APP SDK OpenCL Optimization Guide August 2015 rev1.0 © 2015 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Accelerated Parallel Processing, the AMD Accelerated Parallel Processing logo, ATI, the ATI logo, Radeon, FireStream, FirePro, Catalyst, and combinations thereof are trade- marks of Advanced Micro Devices, Inc. Microsoft, Visual Studio, Windows, and Windows Vista are registered trademarks of Microsoft Corporation in the U.S. and/or other jurisdic- tions. Other names are for informational purposes only and may be trademarks of their respective owners. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. The information contained herein may be of a preliminary or advance nature and is subject to change without notice. No license, whether express, implied, arising by estoppel or other- wise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as compo- nents in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD’s product could create a situation where personal injury, death, or severe property or envi- ronmental damage may occur.
    [Show full text]
  • Readthedocs-Breathe Documentation Release 1.0.0
    ReadTheDocs-Breathe Documentation Release 1.0.0 Thomas Edvalson Sep 12, 2019 CONTENTS 1 Going to 11: Amping Up the Programming-Language Run-Time Foundation3 2 Solid Compilation Foundation and Language Support5 2.1 Quick Start Guide............................................5 2.1.1 Current Release Notes.....................................5 2.1.2 Installation Guide........................................5 2.1.3 Programming Guide......................................6 2.1.4 ROCm GPU Tunning Guides..................................6 2.1.5 GCN ISA Manuals.......................................7 2.1.6 ROCm API References.....................................7 2.1.7 ROCm Tools..........................................8 2.1.8 ROCm Libraries........................................9 2.1.9 ROCm Compiler SDK..................................... 10 2.1.10 ROCm System Management.................................. 11 2.1.11 ROCm Virtualization & Containers.............................. 11 2.1.12 Remote Device Programming................................. 11 2.1.13 Deep Learning on ROCm.................................... 12 2.1.14 System Level Debug...................................... 12 2.1.15 Tutorial............................................. 12 2.1.16 ROCm Glossary......................................... 12 2.2 Current Release Notes.......................................... 12 2.2.1 Hotfix release ROCm 2.7.1................................... 12 2.2.1.1 Defect fixed in ROCm 2.7.1.............................. 12 2.2.1.2 Upgrading from ROCm 2.7
    [Show full text]
  • AMD Unveils RDNA 2-Based Mobile Graphics, New AMD Advantage Laptops, Broadly Compatible Upscaling Technology and More at Computex 2021
    May 31, 2021 AMD Unveils RDNA 2-Based Mobile Graphics, New AMD Advantage Laptops, Broadly Compatible Upscaling Technology and More at Computex 2021 – AMD Radeon RX 6000M Series Mobile Graphics provide a generational performance leap of up to 1.5X, powering the next generation of premium gaming laptops from ASUS, HP, Lenovo, MSI and other leading OEMs – – Open-source, cross-platform AMD FidelityFX Super Resolution leverages optimized spatial upscaling technology, delivering up to 2.5X higher performance than native resolution gaming in select titles – TAIPEI, Taiwan, May 31, 2021 (GLOBE NEWSWIRE) -- Today at Computex 2021 AMD (NASDAQ: AMD) introduced several powerful new solutions that take high-performance gaming to new levels. Designed to bring world-class performance, incredible visual fidelity and immersive experiences to gaming laptops, the new AMD Radeon™ RX 6000M Series Mobile Graphics include the top-of-stack Radeon RX 6800M – the fastest AMD Radeon GPU for laptops1, delivering desktop-class performance2 to power ultra-high frame rate 1440p gaming anywhere. AMD also introduced the AMD Advantage™ Design Framework, the result of a multi-year collaboration between AMD and its global PC partners to deliver the next generation of premium, high-performance gaming laptops. Combining AMD Radeon RX 6000M Series Mobile Graphics, AMD Radeon Software and AMD Ryzen™ 5000 Series Mobile Processors with exclusive AMD smart technologies and other advanced system design characteristics, AMD Advantage systems are designed to deliver best-in-class gaming experiences. The first AMD Advantage laptops are expected to be available from leading OEMs beginning this month. In addition, AMD unveiled AMD FidelityFX Super Resolution (FSR), a cutting-edge spatial upscaling technology designed to boost framerates up to 2.5X in select titles at 4K resolution3 and deliver a high-quality, high-resolution gaming experience.
    [Show full text]