Contributions of Hybrid Architectures to Depth Imaging: a CPU, APU and GPU Comparative Study

Total Page:16

File Type:pdf, Size:1020Kb

Contributions of Hybrid Architectures to Depth Imaging: a CPU, APU and GPU Comparative Study Contributions of hybrid architectures to depth imaging : a CPU, APU and GPU comparative study Issam Said To cite this version: Issam Said. Contributions of hybrid architectures to depth imaging : a CPU, APU and GPU com- parative study. Hardware Architecture [cs.AR]. Université Pierre et Marie Curie - Paris VI, 2015. English. NNT : 2015PA066531. tel-01248522v2 HAL Id: tel-01248522 https://tel.archives-ouvertes.fr/tel-01248522v2 Submitted on 20 May 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESE` DE DOCTORAT DE l’UNIVERSITE´ PIERRE ET MARIE CURIE sp´ecialit´e Informatique Ecole´ doctorale Informatique, T´el´ecommunications et Electronique´ (Paris) pr´esent´eeet soutenue publiquement par Issam SAID pour obtenir le grade de DOCTEUR en SCIENCES de l’UNIVERSITE´ PIERRE ET MARIE CURIE Apports des architectures hybrides `a l’imagerie profondeur : ´etude comparative entre CPU, APU et GPU Th`esedirig´eepar Jean-Luc Lamotte et Pierre Fortin soutenue le Lundi 21 D´ecembre 2015 apr`es avis des rapporteurs M. Fran¸cois Bodin Professeur, Universit´ede Rennes 1 M. Christophe Calvin Chef de projet, CEA devant le jury compos´ede M. Fran¸cois Bodin Professeur, Universit´ede Rennes 1 M. Henri Calandra Expert en imagerie profondeur et calcul haute performance, Total M. Christophe Calvin Chef de projet, CEA M. Pierre Fortin Maˆıtrede conf´erences, Universit´ePierre et Marie Curie M. Lionel Lacassagne Professeur, Universit´ePierre et Marie Curie M. Jean-Luc Lamotte Professeur, Universit´ePierre et Marie Curie M. Mike Mantor Senior Fellow Architect, AMD M. St´ephane Vialle Professeur, SUPELEC campus de Metz “If I have seen further it is by standing on the shoulders of Giants.” — Isaac Newton Acknowledgements Foremost, I would like to express my deepest thanks to my two directors, Pr. Jean-Luc Lamotte and Dr. Pierre Fortin. Their patience, encouragement, and immense knowledge were key motivations throughout my Ph.D. I thank them for having instructed me, for having guided me to compose this thesis, and for having steered me to the end. I thank them for having cheerfully assisted me in moments of frustration and of doubt. I would like to gratefully thank Henri Calandra for his trust, valuable recommendations, insights and useful discussions. He accompanied me throughout this adventure, I look forward to the next one. I have been privileged and honored to have had my work been reviewed by Pr. Fran¸cois Bodin and Dr. Christophe Calvin. I am indebted to them for having taken the time out of their busy schedules, and for having spent efforts to go through, to give insightful comments, and to correct the ideas shared in this dissertation. I also thank the rest of the committee members: Pr. Lionel Lacassagne, M. Mike Mantor and Pr. St´ephaneVialle for having accepted to examine this dissertation and to give valuable feedbacks. This work was a close partnership between Total, AMD, CAPS and Lip6. I would like to extend my gratitude to Total for funding this project. AMD is acknowledged for providing the hardware which was used at the heart of this work. CAPS is kindly acknowledged for the technical support and the sophisticated skills. Immeasurable appreciation and deepest gratitude for the help and support are ex- tended to the following persons, who in one way or another, have contributed in making this study possible. Bruno Conche for his endless support and logistic help, and more importantly for his energy that he did not spare in order to put this project together, without him it would not have been a success. Bruno Stefanizzi for having pointed me to the right way whenever I had questions or needed support from AMD. Laurent Morichetti, Joshua Mora and Robert Engel, each of whom had made each journey of mine to the AMD Sunnyvale campus, a pleasant experience. I thank them for sharing their experiences with me, and for finding answers to all of my questions. Greg Stoner and Gregory Rodgers, for the helpful discussions and the valuable information about the AMD hardware and software stack roadmaps. Terrence Liao and Rached Abdelkhalek, for their precious advices and for the brain storming sessions during my trips to Pau, France. Harry Zong, Jing Wen, Donny Cooper, Matthew Bettinger and Russell Jones for their precious help in setting up a remote work environment. Romain Dolbeau, for having put his rich technical expertise at my disposal. My fellow colleagues in the PE- QUAN team with whom I have shared memorable moments throughout this experience. The administrative staff of the PEQUAN team is also kindly acknowledged for having taken care of my professional trips. On a more personal note, words cannot express my gratitude for my parents, Khalifa and Moufida, the reason of what I become today thank you for your great support and continuous care. I profusely thank my brothers Bilel and Zied for being there for me no matter what. Asma, thank you for being supportive and for the great moments we spent together discussing this project. Rached, thank you for your hospitality, for the time, for the trips and for the laughs and fears we lived together. Saber, thank you immensely for your wise thoughtful advices, and for the numerous funny moments we spent together. I thank my cousins and friends (Khaled, Yasser, Rafik, Haithem, Fethi, Mohammed, Elkhal, John, Erick, Nefili, Layla, Justyna, Chris, Daniel, Jeaven, Binomi, Sahma, Saif, Sana, Sari, Hanen, and the list goes on) for the precious moments of joy that were much needed during this journey. Moktar and Souad, I owe you a deep sense of gratitude for you unconditional love and attention. Ludmila, Igor, Anna, Beji, and Lorita thank you for your never ending support, and more importantly for the initiation to the Russian “banya”. To my wonderful wife Tatjana, whose sacrificial care for me, her quiet patience, her tolerance to my occasional vulgar moods, and her unwavering love made it possible for me to finish this work, I express my genuine appreciation. Finally, I thank my son Arsen, a treasure from the Lord that was offered to me and to my wife in the middle of this adventure, and who was the source of my inspiration and of my greatest happiness. vi Contents Contents vi 1 Introduction 1 I State of the art 5 2 Geophysics and seismic applications 7 2.1 Introduction to seismic exploration ...................... 8 2.1.1 Seismic acquisition ........................... 9 2.1.2 Seismic processing ........................... 11 2.1.3 Seismic interpretation ......................... 14 2.2 Seismic migrations and Reverse Time Migration (RTM) . 14 2.2.1 Description and overview of migration methods . 14 2.2.2 Reverse Time Migration ........................ 16 2.3 Numerical methods for the wave propagation phenomena . 18 2.3.1 The wave equation ........................... 19 2.3.1.1 Seismic waves and propagation media . 19 2.3.1.2 The elastic wave equation . 19 2.3.1.3 The acoustic wave equation . 21 2.3.2 Numerical methods for wave propagation . 22 2.3.2.1 Integral methods ....................... 22 2.3.2.2 Asymptotic methods ..................... 23 2.3.2.3 Direct methods ........................ 23 2.3.2.3.1 Pseudo-Spectral Methods . 23 2.3.2.3.2 Finite Difference Methods . 23 2.3.2.3.3 Finite Element Methods . 24 2.3.3 Application to the acoustic wave equation . 25 2.3.3.1 Numerical approximation . 25 2.3.3.2 Stability analysis and CFL . 27 2.3.3.3 Boundary conditions ..................... 28 3 High performance computing 29 3.1 Overview of HPC hardware architectures . 30 3.1.1 Central Processing Unit: more and more cores . 30 3.1.2 Hardware accelerators: the other chips for computing . 33 3.1.3 Towards the fusion of CPUs and accelerators: the emergence of the Accelerated Processing Unit ................... 36 Contents vii 3.2 Programming models in HPC ......................... 41 3.2.1 Dedicated programming languages for HPC . 41 3.2.1.1 Overview ........................... 41 3.2.1.2 The OpenCL programming model . 43 3.2.2 Directive-based compilers and language extensions . 45 3.3 Power consumption in HPC and the power wall . 45 4 Overview of accelerated seismic applications 49 4.1 Stencil computations .............................. 49 4.2 Reverse time migration ............................. 52 4.2.1 Evolution of RTM algorithms ..................... 52 4.2.2 Wave-field reconstruction methods . 53 4.2.2.1 Re-computation of the forward wavefield . 54 4.2.2.2 Storing all the forward wavefield . 54 4.2.2.3 Selective wavefield storage (linear checkpointing) . 54 4.2.2.4 Checkpointing ........................ 55 4.2.2.5 Boundaries storage ...................... 56 4.2.2.6 Random boundary condition . 56 4.2.3 RTM on multi-cores and hardware accelerators . 56 4.2.3.1 RTM on multi-core CPUs . 57 4.2.3.2 RTM on GPUs ........................ 58 4.2.3.3 RTM on other accelerators . 59 4.3 Close to seismics workflows .......................... 61 5 Thesis position and contributions 63 5.1 Position of the study .............................. 63 5.2 Contributions .................................. 65 5.3 Hardware and seismic material configurations . 67 5.3.1 The hardware configuration ...................... 68 5.3.2 The numerical configurations of the seismic materials . 69 5.3.2.1 The seismic source ...................... 69 5.3.2.2 The velocity model and the compute grids .
Recommended publications
  • Tesla K80 Gpu Accelerator
    TESLA K80 GPU ACCELERATOR BD-07317-001_v05 | January 2015 Board Specification DOCUMENT CHANGE HISTORY BD-07317-001_v05 Version Date Authors Description of Change 01 June 23, 2014 GG, SM Preliminary Information (Information contained within this board specification is subject to change) 02 October 8, 2014 GG, SM • Updated product name • Minor change to Table 2 03 October 31, 2014 GG, SM • Added “8-Pin CPU Power Connector” section • Updated Figure 2 04 November 14, 2014 GG, SM • Removed preliminary and NDA • Updated boost clocks • Minor edits throughout document 05 January 30, 2015 GG, SM Updated Table 2 with MTBF data Tesla K80 GPU Accelerator BD-07317-001_v05 | ii TABLE OF CONTENTS Overview ............................................................................................. 1 Key Features ...................................................................................... 2 NVIDIA GPU Boost on Tesla K80 ................................................................ 3 Environmental Conditions ....................................................................... 4 Configuration ..................................................................................... 5 Mechanical Specifications ........................................................................ 6 PCI Express System ............................................................................... 6 Tesla K80 Bracket ................................................................................ 7 8-Pin CPU Power Connector ...................................................................
    [Show full text]
  • Small Form Factor 3D Graphics for Your Pc
    VisionTek Part# 900701 PRODUCTIVITY SERIES: SMALL FORM FACTOR 3D GRAPHICS FOR YOUR PC The VisionTek Radeon R7 240SFF graphics card offers a perfect balance of performance, features, and affordability for the gamer seeking a complete solution. It offers support for the DIRECTX® 11.2 graphics standard and 4K Ultra HD for stunning 3D visual effects, realistic lighting, and lifelike imagery. Its Short Form Factor design enables it to fit into the latest Low Profile desktops and workstations, yet the R7 240SFF can be converted to a standard ATX design with the included tall bracket. With 2GB of DDR3 memory and award-winning Graphics Core Next (GCN) architecture, and DVI-D/HDMI outputs, the VisionTek Radeon R7 240SFF is big on features and light on your wallet. RADEON R7 240 SPECS • Graphics Engine: RADEON R7 240 • Video Memory: 2GB DDR3 • Memory Interface: 128bit • DirectX® Support: 11.2 • Bus Standard: PCI Express 3.0 • Core Speed: 780MHz • Memory Speed: 800MHz x2 • VGA Output: VGA* • DVI Output: SL DVI-D • HDMI Output: HDMI (Video/Audio) • UEFI Ready: Support SYSTEM REQUIREMENTS • PCI Express® based PC is required with one X16 lane graphics slot available on the motherboard. • 400W (or greater) power supply GCN Architecture: A new design for AMD’s unified graphics processing and compute cores that allows recommended. 500 Watt for AMD them to achieve higher utilization for improved performance and efficiency. CrossFire™ technology in dual mode. • Minimum 1GB of system memory. 4K Ultra HD Support: Experience what you’ve been missing even at 1080P! With support for 3840 x • Installation software requires CD-ROM 2160 output via the HDMI port, textures and other detail normally compressed for lower resolutions drive.
    [Show full text]
  • AMD Accelerated Parallel Processing Opencl Programming Guide
    AMD Accelerated Parallel Processing OpenCL Programming Guide November 2013 rev2.7 © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Accelerated Parallel Processing, the AMD Accelerated Parallel Processing logo, ATI, the ATI logo, Radeon, FireStream, FirePro, Catalyst, and combinations thereof are trade- marks of Advanced Micro Devices, Inc. Microsoft, Visual Studio, Windows, and Windows Vista are registered trademarks of Microsoft Corporation in the U.S. and/or other jurisdic- tions. Other names are for informational purposes only and may be trademarks of their respective owners. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. The information contained herein may be of a preliminary or advance nature and is subject to change without notice. No license, whether express, implied, arising by estoppel or other- wise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as compo- nents in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD’s product could create a situation where personal injury, death, or severe property or envi- ronmental damage may occur.
    [Show full text]
  • AMD Radeon E8860
    Components for AMD’s Embedded Radeon™ E8860 GPU INTRODUCTION The E8860 Embedded Radeon GPU available from CoreAVI is comprised of temperature screened GPUs, safety certi- fiable OpenGL®-based drivers, and safety certifiable GPU tools which have been pre-integrated and validated together to significantly de-risk the challenges typically faced when integrating hardware and software components. The plat- form is an off-the-shelf foundation upon which safety certifiable applications can be built with confidence. Figure 1: CoreAVI Support for E8860 GPU EXTENDED TEMPERATURE RANGE CoreAVI provides extended temperature versions of the E8860 GPU to facilitate its use in rugged embedded applications. CoreAVI functionally tests the E8860 over -40C Tj to +105 Tj, increasing the manufacturing yield for hardware suppliers while reducing supply delays to end customers. coreavi.com [email protected] Revision - 13Nov2020 1 E8860 GPU LONG TERM SUPPLY AND SUPPORT CoreAVI has provided consistent and dedicated support for the supply and use of the AMD embedded GPUs within the rugged Mil/Aero/Avionics market segment for over a decade. With the E8860, CoreAVI will continue that focused support to ensure that the software, hardware and long-life support are provided to meet the needs of customers’ system life cy- cles. CoreAVI has extensive environmentally controlled storage facilities which are used to store the GPUs supplied to the Mil/ Aero/Avionics marketplace, ensuring that a ready supply is available for the duration of any program. CoreAVI also provides the post Last Time Buy storage of GPUs and is often able to provide additional quantities of com- ponents when COTS hardware partners receive increased volume for existing products / systems requiring additional inventory.
    [Show full text]
  • A Review of Gpuopen Effects
    A REVIEW OF GPUOPEN EFFECTS TAKAHIRO HARADA & JASON LACROIX • An initiative designed to help developers make better content by “opening up” the GPU • Contains a variety of software modules across various GPU needs: • Effects and render features • Tools, SDKs, and libraries • Patches and drivers • Software hosted on GitHub with no “black box” implementations or licensing fees • Website provides: • The latest news and information on all GPUOpen software • Tutorials and samples to help you optimise your game • A central location for up-to-date GPU and CPU documentation • Information about upcoming events and previous presentations AMD Public | Let’s build… 2020 | A Review of GPUOpen Effects | May 15, 2020 | 2 LET’S BUILD A NEW GPUOPEN… • Brand new, modern, dynamic website • Easy to find the information you need quickly • Read the latest news and see what’s popular • Learn new tips and techniques from our engineers • Looks good on mobile platforms too! • New social media presence • @GPUOpen AMD Public | Let’s build… 2020 | A Review of GPUOpen Effects | May 15, 2020 | 3 EFFECTS A look at recently released samples AMD Public | Let’s build… 2020 | A Review of GPUOpen Effects | May 15, 2020 | 4 TRESSFX 4.1 • Self-contained solution for hair simulation • Implementation into Radeon® Cauldron framework • DirectX® 12 and Vulkan® with full source • Optimized physics simulation • Faster velocity shock propagation • Simplified local shape constraints • Reorganization of dispatches • StrandUV support • New LOD system • New and improved Autodesk® Maya®
    [Show full text]
  • HP Grafične Postaje: HP Z1, HP Z2, HP Z4, ……
    ARES RAČUNALNIŠTVO d.o.o. HP Z1 Tržaška cesta 330 HP Z2 1000 Ljubljana, Slovenia, EU HP Z4 tel: 01-256 21 50 Grafične kartice GSM: 041-662 508 e-mail: [email protected] www.ares-rac.si 29.09.2021 DDV = 22% 1,22 Grafične postaje HP Grafične postaje: HP Z1, HP Z2, HP Z4, …….. ZALOGA Nudimo vam tudi druge modele, ki še niso v ceniku preveri zalogo Na koncu cenika so tudi opcije: Grafične kartice. Ostale opcije po ponudbi. Objavljamo neto ceno in ceno z DDV (PPC in akcijsko). Dokončna cena in dobavni rok - po konkretni ponudbi Objavljamo neto ceno in ceno z DDV (PPC in akcijsko). Dokončna cena po ponudbi. Koda HP Z1 TWR Grafična postaja /Delovna postaja Zaloga Neto cena Cena z DDV (EUR) (EUR) HP Z1 G8 TWR Zaloga Neto cena Cena z DDV 29Y17AV HP Z1 G8 TWR, Core i5-11500, 16GB, SSD 512GB, nVidia GeForce RTX 3070 8GB, USB-C, Win10Pro #71595950 Koda: 29Y17AV#71595950 HP delovna postaja, HP grafična postaja, Procesor Intel Core i5 -11500 (2,7 - 4,6 GHz) 12MB 6 jedr/12 niti Nabor vezja Intel Q570 Pomnilnik 16 GB DDR4 3200 MHz (1x16GB), 3x prosta reža, do 128 GB 1.735,18 2.116,92 SSD pogon 512 GB M.2 2280 PCIe NVMe TLC Optična enota: brez, HDD pogon : brez Razširitvena mesta 2x 3,5'', 1x 2,5'', RAID podpira RAID AKCIJA 1.657,00 2.021,54 Grafična kartica: nVidia GeForce RTX 3070 8GB GDDR6, 256bit, 5888 Cuda jeder CENA Žične povezave: Intel I219-LM Gigabit Network Brezžične povezave: brez Razširitve: 1x M.2 2230; 1x PCIe 3.0 x16; 2x PCIe 3,0 x16 (ožičena kot x4); 2x M.2 2230/2280; 2x PCIe 3.0 x1 Čitalec kartic: brez Priključki spredaj: 1x USB-C, 2x
    [Show full text]
  • Workstation Fisse E Mobili Monitor Hub Usb-C Docking
    OCCUPATEVI DEL VOSTRO BUSINESS, NOI CI PRENDEREMO CURA DEI VOSTRI COMPUTERS E PROGETTI INFORMATICI 16-09-21 16 WORKSTATION FISSE E MOBILI MONITOR HUB USB-C DOCKING STATION Noleggiare computer, server, dispositivi informatici e di rete, Cremonaufficio è un’azienda informatica presente nel terri- può essere una buona alternativa all’acquisto, molti infatti torio cremonese dal 1986. sono i vantaggi derivanti da questa pratica e il funziona- Sin dai primi anni si è distinta per professionalità e capacità, mento è molto semplice. Vediamo insieme alcuni di questi registrando di anno in anno un trend costante di crescita. vantaggi: Oltre venticinque anni di attività di vendita ed assistenza di Vantaggi Fiscali. Grazie al noleggio è possibile ottenere di- prodotti informatici, fotocopiatrici digitali, impianti telefonici, versi benefici in termini di fiscalità, elemento sempre molto reti dati fonia, videosorveglianza, hanno consolidato ed affer- interessante per le aziende. Noleggiare computer, server e mato l’azienda Cremonaufficio come fornitrice di prodotti e dispositivi di rete permette di ottenere una riduzione sulla servizi ad alto profilo qualitativo. tassazione annuale e il costo del noleggio è interamente Il servizio di assistenza tecnica viene svolto da tecnici interni, deducibile. specializzati nei vari settori, automuniti. Locazione operativa, non locazione finanziaria. A differen- Cremonaufficio è un operatore MultiBrand specializzato in za del leasing, il noleggio non prevede l’iscrizione ad una Information Technology ed Office Automation e, come tale, si centrale rischi, di conseguenza migliora il rating creditizio, avvale di un sistema di gestione operativa certificato ISO 9001. facilita il rapporto con le banche e l’accesso ai finanzia- menti.
    [Show full text]
  • Monte Carlo Evaluation of Financial Options Using a GPU a Thesis
    Monte Carlo Evaluation of Financial Options using a GPU Claus Jespersen 20093084 A thesis presented for the degree of Master of Science Computer Science Department Aarhus University Denmark 02-02-2015 Supervisor: Gerth Brodal Abstract The financial sector has in the last decades introduced several new fi- nancial instruments. Among these instruments, are the financial options, which for some cases can be difficult if not impossible to evaluate analyti- cally. In those cases the Monte Carlo method can be used for pricing these instruments. The Monte Carlo method is a computationally expensive al- gorithm for pricing options, but is at the same time an embarrassingly parallel algorithm. Modern Graphical Processing Units (GPU) can be used for general purpose parallel-computing, and the Monte Carlo method is an ideal candidate for GPU acceleration. In this thesis, we will evaluate the classical vanilla European option, an arithmetic Asian option, and an Up-and-out barrier option using the Monte Carlo method accelerated on a GPU. We consider two scenarios; a single option evaluation, and a se- quence of a varying amount of option evaluations. We report performance speedups of up to 290x versus a single threaded CPU implementation and up to 53x versus a multi threaded CPU implementation. 1 Contents I Theoretical aspects of Computational Finance 5 1 Computational Finance 5 1.1 Options . .7 1.1.1 Types of options . .7 1.1.2 Exotic options . .9 1.2 Pricing of options . 11 1.2.1 The Black-Scholes Partial Differential Equation . 11 1.2.2 Solving the PDE and pricing vanilla European options .
    [Show full text]
  • Mairjason2015phd.Pdf (1.650Mb)
    Power Modelling in Multicore Computing Jason Mair a thesis submitted for the degree of Doctor of Philosophy at the University of Otago, Dunedin, New Zealand. 2015 Abstract Power consumption has long been a concern for portable consumer electron- ics, but has recently become an increasing concern for larger, power-hungry systems such as servers and clusters. This concern has arisen from the asso- ciated financial cost and environmental impact, where the cost of powering and cooling a large-scale system deployment can be on the order of mil- lions of dollars a year. Such a substantial power consumption additionally contributes significant levels of greenhouse gas emissions. Therefore, software-based power management policies have been used to more effectively manage a system’s power consumption. However, man- aging power consumption requires fine-grained power values for evaluating the run-time tradeoff between power and performance. Despite hardware power meters providing a convenient and accurate source of power val- ues, they are incapable of providing the fine-grained, per-application power measurements required in power management. To meet this challenge, this thesis proposes a novel power modelling method called W-Classifier. In this method, a parameterised micro-benchmark is designed to reproduce a selection of representative, synthetic workloads for quantifying the relationship between key performance events and the cor- responding power values. Using the micro-benchmark enables W-Classifier to be application independent, which is a novel feature of the method. To improve accuracy, W-Classifier uses run-time workload classification and derives a collection of workload-specific linear functions for power estima- tion, which is another novel feature for power modelling.
    [Show full text]
  • A Statistical Performance Model of the Opteron Processor
    A Statistical Performance Model of the Opteron Processor Jeanine Cook Jonathan Cook Waleed Alkohlani New Mexico State University New Mexico State University New Mexico State University Klipsch School of Electrical Department of Computer Klipsch School of Electrical and Computer Engineering Science and Computer Engineering [email protected] [email protected] [email protected] ABSTRACT m5 [11], Simics/GEMS [25], and more recently MARSSx86 [5]{ Cycle-accurate simulation is the dominant methodology for the only one supporting the popular x86 architecture. Some processor design space analysis and performance prediction. of these have not been validated against real hardware in However, with the prevalence of multi-core, multi-threaded the multi-core mode (SESC and M5), and the others that architectures, this method has become highly impractical as are validated show large errors of up to 50% [5]. All cycle- the sole means for design due to its extreme slowdowns. We accurate multi-core simulators are very slow; a few hundred have developed a statistical technique for modeling multi- thousand simulated instructions per second is the most op- core processors that is based on Monte Carlo methods. Us- timistic speed with the help of native mode execution. Fur- ing this method, processor models of contemporary archi- ther, the architectures supported by many of these simula- tectures can be developed and applied to performance pre- tors are now obsolete. diction, bottleneck detection, and limited design space anal- ysis. To date, we have accurately modeled the IBM Cell, the To address these issues and to satisfy our own desire for Intel Itanium, and the Sun Niagara 1 and Niagara 2 proces- performance models of contemporary processors, we devel- sors [34, 33, 10].
    [Show full text]
  • Power Modeling
    CHAPTER 5 POWER MODELING Jason Mair1, Zhiyi Huang1, David Eyers1, Leandro Cupertino2, Georges Da Costa2, Jean-Marc Pierson2 and Helmut Hlavacs3 1Department of Computer Science, University of Otago, New Zealand 2Institute for Research in Informatics of Toulouse (IRIT), University of Toulouse III, France 3Faculty of Computer Science, University of Vienna, Austria 5.1 Introduction Power consumption has long been a concern for portable consumer electronics, with many manufacturers explicitly seeking to maximize battery life in order to improve the usability of devices such as laptops and smart phones. However, it has recently become a concern in the domain of much larger, more power hungry systems such as servers, clusters and data centers. This new drive to improve energy efficiency is in part due to the increasing deployment of large-scale systems in more businesses and industries, which have two pri- mary motives for saving energy. Firstly, there is the traditional economic incentive for a business to reduce their operating costs, where the cost of powering and cooling a large data center can be on the order of millions of dollars [18]. Reducing the total cost of own- ership for servers could help to stimulate further deployments. As servers become more affordable, deployments will increase in businesses where concerns over lifetime costs pre- viously prevented adoption. The second motivating factor is the increasing awareness of the environmental impact—e.g. greenhouse gas emissions—caused by power production. Reducing energy consumption can help a business indirectly reduce their environmental impact, making them more clean and green. The most commonly adopted solution for reducing power consumption is a hardware- based approach, where old, inefficient hardware is replaced with newer, more energy effi- COST Action 0804, edition.
    [Show full text]
  • Deep Dive: Asynchronous Compute
    Deep Dive: Asynchronous Compute Stephan Hodes Developer Technology Engineer, AMD Alex Dunn Developer Technology Engineer, NVIDIA Joint Session AMD NVIDIA ● Graphics Core Next (GCN) ● Maxwell, Pascal ● Compute Unit (CU) ● Streaming Multiprocessor (SM) ● Wavefronts ● Warps 2 Terminology Asynchronous: Not independent, async work shares HW Work Pairing: Items of GPU work that execute simultaneously Async. Tax: Overhead cost associated with asynchronous compute 3 Async Compute More Performance 4 Queue Fundamentals 3 Queue Types: 3D ● Copy/DMA Queue ● Compute Queue COMPUTE ● Graphics Queue COPY All run asynchronously! 5 General Advice ● Always profile! 3D ● Can make or break perf ● Maintain non-async paths COMPUTE ● Profile async on/off ● Some HW won’t support async ● ‘Member hyper-threading? COPY ● Similar rules apply ● Avoid throttling shared HW resources 6 Regime Pairing Good Pairing Poor Pairing Graphics Compute Graphics Compute Shadow Render Light culling G-Buffer SSAO (Geometry (ALU heavy) (Bandwidth (Bandwidth limited) limited) limited) (Technique pairing doesn’t have to be 1-to-1) 7 - Red Flags Problem/Solution Format Topics: ● Resource Contention - ● Descriptor heaps - ● Synchronization models ● Avoiding “async-compute tax” 8 Hardware Details - ● 4 SIMD per CU ● Up to 10 Wavefronts scheduled per SIMD ● Accomplish latency hiding ● Graphics and Compute can execute simultanesouly on same CU ● Graphics workloads usually have priority over Compute 9 Resource Contention – Problem: Per SIMD resources are shared between Wavefronts SIMD executes
    [Show full text]