A New Temperature Distribution Measurement Method on GPU

Total Page:16

File Type:pdf, Size:1020Kb

A New Temperature Distribution Measurement Method on GPU A New Temperature Distribution Measurement Method on GPU Architectures Using Thermocouples† Aniruddha Dasgupta1, Sunpyo Hong1, Hyesoon Kim2 and Jinil Park3* 1 School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA 2School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA 3School of Mechanical Engineering, Ajou University, Suwon, 443-749, Korea ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Abstract In recent years, the many-core architecture has seen a rapid increase in the number of on-chip cores with a much slower increase in die area. This has led to very high power densities in the chip. Hence, in addition to power, temperature has become a first-order design con- straint for high-performance architectures. However, measuring temperature is very limited to on-chip temperature sensors, which might not always be available to researchers. In this paper, we propose a new temperature-measurement system using thermocouples for many-core GPU architectures and devise a new method to control GPU scheduling. This system gives us a temperature distribution heatmap of the chip. In addition to monitoring temperature distribution, our system also does run-time power consumption monitoring. The results show that there is a strong co- relation between the on-chip heatmap patterns and power consumption. Furthermore, we provide actual experimental results that show the relationship between TPC utilizations and their active locations that reduce temperature and power consumption. Keywords: Temperature Measurement; Many-core Architecture; GPU; Thermocouple ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- must be removed, which could interfere with the natural heat 1. Introduction distribution from a heatsink. Also, measurements performed The number of cores inside a chip is increasing dramatically through such a setup typically require some adjustments to the in today’s processors. For example, NVIDIA’s GTX280 has measured data, so as to accurately represent ideal measure- 30 streaming multiprocessors with 240 CUDA cores, and ments under actual working conditions of the processor (i.e., NVIDIA Fermi GPUs have 512 CUDA cores. On the multi- with a heatsink cooling solution). Thus, there is an opportunity core front, the latest AMD processors have 12 cores. This high for inaccuracies to creep in due to the nature of the modeling. number of cores puts a lot of pressure on designing effective Hence, we propose a new cost-effective temperature- power- and temperature-controlled architectures. Moreover, measurement system that uses thermocouples for the first time the work by Mesa-Martinez et al. [10] showed that tempera- for GPU architectures. We devised a method to install ther- ture is becoming a dominant factor for the determining per- mocouples between a chip and a heatsink. With this system, formance, reliability, and leakage power consumption of we successfully measured the on-chip temperature distribution modern processors. of a GPU processor. Thermocouples provide two benefits over In this paper, we use GPUs as a form of many-core proces- IR cameras. First, they are very low cost and relatively easy to sor. With GPUs, it is possible to validate that temperature- install, even in academia, without special expensive equip- aware thread scheduling can actually reduce power consump- ment. Second, a heatsink can still be placed, so we can meas- tion. Unfortunately, unlike the state-of-the-art multicores, the ure power and temperature simultaneously. Then, we demon- current GPUs do not provide temperature sensors for each strate the need for thermal-aware scheduling algorithms based individual core. Usually, a board-level temperature sensor is on the correlation between the on-chip heatmap and power provided. However, it cannot account for the rampant temper- consumption. ature variations across the chip due to hotspots. Hence, in this paper, we propose a new temperature-measurement system 2. Background and Related Work that allows us to measure the temperature map, while also In this section, we discuss previous chip temperature- measuring the total power consumption. measurement systems and provide a brief background of the Some efforts in academia have focused on measuring tem- evaluated GPU system. perature using infrared (IR) cameras [11] (Although industries have better ways of measuring temperature, typically that 2.1 Chip Temperature Characterization Methods information is not disclosed to the public). IR cameras provide Chip temperature characterization methods can be classified an entire temperature distribution, but set-up cost is very high, into two main branches: 1) modeling methods, and 2) meas- and they require special oil cooling. In other words, a heatsink urement methods. 2.1.1 Modeling Methods IR-based Measurement: Infrared-based thermal imaging has gained popularity as a robust method of characterizing Temperature-modeling methods are mainly relevant to de- thermal behavior [11]. It provides good resolution and accura- sign time thermal characterization. They provide designers cy both in time and space. As such, it has been used in study- with the freedom to try out new designs and perform simula- ing dynamic thermal management techniques. Its external tions. Also, such thermal models can be plugged into microar- nature also helps in making decisions regarding the placement chitecture simulators to see the effect of changing microarchi- location of thermal sensors on the chip at temperature-critical tectural parameters on temperature or the effect of running portions. However, there are a few limitations of using IR different benchmarks. One of the most popular thermal mod- imaging, some of which have already been pointed out by els is HotSpot [9]. Based on the duality of heat transfer and Huang et al. [7]; IR rays cannot pass through metal. Generally, electricity, the authors have modeled various microarchitec- processors are encompassed with a metallic heat dissipation ture components into equivalent thermal resistances and ca- solution like a heatsink. So, for IR imaging to work, the pacitances. HotSpot can also be used to model a particular heatsink needs to be removed and an alternate cooling solution thermal package for the chip and to observe its thermal char- needs to be provided. One of the prevalent methods in this acteristics. By plugging the HotSpot thermal model into a case is removing the heatsink and providing laminar oil- simulator, one can track the thermal properties of individual cooling over a bare silicon die [11]. However, this results in components under load, understand a program’s thermal be- different transient and steady-state thermal responses com- havior, evaluate thermal management techniques, etc. The pared to a conventional cooling solution like a heatsink [7]. thermal model is portable and flexible, and it can be built upon The other limitation is that the cooling capacity of oil is rough- to cater to particular requirements. However, verification of ly proportional to the size of the oil tank and the velocity of the model on absolute thermal values is still a challenge. the oil flow. In order to cool 100W-300W cores, the speed of oil flow has to be fast, thereby easily producing more distorted 2.1.2 Measuring Methods images. Although this method has high merits when done correctly, it comes with high set-up cost and time. Temperature measurement methods are mainly relevant to runtime thermal management techniques, which require a 2.1.2 Pros and Cons of using Thermocouples temperature measurement to occur in real time. Also, though thermal simulation models aim to faithfully mirror the behav- The most notable advantage of using thermocouples is the ior of the system, they are based on the designer’s understand- cost and the ease of use for the measurement. Not only is it ing of what factors affect the thermal characteristics of the suitable for measurements up to 750 degrees Celsius, but it system. So, modeling methods need to be validated against has a very thin diameter, and being a wire, it can easily be actual measurements of some sort to ensure the accuracy of placed anywhere. However, placing this wire in a specific the model and thus they require the existence of robust ther- location is a challenge as the pressure applied on it could af- mal-measurement methods. In the realm of performance, fect the temperature readings. Furthermore, the resolution of modern processors provide measurement instruments in the the readings is very limited to IR measurement. However, form of hardware performance counters. However, for tem- with a very well designed thermo-spacer and some knowledge perature, processors, especially many-core processors, do not of the GPU processor layout, using thermocouples provides yet have a concrete built-in measurement system. Though the the best cost-effective solution. Also, reconstructing a exact methods used in the industry to measure temperature are heatmap from thermocouple readings is much simpler than in not known, there are
Recommended publications
  • Does Your Hardware Support Opencl? - 12-29-2011 by Vincent - Streamcomputing
    Does your hardware support OpenCL? - 12-29-2011 by vincent - StreamComputing - http://streamcomputing.eu Does your hardware support OpenCL? by vincent – Thursday, December 29, 2011 http://streamcomputing.eu/blog/2011-12-29/opencl-hardware-support/ Does your computer have OpenCL-capable hardware? Read on and find out… For people who only want to run OpenCL-software and have recent hardware, just read this paragraph. If you have recent drivers for your GPU, you can be sure OpenCL is already supported and you can run OpenCL-capable software. NVidia has support for OpenCL 1.1 since drivers 280.13, so if you need OpenCL 1.1, then make sure you have this version or later. If you want to use Intel-processors and you don’t have an AMD GPU installed, you need to download the runtime of Intel OpenCL. If you want to know if your X86 device is supported, you’ll find answers in this article. Often it is not clear how OpenCL works on CPUs. If you have a 8 core processor with double threading, then it mostly is understood that 16 pipelines of instructions are possible. OpenCL takes care of this threading, but also uses parallelism provided by SSE and AVX extension. I talked more about this here and here. Meaning that an 8-core processor with AVX can compute 8 times 32 bytes (8*8 floats or 8*4 doubles) in parallel. You could see it as parallelism of parallelism. SSE is designed with multimedia-operations in mind, but has enough to be used with OpenCL.
    [Show full text]
  • GPU-Based Deep Learning Inference
    Whitepaper GPU-Based Deep Learning Inference: A Performance and Power Analysis November 2015 1 Contents Abstract ......................................................................................................................................................... 3 Introduction .................................................................................................................................................. 3 Inference versus Training .............................................................................................................................. 4 GPUs Excel at Neural Network Inference ..................................................................................................... 5 Inference Optimizations in Caffe and cuDNN 4 ........................................................................................ 5 Experimental Setup and Testing Methodology ........................................................................................ 7 Inference on Small and Large GPUs .......................................................................................................... 8 Conclusion ................................................................................................................................................... 10 References .................................................................................................................................................. 10 2 Abstract Deep learning methods are revolutionizing various areas of machine perception. On a
    [Show full text]
  • Arxiv:1809.03668V2 [Cs.LG] 20 Jan 2019 17, 20, 21]
    Comparing Computing Platforms for Deep Learning on a Humanoid Robot Alexander Biddulph∗, Trent Houliston, Alexandre Mendes, and Stephan K. Chalup School of Electrical Engineering and Computing The University of Newcastle, Callaghan, NSW, 2308, Australia. [email protected] Abstract. The goal of this study is to test two different computing plat- forms with respect to their suitability for running deep networks as part of a humanoid robot software system. One of the platforms is the CPU- centered Intel R NUC7i7BNH and the other is a NVIDIA R Jetson TX2 system that puts more emphasis on GPU processing. The experiments addressed a number of benchmarking tasks including pedestrian detec- tion using deep neural networks. Some of the results were unexpected but demonstrate that platforms exhibit both advantages and disadvantages when taking computational performance and electrical power require- ments of such a system into account. Keywords: deep learning, robot vision, gpu computing, low powered devices 1 Introduction Deep learning comes with challenges with respect to computational resources and training data requirements [6, 13]. Some of the breakthroughs in deep neu- ral networks (DNNs) only became possible through the availability of massive computing systems or through careful co-design of software and hardware. For example, the AlexNet system presented in [15] was implemented efficiently util- ising two NVIDIA R GTX580 GPUs for training. Machine learning on robots has been a growing area over the past years [4, arXiv:1809.03668v2 [cs.LG] 20 Jan 2019 17, 20, 21]. It has become increasingly desirable to employ DNNs in low powered devices, among them humanoid robot systems, specifically for complex tasks such as object detection, walk learning, and behaviour learning.
    [Show full text]
  • Numerical Behavior of NVIDIA Tensor Cores
    Numerical behavior of NVIDIA tensor cores Massimiliano Fasi1, Nicholas J. Higham2, Mantas Mikaitis2 and Srikara Pranesh2 1 School of Science and Technology, Örebro University, Örebro, Sweden 2 Department of Mathematics, University of Manchester, Manchester, UK ABSTRACT We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100, Turing T4, and Ampere A100 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware units. Knowing the answers to these questions is important if one wishes to: (1) accurately simulate NVIDIA tensor cores on conventional hardware; (2) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations; and (3) build custom hardware whose behavior matches that of NVIDIA tensor cores. As part of this work we provide a test suite that can be easily adapted to test newer versions of the NVIDIA tensorcoresaswellassimilaracceleratorsfromothervendors,astheybecome available. Moreover, we identify a non-monotonicity issue
    [Show full text]
  • Manycore GPU Architectures and Programming, Part 1
    Lecture 19: Manycore GPU Architectures and Programming, Part 1 Concurrent and Mul=core Programming CSE 436/536, [email protected] www.secs.oakland.edu/~yan 1 Topics (Part 2) • Parallel architectures and hardware – Parallel computer architectures – Memory hierarchy and cache coherency • Manycore GPU architectures and programming – GPUs architectures – CUDA programming – Introduc?on to offloading model in OpenMP and OpenACC • Programming on large scale systems (Chapter 6) – MPI (point to point and collec=ves) – Introduc?on to PGAS languages, UPC and Chapel • Parallel algorithms (Chapter 8,9 &10) – Dense matrix, and sorng 2 Manycore GPU Architectures and Programming: Outline • Introduc?on – GPU architectures, GPGPUs, and CUDA • GPU Execuon model • CUDA Programming model • Working with Memory in CUDA – Global memory, shared and constant memory • Streams and concurrency • CUDA instruc?on intrinsic and library • Performance, profiling, debugging, and error handling • Direc?ve-based high-level programming model – OpenACC and OpenMP 3 Computer Graphics GPU: Graphics Processing Unit 4 Graphics Processing Unit (GPU) Image: h[p://www.ntu.edu.sg/home/ehchua/programming/opengl/CG_BasicsTheory.html 5 Graphics Processing Unit (GPU) • Enriching user visual experience • Delivering energy-efficient compung • Unlocking poten?als of complex apps • Enabling Deeper scien?fic discovery 6 What is GPU Today? • It is a processor op?mized for 2D/3D graphics, video, visual compu?ng, and display. • It is highly parallel, highly multhreaded mulprocessor op?mized for visual
    [Show full text]
  • COM Express® + GPU Embedded System (VXG/DXG)
    COM Express® + GPU Embedded System (VXG/DXG) VXG Series DXG Series Connect Tech Inc. Tel: 519-836-1291 42 Arrow Road Toll: 800-426-8979 (North America only) Guelph, Ontario Fax: 519-836-4878 N1K 1S6 Email: [email protected] www.connecttech.com [email protected] CTIM-00409 Revision 0.12 2018-03-16 COM Express® + GPU Embedded System (VXG/DXG) Users Guide www.connecttech.com Table of Contents Preface ................................................................................................................................................... 4 Disclaimer ....................................................................................................................................................... 4 Customer Support Overview ........................................................................................................................... 4 Contact Information ........................................................................................................................................ 4 One Year Limited Warranty ............................................................................................................................ 5 Copyright Notice ............................................................................................................................................. 5 Trademark Acknowledgment .......................................................................................................................... 5 ESD Warning .................................................................................................................................................
    [Show full text]
  • Numerical Behavior of NVIDIA Tensor Cores Fasi, Massimiliano And
    Numerical Behavior of NVIDIA Tensor Cores Fasi, Massimiliano and Higham, Nicholas J. and Mikaitis, Mantas and Pranesh, Srikara 2020 MIMS EPrint: 2020.10 Manchester Institute for Mathematical Sciences School of Mathematics The University of Manchester Reports available from: http://eprints.maths.manchester.ac.uk/ And by contacting: The MIMS Secretary School of Mathematics The University of Manchester Manchester, M13 9PL, UK ISSN 1749-9097 Numerical Behavior of NVIDIA Tensor Cores Massimiliano Fasi*1, Nicholas J. Higham2, Mantas Mikaitis2, and Srikara Pranesh2 1School of Science and Technology, Orebro¨ University, Orebro,¨ Sweden 2Department of Mathematics, University of Manchester, Manchester, United Kingdom Corresponding author: Mantas Mikaitis2 Email address: [email protected] ABSTRACT We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100, Turing T4 and Ampere A100 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware units. Knowing the answers to these questions is important if one wishes to: 1) accurately simulate NVIDIA tensor cores on conventional hardware; 2) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations; and 3) build custom hardware whose behavior matches that of NVIDIA tensor cores.
    [Show full text]
  • Storage Device Information 1
    Storage Device Information 1. MSI suggests to ask our local service center for a storage device’s approval list before your upgrade in order to avoid any compatibility issues. 2. For having the maximum performance of the SSD, see the suggested Stripe Size shown in the RAID 0 column to do the set up. 3. 2.5 inch vs. mSATA vs. M.2 SSD Which M.2 SSD Drive do I need? 1. Connector & Socket: Make sure you have the right combination of the SSD connector and the socket on your notebook. There are three types of M.2 SSD edge connectors, B Key, M Key, and B+M Key. – B key edge connector can use on Socket for 'B' Key (aka. Socket 2) (SATA) – M key edge connector can use on Socket for 'M' Key (aka. Socket 3) (PCIe x 4) – B+M Key edge connector can use on both sockets * To know the supported connector and the socket of your notebook, see the ‘M.2 slot(s)’ column. 2. Physical size: M.2 SSD allows for a variety of physical sizes, like 2242, 2260, 2280, and MSI Notebooks supports Type 2280. 3. Logical interface: M.2 drives can connect through either the SATA controller or through the PCI-E bus in either x2 or x4 mode. Catalogue 1. MSI Gaming Notebooks – NVIDIA GeForce 10 Series Graphics And Intel HM175/CM238 Chipset – NVIDIA GeForce 10 Series Graphics And Intel HM170/CM236 Chipset – NVIDIA GeForce 900M Series Graphics And Intel HM175 Chipset – NVIDIA GeForce 900M Series Graphics And Intel HM170/CM236 Chipset – NVIDIA GeForce 900M Series Graphics And Intel HM87/HM86 Chipset – NVIDIA GeForce 800M Series Graphics – NVIDIA GeForce 700M Series Graphics – Gaming Dock – MSI Vortex – MSI VR ONE 2.
    [Show full text]
  • NVIDIA's Opengl Functionality
    NVIDIANVIDIA ’’ss OpenGLOpenGL FunctionalityFunctionality Session 2127 | Room A5 | Monday, September, 20th, 16:00 - 17:20 San Jose Convention Center, San Jose, California Mark J. Kilgard • Principal System Software Engineer – OpenGL driver – Cg (“C for graphics”) shading language • OpenGL Utility Toolkit (GLUT) implementer • Author of OpenGL for the X Window System • Co-author of Cg Tutorial Outline • OpenGL’s importance to NVIDIA • OpenGL 3.3 and 4.0 • OpenGL 4.1 • Loose ends: deprecation, Cg, further extensions OpenGL Leverage Cg Parallel Nsight SceniX CompleX OptiX Example of Hybrid Rendering with OptiX OpenGL (Rasterization) OptiX (Ray tracing) Parallel Nsight Provides OpenGL Profiling Configure Application Trace Settings Parallel Nsight Provides OpenGL Profiling Magnified trace options shows specific OpenGL (and Cg) tracing options Parallel Nsight Provides OpenGL Profiling Parallel Nsight Provides OpenGL Profiling Trace of mix of OpenGL and CUDA shows glFinish & OpenGL draw calls OpenGL In Every NVIDIA Business OpenGL on Quadro – World class OpenGL 4 drivers – 18 years of uninterrupted API compatibility – Workstation application certifications – Workstation application profiles – Display list optimizations – Fast antialiased lines – Largest memory configurations: 6 gigabytes – GPU affinity – Enhanced interop with CUDA and multi-GPU OpenGL – Advanced multi-GPU rendering – Overlays – Genlock – Unified Back Buffer for less framebuffer memory usage – Cross-platform • Windows XP, Vista, Win7, Linux, Mac, FreeBSD, Solaris – SLI Mosaic –
    [Show full text]
  • Nvida Geforce Driver Download Geforce Game Ready Driver
    nvida geforce driver download GeForce Game Ready Driver. Game Ready Drivers provide the best possible gaming experience for all major new releases. Prior to a new title launching, our driver team is working up until the last minute to ensure every performance tweak and bug fix is included for the best gameplay on day-1. Game Ready for Naraka: Bladepoint This new Game Ready Driver provides support for Naraka: Bladepoint, which utilizes NVIDIA DLSS and NVIDIA Reflex to boost performance by up to 60% at 4K and make you more competitive through the reduction of system latency. Additionally, this release also provides optimal support for the Back 4 Blood Open Beta and Psychonauts 2 and includes support for 7 new G-SYNC Compatible displays. Effective October 2021, Game Ready Driver upgrades, including performance enhancements, new features, and bug fixes, will be available for systems utilizing Maxwell, Pascal, Turing, and Ampere-series GPUs. Critical security updates will be available on systems utilizing desktop Kepler- series GPUs through September 2024. A complete list of desktop Kepler-series GeForce GPUs can be found here. NVIDIA TITAN Xp, NVIDIA TITAN X (Pascal), GeForce GTX TITAN X, GeForce GTX TITAN, GeForce GTX TITAN Black, GeForce GTX TITAN Z. GeForce 10 Series: GeForce GTX 1080 Ti, GeForce GTX 1080, GeForce GTX 1070 Ti, GeForce GTX 1070, GeForce GTX 1060, GeForce GTX 1050 Ti, GeForce GTX 1050, GeForce GT 1030, GeForce GT 1010. GeForce 900 Series: GeForce GTX 980 Ti, GeForce GTX 980, GeForce GTX 970, GeForce GTX 960, GeForce GTX 950. GeForce 700 Series: GeForce GTX 780 Ti, GeForce GTX 780, GeForce GTX 770, GeForce GTX 760, GeForce GTX 760 Ti (OEM), GeForce GTX 750 Ti, GeForce GTX 750, GeForce GTX 745, GeForce GT 740, GeForce GT 730, GeForce GT 720, GeForce GT 710.
    [Show full text]
  • Graphics Card & Iclone Comparability Table: NVIDIA ATI Technologies Sis Intel Graphics Card: NVIDIA
    Graphics Card & iClone Comparability Table: NVIDIA ATI Technologies SiS Intel Graphics Card: NVIDIA IClone 5.x iClone 2.x iClone 3.x iClone 3.x iClone 3.x iClone 4.x iClone 4.x iClone 4.x iClone 4.x IClone 5.x IClone 5.x IClone 5.x 3D Card Name iClone 1.x iClone 2.x HDR + Self Cast Shadow Quick Shader Vertex Shader Pixel Shader Quick Shader Vertex Shader Pixel Shader HDR Quick Shader Pixel Shader HDR Anti-Aliased GeForce5 Series VVVV XX V XXX V XXX GeForce 6 Series VVVVVVVVV X VVV X GeForce 7 Series VVVVVVVVVVVVV X GeForce 8 Series VVVVVVVVVVVVVV GeForce 9 Series VVVVVVVVVVVVVV GeForce 100 Series VVVVVVVVVVVVVV GeForce 200 Series VVVVVVVVVVVVVV Quadro FX370 VVVVVVVVVVVVVV GeForce GTX460 VVVVVVVVVVVVVV GeForce GTX550Ti VVVVVVVVVVVVVV GeForce GTX560Ti VVVVVVVVVVVVVV GeForce GTX 570 VVVVVVVVVVVVVV GeForce GTX 580 VVVVVVVVVVVVVV GeForce GTX 590 VVVVVVVVVVVVVV Quadro FX 580 VVVVVVVVVVVVVV Graphics Card: ATI iClone5.x iClone 2.x iClone 3.x iClone 3.x iClone 3.x iClone 4.x iClone 4.x iClone 4.x iClone 4.x iClone 5.x iClone 5.x iClone5.x 3D Card Name iClone 1.x iClone 2.x HDR + Self Cast Shadow Quick Shader Vertex Shader Pixel Shader Quick Shader Vertex Shader Pixel Shader HDR Quick Shader Pixel Shader HDR Anti-Aliased Radeon9xxx Series VV X V XX V XXX V XXX X300 / FireGL V3100 VV X V XX V XXX V XXX X600 / FireGL V3200 VV X V XX V XXX V XXX X700 / FireGL V5000 VV X V XX V XXX V XXX X800 / FireGL V5100 VV X V XX V XXX V XXX FireGL V7700 VVVVVVVVVVVVV X X1xxx Series VV X V XX V XXX V XXX HD 2xxx Series VVVVVVVVVVVVV X HD 3650 VVVVVVVVVVVVVV HD 4850
    [Show full text]
  • Lecture: Manycore GPU Architectures and Programming, Part 1
    Lecture: Manycore GPU Architectures and Programming, Part 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering Yonghong Yan [email protected] https://passlab.github.io/CSCE569/ 1 Manycore GPU Architectures and Programming: Outline • Introduction – GPU architectures, GPGPUs, and CUDA • GPU Execution model • CUDA Programming model • Working with Memory in CUDA – Global memory, shared and constant memory • Streams and concurrency • CUDA instruction intrinsic and library • Performance, profiling, debugging, and error handling • Directive-based high-level programming model – OpenACC and OpenMP 2 Computer Graphics GPU: Graphics Processing Unit 3 Graphics Processing Unit (GPU) Image: http://www.ntu.edu.sg/home/ehchua/programming/opengl/CG_BasicsTheory.html 4 Graphics Processing Unit (GPU) • Enriching user visual experience • Delivering energy-efficient computing • Unlocking potentials of complex apps • Enabling Deeper scientific discovery 5 What is GPU Today? • It is a processor optimized for 2D/3D graphics, video, visual computing, and display. • It is highly parallel, highly multithreaded multiprocessor optimized for visual computing. • It provide real-time visual interaction with computed objects via graphics images, and video. • It serves as both a programmable graphics processor and a scalable parallel computing platform. – Heterogeneous systems: combine a GPU with a CPU • It is called as Many-core 6 Graphics Processing Units (GPUs): Brief History GPU Computing General-purpose computing on graphics processing units (GPGPUs) GPUs with programmable shading Nvidia GeForce GE 3 (2001) with programmable shading DirectX graphics API OpenGL graphics API Hardware-accelerated 3D graphics S3 graphics cards- single chip 2D accelerator Atari 8-bit computer IBM PC Professional Playstation text/graphics chip Graphics Controller card 1970 1980 1990 2000 2010 Source of information http://en.wikipedia.org/wiki/Graphics_Processing_Unit 7 NVIDIA Products • NVIDIA Corp.
    [Show full text]