Gpgpus and Their Programming
Total Page:16
File Type:pdf, Size:1020Kb
GPGPUs and their programming Sima, Dezső Szénási, Sándor Created by XMLmind XSL-FO Converter. GPGPUs and their programming írta Sima, Dezső és Szénási, Sándor Szerzői jog © 2013 Typotex Kivonat The widespread use of GPGPUs in an increasing number of HPC (High Performance Computing) areas, such as scientific, engineering, financial and business applications, is one of recent major trends in using informatics. The lecture slides worked out pursuit two goals. On the one side the lecture aims at presenting the principle of operation, the microarchitecture and main features of GPGPU cores, as well as their implementation as graphics cards or data parallel accelerators. On the other hand, the practical part of the lectures aims at acquainting students with data parallel programming, first of all with GPGPU programming and program optimization using the CUDA language and programming environment. Created by XMLmind XSL-FO Converter. Tartalom I. GPGPUs .......................................................................................................................................... 1 Aim .......................................................................................................................................... ix 1. Introduction to GPGPUs .................................................................................................... 10 1. Representation of objects by triangles ...................................................................... 10 1.1. Main types of shaders in GPUs .................................................................... 11 2. Convergence of important features of the vertex and pixel shader models .............. 12 3. Peak FP32/FP64 performance of Nvidia’s GPUs vs Intel’ P4 and Core2 processors [7] 14 4. Peak FP32 performance of AMD’s GPUs [8] ........................................................... 15 5. Evolution of the FP-32 performance of GPUs [9] .................................................... 16 6. Evolution of the bandwidth of Nvidia’s GPU’s vs Intel’s P4 and Core2 processors [7] 16 7. Comparing main features of CPUs and GPUs [12] .................................................. 17 2. The virtual machine concept of GPGPU computing .......................................................... 19 1. Introduction to the virtual machine concept of GPGPU computing ......................... 19 1.1. The virtual machine concept as used for GPGPUs ...................................... 19 1.2. Benefits of the portability of the pseudo assembly code .............................. 19 1.3. The process of developing applications-1 .................................................... 19 1.4. The process of executing applicatons-2 ....................................................... 20 1.5. Formulation of application programs at different levels of abstraction ....... 20 2. Introduction to a GPGPU-based massively data parallel computational model ....... 21 2.1. Specification of GPGPUs at different levels of abstraction ......................... 21 2.2. Specification of the GPGPU virtual machine ............................................... 22 2.3. The GPGPU-based massively data parallel computational model ............... 22 2.4. Emergence of GPGPU-based massively data parallel computational models 22 2.5. The choice of the computational model presented ....................................... 23 2.6. The choice of the terminology used ............................................................. 23 2.6.1. Designation of the GPGPU-based massively data parallel computational model as the SIMT (Single instruction Multiple Threads) model of computation 23 2.6.2. Designation of the terms related to GPGPU-based massively data parallel computational model ................................................................................. 23 3. Introduction to the SIMT computational model ....................................................... 24 3.1. Main components of the SIMT computational models ................................ 24 3.2. The model of the compute workload ............................................................ 24 3.2.1. The thread model ............................................................................. 25 3.2.2. The kernel concept ........................................................................... 29 3.3. The model of the platform ............................................................................ 31 3.3.1. The model of computational resources ............................................ 31 3.3.2. The memory mode ........................................................................... 35 3.4. The execution model .................................................................................... 39 3.4.1. The concept of SIMT execution ...................................................... 39 3.4.2. The concept of assigning work to the PEs ....................................... 43 3.4.3. The concept of data sharing ............................................................. 47 3.4.4. The concept of data dependent flow control .................................... 48 3.4.5. The concept of synchronization ....................................................... 50 4. The Pseudo ISA of GPGPU computing .................................................................... 51 4.1. Discussion of pseudo ISAs ........................................................................... 52 4.2. Case example: The pseudo ISA of Nvidia’s PTX virtual machine .............. 52 4.3. Case example: The pseudo ISA of Nvidia’s PTX virtual machine .............. 53 4.4. Nvidia’s compute capability concept [31], [1] ............................................. 54 4.5. a) Evolution of the functional features, called compute capabilities in subsequent versions of Nvidia’s pseudo ISA (PTX) [32] ...................................................... 55 4.6. b) Evolution of the device parameters bound to subsequent compute capability versions of Nvidia’s PTX [32] ............................................................................ 55 4.7. c) Architecture specifications bound to compute capability versions of Nvidia’s PTX [32] ............................................................................................................. 56 iii Created by XMLmind XSL-FO Converter. GPGPUs and their programming 4.8. d) Throughput of native arithmetic instructions in subsequent compute capability versions of Nvidia’s PTX (operations per clock cycle/SM) [7] .......................... 56 4.9. PTX ISA Versions generated by subsequent releases of CUDA SDKs and supported compute capability versions (sm[xx]) (designated as Supported Targets in the Table) [20] ...................................................................................................................... 57 4.10. Compute capability versions supported by Nvidia’s GPGPU cores and cards [32] 58 4.11. Forward portability of PTX code [31] ........................................................ 58 4.12. Compatibility rules of object files (CUBIN files) compiled to a particular GPGPU compute capability version [31] .......................................................................... 58 3. Overview of GPGPU cores and cards ................................................................................ 59 1. General overview ...................................................................................................... 59 1.1. GPGPU cores ............................................................................................... 59 1.2. Main features of Nvidia’s dual-GPU graphics cards .................................... 62 1.3. Main features of AMD’s/ATI’s dual-GPU graphics cards ........................... 62 1.4. Remarks on AMD-based graphics cards [36], [37] ...................................... 62 2. Overview of the main features of Nvidia’s graphics cores and cards ....................... 63 2.1. Main features of Nvidia’s graphics cores and cards ..................................... 63 2.1.1. Structure of an SM of the G80 architecture ..................................... 64 2.1.2. Positioning Nvidia’s GPGPU cards in their entire product portfolio [41] 67 2.2. Examples for Nvidia’s graphics cards .......................................................... 68 2.2.1. Nvidia GeForce GTX 480 (GF 100 based) [42] .............................. 68 2.2.2. A pair of Nvidia’s GeForce GTX 480 cards [42] (GF100 based) .... 68 2.2.3. Nvidia’s GeForce GTX 480 and 580 cards [44] .............................. 69 2.2.4. Nvidia’s GTX 680 card [80] (GK104 based) .................................. 70 2.2.5. Nvidia’s GTX Titan card [81] (GK110 based) ................................ 70 3. Overview of the main features of AMD’s graphics cores and cards ........................ 71 3.1. Main features of AMD’s graphics cores and cards ...................................... 71 3.2. Examples for AMD’s graphics cards ........................................................... 75 3.2.1. ATI HD 5970 (actually RV870 based) [46] .................................... 76 3.2.2. ATI HD 5970 (RV870 based) [47] .................................................. 76 3.2.3. AMD HD 6990 (Cayman based) [48] .............................................. 77 3.2.4. AMD Radeon HD 7970 (Tahiti XT, Southern Islands based) - First GCN implementation [49] .................................................................................. 77 3.2.5. Prices of GPGPUs as of Summer 2012 [50] ...................................