Introduction to Modern GPU Hardware

The following content are extracted from the material in the references on last page. If any wrong citation or reference missing, please contact [email protected] . I will correct the error asap. This course used only and please do NOT broadcast. Thank you. Introduction to Modern GPU Hardware Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Chiao Tung University Hsinchu, Taiwan Fall, 2016 1 Outline GPU Pipeline History of GPU Hardware GPU Hardware Consideration Modern GPU Hardware Architecture NVIDIA GeForce AMD (ATI) Radeon IMG PowerVR ARM Mali GPU Applications Summary 2 GPU Fundamentals: Graphics Pipeline Graphics State Screenspace triangles (2D) triangles Screenspace Xformed, Lit Vertices (2D) Vertices Lit Xformed, Final Pixels (Color, Depth) (Color, Pixels Final Fragments (pre Fragments Vertices (3D) Vertices Transform Assemble Application Rasterize Shade Video & Light Primitives Memory (Textures) - pixels) CPU GPU Render-to-texture • A simplified graphics pipeline – Note that pipe widths vary – Many caches, FIFOs, and so on not shown GPU Fundamentals: Modern Graphics Pipeline Graphics State Screenspace triangles (2D) triangles Screenspace Xformed, Lit Vertices (2D) Vertices Lit Xformed, Final Pixels (Color, Depth) (Color, Pixels Final Fragments (pre Fragments Vertices (3D) Vertices TransformVertex Assemble Application Rasterize FragmentShade Video Processor& Light Primitives Processor Memory (Textures) - pixels) CPU GPU Render-to-texture • Programmable vertex • Programmable pixel processor! processor! GPU Fundamentals: Modern Graphics Pipeline Graphics State Screenspace triangles (2D) triangles Screenspace Xformed, Lit Vertices (2D) Vertices Lit Xformed, Final Pixels (Color, Depth) (Color, Pixels Final Fragments (pre Fragments Vertices (3D) Vertices Vertex AssembleGeometry Fragment Application Rasterize Video Processor PrimitivesProcessor Processor Memory (Textures) - pixels) CPU GPU Render-to-texture Programmable More flexible primitive assembly! memory access! History of Graphics Hardware (1/3) … - mid ’90s SGI mainframes and workstations PC: only 2D graphics hardware mid ’90s Consumer 3D graphics hardware (PC) - 3dfx, NVIDIA, Matrox, ATI, … Triangle rasterization (only) Cheap: pushed by game industry 1999 3DFX Voodoo graphics 4MB - 1997 PC-card with TnL (Transform and Lighting) - NVIDIA GeForce: Graphics Processing Unit (GPU) PC-card more powerful than specialized workstations 6 History of Graphics Hardware (2/3) https://www.zhihu.com/question/21980949 History of Graphics Hardware (3/3) Modern graphics hardware Graphics pipeline partly programmable Leaders: AMD(ATI) and NVIDIA - “AMD Radeon HD 6990” and “NVIDIA GeForce GTX 590” Game consoles similar to GPUs (Xbox) 8 Computational Power (1/2) • GPUs are fast… – 3.0 GHz Intel Core2 Duo (Woodcrest Xeon 5160): • Computation: 48 GFLOPS peak • Memory bandwidth: 21 GB/s peak • Price: $874 (chip) – NVIDIA GeForce 8800 GTX: • Computation: 330 GFLOPS observed • Memory bandwidth: 55.2 GB/s observed • Price: $599 (board) • GPUs are getting faster, faster – CPUs: 1.4× annual growth – GPUs: 1.7× (pixels) to 2.3× (vertices) annual growth Computational Power (2/2) GPU CPU Courtesy Naga Govindaraju Flops Comparison on GPU and CPU Memory Bandwidths Comparison of CPU and GPU Motivation • Why are GPUs getting faster so fast? – Arithmetic intensity • the specialized nature of GPUs makes it easier to use additional transistors for computation – Economics • multi-billion dollar video game market is a pressure cooker that drives innovation to exploit this property Flexible and Precise • Modern GPUs are deeply programmable – Programmable pixel, vertex, and geometry engines – Solid high-level language support • Modern GPUs support “real” precision – 32-bit/64-bit floating point throughout the pipeline • High enough for many applications – DX10-class GPUs add 32-bit integers Graphics Hardware Consideration (1/2) • GPU = Graphics Processing Unit – Vector processor – Operates on 4 tuples • Position ( x, y, z, w ) • Color ( red, green, blue, alpha ) • Texture Coordinates ( s, t, r, q ) – 4 tuple ops, 1 clock cycle • SIMD [ Single Instruction Multiple Data ] – ADD, MUL, SUB, DIV, MADD, … Graphics Hardware Consideration (2/2) • Pipelining 1 2 3 – Number of stages 1 • Parallelism 2 – Number of parallel processes 3 1 2 3 • Parallelism + pipelining 1 2 3 – Number of parallel pipelines 1 2 3 Outline GPU Pipeline History of GPU Hardware GPU Hardware Consideration Modern GPU Hardware Architecture NVIDIA GeForce AMD (ATI) Radeon IMG PowerVR ARM Mali Summary 17 Growth of NVIDIA GPU • Performance matrices – Since 2000, the amount of horsepower applied to processing 3D vertices and fragments has been growing at a remarkable rate. Growth of NVIDIA GPU NVIDIA GeForce 7900 GTX Nvidia Graphics Card Architecture • GeForce-8 Series – 12,288 concurrent threads, hardware managed – 128 Thread Processor cores at 1.35 GHz == 518 GFLOPS peak Host CPU Work Distribution IU IU IU IU IU IU IU IU IU IU IU IU IU IU IU IU SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory TF TF TF TF TF TF TF TF TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 L2 L2 L2 L2 L2 L2 Memory Memory Memory Memory Memory Memory NVIDIA FERMI FERMI: Streaming Multiprocessor (SM) • Each SM contains • 32 Cores • 16 Load/Store units • 32,768 registers • Newer FP representation • IEEE 754-2008 • Two units • Floating point • Integer FERMI: Results FERMI: Comparison Kepler: Core Architecture http://www.weistang.com/article-941-1.html Titan vs Tesla Comparison 09/02/11 Maxwell: Core Architecture http://www.weistang.com/article-941-1.html http://www.coolaler.com/showthread.php/313295- %E5%8F%B2%E4%B8%8A%E6%9C%80%E9%A B%98%E6%95%88GPU%EF%BC%9ANVIDIA- Maxwell%E6%9E%B6%E6%A7%8B Kepler vs Maxwell Comparison http://www.coolaler.com/showthread.php/313295- %E5%8F%B2%E4%B8%8A%E6%9C%80%E9%AB%98%E6%95%88GPU%EF%BC%9ANVIDIA- Maxwell%E6%9E%B6%E6%A7%8B 09/02/11 https://zh.wikipedia.org/wiki/CUDA 09/02/11 NVIDIA ULP-Geforce (Tegra2) 31 NVIDIA ULP-Geforce (Tegra2) • Ultra low power (ULP) GeForce GPU with 4 pixel shaders + 4 vertex shaders • 32-bit single-channel memory controller with either LPDDR2-600 or DDR2-667 memory 32 NVIDIA ULP-Geforce (Tegra3) 33 NVIDIA ULP-Geforce (Tegra3) • The GPU in Tegra 3 is an evolution of the Tegra 2 GPU, with twice the number of pixel shader units (8 compared to 4) and higher clock frequency. • 32-bit single-channel memory controller with either LPDDR2 or DDR3 memory 34 Tegra Roadmap 09/02/11 Mobile Roadmap http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler- into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table- drawing-tablet?page=2 09/02/11 ATI Radeon X1900 XTX • Features of ATI Radeon X1900 XTX – Core speed 650 MHz – 48 pixel shader processors – 8 vertex shader processors – 51 GB/s memory http://product.pcpop.com/000024721/Index bandwidth .html – 512 MB memory ATI Radeon X1900 XTX • High Memory Bandwidth Graphics Card High bandwidth GPU 51GB/s Graphics memory Output 650MHz ½ GB AGP bus Processor Chip 2GB/s High bandwidth AGP memory Cache ½ GB CPU 77GB/s ½ MB 3GB/s Parallel Processes Parallel Main memory 3GHz 1GB ATI Radeon 9700 • Parallelism + pipelining: ATI Radeon 9700 4 vertex pipelines 8 pixel pipelines Radeon Comparison http://www.pcdiy.com.tw/detail/4275 09/02/11 IMG PowerVR Series5XT (SGXMP) 41 IMG PowerVR Series5XT (SGXMP) • Shader-driven Tile-Based Deferred Rendering (TBDR) architecture • Fully programmable GPU using unique USSE architecture • All SGX cores support OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX 9/10.1 42 IMG PowerVR Series6 (Rogue) 43 IMG PowerVR Series6 (Rogue) • Support OpenGL ES 3.0, OpenGL ES 2.0, OpenGL 3.x/4.x, OpenCL 1.x and DirectX10 with certain family members extending their capabilities to full WHQL-compliant DirectX11.1 functionality 44 IMG PowerVR 7XT Plus http://imgtec.eetrend.com/article/713045 IMG PowerVR 7XT Plus http://imgtec.eetrend.com/article/713046 Features of ARM Mali 47 ARM Mali-200 48 ARM Mali-300 49 ARM Mali-400MP 50 ARM Mali-450MP 51 ARM Mali-T604 52 ARM Mali-T604 • GPGPU (support OpenCL 1.1) • Tri-pipe architecture • The first GPU based on the Midgard architecture • True IEEE double-precision floating-point math in hardware for Full Profile • The Job Manager within Mali-T600 Series GPUs offloads task management from the CPU to the GPU • 5x performance improvement over previous Mali graphics processors. 53 ARM Mali-T624 54 9/27/2016 ARM Mali-T678 55 ARM Mali-T678 • 50% performance improvement compared to the Mali- T658. 56 ARM Mali-T760 57 ARM Mali-T880 58 ARM Mali Comparison https://zh.wikipedia.org/wiki/Mali_(GPU) 59 ARM Mali Comparison https://zh.wikipedia.org/wiki/Mali_(GPU) 60 Applications (1/7) • Includes lots of applications – Ray-tracer – Image segmentation – FFT/Linear Algebra http://f.fwallpapers.com/images/3d -bunny.jpg http://graphics.stanford.edu/data/3Ds canrep/stanford-bunny-cebal-ssh.jpg Applications (2/7) http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler- into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table- 09/02/11 drawing-tablet?page=2 Applications (3/7) http://5pit.tw/tech/computer/tid_12880 Applications (4/7) http://wechatinchina.com/thread-461154-1-1.html 09/02/11 Applications (5/7) https://read01.com/Pnd3D.html 09/02/11 Applications (6/7) AR and VR Applications @@ http://wechatinchina.com/thread-461154-1-1.html 09/02/11 Applications (7/7) http://www.naipo.com/Portals/1/web_tw/Knowledge_Center/Industry_E conomy/publish-482.htm 09/02/11 Summary Understand the GPU pipeline in depth Understand the motivation of of GPU hardware Understand modern GPU hardware architecture and specifications Understand GPU/GPGPU applications 68 Reference GPU Architecture & CG, Mark Colbert, 2006 Introduction to Graphics Hardware and GPUs, Yannick Francken, Tom Mertens GPU Tutorial, Yiyunjin, 2007 Evolution of GPU and Graphics Pipelining, Weijun Xiao Commercial product website (NVIDIA, ATI, IMG, ARM).

Load more