<<

The following content are extracted from the material in the references on last page. If any wrong citation or reference missing, please contact [email protected] . I will correct the error asap. This course used only and please do NOT broadcast. Thank you.

Introduction to Modern GPU Hardware

Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Yang Ming Chiao Tung University Hsinchu, Taiwan Spring, 2021

1 Outline

 GPU Pipeline  GPU Hardware History  GPU Hardware Consideration  Modern GPU Hardware Architecture  GeForce  AMD (ATI)  IMG PowerVR  ARM Mali  GPU Applications

 Summary 2 GPU Fundamentals:

Graphics State

Screenspace Screenspace triangles (2D)

Xformed, VerticesXformed, Lit (2D)

Final Final (Color, Depth)

Fragments Fragments (pre Vertices (3D) Transform Assemble Application Rasterize Shade Video & Light Primitives Memory

(Textures)

- pixels)

CPU GPU Render-to-texture

• A simplified graphics pipeline – Note that pipe widths vary – Many caches, FIFOs, and so on not shown GPU Fundamentals: Modern Graphics Pipeline

Graphics State

Screenspace Screenspace triangles (2D)

Xformed, VerticesXformed, Lit (2D)

Final Final Pixels(Color, Depth)

Fragments Fragments (pre Vertices ( TransformVertex Assemble Application Rasterize FragmentShade Video

Processor& Light Primitives Processor Memory 3

D) (Textures)

- pixels)

CPU GPU Render-to-texture

• Programmable vertex • Programmable processor! processor! GPU Fundamentals: Modern Graphics Pipeline

Graphics State

Screenspace Screenspace triangles (2D)

Xformed, VerticesXformed, Lit (

Final Final Pixels(Color, Depth)

Fragments Fragments (pre Vertices (3D) Vertex AssembleGeometry Fragment Application Rasterize Video Processor PrimitivesProcessor Processor Memory

(Textures)

-

pixels)

2 D)

CPU GPU Render-to-texture

 Programmable  More flexible primitive assembly! memory access! History of Graphics Hardware (1/3)  … - mid ’90s

 SGI mainframes and workstations

 PC: only 2D graphics hardware  mid ’90s

 Consumer 3D graphics hardware (PC) - 3dfx, NVIDIA, , ATI, …

 Triangle rasterization (only)

 Cheap: pushed by game industry  1999 3DFX Voodoo graphics 4MB - 1997  PC-card with TnL (Transform and Lighting) - NVIDIA GeForce: (GPU)

 PC-card more powerful than specialized workstations 6 History of Graphics Hardware (2/3)

https://www.zhihu.com/question/21980949 History of Graphics Hardware (3/3)  Modern graphics hardware

 Graphics pipeline partly programmable

 Leaders: AMD(ATI) and NVIDIA - “AMD Radeon HD 6990” and “NVIDIA GeForce GTX 590”

 Game consoles similar to GPUs (Xbox)

8 Computational Power

• GPUs are fast… – 3.0 GHz Core2 Duo (Woodcrest Xeon 5160): • Computation: 48 GFLOPS peak • Memory bandwidth: 21 GB/s peak • Price: $874 (chip) – NVIDIA GeForce 8800 GTX: • Computation: 330 GFLOPS observed • Memory bandwidth: 55.2 GB/s observed • Price: $599 (board) • GPUs are getting faster, faster – CPUs: 1.4× annual growth – GPUs: 1.7× (pixels) to 2.3× (vertices) annual growth Comparison on GPU and CPU (1/2)

GPU

CPU

Courtesy Naga Govindaraju Comparison on GPU and CPU (2/2)

https://wccftech.com/nvidia-pascal-volta-gpus-sc15/ Motivation

• Why are GPUs getting faster so fast? – Driven forces • Data driven • AI driven • Science driven • Engineering driven • Customer driven • Game driven • Others…. Flexible and Precise

• Modern GPUs are deeply programmable – Programmable pixel, vertex, and geometry engines – Solid high-level language support • Modern GPUs support “real” precision – 32-bit/64-bit floating point throughout the pipeline • High enough for many applications – DX10-class GPUs add 32-bit integers Graphics Hardware Consideration (1/2) • GPU = Graphics Processing Unit – – Operates on 4 tuples • Position ( x, y, z, w ) • Color ( red, green, blue, alpha ) • Texture Coordinates ( s, t, r, q ) – 4 tuple ops, 1 clock cycle • SIMD [ Single Instruction Multiple Data ] – ADD, MUL, SUB, DIV, MADD, … Graphics Hardware Consideration (2/2)

• Pipelining 1 2 3 – Number of stages

1 • Parallelism 2 – Number of parallel processes 3

1 2 3

• Parallelism + pipelining 1 2 3 – Number of parallel pipelines 1 2 3 Outline

 GPU Pipeline  History of GPU Hardware  GPU Hardware Consideration  Modern GPU Hardware Architecture  NVIDIA GeForce  AMD (ATI) Radeon  IMG PowerVR  ARM Mali  Summary

16 Growth of NVIDIA GPU

• Performance matrices – Since 2000, the amount of horsepower applied to processing 3D vertices and fragments has been growing at a remarkable rate. Growth of NVIDIA GPU NVIDIA GeForce 7900 GTX Nvidia Graphics Card Architecture

• GeForce-8 Series – 12,288 concurrent threads, hardware managed – 128 Thread Processor cores at 1.35 GHz == 518 GFLOPS peak

Host CPU Work Distribution

IU IU IU IU IU IU IU IU IU IU IU IU IU IU IU IU

SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP

Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory

TF TF TF TF TF TF TF TF

TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1

L2 L2 L2 L2 L2 L2

Memory Memory Memory Memory Memory Memory NVIDIA Roadmap

https://videocardz.com/specials/roadmaps 09/02/11 NVIDIA Roadmap https://3cjohnhardware.wordpress.com/2018/08/28/nvidia-7nm/

09/02/11 NVIDIA FERMI FERMI: Streaming Multiprocessor (SM)

• Each SM contains • 32 Cores • 16 Load/Store units • 32,768 registers • Newer FP representation • IEEE 754-2008 • Two units • Floating point • Integer FERMI: Results FERMI: Comparison Kepler: Core Architecture http://www.weistang.com/article-941-1.html Maxwell: Core Architecture http://www.weistang.com/article-941-1.html

http://www.coolaler.com/showthread.php/313295- %E5%8F%B2%E4%B8%8A%E6%9C%80%E9%A B%98%E6%95%88GPU%EF%BC%9ANVIDIA- Maxwell%E6%9E%B6%E6%A7%8B Kepler vs Maxwell Comparison 2012 2014

http://www.coolaler.com/showthread.php/313295- %E5%8F%B2%E4%B8%8A%E6%9C%80%E9%AB%98%E6%95%88GPU%EF%BC%9ANVIDIA- Maxwell%E6%9E%B6%E6%A7%8B Pascal: Core Architecture

https://read01.com/zh-tw/oemmE4.html#.Wi5F30qWYps Volta: Core Architecture

http://technews.tw/2017/05/11/nvidia-gpu-volta/ Pascal vs Volta Comparison

2016 2017

http://technews.tw/2017/05/11/nvidia-gpu-volta/ Ampere: Core Architecture

https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/ Ampere: Core Architecture

https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/ Volta vs Ampere Comparison

https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/ Mobile Roadmap

http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler- into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table- drawing-tablet?page=2 09/02/11 https://zh.wikipedia.org/wiki/CUDA 09/02/11 ATI Radeon 9700

• Parallelism + pipelining: ATI Radeon 9700

4 vertex pipelines 8 pixel pipelines https://technews.tw/2020/03/10/amd-gpu-dual-track/ AMD Roadmap

https://videocardz.com/specials/roadmaps 09/02/11 http://wccftech.com/amd-vega-4096-gcn-stream-processors/ http://www.anandtech.com/show/9233/amds-2016-gpu-roadmap- -high-bandwidth-memory http://www.anandtech.com/show/9233/amds-2016-gpu-roadmap- finfet-high-bandwidth-memory https://www.youtube.com/watch?v=l_f_lIF3A7Q http://imgtec.eetrend.com/content/2019/100045623.html

45 http://imgtec.eetrend.com/news/7355

46 IMG PowerVR 8XE Plus

http://www.anandtech.com/show/11028/powervr-8xe-plus-announced 47 IMG PowerVR 8XE Plus

http://www.anandtech.com/show/11028/powervr-8xe-plus-announced 48 IMG PowerVR 8XE Plus

http://www.anandtech.com/show/11028/powervr-8xe-plus-announced 49 IMG PowerVR 8XE Plus

http://www.anandtech.com/show/11028/powervr-8xe-plus-announced 50 ای/-powervrپردازندهی-گرافیکی-گوشی-مقایسه-/http://intotech.ir/phone-tablet/proccessor

51 IMG PowerVR 9XE

https://qooah.com/2017/09/26/imagination-launch-powervr-9/

52 IMG PowerVR 9XE

https://kknews.cc/tech/86m6g6n.html 53 IMG PowerVR 9XE

https://kknews.cc/tech/86m6g6n.html 54 Features of ARM Mali http://www.arm.com/products/graphics-and-multimedia/mali-gpu

55 ARM Mali

http://www.grdkingdom.com/2013/10/armt7 60-400-gpu.html 56 ARM G7X Series

http://www.semiinsights.com/s/electronic_components/23/37012.shtml 57 ARM Mali-200

2007

58 ARM Mali-300

59 ARM Mali-400MP

2008 60 ARM Mali-450MP

2012 61 ARM Mali-T604

62 ARM Mali-T604

• GPGPU (support OpenCL 1.1). • Tri-pipe architecture. • The first GPU based on the Midgard architecture. • True IEEE double-precision floating-point math in hardware for Full Profile. • The Job Manager within Mali-T600 Series GPUs offloads task management from the CPU to the GPU. • 5x performance improvement over previous Mali graphics processors.

63 ARM Mali-T624

2012

64 3/16/2021 ARM Mali-T678

65 ARM Mali-T678

• 50% performance improvement compared to the Mali-

T658. 66 ARM Mali-T760

2013

67 ARM Mali-T880

2016

68 ARM G7X Series

http://www.semiinsights.com/s/electronic_components/23/37012.shtml 69 Applications (1/5)

• Includes lots of applications – Ray-tracer – Image segmentation – FFT/Linear Algebra

http://f.fwallpapers.com/images/3d -bunny.jpg http://graphics.stanford.edu/data/3Ds canrep/stanford-bunny-cebal-ssh.jpg Applications (2/5)

http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler- into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table- 09/02/11 drawing-tablet?page=2 Applications (3/5)

http://5pit.tw/tech/computer/tid_12880 Applications (4/5)

AR and VR Applications @@

http://wechatinchina.com/thread-461154-1-1.html

09/02/11 Applications (5/5)

http://www.naipo.com/Portals/1/web_tw/Knowledge_Center/Industry_E conomy/publish-482.htm 09/02/11 GPU Solve ALL Problems?

https://www.youtube.com/watch?v=6YPWrgCLiLA Summary

 Understand the GPU pipeline in depth  Understand the motivation of of GPU hardware  Understand modern GPU hardware architecture and specifications  Understand GPU/GPGPU applications and key problems

76 Reference

 GPU Architecture & CG, Mark Colbert, 2006  Introduction to Graphics Hardware and GPUs, Yannick Francken, Tom Mertens  GPU Tutorial, Yiyunjin, 2007  Evolution of GPU and Graphics Pipelining, Weijun Xiao  Commercial product website (NVIDIA, ATI, IMG, ARM).  Referencing SIGGRAPH 2005 Course Notes from David Luebke  Adapted from: David Luebke (University of Virginia) and NVIDIA  Jan Verschelde, MCS 572 Lecture 27, Introduction to Supercomputing, 17 March 2014  Acknowledgement:  Thanks for TA’s help for preparing the material.

77