The GPU Computing Revolution

The GPU Computing Revolution From Multi-Core CPUs to Many-Core Graphics Processors A Knowledge Transfer Report from the London Mathematical Society and Knowledge Transfer Network for Industrial Mathematics By Simon McIntosh-Smith Copyright © 2011 by Simon McIntosh-Smith Front cover image credits: Top left: Umberto Shtanzman / Shutterstock.com Top right: godrick / Shutterstock.com Bottom left: Double Negative Visual Effects Bottom right: University of Bristol Background: Serg64 / Shutterstock.com THE GPU COMPUTING REVOLUTION From Multi-Core CPUs To Many-Core Graphics Processors By Simon McIntosh-Smith Contents Page Executive Summary 3 From Multi-Core to Many-Core: Background and Development 4 Success Stories 7 GPUs in Depth 11 Current Challenges 18 Next Steps 19 Appendix 1: Active Researchers and Practitioner Groups 21 Appendix 2: Software Applications Available on GPUs 23 References 24 September 2011 A Knowledge Transfer Report from the London Mathematical Society and the Knowledge Transfer Network for Industrial Mathematics Edited by Robert Leese and Tom Melham London Mathematical Society, De Morgan House, 57–58 Russell Square, London WC1B 4HS KTN for Industrial Mathematics, Surrey Technology Centre, Surrey Research Park, Guildford GU2 7YG 2 THE GPU COMPUTING REVOLUTION From Multi-Core CPUs To Many-Core Graphics Processors AUTHOR Simon McIntosh-Smith is head of the Microelectronics Research Group at the Univer- sity of Bristol and chair of the Many-Core and Reconfigurable Supercomputing Conference (MRSC), Europe’s largest conference dedicated to the use of massively parallel computer architectures. Prior to joining the university he spent fifteen years in industry where he designed massively parallel hardware and software at companies such as Inmos, STMicro- electronics and Pixelfusion, before co-founding ClearSpeed as Vice-President of Architec- ture and Applications. In 2006 ClearSpeed’s many-core processors enabled the creation of one of the fastest and most energy-efficient supercomputers: TSUBAME at Tokyo Tech. He has given many invited talks on how massively parallel, heterogeneous processors are rev- olutionising high-performance computing, and has published numerous papers on how to exploit the speed of graphics processors for scientific applications. He holds eight patents in parallel hardware and software and sits on the steering and programme committees of most of the well-known international high-performance computing conferences. www.cs.bris.ac.uk/home/simonm/ [email protected] A KNOWLEDGE TRANSFER REPORT FROM THE LMS 3 AND THE KTN FOR INDUSTRIAL MATHEMATICS Executive Summary Computer architectures are hundreds or even thousands of more effective drugs for the undergoing their most radical cores. Many-core processors treatment of disease. change in a decade. In the past, potentially offer an order of processor performance has been magnitude greater performance Thus new many-core processors improved largely by increasing than conventional processors, a represent one of the biggest clock speed — the faster the clock significant increase that, if advances in computing in recent speed, the faster a processor can harnessed, can deliver major history. They also indicate the execute instructions, and thus the competitive advantages. direction of all future processor greater the performance that is design — we cannot avoid this delivered to the end user. This But to take advantage of these major paradigm shift into massive drive to greater and greater clock powerful new many-core parallelism. All processors from all speeds has stopped, and indeed in processors, most users will have to the major vendors will become some cases is actually reversing. radically redesign their software, many-core designs over the next Instead, computer architects are potentially needing completely new few years. Whether this is many designing processors that include algorithms that can exploit massive identical mainstream CPU cores or multiple cores that run in parallel. parallelism. a heterogeneous mix of different For the purposes of this report we kinds of core remains to be seen, shall think of a core as one but it is inevitable that there will be independent ‘brain’, able to execute The many-core lots of cores in each processor, and its own stream of instructions, opportunity importantly, to harness their independently fetching the data it performance, most software and algorithms will need redesigning. requires and writing its own results For computationally challenging back to a main memory. This shift problems in business and scientific There is significant competitive from increasing clock speeds to research, these new, highly advantage to be gained for those increasing core counts — and thus parallel, many-core architectures who can embrace and exploit this increasing parallelism — is represent a technological new, many-core technology, and presenting significant new breakthrough that can deliver considerable risk for any who are opportunities and challenges. significant benefits in science and unable to adapt to a many-core business benefits. Such a During this architectural shift a new approach. This report explains the step-change in available computing class of many-core computer background to these developments, performance could enable faster architectures is emerging from the presents several success stories, financial trades, higher-resolution graphics market into the and describes a number of ways to simulations, the design of more mainstream. This new class of get started with many-core competitive products such as processor already exploits technologies. increasingly aerodynamic cars and 4 THE GPU COMPUTING REVOLUTION From Multi-Core CPUs To Many-Core Graphics Processors From Multi-Core to Many-Core: Background and Development Historically, advances in computer physical limitations were going to performance as the single 6GHz hardware have delivered change everything. Herb Sutter core (much more on this later). performance improvements that noticed this significant change in Today even laptop processors have been transparent to the end his widely cited 2004 essay ‘The contain at least two cores, with user – software has run more Free Lunch is Over’ [110]. Figure 1 most mainstream server CPUs quickly on newer hardware without is taken from Sutter’s paper, already containing four or six cores. having to make any changes to that updated in 2009, and shows how Thus, due to the power software. We have enjoyed single thread performance consumption problem, increasing decades of ‘single thread’ increases, based on increasing core counts have replaced performance improvement, which clock speed and instruction level increasing clock speeds as the relied on converting exponential parallelism improvements, could main method of delivering greater increases in available transistors only have continued at the cost of hardware performance. (For more (known as Moore’s Law [83]), into significant increases in processor details see Box 1 later.) increasing processor clock speeds power consumption. Something and advances in microprocessor had to change — the free lunch Of course the important implication pipelining, instruction-level was over. of ‘the free lunch is over’ is that parallelism and out-of-order most existing software will not just execution. All these technology The answer was, and is, go faster when we increase the developments are essentially parallelism. It transpires that, for number of cores inside a ‘under the hood’, with existing various semiconductor processor, unless you are in the software generally performing physics-related reasons, a fortunate position that your faster on next generation hardware processor containing two 3GHz performance-critical software is without programmer intervention. cores will use significantly less already ‘multi-core aware’. Most of these developments have energy (and thus dissipate less now hit a wall. waste heat) than an alternative So, hardware is suddenly and processor containing a single 6GHz dramatically changing: it is In 2003 it started to become core. Yet in theory the dual-core delivering ever more performance, apparent that several fundamental 3GHz device can deliver the same but is now doing so through rapidly #!$!!!$!!!" ,)-./012&$34-'5)*$6$ #$!!!$!!!" 3'4&.$7%8$92&':;$ <;1)20&;=$3'4&.>$?5@5A&:5->$BC$D.)@14)'E$ #!!$!!!" %&'()*$+$ #!$!!!" %&'()*$ #$!!!" !"#$ #!!" #!" #" *+,-./.01+."2!!!3" 45167"89::;"2<=>3" ?1@:+"2A3" ?:+BC45167"2DE?3" !" #%&!" #%&'" #%(!" #%('" #%%!" #%%'" )!!!" )!!'" )!#!" Figure 1: Graph of four key technology trends from ‘The Free Lunch is Over’: transistors per processor, clock speed, power consumption and instruction level parallelism [110]. A KNOWLEDGE TRANSFER REPORT FROM THE LMS 5 AND THE KTN FOR INDUSTRIAL MATHEMATICS increasing parallelism. Software massive parallelism are graphics incredibly high-performance GPUs will therefore need to change processors, often called Graphics to solve non-graphics problems. radically in order to exploit these Processing Units or GPUs. The term ‘General-Purpose new parallel architectures. This Contemporary GPUs first appeared computation on Graphics shift to parallelism represents one in the 1990’s and were originally Processing Units’ or GPGPU, was of the largest challenges that designed to run the 2D and 3D coined by Mark Harris in 2002 [51], software developers have faced. graphical operations used by the and since then GPUs have Modifying existing software to make burgeoning computer games continued to become more

The GPU Computing Revolution

1 Intel CEO Remarks Pat Gelsinger Q2'21 Earnings Webcast July 22

SIMD Extensions

Recent International Trade Commission Representations

PGI Fortran Guide

Gpus: the Hype, the Reality, and the Future

Retail Prices in RED Color Are Sale Prices. Limit One Per Customer

The Portland Group

The Portland Group

Accelerating Applications with Pattern-Specific Optimizations On

NVIDIA Corp NVDA (XNAS)

Intel® Compute Stick STK2M3W64CC STK2MV64CC STK2M364CC Technical Product Specification

Content Marketing Members-Only Conference