ENTERTAINMENT COMPUTING

and it’s impractical or unnecessary to retain the entire data set” (graphics. The GPU Enters stanford.edu/papers/humper_thesis/ humper_thesis.pdf; p.30). According to Fred Brooks, a developer of IBM’s Stretch supercomputers, stream pro- Computing’s cessing goes back to the 1950s when IBM built the Harvest variant of the Stretch system for the National Security Agency. Harvest treats large volumes of Mainstream encrypted text as a stream and has a special coprocessor designed specifically to handle this task. Michael Macedonia, Georgia Tech Research Institute DARPA has also backed Bill Dally’s Imagine stream processor work at

he Siggraph/Eurographics Graphics Hardware 2003 workshop (graphicshardware. The graphical processing T org), held in San Diego, will unit is visually and visibly likely be remembered as a turning point in modern computing. changing the course of In one of those rare moments when a general-purpose computing. new paradigm visibly begins changing general-purpose computing’s course, what has traditionally been a graph- ics-centric workshop shifted its atten- for another five years. At this rate, GPU Stanford (U.J. Kapasi et al., “Program- tion to the nongraphics applications performance will move inexorably into mable Stream Processors,” , of the . the teraflop range by 2005. August 2003, pp. 54-62). Imagine, a GPUs, made by Nvidia (www.nvidia. What does this mean for consumers? programmable architecture, “executes com) and ATI (www.ati.com), function For less than $400 they can buy an off- stream-based programs and is able to as components in graphics subsystems the-shelf graphics card today with per- sustain tens of Gflops over a range of that power everything from Microsoft’s formance comparable to a top-of-the- media applications with a power dissi- Xbox to high-end visualization systems line image generator that cost hundreds pation of less than 10 Watts” (cva. from Hewlett-Packard and SGI. The of thousands of dollars in 1999. Intense stanford.edu/imagine/). GPUs act as coprocessors to CPUs such competition has driven this explosive Similarly, GPUs treat computer as ’s Pentium, using a fast bus such growth, with Nvidia and ATI leapfrog- graphics primitives such as vertices and as Intel’s Advanced Graphics Port. ging each other’s introduction of a new pixels as streams. Multiple program- AGP8x has a peak bandwidth of 2.1 GPU generation every six months. mable processing units connect via gigabytes per second—speed it needs to Recently, this battle expanded data flows. In a GPU, a vertex proces- avoid inflicting bus starvation on data- beyond the PC to include video game sor transforms and processes points— hungry GPU coprocessors. consoles. Nvidia, which years ago as a four-component vector—and a staged a coup by locking down the fragment processor computes pixel RAW PERFORMANCE rights to make the GPU for Microsoft’s color. As a stream processor, a GPU The new GPUs are very fast. Xbox, lost out when Microsoft performs simple operations and Stanford’s Ian Buck calculates that the awarded the GPU manufacturing exploits spatial parallelism. For exam- current GeForce FX 5900’s perfor- rights for the Xbox’s next-generation ple, the fragment processor runs the mance peaks at 20 Gigaflops, the successor to ATI. same program for each pixel. In the equivalent of a 10-GHz Pentium— case of the ATI R300, eight-pixel with, according to Nvidia, even more STREAM PROCESSING pipelines handle single-instruction, speed on the horizon. Performance Stream processing gives GPUs their multiple-computing processing. growth has multiplied at a rate of 2.8 impressive speed. Streams constitute a GPUs’ simple architectures devote times per year since 1993, a pace ana- “computational primitive because large large areas of chip real estate to the lysts expect the industry to maintain amounts of data arrive continuously, computational engines. For example,

106 Computer the 0.15-micron-process ATI R300 chip has more than 110 million transis- tors, while Intel’s Xeon microprocessor has 108 million. However, the Xeon chip devotes more than 60 percent of its transistors to cache (www.anandtech. com/printarticle.html?i=1749). The Nvidia NV40, to be announced this fall and manufactured using a 0.13 process, is rumored to have 150 mil- lion transistors.

HACKING THE GPU Two years ago, Nvidia and ATI shook the commu- nity by making their GPUs program- mable. Programmability—nothing new in the graphics world, as a description of the Pixel Flow architecture reveals (www.cs.unc.edu/~molnar/Papers/PxFl- hwws97.pdf)—provided a dramatic leap forward for commodity graphics processors. With programmability, developers could code small programs, Figure 1. Nvidia’s cutting-edge Siggraph demo. Developers rendered this exquisitely ani- known as shaders, to perform real-time mated fairy in real time, leveraging the power of today’s GPUs. rendering effects such as bump map- ping or shadows. These shaders can be the new field of high dynamic range 2003 with its Dawn fairy demo, pro- written in assembly or languages such imaging and image-based lighting. duced by Joe Demers. The real-time as Nvidia’s Cg (www.cgshaders.org/) Image-based lighting illuminates objects demo is available online at www. or Microsoft’s High-Level Shader Lan- with images of light from the real nvidia.com/object/demo_dawn.html. guage (msdn.microsoft.com/directx/). world, and HDRI techniques imbue the Dawn stands at the threshold of a Microsoft has been a driving force in scene with the same dynamic range of new art form. In a case of art imitating making GPUs programmable, starting real light. Earlier GPUs could provide reality, a new class of movies called with its introduction of DirectX 8.0 as only 8 bits for each red, green, and blue machinima uses video game engines the 3D graphics API for MS Windows color value, which limited the dynamic such as Unreal Tournament to render and, in variant form, for the Xbox. Pre- range to 256 levels. Floating-point color their visuals (www.machinima.com/). vious DirectX versions worked as fixed- removes this restriction, allowing a High-quality, real-time rendering may function pipelines, meaning developers dramatically better representation of soon give way to creating full-length could not program their hardware and real light values. Most importantly, the commercial movies with this technique. had to rely on static API functions. introduction of 32-bit floating-point The DirectX 9.0 specification pro- capability brings GPUs closer to true BEYOND GRAPHICS vided the next major boost in GPU general-purpose computing. In a recent Wired magazine inter- development. First, DX9 allows much view, Nvidia CEO Jen-Hsun Huang longer shader programs and gives THE ULTIMATE MEDIA PROCESSOR said, “The microprocessor will be ded- developers a greatly expanded pixel- This performance and programma- icated to other things like artificial shading instruction set. Second and bility revolution has already resulted intelligence. That trend is helpful to us. most important, DX9 breaks the math- in spectacular computer graphics. It’s a trend that’s inevitable.” (www. ematical precision barrier that had Nvidia markets its GPUs as hardware wired.com/wired/archive/10.07/Nvidia. previously limited PC graphics. Pre- capable of enabling a “cinematic com- html?pg=1&topic=&topic_set=) Sur- cision, and therefore visual quality, puting experience” because interactive prisingly, GPUs—not just CPUs—are increases when displayed with 128-bit graphics now approach the quality of making this vision real. The latest floating-point color per pixel (www. fully rendered motion-picture anima- GPUs now perform nongraphics func- nvidia.com/object/dx9_tb.html). tion. As Figure 1 shows, Nvidia sup- tions such as motion planning and Floating-point color has opened up plied proof of this progress at Siggraph physics for game AI, making them the

October 2003 107 Entertainment Computing

ultimate processor for entertainment funding Ian Buck’s development of the applications. In addition, GPUs can Brook stream-processing language. compute fast-Fourier transform func- Brook hides current GPUs’ graphics tions such as real-time MPEG video abstractions, such as texture memory, compression or audio rendering for and is intended to pave the way for Dolby 5.1 sound systems. future stream processors such as the This year’s Graphics Hardware DARPA-sponsored Tera-op reliable intel- Workshop also provided a forum for ligently adaptive processing system (www. speculating about the potential for graphicshardware.org/presentations/ GPU and stream-processor use in Buck.ppt). The Trips chip, jointly devel- nonentertainment applications. Mark oped by IBM and the University of Harris has dedicated a Web site (www. Texas, will serve as part of an overall gpgpu.org) to these and other general- supercomputer design effort at IBM purpose GPU applications. The site to develop a productive, easy-to-use, lists several papers describing GPU reliable computing system (www. uses that range from linear algebra to research.ibm.com/resources/news/200 simulating ice crystal growth. 30827_trips.shtml). DARPA also con- For example, Dinesh Manocha and tributes around $11 million annually Ming Lin’s research group at the to the project’s funding. The system will University of North Carolina, Chapel reportedly perform one quadrillion Hill, has shown how GPUs can perform operations per second and, by 2010, fast computation of Voronoi diagrams will be able to analyze its own work- (www.cs.unc.edu/~geom/voronoi/). flow and optimize its own hardware These diagrams partition a plane with n and software resources. points into n convex polygons so that each polygon contains exactly one point, and every point in a given polygon is ntil GPUs mature and become closer to its central point than to any much easier for developers to use, other. Computations such as these can U IBM will continue to help build be useful in applications like robotics. supercomputers for entertainment. Console game market leader Sony A pain to program invited IBM and Toshiba to collabo- GPUs have a long way to go, how- rate on development of the Cell, a chip ever, before they become truly general that will serve as the GPU for Sony’s purpose. Developers find programming PlayStation 3. current models painful because these Analysts estimate that the Cell will processors lack many essential features use a 50-nanometer process and operate such as real debuggers or profilers. Nor at 4 GHz. Designing the chip required do they support branching. Nvidia’s 300 engineers and cost $400 million. primitive language, Cg, offers no sup- Estimates show that developing the port for pointers. Worse, GPU vendors plant to produce these supercomputers- have limited programming on these on-a-chip will cost more than $1.6 bil- devices to the DirectX or OpenGL APIs lion (news.com.com/2100-1041_3- and have kept internal functions secret. 997596.html). Why is Sony willing to If made public, researchers could use spend so much? To leverage the vast this information to explore new uses market it has created by selling 80 mil- for these inexpensive but powerful lion PlayStation 2 consoles. devices.

On the horizon DARPA may provide a solution to Michael Macedonia is a senior scien- this mess by funding both new tools tist at the Georgia Tech Research and new processors that exploit stream Institute, Atlanta. Contact him at computing. For example, the agency is [email protected].