How Gpus Work

r2How.qxp 23/1/07 12:44 PM Page 96 HOW THINGS WORK Direct3D) to provide each triangle to the graphics pipeline one vertex at a How GPUs time; the GPU assembles vertices into triangles as needed. Work Model transformations A GPU can specify each logical object in a scene in its own locally David Luebke, NVIDIA Research defined coordinate system, which is Greg Humphreys, University of Virginia convenient for objects that are natu- rally defined hierarchically. This con- venience comes at a price: before rendering, the GPU must first trans- GPUs have moved away from form all objects into a common coordinate system. To ensure that triangles the traditional fixed-function aren’t warped or twisted into curved 3D graphics pipeline toward shapes, this transformation is limited to simple affine operations such as a flexible general-purpose rotations, translations, scalings, and computational engine. the like. As the “Homogeneous Coordinates” sidebar explains, by representing each vertex in homogeneous coordinates, the graphics system can perform the entire hierarchy of transformations n the early 1990s, ubiquitous THE GRAPHICS PIPELINE simultaneously with a single matrix- interactive 3D graphics was still The task of any 3D graphics system vector multiply. The need for efficient the stuff of science fiction. By the is to synthesize an image from a hardware to perform floating-point end of the decade, nearly every description of a scene—60 times per vector arithmetic for millions of ver- I new computer contained a graph- second for real-time graphics such as tices each second has helped drive the ics processing unit (GPU) dedicated to videogames. This scene contains the GPU parallel-computing revolution. providing a high-performance, visu- geometric primitives to be viewed as The output of this stage of the ally rich, interactive 3D experience. well as descriptions of the lights illu- pipeline is a stream of triangles, all This dramatic shift was the in- minating the scene, the way that each expressed in a common 3D coordinate evitable consequence of consumer object reflects light, and the viewer’s system in which the viewer is located demand for videogames, advances in position and orientation. at the origin, and the direction of view manufacturing technology, and the GPU designers traditionally have is aligned with the z-axis. exploitation of the inherent paral- expressed this image-synthesis process lelism in the feed-forward graphics as a hardware pipeline of specialized Lighting pipeline. Today, the raw computa- stages. Here, we provide a high-level Once each triangle is in a global tional power of a GPU dwarfs that of overview of the classic graphics coordinate system, the GPU can com- the most powerful CPU, and the gap is pipeline; our goal is to highlight those pute its color based on the lights in the steadily widening. aspects of the real-time rendering cal- scene. As an example, we describe the Furthermore, GPUs have moved culation that allow graphics applica- calculations for a single-point light away from the traditional fixed-function developers to exploit modern source (imagine a very small lightbulb). tion 3D graphics pipeline toward GPUs as general-purpose parallel The GPU handles multiple lights by a flexible general-purpose compu- computation engines. summing the contributions of each tational engine. Today, GPUs can individual light. The traditional graph- implement many parallel algorithms Pipeline input ics pipeline supports the Phong light- directly using graphics hardware. Most real-time graphics systems ing equation (B-T. Phong, “Illumina- Well-suited algorithms that leverage assume that everything is made of tri- tion for Computer-Generated Images,” all the underlying computational angles, and they first carve up any more Comm. ACM, June 1975, pp. 311- horsepower often achieve tremendous complex shapes, such as quadrilaterals 317), a phenomenological appearance speedups. Truly, the GPU is the first or curved surface patches, into trian- model that approximates the look of widely deployed commodity desktop gles. The developer uses a computer plastic. These materials combine a dull parallel computer. graphics library (such as OpenGL or diffuse base with a shiny specular high- 96 Computer r2How.qxp 23/1/07 12:44 PM Page 97 light. The Phong lighting equation gives the output color C = KdLi(N · L) Homogeneous Coordinates s + KsLi(R · V) . Table 1 defines each term in the Points in three dimensions are typically represented as a triple (x,y,z). In equation. The mathematics here isn’t computer graphics, however, it’s frequently useful to add a fourth coordinate, as important as the computation’s w, to the point representation. To convert a point to this new representation, structure; to evaluate this equation we set w = 1. To recover the original point, we apply the transformation efficiently, GPUs must again operate (x,y,z,w) —> (x/w, y/w, z/w). directly on vectors. In this case, we Although at first glance this might seem like needless complexity, it has sev- repeatedly evaluate the dot product of eral significant advantages. As a simple example, we can use the otherwise two vectors, performing a four-com- undefined point (x,y,z,0) to represent the direction vector (x,y,z). With this uni- ponent multiply-and-add operation. fied representation for points and vectors in place, we can also perform several useful transformations such as simple matrix-vector multiplies that would oth- Camera simulation erwise be impossible. For example, the multiplication The graphics pipeline next projects ⎡100Δx ⎤ ⎡ x ⎤ each colored 3D triangle onto the vir- ⎢ ⎥ ⎢ ⎥ tual camera’s film plane. Like the ⎢010Δy ⎥ ⎢ y ⎥ model transformations, the GPU does ⎢001Δz ⎥ ⎢ z ⎥ ⎢ ⎥ ⎢ ⎥ this using matrix-vector multiplication, ⎣000 1⎦ ⎣w ⎦ again leveraging efficient vector operations in hardware. This stage’s output can accomplish translation by an amount Dx, Dy, Dz. is a stream of triangles in screen coor- Furthermore, these matrices can encode useful nonlinear transformations dinates, ready to be turned into pixels. such as perspective foreshortening. Rasterization Each visible screen-space triangle resolution. Because the access pattern to a programmable computational sub- overlaps some pixels on the display; to texture memory is typically very strate that can support it. Fixed-func- determining these pixels is called ras- regular (nearby pixels tend to access tion units for transforming vertices and terization. GPU designers have incor- nearby texture image locations), spe- texturing pixels have been subsumed by porated many rasterizatiom algo- cialized cache designs help hide the a unified grid of processors, or shaders, rithms over the years, which all ex- latency of memory accesses. that can perform these tasks and much ploit one crucial observation: Each more. This evolution has taken place pixel can be treated independently Hidden surfaces over several generations by gradually from all other pixels. Therefore, the In most scenes, some objects replacing individual pipeline stages machine can handle all pixels in par- obscure other objects. If each pixel with increasingly programmable units. allel—indeed, some exotic machines were simply written to display mem- For example, the NVIDIA GeForce 3, have had a processor for each pixel. ory, the most recently submitted tri- launched in February 2001, introduced This inherent independence has led angle would appear to be in front. programmable vertex shaders. These GPU designers to build increasingly Thus, correct hidden surface removal shaders provide units that the pro- parallel sets of pipelines. would require sorting all triangles grammer can use for performing from back to front for each view, an matrix-vector multiplication, exponen- Texturing expensive operation that isn’t even tiation, and square root calculations, as The actual color of each pixel can always possible for all scenes. be taken directly from the lighting cal- All modern GPUs provide a depth culations, but for added realism, buffer, a region of memory that stores Table 1. Phong lighting equation terms. images called textures are often the distance from each pixel to the Term Meaning draped over the geometry to give the viewer. Before writing to the display, illusion of detail. GPUs store these tex- the GPU compares a pixel’s distance to Kd Diffuse color tures in high-speed memory, which the distance of the pixel that’s already Li Light color each pixel calculation must access to present, and it updates the display N Surface normal determine or modify that pixel’s color. memory only if the new pixel is closer. L Vector to light In practice, the GPU might require Ks Specular color multiple texture accesses per pixel to THE GRAPHICS PIPELINE, R Reflected light vector mitigate visual artifacts that can result EVOLVED V Vector to camera when textures appear either smaller GPUs have evolved from a hardwired s “Shininess” or larger on screen than their native implementation of the graphics pipeline February 2007 97 r2How.qxp 23/1/07 12:44 PM Page 98 HOW THINGS WORK GPUs introduced increased flexibility, adding support for longer programs, more registers, and control-flow primitives such as branches, loops, and subroutines. The ATI Radeon 9700 (July 2002) and NVIDIA GeForce FX (January 2003) replaced the often awkward reg- ister combiners with fully programmable pixel shaders. NVIDIA’s latest chip, the GeForce 8800 (November 2006), adds programmability to the primitive assembly stage, allowing developers to control how they con- struct triangles from transformed vertices. As Figure 2 shows, modern GPUs achieve stunning visual realism. Increases in precision have accom- panied increases in programmability. The traditional graphics pipeline pro- vided only 8-bit integers per color Figure 1.Programmable shading.The introduction of programmable shading in 2001 led channel, allowing values ranging from to several visual effects not previously possible,such as this simulation of refractive 0 to 255. The ATI Radeon 9700 chromatic dispersion for a “soap bubble”effect.

Load more