A Closer Look at Gpus

A Closer Look at Gpus

practice DOI:10.1145/1400181.1400197 T2, and increasingly multicore x86 sys- As the line between GPUs and CPUs tems from Intel and AMD, differentiate themselves from traditional CPU de- begins to blur, it’s important to understand signs by prioritizing high-throughput what makes GPUs tick. processing of many parallel operations over the low-latency execution of a sin- BY KAYVON FATAHALIAN AND MIKE HOUSTON gle task. GPUs assemble a large collection of fixed-function and software-program- mable processing resources. Impressive statistics, such as ALU (arithmetic logic unit) counts and peak floating-point rates often emerge during discussions of GPU design. Despite the inherently parallel nature of graphics, however, ef- A Closer ficiently mapping common rendering algorithms onto GPU resources is ex- tremely challenging. The key to high performance lies in strategies that hardware components and their corresponding software in- Look at terfaces use to keep GPU processing resources busy. GPU designs go to great lengths to obtain high efficiency and conveniently reduce the difficulty pro- grammers face when programming graphics applications. As a result, GPUs deliver high performance and expose GPUs an expressive but simple programming interface. This interface remains largely devoid of explicit parallelism or asyn- chronous execution and has proven to be portable across vendor implementa- tions and generations of GPU designs. A GAMER wanders through a virtual world rendered At a time when the shift toward in near-cinematic detail. Seconds later, the screen throughput-oriented CPU platforms is fills with a 3D explosion, the result of unseen enemies prompting alarm about the complexity of parallel programming, understand- hiding in physically accurate shadows. Disappointed, ing key ideas behind the success of the user exits the game and returns to a computer GPU computing is valuable not only for desktop that exhibits the stylish 3D look-and-feel developers targeting software for GPU execution, but also for informing the of a modern window manager. Both of these visual design of new architectures and pro- experiences require hundreds of gigaflops of computing gramming systems for other domains. In this article, we dive under the hood of performance, a demand met by the GPU (graphics a modern GPU to look at why interactive processing unit) present in every consumer PC. rendering is challenging and to explore The modern GPU is a versatile processor that the solutions GPU architects have de- vised to meet these challenges. flaherty constitutes an extreme but compelling point in the growing space of multicore parallel computing The Graphics Pipeline DREW A graphics system generates images architectures. These platforms, which include GPUs, that represent views of a virtual scene. the STI Cell Broadband Engine, the Sun UltraSPARC This scene is defined by the geometry, BY ILLUSTRATION 50 COMMUNICATIONS OF THE ACM | OCTOBER 2008 | VOL. 51 | NO. 10 practice Figure 1: A simplified graphics pipeline. by application code. Figure 2 illustrates an application-programmable stage. the operation of key pipeline stages. PO (pixel operations). PO uses each VG (vertex generation). Real-time fragment’s screen position to calculate Memory Buffers graphics APIs represent surfaces as and apply the fragment’s contribution vertex vertex descriptors collections of simple geometric primi- to output image pixel values. PO ac- generation vertex data buffers tives (points, lines, or triangles). Each counts for a sample’s distance from the (VG) primitive is defined by a set of vertices. virtual camera and discards fragments To initiate rendering, the application that are blocked from view by surfaces vertex global buffers provides the pipeline’s VG stage with a closer to the camera. When fragments processing textures list of vertex descriptors. From this list, from multiple primitives contribute to (VP) VG prefetches vertex data from memory the value of a single pixel, as is often the and constructs a stream of vertex data case when semi-transparent surfaces primitive vertex topology records for subsequent processing. In overlap, many rendering techniques generation practice, each record contains the 3D rely on PO to perform pixel updates (PG) (x,y,z) scene position of the vertex plus in the order defined by the primitives’ additional application-defined param- positions in the PP output stream. All primitive global buffers eters such as surface color and normal graphics APIs guarantee this behavior, textures processing vector orientation. and PO is the only stage where the order (PP) VP (vertex processing). The behavior of entity processing is specified by the of VP is application programmable. VP pipeline’s definition. fragment operates on each vertex independently generation Shader Programming (FG) and produces exactly one output vertex record from each input record. One of The behavior of application-program- the most important operations of VP ex- mable pipeline stages (VP, PP, FP) is global buffers fragment ecution is computing the 2D output im- defined byshader functions (or shaders). processing textures (FP) age (screen) projection of the 3D vertex Graphics programmers express vertex, position. primitive, and fragment shader func- PG (primitive generation). PG uses tions in high-level shading languages pixel output image operations vertex topology data provided by the ap- such as NVIDIA’s Cg, OpenGL’s GLSL, (PO) plication to group vertices from VP into or Microsoft’s HLSL. Shader source is an ordered stream of primitives (each compiled into bytecode offline, then fixed-function stage primitive record is the concatenation of transformed into a GPU-specific binary shader-defined stage several VP output vertex records). Vertex by the graphics driver at runtime. topology also defines the order of primi- Shading languages support complex tives in the output stream. data types and a rich set of control-flow PP (primitive processing). PP operates constructs, but they do not contain orientation, and material properties of independently on each input primitive primitives related to explicit parallel object surfaces and the position and to produce zero or more output primi- execution. Thus, a shader definition is characteristics of light sources. A scene tives. Thus, the output of PP is a new a C-like function that serially computes view is described by the location of a vir- (potentially longer or shorter) ordered output-entity data records from a single tual camera. Graphics systems seek to stream of primitives. Like VP, PP opera- input entity. Each function invocation is find the appropriate balance between tion is application programmable. abstracted as an independent sequence conflicting goals of enabling maximum FG (fragment generation). FG samples of control that executes in complete performance and maintaining an ex- each primitive densely in screen space isolation from the processing of other pressive but simple interface for de- (this process is called rasterization). stream entities. scribing graphics computations. Each sample is manifest as a fragment As a convenience, in addition to data Real-time graphics APIs such as Di- record in the FG output stream. Frag- records from stage input and output rect3D and OpenGL strike this balance ment records contain the output image streams, shader functions may access by representing the rendering compu- position of the surface sample, its dis- (but not modify) large, globally shared tation as a graphics processing pipeline tance from the virtual camera, as well as data buffers. Prior to pipeline execu- that performs operations on four fun- values computed via interpolation of the tion, these buffers are initialized to con- damental entities: vertices, primitives, source primitive’s vertex parameters. tain shader-specific parameters and tex- fragments, and pixels. Figure 1 provides FP (fragment processing). FP simu- tures by the application. a block diagram of a simplified seven- lates the interaction of light with scene stage graphics pipeline. Data flows be- surfaces to determine surface color and Characteristics and Challenges tween stages in streams of entities. This opacity at each fragment’s sample point. Graphics pipeline execution is charac- pipeline contains fixed-function stages To give surfaces realistic appearances, terized by the following key properties. (green) implementing API-specified FP computations make heavy use of fil- Opportunities for parallel processing. operations and three programmable tered lookups into large, parameterized Graphics presents opportunities for stages (red) whose behavior is defined 1D, 2D, or 3D arrays called textures. FP is both task- (across pipeline stages) and 52 COMMUNICATIONS OF THE ACM | OCTOBER 2008 | VOL. 51 | NO. 10 practice data- (stages operate independently on timized, fi xed-function hardware com- Figure 2: Graphics pipeline operations. stream entities) parallelism, making ponents. parallel processing a viable strategy for Mixture of predictable and unpredict- increasing throughput. Despite abun- able data access. The graphics pipeline (a) v1 dant potential parallelism, however, the rigidly defi nes inter-stage data fl ows unpredictable cost of shader execution using streams of entities. This pre- v0 v5 and constraints on the order of PO stage dictability presents opportunities for processing introduce dynamic, fi ne- aggregate prefetching of stream data v4 v2 grained dependencies that complicate records and highly specialized hard- parallel implementation

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us