High Performance Computer Graphics Architectures

High Performance Computer Graphics Architectures TR89-026 July, 1989 Roger Brusq The University of North Carolina at Chapel Hill Department of Computer Science CB#3175, Sitterson Hall Chapel Hill, NC 27599-3175 UNC is an Equal Opportunity/Aflirmative Action Institution. This report is a review of several existing or annouced high performance Computer Graphics Architectures ranging from 20,000 to 1 Million triangles per second RogerBRUSQ Department of Computer Science S itterson Hall University of North Carolina at Chapel Hill Chapel Hill,NC 27599-3175 July,l5 1989 Introduction High Performance Computer Graphics page I !. INTRODUCTION 2. PART A This repon consists of three pans: A review or some recent Computer Graphics Architectures .part A is a brief review of some recent "Graphics Supercomputers" architectures: AT&T Pixel Machine, Alliant Visualization Series, Megatek Sigma 70, Apollo DNlOOOO, HP 835 Turbo SRX. 2.1 Introduction •part B is a more detailed presemalion of the basic A general approach to 3D graphics systems is concepts and performances of three High Performance outlined in figure 1·1. Computer Graphics Systems which performances range A Graphics Processor traverses a display list from 100 to 200 K triangles/seconde, i.e an order of containing the geometric primitives in the scene, magnitude higher than the previous generation (10,000 to computes viewing transformations, lighting calculations, 20,000 triangles/sec.).: clipping and scaling. Computational load for these -ARDENT TITAN Graphics Supercomputer, functions increases roughly linearly with the number of -SILICON GRAPffiCS POWER IRIS Series, primitives in the scene. -STELLAR GSlOOO Graphics Supercomputer. The Renderer determines which pixels are visible in each primitive and generates their final color into the These three machines are among the most powerful frame buffer. Computation load here can explode to the now available. They all implement a parallel processing order of the number of primitives times the number of approach to speed up both the geomelry computations and pixels per primitive. In most graphics systems, the rendering process, combining the performance of computation and bandwidth limits in the renderer minisupercomputers with real-lime high quality graphics. determine the maximum rate of image generation The section reviews the overall structure of each The Video Controller reads and convens data from the system and the different ways they implement the Frame Buffer into color streams sent to the Display Graphics Pipeline. device. The functional organization of a 3D graphics system •part C is devoted to some new designs achieving an such as described in figure 1-1 is basically a pipeline from order of magnitude increase in rendering rates over the object database to the display screen .. and the previous systems: I Million triangleS/second: expression "Graphics Pipeline" will frequently appear in -SAGE: The Systolic Array Graphics Engine, by the following paragraphs. Gharachorloo et al. (IBM Research Division, Watson Each new Graphics Supercomputer introduces some Research Center): 1 Million polygons per second (Z· innovations into one or more of the three major buffered, Gouraud shaded); components of this pipeline: geometric -The Triangle Processor, by Michael Deering et al. transformationS/clipping/perspective, image rendering, and (Schlumberger Palo Alto Research): I Million triangles scan-out from image buffer. per second (Z·buffered, Phong shading); In this section we review some of the recent -Pixel-Planes 5: I Million triangles per second (100 implementations of this Graphics Pipeline: AT&T Pixel pixels, Z-buffered,Phong shaded). Machine, Alliant Visualization Series, Megatek Sigma 70, Apollo DNIOOOO, HP 835 Turbo SRX. Fram<: Renderer Buffer • Traverse Display List • Visibility Calculations • Geometric Tnnsforrnations • Hidden Surface Elimination • Clipping & Perspective • Shading • Lighting Block diagram of a typical 3-D graphics system. figure I-I from Fuchs & Poulton ·An Architecture for Advanced Airborne Graphics Systems-May,l988 RogerBrusq UNC-CH July,J5 1989 AT&T Pixel Machine High Performance Computer Graphics page 2 ··-~-------------------------------··············--······-·-···········-··-·············-····------·--·--·-····-···············---------------------·--·-----------------------------------·-····---·--- 2.2 The AT&T Pixel Machine VME B::"::_.' ~- ~Ri ~u~v~' 1 u vsJ 1 _., i H .-~ 1 To P1xe1 Host Funnef L_~~~~:~~_ _J ri,.~ li ~:H.~ ;~~~,~~~ ,__ I :n::) ! : :roo Transformation Pipstine ! LH Pixel NoOes with Oistnbutect Frame Buffer figure 2.2-1 P1xet Machme block diagram. from [AT&T-I] II· Pipe Nodes I- Overview The pipe node is built around a 5 MIPS, lO The figure 2.2-1 shows how the Pixel Machine MFLOPS DSP32 processor, wilh parallel DMA interface combines coarse grain pipelining and MIMD computing to lhe VME bus (see figure 2.2-2). arrays to provide lhe expected performance for bolh image synlhesis and image analysis applications The design philosophy is: -use floating-point computation and large image memories, which are useful for image processing; -design simple, modular processors !hat can be repeated a number of times to build a system; -implement all algorilhms in software. L- The Pixel Machine is connected to lhe host via lhe VME bus, lhrough lhe parallel interface. The transformation pipeline comprises 9 or 18 pipe figure2.2-2 ,... __ _ nodes configurable as one or two pipelines ( see figures from[AT&T-1] 2.2-1 and 2.2-4). The rendering function is closely integrated wilh lhe The host provides input to lhe first node via lhe VME frame buffer distributed memory in lhe MIMD Pixel bus. The output from lhe last node is broadcast to all of Nodes array (see figures 2.2-1 and 2.2-3). lhe pixel nodes and !his node is also connected to lhe The tranformation pipeline feeds primitives to parallel VME bus lhrough a second output FIFO. pixel nodes lhrough a broadcast bus (see figure 2-2.3) The Transformation Pipeline can have 9 pipe nodes or The pixel funnel rearranges pixels from lhe frame 18 pipe nodes software configurable as one 18-node buffer into a properly ordered raster scan sequence pipeline or two 9-node pipelines (figure 2-2.4). In lhe case presented to lhe display device lhrough lhe video controller of two parallel pipelines, lhe broadcast bus is time-shared by the two ending nodes The broadcast bus and lhe pixel funnel are physically The pipe nodes perform 3D transformations, clipping, part of the VME backplane. projections, shading and image filtering. The pipeline can also be used as a hardware subroutine by processes Each pipe node and pixel node can be viewed as a small independent computer !hat executes its instructions running in lhe host computer, which can send data to lhe and operates on data asynchronously wilh all lhe olher first node and read results from lhe last one. nodes. Program are loaded into lhe nodes by lhe host, using unique, software defined node numbers to distinguish between !hem. RogerBrusq UNC-CH July,J5!989 AT&T Pixel Machine High Perfonnance Computer Graphics page 3 ••u••••·•••••••••·••••••••••••••··••••••••••••••••••••···••••••••••••••••••••••••••••••••••••••••••••••••·•••••••••••••••·•••••••oou•••••••••••••••••••••·•••••••••••••••••••• .. •••••·••••••••••·•··•• From transformatiOn t p.peline I VMEbus VIA£l>ul I r-- I 512 ')( 32-ert "":f.am...._ I FIFO ( k X 32·btt L """"""' statiC RAM) I I I I I .,_ r F•amo 32-l>t bu1fO< 1- ... 051'32 (64k X 32·bi! L 'lldeoRAMI) - I I I I Serial L. 110 (&Ckz """"'X 32-t>it swtteh 1- dynam>CRAM) I L : To and from figure 2.2-3 rour adjacent from [Tunik87] To (a) -· (b) In AT!ra PXM 900 Image-display system, the transtormatlon pipeline leeds primitives to parallel pixel nodes through a broadcast bus (a). Inside each pixel node are double-buffered vldea RAMs thai consi!Me In part the system'slarge frame buffer (b). Depth values reside In the Z buller, which also could supply Jlor· age space lor large Images. "m VME~s·= VMEOus ~fOus VMEOOs- - ' I I --'--- i~c· ~ ~ I$ I .,:, ' I I ,.,.;. , I ~ ' !,.,;.-,-· •o/ ~ ~ ~-.,!.,I '"+'I ::r::,-.-- $ !~"! ~· ~~J! 1Nodt 121 F.Ei ~ ~·-,-- ----,-- i-f· I s r""7 ·'l l'f'l r~·>! / ,.,.;. sj 1.,:;. s! !,.,;. ,., !-:. "l '-T-' 6 1 ~ 10 I ~ 1 I,J.,s! • I .,;,. •I ,.Je 1s! • """"" --.- ''"""' -,---- "'"'" ~ t ~~7! ~~16! I Noe;.rl I "'"" 1 "'!'"' ' ~~ I J ..;,,I :~Ill j !~81 ~ I ~ l ! I I figure 2.2-4 • • • •' from [AT&T·!] PJJ<~ Nodes .14~~N~ ......... Pue!N~ (~~~~ {O)TIO'O~PPH IC) Crt# long p(» ~,. t:Onll{}!ifiii>O!I$ Roger Brusq UNC-CH July,!5 1989 AT&T Pixel Machine High Performance Computer Graphics page 4 Ill· Pixel Nodes IV· Video display The pixel node is also built around the DSP32, with The frame buffer is distributed throughout the array of an input FIFO and 9K 32-bit words of static RAM. In pixel nodes (figure 2.2-7). The pixel funnel rearranges addition, it is provided with a 4-way multiplexed 1/0 pixels from the frame buffer in!D a properly ordered raster switch for communication with the 4 neighbours, plus scan sequence. Both the video processor and the pixel two banks of 64Kx32 bit VRAMs as pan of the frame funnel are software configurable for the five different pixel buffer, and 64Kx32 bit dynamic RAM for Z-buffering or arrays. dala store. (figures 2.2-3 and 2.2-5) ~--------------, ! ~ ... ~ Sroaacast Bus , VMEtx.Js DSP32 '-,.,-:-:---:-c'7::_,!~ !''V r~7~ ~ 9Kx32 ~4Kx32 i ! netgf!OOrS r-- SRAM VRAM ' I r-- 64Kx32 64Kx32 DRAM 1 VRAM 04tt2,0ool612 15t111!i913 2 '101412 6 101ool 1. 3 7 11 lSi J 7 11 15 P1xe1 noCe blOCk tJJagram. 0 • ! '2 0 4 • 12 1 5 9 13 1 5 i tl figure 2.2-5 2 6 10 '"' 2 6 10 14 from [AT&T-I] 3 7 1' 15 3 7 11 15 The pixel nodes form an nxm array with a distributed frame buffer. They receive their data from the broadcast bus and store output into the frame buffer or return it to figure 2.2-7 the host from [AT&T-I] The Pixel Machine can be configured with 16,20,32,40, or 64 pixel nodes (figure 2.2-6).

High Performance Computer Graphics Architectures

Tom Kearney Object Name: Sun Sparcstation IPX Vintage: C.1991 Synopsis: Sun Sparcstation IPX

· Welter ABACUS 3170 FLOATING·POINT COPROCESSOR for SPARC PRELIMINARY DATA May 1989

Coppola Marchese Unixworld.Pdf

Computer Architectures an Overview

HP-UX and K4

Unit 3: Microcomputers and Microprocessors

The Multititan: Four Architecture Papers

MIPS, HPS, Two-Level Branch Prediction, and Compressed Code RISC Processor

SMBIOS) Reference 6 Specification

BLOCK ADJUSTMENT on PERSONAL COMPUTERS Dipl.-Math

Solaris 7 5/99 Online Release Notes (Sunwrdm)

The SPARC Architecture Manual Version 8