SGI-10043 Graphics WP.Mech
Total Page:16
File Type:pdf, Size:1020Kb
White Paper The Silicon Graphics® 320 and Silicon Graphics® 540 Visual Workstation Graphics Subsystem 1.0 Introduction Table 1-1. Critical Graphics Paths Performance The designers of the Silicon Graphics 320 and 540 visual (Peak Throughputs) workstations studied the latest, popular PC graphics and imaging applications to develop a new graphics architec- Graphics to Memory 3.2GB/sec ture that optimizes the most commonly executed graphics and imaging tasks. At the heart of the system architecture, CPU to Memory 800MB/sec the graphics subsystem takes advantage of high-speed buses, a single, large, central pool of versatile memory, and a new SGI™ ASIC that incorporates both the geometry and 1.2 Graphics Performance the rendering pipelines—an industry first for ASIC design. The Silicon Graphics 320 and 540 graphics subsystem sets itself apart from PC graphics with several features: This paper discusses the technical features and archi- •A dynamically allocated, high-speed memory pool tectural design philosophies of the Silicon Graphics 320 (up to 1GB for the 320 and up to 2GB for the 540) and 540 graphics subsystem as they apply to CAD/CAM, makes it practical to simultaneously manage large visual simulation, content creation, imaging, and other amounts of framebuffer, texture, and z-buffer data applications. •A high-speed (3.2GB/second), low-latency memory bus eliminates the graphics-to-memory bottleneck 1.1 Visionary Graphics and delivers three to six times the throughput The Silicon Graphics 320 and 540 visual workstation available on PC systems graphics subsystem extends the graphics capabilities •A specialized, proprietary graphics ASIC that of traditional PC-based systems by exploiting several incorporates hardware accelerators for geometry features of the system architecture: multiple high- acceleration, rendering, hardware-based texture bandwidth buses, a highly accessible central memory mapping, dynamically allocated texture RAM, and pool, and high-speed multiprocessing. Within the 2D and image acceleration graphics subsystem itself, a new graphics ASIC (the •Direct motherboard connections among graphics, Cobalt™ chip) includes the necessary graphics functions video, audio, network, and IEEE 1394 functions to to implement both the rasterization and geometry optimize data availability, parallel processing, and pipelines at the silicon level for maximum performance content sharing among all subsystems and throughput. The result of this implementation is a visual computing Because of its hardware-accelerated graphics and the system that executes graphics operations much more system’s efficient handling of large amounts of data, efficiently than any other system in its price class. the Silicon Graphics 320 and 540 graphics subsystem While most PCs require the use of PCI and/or mother- particularly suits applications that require the creation, board cards for graphics functionality, the 320 and 540 processing, and manipulation of complex, real-time graphics ASIC puts essential graphics on the mother- visual data. Tight integration of system components board. Overall graphics performance is improved and provides both the bandwidth (see Table 1-1) and the system costs are lowered. Table 1-2 lists the essential processing power (see Table 1-2) to deliver workstation- graphics performance numbers for the Silicon Graphics class graphics while effectively competing in the PC 320 and 540 systems. price range. Table 1-2. Cobalt Sustained Graphics Performance Numbers 32-bit color, 24-bit z, 500 MHz PIII Smooth-shaded, z-buffered pixels/sec 180M/sec Bilinearly mip-mapped, textured, z-buffered pixels/sec 150M/sec Trilinearly mip-mapped, textured, z-buffered pixels/sec 120M/sec Pixel transfer (pixels/sec) 180M/sec 10-pixel non-anti-aliased vectors/sec 7M/sec 10-pixel anti-aliased vectors/sec 3.5M/sec 1-pixel lit, smooth-s shaded, 2 buf trils 7.4M/sec 25-pixel smooth-shaded, z-buffered triangles/sec 3.5M/sec 25-pixel trilinearly mip-mapped, textured, z-buffered triangles/sec 1.5M/sec Clear rate 2.7GB/sec 2 2.0 Graphics Subsystem Description • Window clipping support through screen masks and window clip IDs 2.1 Geometry and Rasterization Pipelines on a Chip • Linear framebuffer addressing The Cobalt ASIC includes: • Occlusion testing • 2D/3D rendering and geometry pipelines • Post-texture specular highlights (minimizes CPU involvement in graphics) • 8, 16, 32-bit color formats and 16/16, 32/32 double- • Memory controller buffer formats • Interfaces to the processor (P6) bus, memory, • Pixel transfers through rendering pipeline with I/O subsystem, and the display subsystem format conversion • Subsampled formats for YUV video The incorporation of this much functionality and • 4x4 color matrix for color space conversion, connectivity required the following package scale/bias, and component swizzling and chip technology: • Clear operations at full memory bandwidth • 1521 Ball flip-chip package for high pin count • Addressing up to 4k x 4k framebuffers and • Five layers, 0.25 mm technology texture maps • 13.6 mm x 13.6 mm die size • Memory management hardware for allocation • 10M transistors of graphics buffers in system memory The rendering engine presents a high-level program- Scalable Graphics Memory ming interface that offloads many graphics operations • Up to 1GB for Silicon Graphics 320; 2GB for Silicon from the CPUs. Primitives are described by their Graphics 540 window-space vertex coordinates and attributes such • 32-bit, double-buffered as vertex normal, color, texture coordinates, fog, and z. • 24-bit z buffer To reduce traffic between the processor bus and mem- • 8-bit overlay ory system, the interface also supports connected line • 8-bit stencil and triangle mesh primitive descriptions. The Cobalt • Texture memory, up to approximately 900MB/1.9GB ASIC supports the following features and functions. (Silicon Graphics 320/540, respectively) Rendering Engine (Geometry Operations) Resolution • Vertex-attribute interface for point, line, line strip, • Up to 1920x1200 at 66 Hz in 32-bit RGBA (24-bit color triangle, triangle strip, and rectangle primitives plus 8-bit alpha) • Vertex transforms from object to window coordinates • View frustum clip-check 2.2 System Memory • Normal transform and normalization The main memory system, with a 288-bit bus (32-bit • Connected line and triangle mesh interface ECC), achieves data rates of up to 3.2GB per second • Rasterizer setup, attribute interpolation setup from the synchronous DRAM (SDRAM) implementation. and anti-aliased line setup from primitive A single, central, dynamically allocated pool of memory vertices and vertex attributes offers several advantages for graphics performance and • Front and back face culling capabilities including simplified programming, high • Per-vertex lighting computation for up to four graphics processing rates, and lowered costs. Unlike lights (and support for accumulating a host- typical PC architectures, where data must be transferred supplied contribution from other light sources) between main memory and graphics, video, and imaging memory located on separate interface cards, all of the Other Rendering Engine Functions and Features 320/540 data resides in main memory where every sub- (Rasterization Operations) system has direct access to it. • Gouraud shading • Depth buffering for 16-bit floating point and Memory for the systemincluding, frame buffer, z-buffer, 24-bit fixed point z-buffer formats texturing, rendering, imaging, and video are all allocated • Stencil buffering for 8-bit stencil values by the Cobalt chipset. Instead of maintaining multiple • Texture mapping with nearest, bilinear, memory systems throughout the computer (a design and trilinear mip-mapped filtering that forces data movement through limited bus architec- • Anti-aliased points and lines ture and therefore degrades performance), the Cobalt • Fogging chipset allows the graphics memory to be dynamically • Scissored rendering allocated from system memory. Memory is allocated • Line and polygon stippling in tiled pages. To minimize page fragmentation and • Alpha and chroma testing reduce DRAM page crossings incurred during render- • Alpha blending ing, the pages are organized linearly (without tiles). • Logical operations • Color plane masking • Dithering for 5-bit RGB components 3 3.0 Graphics Software 4.3 Content Creation In addition to the hardware-accelerated functions, the Content creation applications require a broad range Silicon Graphics 320 and 540 systems offer graphics of capabilities. Complex models demand the geometry performance enhancements in the form of library processing functions (similar to CAD/CAM) and the optimizations. Data and control flows are tuned for creation of texture-rich environments calls for key Windows NT® graphics and imaging applications advanced visual simulation functions. Content without sacrificing compatibilty. creation benefits from 320/540 features such as: • A unique combination of vertex throughput and The Silicon Graphics 320 and 540 systems support fast fill rates these standard API’s: • Large amounts of texture memory for rich • OpenGL® 1.2 environments • Microsoft® Direct3D® • Microsoft® DirectDraw® 4.4 Visual Simulation Visual simulation tasks require the ability to walk through complex 3D scenes, whether they be architec- 4.0 Application Areas tural structures, battlefield diagrams, or game environ- The designers of the 320/540 graphics subsystem ments. In order for a system to be used in this setting, optimized execution for a broad