Europaisches Patentamt (19) European Patent Office Office europeeneen des brevets EP 0 752 685 A1

(12) EUROPEAN PATENT APPLICATION

(43) Date of publication: (51) Intel e G06T 15/10 08.01.1997 Bulletin 1997/02

(21) Application number: 96304978.8

(22) Date of filing: 05.07.1996

(84) Designated Contracting States: • Kamen, Yakov DE FR GB IT NL Cupertino. California 95014 (US)

(30) Priority: 06.07.1995 US 498733 (74) Representative: Harris, Ian Richard c/o D. Young & Co., (71) Applicant: SUN MICROSYSTEMS, INC. 21 New Fetter Lane Mountain View, CA 94043 (US) London EC4A 1 DA (GB)

(72) Inventors: • Goldberg, Richard M. Los Gatos, California 95030 (US)

(54) Method and apparatus for efficient rendering of three-dimensional scenes

(57) A method and apparatus for rendering an ob- ject or scene from a preselected viewpoint onto a dis- play. The object is represented by a texture map stored in a memory of a processor-based system, and the view- 100 point is represented by geometry data stored in the memory. The viewpoint on the object may be represent- Intelligent Memory ed in the geometry data a polygon (or more than one Controller polygon). The processor determines span data by edge- MEMORY CONTROLLER walking the polygon, and transfers the span data to the memory controller. Beginning with a first such span, the processor then transfers the span data (one span at a time) to the memory controller. After each such transfer, the memory controller takes over execution of the ren- dering procedure, beginning with mapping the current span onto a span of voxels (volume elements) in texture map space. The memory controller then retrieves the colors and textures for that span, and renders the span accordingly (i.e. either displays it or writes it to an ap- propriate memory). Control then returns to the proces- sor, which transfers the data for the span, and the Mass Storage Display controller again takes over the remainder of the Geometry memory Data rendering procedure for that The transfer of con- span. Texture trol back and forth is repeated until all the of the 182 Map spans / first such polygon are rendered, and until all such poly- 184 gons have been so processed, thus greatly increasing the efficiency and throughput of graphics data in the ren- dering pipeline. The procedure is made more efficient Figure 1 00 by the use of a dedicated portion of memory for the CO graphics data, under the exclusive control of the mem- CM ory controller. IO Is- o a. LU

Printed by Jouve, 75001 PARIS (FR) EP 0 752 685 A1

Description

The present invention relates to the visualization and display of three-dimensional scenes or objects on a computer system, and in particular to a new method and apparatus for accelerating the display of three-dimensional scenes. 5 "Visualization" broadly refers to all of the steps taken by a graphics display system from accessing the 3D graphics data (of a scene or object) to rendering the data as a scene on a display device. There are three basic types of 3D data visualization carried out by computer systems today: (1 ) geometry (polygon) rendering (such as wire frame rendering); (2) volume rendering; and (3) 3D texture mapping, which will typically include some features of geometry rendering. The present invention is directed specifically towards more rapid texture mapping 10 than is available with conventional systems. For an extended treatment on methods of graphics visualization, see Foley, van Dam, et al., Computer Graphics - Principles and Practice (2d Ed. 1990 Addison Wesley), incorporated herein by reference. In conventional systems, a graphics application executing on a processor will generate three-dimensional data representing an object or scene for display on a computer monitor or other display or output device. 3D geometry data is are typically processed in the processor under the control of the graphics application. If the system includes a graphics accelerator, the processed data are then sent to the graphics accelerator for rendering on the monitor. 3D texture mapping is an intermediate visualization combining data from geometry rendering and volumetric data from volume rendering. Volumetric data at its simplest is simply a set of 3D pixels (discrete color values), normally referred to as "voxels". Conventional 3D texture mapping can also be combined with arbitrary cutting or clipping to 20 reveal hidden structures within the voxel data. Texture mapping in three dimensions is typically quite a time-consuming process. If the main processor of a com- puter or other processor-based system is used to carry out the rendering procedures, this can considerably degrade the performance of the system, which typically uses the processor for many other functions and processes. Dedicated graphics controllers and accelerators have been developed which can speed up the rendering procedures, but they 25 are generally quite expensive, so that fully equipping a system with dedicated graphics modules can be prohibitive. Some systems in use today utilize a dedicated pixel bus (a special memory bus just for the pixel data) with an extra memory controller in order to speed up the rendering/display procedure, but this increases the cost and complexity of the system. Thus, in conventional systems a choice must be made to stress either the low cost or the high performance of the 30 architecture. A system is needed that answers both of these challenges - speed and expense. Ideally, a system would be provided that provides the high performance of dedicated graphics modules with the lower cost of general-purpose processors. Aspects of the invention are set out in the accompanying independent claims. Preferred and further features are 35 set out in the dependent Claims. An embodiment of the invention provides for rendering an object or scene from viewpoints selected by a user or a program. The system utilizes an architecture with a processor, a memory, a display device and a dedicated graphics memory controller which is configured to take exclusive control over a number of processes conventionally handled by the processor. The system is provided, in memory, with standard geometry (model coordinate) data and texture 40 map data, and in the rendering method of the invention, the processor executes procedures to bind the model coordinate data to the texture map data, and to transform the model coordinates to device coordinates, i.e. coordinates corre- sponding to regions on the display device. The processor also generates conventional spans, i.e. vectors representing regions of the texture map that will eventually be displayed as scan lines on the display device. The processor then transfers the span information, one span at a time, to the memory controller. After each such 45 transfer, the memory controller takes over control of the rendering procedure and maps, the current span onto the voxels (volume elements) of the texture map space. Each span is in turn rendered with its assigned colors, textures, etc. to the display, or alternatively to a display memory or other memory, for later display. Execution control is now handed back to the processor, which transmits the data for the next span to the memory controller, and then transfers control to the memory controller for texture mapping and rendering. The procedure is so repeated until all the spans for a given polygon (representing the viewpoint onto the object) have been rendered. If there are other polygons, they are then processed in the same manner. The transfer of control back and forth between the processor and the memory controller greatly increases the speed with which objects can be rendered, because the respective functions that the processor and memory controller execute are particularly efficient for those respective devices, and the memory controller's execution of the texture 55 mapping and rendering procedures, which are especially computation-intensive, allows the processor to carry out other functions uninhibited by the multiple interrupts and general degradation of performance that would otherwise occur. The process is made yet more efficient by the establishment of a portion of main (or cache) memory exclusively for the control of the memory controller, so that block transfers of large amounts of graphics data to and from this

2 EP 0 752 685 A1

portion do not interfere with memory accesses to the other portions of memory by the processor. Embodiments of the invention are described hereinafter, by way of example only, with reference to the accompa- nying drawings, in which:

5 Figure 1 is a block diagram of an apparatus incorporating the present invention. Figure 2 is a diagram illustrating a slice-plane view on a 3D object in model coordinate space. Figure 3 is a diagram illustrating the object and view plane of Figure 2 in 3D texture map space. Figure 4 is a flow chart of a method of the invention. Figure 5 is a diagram illustrating the interpolation procedure of the invention for generating spans. 10 Figure 1 illustrates a computer system 100 of the invention, including a processor 110 coupled via a memory controller 120 including an Intelligent Memory Controller (IMC) 130 to a bus 140. The processor 110 may be a con- ventional microprocessor as currently used in workstations or personal computers, and the memory controller 120 is standard but for the inclusion of the IMC, whose functions are discussed in detail below. is Also coupled to the bus 140 is a main memory (such as DRAM) 150 or cache memory (e.g. SRAM), which is partitioned, preferably but not necessarily by the operating system, into contiguous memory CMEM 160 and system memory SYM 170. The INIC 130 is preferably dedicated hardware for controlling a portion the CMEM, and may for instance Sun Microsystems, Inc.'s SPARCstation 10SX memory controller chip discussed in detail in a document en- titled "SPARCstation 10sX Graphics Technology - A White Paper (Sun Microsystems, Inc. 1993)" (Appendix A), a 20 copy of which is to be found on the European Patent Office file for this application, and the content of which is incor- porated herein by reference. The basic architecture and software configuration as described in that paper may be adapted to implement the present invention. For instance, as will be seen from Appendix A, the IMC hardware is built into the motherboard of the system described there, which includes Sun Microsystems' SS10SX and SS20 model workstations. These workstations provide a suitable platform, without any hardware modifications, for the present in- 25 vention; only the software necessary to implement the method steps of the present invention need be supplied, which is a straightforward matter, given the present teaching. The processor will typically have an external cache (Ecache) 115 connected to it in a conventional manner. The system 1 00 also includes disk storage 1 80, I/O devices 1 90 and a display 200 or other output device capable of graphically representing data, such as monitors and/or printers. Disk storage 180 may alternatively be any mass 30 storage medium, volatile or involatile, including RAM, tape, CD-ROM, etc. Network connections are made via conventional network hardware 205. The devices 180-205 may all be conven- tional, and may be connected to the processor in a variety of configurations, such as over the main system bus or over the memory bus, as the case may be. The particular bus connections for these peripherals is not crucial to the present invention, though the architecture of Appendix A teaches desirable configurations (such as in Figure 2.1 on page 17), 35 wherein the Sbus devices may include the I/O devices, network connections, mass storage, and so on. Other config- urations are also usable. The following description of an embodiment of the invention is primarily in terms of rendering and displaying graph- ics data; however, the same types of issues of efficiency, cost and speed arise in the setting of rendering graphics data for storage as files. That is, the features of the invention are independent of the destination of the rendered data - it 40 may be a display, a printer, a Postcript file or some other type of file, another processor on a network, and so on. Accordingly, discussion below of "displaying" the output graphics data should be read as referring also to output of the graphics data to any process or device (for display, storage, etc.) desired, and "rendering to a display" or to a "display memory" should be taken broadly to include also storage to any storage medium from which the rendered data may later be retrieved. 45 In a typical setting where a user wishes to view 3D graphics, two types of data are stored on mass storage 180: geometry data and texture map data. The geometry data relates to the "model coordinate space", i.e. the (normally Euclidean) geometry representing the three-dimensional space in which the scene or object to be viewed is defined. Herein, the rendering of "objects" will be addressed, but it should be understood that this may apply to any three- dimensional setting with one or more objects in it. so For example, in Figure 2 the model space is represented by the x-y-z coordinate system 210. An object 220 is represented in a geometry data file 182 stored on mass storage 180, in a conventional fashion. This geometry data file is what is referred to in the field of graphics computing as a "list of vertices", because it generally takes the form of a representation of numerous polygons, such as triangles, into which the object has previously been tessellated. The vertices of these triangles are stored and constitute a complete representation of the object, to the limits of the resolution 55 of the scanning, CAD or other system that generated them and to the limits of the amount of disk space available to store the resulting geometry data file. The texture map data includes a representation of the stored object, including information about colors, textures and other details necessary to fully render the object. The texture map is stored as a data file 184 on mass storage

3 EP 0 752 685 A1

180. There are a number of conventional formats used in the field of graphics computing that are suitable for this purpose, such as vff (visual/volume file format) or the .obj, .nff, .wrl formats, etc. The raw sequential byte format is widely used in the field, wherein voxel data are stored for the texture map cube. Typically, the data header includes information about the dimensions of the data cube, followed by the raw byte data. For a particular implementation of 5 a configuration utilizing applicant's commercial IMC, see a document entitled "Platform Notes: SPARcstation 10SX and SPARCstation 20 System Configuration Guide (Sun Microsystems, Inc. 1 994) (Appendix B), a copy of which is to be found on the European Patent Office file for this application and the content of which is incorporated herein by reference. Conventionally, as shown in Figure 3, the texture map space (which may also be referred to as the "voxel data 10 set") is represented as a 3D Euclidean coordinate system 31 0 (with mutually orthogonal dimensions t, u, v). The object 220 is represented in this space as all of the information necessary to represent it from any of a number of user- selectable viewpoints, including those that slice through the object. Typically, the texture map is represented as an effective cube of data, which may be, for instance, 256 bytes on a side or approximately 16 megabytes of data. Though the geometry 182 and texture map 184 are illustrated in Figure 1 as separate data files, they may alter- 15 natively be represented as different portions of a single data file. In Figure 2, a viewing plane (or "slice plane") 230 is defined by vertices P1-P2-P3, and intersects at region 240 with the object 220. Slice plane 230 represents a point of view relative to the object 220, which may be selected by a user at a workstation (by mouse or user-entered commands), by a process of a running application, or by any other suitable mechanism, whether hardware or software. The output of the resulting rendered graphics data (monitor, output 20 file, processor on a network, etc.) will depend upon the requesting process. Typically, for instance, a user at a work- station may want the rendered graphics data to be displayed on a monitor while an executing application may want the data to be stored in a file (thought the reverse is certainly possible and likely, as well). When slice plane 230 is selected, region 240 is displayed on the display 200; the viewing plane may be moved and rotated to any desired position or angle. This is known in the field of graphics display systems as "slice extraction", 25 whose procedures are well known in today's graphics display systems. When the viewpoint on a complex scene or object is changed in a conventional system, the processor must execute considerable recalculations to determine which portions of the object are to be displayed, and to determine what tex- tures and colors to apply to the polygons to be rendered. The processor would normally access the data in system memory (or from a network main system memory, tape drive, or any other mass storage medium, volatile or involatile) 30 via the (conventional) memory controller, then wait as arbitration for the bus and any other conflicts are resolved, and finally when it receives the data, load it to the extent possible in its external cache. All of this competes with other executing processes, and interferes with the performance of the entire system. The present invention greatly increases the potential speed of graphics rendering by a number of mechanisms. Instead of having the processor reload data from disk to cache each time a viewpoint is changed, the CMEM region 35 1 60 of memory is reserved for this purpose. The CMEM region is under the exclusive control of the IMC 120, which may be specified in the operating system. For instance, this is a feature of the SOLARIS operating system in applicant's SPARCstation workstations, which may be used to implement the invention. (See Appendices A and B.) The SOLARIS operating system is configured so that the software, or a user or programmer, may specify any sized portion of the memory 1 50 as CMEM. For instance, if the memory 1 50 is 32 MB, then up to 32 MB of RAM (minus the amount needed 40 for the operating system) is available to the IMC for partitioning off as CMEM. The total of CMEM and SYM in this case will equal 32 MB. The use of the CMEM under the control of the IMC increases processor performance dramatically because, among other reasons, the processor must execute all instructions on a clock-cycle-by-clock-cycle basis, while the IMC can block-move graphics data from the CMEM to any other portion of memory (such as to display RAM) very quickly. 45 Figure 4 illustrates the preferred embodiment of the method 400 of the invention. Each of the boxes in Figure 4 represents a step or series of steps to be executed by a program or program module stored as instructions in the system memory 170 and executable by either the processor 110 or the IMC 130, as specified below. The authoring of such program modules is a straightforward matter, given the teachings of the present invention, and may be done in any suitable conventional programming language, such as C. so At box 410, the processor, at the instigation of a user or an executing process, executes an instruction to load graphics data into memory. In this embodiment, the processor sends this instruction as a request to the IMC 130 to load the geometry data 182 and texture map 184 into CMEM 180. Thus, there is no competition with other processes being executed by the processor; rather, the IMC takes over the loading of the graphics data, and moves it directly into the CMEM 160, while the processor continues to execute its ongoing processes. 55 At box 420, the processor binds the geometry data (in model coordinate space or MCS) to the texture map data. The input to this step is the list of vertices (x,y,z), and the output is that list of vertices correlated with the appropriate voxels in the texture map space, i.e. those voxels that correspond to the three-dimensional positions of the vertices of slice plane 230. The input to the binding procedure is thus, for the example in Figures 2 and 3, the list of coordinates

4 EP 0 752 685 A1

in model coordinate space of the vertices P1 -P2-P3, i.e. (x1 , y1 , z1 ; x2, y2, z2; x3, y3, z3), or in normal vector notation:

fxl\ fx2\ fx3\ n- P3=

The binding procedure itself is conventional. 10 After the binding procedure, the resulting data structure is a vector in the form of P1 -P2-P3 = (x1 , y1 , z1 , t1 , u1 , v1 ; x2, y2, z2, t2, u2, v2; x3, y3, z3, t3, u3, v3):

15

(Eq. 2)

20

That is, each of the vertices P1-P3 has associated with it ("bound" to it) its own unique t-u-v coordinate in the texture map space. Typically, the user (or programmer) will supply this binding information, i.e. the correlation between the vertices in model coordinate space and their corresponding coordinates in texture map space. 25 At box 430, the processor transforms the model coordinates of the three points P1 , P2 and P3 to screen coordinates, in a conventional manner, such as looking them up in a table or by using a transformation matrix. Then, at box 440, the processor culls out invisible regions -- also done in a conventional manner. At box 442, the next polygon is selected for rendering. On the first pass through the procedure, the first such polygon is selected as the current polygon. In the present example, there is only one such polygon, namely triangle 230. 30 At box 444, the processor carries out a procedure known in the field as "edge walking". This is a method of deter- mining the data relating to the spans. The spans may generally be referred to as "discrete" data about the object; i.e, they only approximate a continuous, real-life object. The edge walking procedure is conventional, and basically involves traversing the triangle 230 from point P1 to P3 by predetermined regular increments in the y-dimension (in screen coordinates), and interpolating for each incre- 35 ment the corresponding x-value for the left and right edges of the span. From this, the direction and length of the span with respect to the left-edge intersection can be determined, as well as the (t,u.v) values of the first pixel (the leftmost pixel) on that span. The edge-walking procedure makes use of a linear interpolation proceduce to determine the (tuv) coordinates for the left end of the current span. Interpolation is used because this point lies between the vertices of the triangle. 40 For instance, in Figure 5 scan line 231 is depicted (the rest of the polygon being omitted for the sake of clarity), projected onto dimensions (t, u) of the texture map space. P.3 and P.1 are the vertices, and Pint can be Pleft of Figure 3, or generally any point between P.2 and P. 1 , typically representing the left edge of a span. (It will be assumed for the purposes of illustration that the v-dimension (out of the page in Figure 5) is constant, though that is not generally the case, and in particular is not the case in the example of Figures 2 and 3.) 45 For each intermediate point Pint along the line segment P.3-P.1, it is a straightforward matter to determine the corresponding t- and u- values in texture map space; it is simply a matter of linear interpolation. It may, for instance, be done as follows. The length (Pint - P.3) is divided by the scan length (P.1 - P.3), yielding a ratio R. The t-/u- coordinates corresponding to P.3 (e.g. (t.1, u.1))andP1 (t.r, u.r) are determined. To determine the t-/u- coordinates (t.int, u.int) - and hence texture map addresses - corresponding to any intermediate point Pint, the following formulas may be used: 50 t int = R * (t.r - 1. 1 ) (Eq. 3)

u.int = R * (u.r - u.1) (Eq. 4)

This is easily generalizable into three dimensions, i.e. z.int = R * (z.r - z.l) (Eq. 5). In this manner, all texture map space coordinates (and texture map addresses) can be derived from the (x, y, z) coordinates of just the endpoints of the input

5 EP 0 752 685 A1

scan lines. Thus, the result of edge walking (i.e. one iteration of step 444) is to generate for the current span: an associated starting point (x,y values in screen coordinates for the left intersection point, determined from a predefined table stored in memory); vector information (including length and direction); and the (tuv) coordinates for the leftmost point of each 5 span. The "length" of the vector refers to the width of the span, in pixels, as it will appear on the display. At the end of execution of step 444, the data for one new (i.e. the current) span has been generated. The above edge walking procedure is an efficient one, but other suitable methods could also be used; the important thing is that the geometry be organized in such a way as to map onto the texture map so that a viewpoint onto the object may be rendered on a display. 10 In addition to the above, at step 444 the processor also determines the 3D slope of the current scan line in texture map space. For a given span such as span 231 , it is determined what the (tuv) values correspond to the (xyz) screen- coordinate system values for the left intersection point P. left and the right intersection point P. right. The rate of change of (t vs. x) is be calculated, as well as (u vs. x) and (v vs. x), as follows:

15 At (t. right -t. left) Au (u. right - u.left) Ay (v. right - v. left) _ . _ . _ 1,£ q' „. Ax " (x. right - x.left) ' Ax " (x. right - x.left) ' Ax " (x. right - x. left) '

where x. right is the x-coordinate of Pright, t. right is the t-coordinate of Pright as mapped onto the texture map space, with analogous definitions for the other (tuv) and (xyz) values in this equation. 20 These At/Ax, Au/Ax and Av/Ax values represent the 3D slope of a given span in texture map space. At box 446, a new span is selected as the current span. The spans constitute the intersections of scan lines (in screen coordinates) with the slice plane 230. The spans are represented as lines 231-235 in Figures 2 and 3. There will typically be hundreds of these per displayed object or scene, though for clarity's sake only four are illustrated in Figures 2-3. These features are well known in the field of computer graphics. Thus, in Figures 2 and 3, spans 231 -235 25 correspond to what will ultimately be horizontal scans across the display. At box 450, the processor 110 transfers the information for the current span (including the left edge intersection, the vector - direction and length - data, and the 3D slope), along with the physical addresses for the texture map, to the IMC 130. At box 460, the IMC carries out the actual texture mapping of the current span, i.e the projection of the current so scan onto texture map space to determine the voxel data for each point on the span. The IMC can determine the (tuv) value for each (xyz) value in a given span in the following manner. The (tuv) value for Pleft is given as input to the IMC in box 450. The IMC stores this value pair (tuv; xyz) and then calculates the (tuv) value corresponding to the next x- increment, i.e. the next pixel to the right of the current pixel on the current span, proceeding one pixel at a time (one increment of the x-value) per calculation. 35 At each such x-value incrementation, using the three slope values of Equations 6 the corresponding (tuv) values for the current pixel are determined. Additions based upon 3 (flesh this out) can be processed quite rapidly, more rapidly than the individual (tuv) values corresponding to the (xyz) values could be looked up in the aforementioned table, and more rapidly than divisions could be carried out. Thus, the mapping process is efficiently handled by the IMC. At box 470, the IMC then retrieves the colors/textures for the endpoints and internal points of the current span by 40 tabulating (looking up in a table) a linear memory address from the resultant (tuv) values. That is, each coordinate point in the texture map has an associated value for a color and intensity of the pixel that will be displayed to represent that point. This may be in the form of an actual RGB (or CYM or other format) value stored at that address in the texture map 1 84, or it may be a pointer or address to a table or process for determining the color value. Such a process might be a shading procedure. If a table is referenced, then actual color values may be stored in the table entries. A number 45 of variations are possible, and can be selected by the designer of the particular implementation. At box 480, the current scan line is rendered by the IMC, i.e. it is written to display memory (such as a VRAM or display buffer), output directly to a display, spooled for printing, written to a file, or the like. Control of the procedure is now transferred from the IMC to the processor. At box 490, the processor determines whether there is another span to be rendered in this polygon. In the example 50 of Figures 2-3, the method thus proceeds to step 446 for execution of steps 446-480 on the data representing span 232, and subsequently likewise for spans 233-235. Once the entire triangle 230 has been processed by the steps at boxes 442-490, then at box 500 the processor determines whether there is another polygon (e.g. triangle) in this scene or view to be rendered. If so, the method proceeds to step 442, and if not the method proceeds to step 510, where the processor determines whether another 55 view or geometry has been selected (e.g. by the user or by a program's executing process) for rendering. If so, then the method proceeds to box 410 to begin anew, and otherwise the method stops at box 520. It will be appreciated from the foregoing that optimal use is made of both the processor and the IMC; they pass control back and forth to one another to execute those steps that are most efficient for each. Thus, the texture mapping,

6 EP 0 752 685 A1

retrieval of voxel data and rendering steps (boxes 460-480) are all, in balance, most efficiently executed by the IMC, freeing up the processor for other tasks. The use of the dedicated CMEM by the IMC allows this to be done, and thus the processor is not tied up either by intensive graphics calculations or by massive data transfers over the bus to and from memory. 5 The steps that the processor does execute are, however, efficient for it to do so, and unsuitable for the IMC. Thus, high efficiency of 3D graphics rendering is achieved without great architectural complexity or the expense of a dedicated pixel bus or other dedicated hardware, but the expediency in the method of control passing between the processor and the intelligent memory controller. While the present invention has been described with reference to a few specific embodiments, the description is 10 illustrative of the invention and is not to be construed as limiting the invention. Various modifications may occur to those skilled in the art within the scope of the invention. For example, although in the described embodiment the invention has been implemented using computer software, it will be appreciated that functions implemented by the software could be implemented in firmware or by means of special purpose hardware (e.g. ASICs) in alternative embodiments.

15

FILE E 435 9 IRH

20

25

30

35

APPENDIX A 40

45 THIS DOCUMENT IS REFERRED TO IN THE ACCOMPANYING APPLICATION AND IS INCORPORATED IN THAT APPLICATION BY REFERENCE.

THIS DOCUMENT IS TO BE PLACED ON THE EUROPEAN PATENT OFFICE so FILE FOR THE ACCOMPANYING APPLICATION SO THAT IT IS AVAILABLE FOR PUBLIC INSPECTION

55

7 EP 0 752 685 A1

SPARCstation 10SX Graphics Technology

A White Paper

^Sun

8 EP 0 752 685 A1

© 1993 Sun Microsystems, Inc. *" 2550 Garcia Avenue, Mountain View, California 94043-1100 U.S. A All rights reserved. This product and related documentation is protected by copyright and distributed under licenses 5 restricting its use, copying, distribution and decompilation. No part of this product or related documentation may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Portions of this product may be derived from the UNIX® and Berkeley 4.3 BSD systems, licensed from UNIX Systems Laboratories, Inc. and the University of California, respectively. Third party font software in this product is protected bv copvrightand licensed from Sun's Font Suppliers. 10 RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to restrictions as set forth in subparagraph (cKlKii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 and FAR52.227-19. The product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending 15 applications. TRADEMARKS Sun, Sun Microsystems, the Sun logo, SunSoft, Solaris, OpenWindows, the XIL Imaging Library, the Graphics Library, the SunPHIGS Graphics Library, and the Direct Graphics Library are trademarks or registered trademarks of Sun Microsystems, Inc. UNLXand OPEN LOOK are registered trademarks of UNIX System Laboratories, Inc. All 20 other product names mentioned herein are the trademarks of their respective owners. IrisGL and OpcnGL are trademarks of Silicon Graphics, Inc. NTH CL is a trademark of Nth Graphics. PosScript and Display PostScript are trademarks of Adobe Systems. Figaro PHIGS is a trademark of Liant Software. GRAFPAK GKS is a trademark of Advanced Technology Center. HOOPS is a trademark of Ithaca Software. Prior GKS is a trademark of Gallium Software. 25 All SPARC trademarks, including the SCD Compliant Logo, are trademarks or registered trademarks of 5PARC International, Inc. SPARCstation, SPARCstation 10SX.5PARCserver,SPARCengine,SPARCworks,and SPARCompiler are licensed exclusively to Sun Microsystems, Inc. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK® and Sun™ Graphical User Interfaces were developed by Sun Microsystems, Inc. for its users and 30 licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox , which license also covers Sun's licensees who implement OPEN LOOK GUIs and otherwise comply with Sun's written license agreements. X System is a trademark and product of the Massachusetts Institute of Technology. THIS PUBLICATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KLND, EITHER EXPRESS OR IMPLIED, LN'CLUDLNG, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-LNFRLNGEMENT. THIS PUBLICATION COULD INCLUDE TECHNIC A LIN ACCURACIES OR TYPOGRAPHICAL ERRORS. CHANGES ARE PERIODICALLY ADDED TO THE LNTORMATION HERELN; THESE CHANGES WILL BE 40 INCORPORATED LN NEW EDITIONS OF THE PUBLICATION. SUN MICROSYSTEMS, LNC. MAY MAKE IMPROVEMENTS AND/OR CHANGES LN THE PRODUCT(S) AND/OR THE PROGRAM(S) DESCRIBED IN THIS PUBLICATION AT ANY TIME.

9 EP 0 752 685 A1

Contents

1. Overview 1

Introduction 1

Balanced Graphics Functionality and Performance 3

Principal Markets 5

Color Prepress 5

Technical Imaging 6

Comercial and Technical Document Imaging 6

Electronic Design Automation 7

Mechanical Design and Computer-Aided Design 7

Geographic Information Systems (GIS) 7

Earth Resources 8

Multimedia 8

2. Architectural Overview 9

SPARCstation 10SX Architecture 9

SPARCstation 10SX High-Level Architecture 12

10 EP 0 752 685 A1

SMC Low-Level Architecture r. 13

SPARCstation 10SX Packaging 15

The SMC Processor 15

SPARCstation 10SX Configuration Options 16

SPARCstation 10SX Graphics Features and Functionality .... 17 Integer Arithmetic 17

Vectorization 18

Conditionals 18

Accelerated Memory 18

Accelerated Rendering Functions 19

SPARCstation 10SX System Performance 20

3. SPARCstation 10SX Software 23

SPARCstation 10SX Software Interfaces 23

The XIL Imaging and Video Library 23

The XGL Geometry Library 27

The OpenWindows Environment 28

Direct Xlib vs. Xlib Graphics Libraries 28

4. Summary 31

References 33

Glossary 35

SPARCstation WSX Graphics Technology White Paper— October 1993

11 EP 0 752 685 A1

Overview 1 —

Over the last several decades, computer graphics has grown from specialized applications in niche markers such as mechanical design and mapping, to broad usage in areas ranging from electronic design and animation to graphics arts and multimedia. Accompanying the spread of computer graphics use have been the general availability of low-cost graphics subsystems, the proliferation of development tools and graphical user interfaces, and an explosion in the number of available applications. These general trends have largely occurred through advances in graphics technology, such as high-quality rendering tools, sophisticated 2-D and 3-D accelerators, and the emergence of standards in application program interfaces. Coincident with these advances has also been the trend toward providing these features on low-cost desktop systems, leading to affordable solutions for increasingly broader groups of users. Despite the continued reduction in the cost of desktop systems, graphics device architecture has not changed significantly over time. 2-D and 3-D graphics display devices, even with clear innovation and integration in design, are still based on the philosophy of accelerating limited predefined geometry primitives, such as vectors and polygons. Designing graphics products with such specialized applicability results in the delivery of performance and functionality to geometry-oriented applications only.

12 EP 0 752 685 A1

The image manipulation capabilities of current graphics accelerators are limited by their design, which focuses principally on the computation of geometric data. The complexity of such devices also requires that they reside on an I/O bus, which needs additional logic and has greater overhead in communicating with the CPU-memory bus. Thus, imaging applications that use the current design of graphics processors are limited by such architectures, because they stress the manipulation of geometry over image data. The SPARCstation"" lOSX™ represents an revolutionary approach to the requirements of graphics applications, providing balanced acceleration for both images and geometry. Residing directly on the CPU-memory bus, the SPARCstation 10SX memory controller and rendering engine, called the SMC, can perform extremely fast transfer of data from physical memory (DRAM) to the frame buffer. This architecture serves to increase the image transfer rate by more that ten times the equivalent processor technology residing on an I/O bus. The close-coupling of the SMC to the SPARCstation 10 CPU further enables floating point-based graphics computation to utilize the high- performance transformation capabilities on the SPARC processor. The SMC acts as a vector integer coprocessor working in a balanced fashion with the host processor. The SMC provides pixel and vector manipulation logic, which complements the floating-point computational capability of the CPU. The SMC's direct access to physical and display memory additionally provides flexible and powerful capabilities for the management, manipulation, and display of large image sets. The SPARCstation 10SX system was designed to leverage significant advances in multiprocessor bus design, high-performance RISC technology, and graphics integration. The development of the SPARCstation 10SX system represents a new standard in integrated graphics, delivering the broadest graphics computation and rendering capabilities of any system to date. The SPARCstation 10SX graphics architecture employs a new and powerful approach to graphics integration. Never before has such integration with CPU and memory been achieved, with the significant results of flexible image manipulation and unsurpassed performance. By combining the SPARCstation 10 processor with the SMC, graphics rendering performance is no longer limited to the bus bandwidth of peripheral devices. The SPARCstation 10SX CPU-graphics integration further enables closely coupled execution of graphics commands without the necessity of maintaining separate graphics display and processor memory, thus increasing performance at lower cost.

6 SPARCstation 10SX Graphics Technology White Paper— October 2993

13 EP 0 752 685 A1

This white paper describes the innovative design of the SPARCstation 10SX graphics architecture, and its optimized feature set for providing graphics acceleration to a broad range of application areas.

10 Balanced Graphics Functionality and Performance

Among the key trends computing users face today are the move toward running multiple, varied applications and the growing demand for high- performance graphics. Multitasking operating systems, window-based graphical user interfaces, and powerful multimedia applications are just some of the factors influencing the trend toward more powerful graphics on the desktop. Once the domain of high-end users in computer-aided design (CAD), technical imaging, color prepress and other markets, powerful graphics subsystems are 20 increasingly of interest to volume users with broader requirements. To meet the needs of this large segment of users, specialized, moderately-priced accelerators have become available for personal computers and workstations. Often, these devices are optimized for focused acceleration of windowing, image manipulation, or 3-D CAD. However, it is uncommon for a graphics 25 display processor to address all of these areas in an equal manner. This lack of specialization in distinct contrast to the requirements of users today, who require the same performance to run a multimedia or color publishing application as they do a 3-D graphics one. Balanced graphics performance, which includes image manipulation performance, is the key. Figure 1.1 illustrates the nature of 2-D and 3-D geometry accelerators as compared to the SPARCstation 10SX. Various markets, including commercial imaging, rely on support for image manipulation operations, such as halftoning, dithering, and color transformation which is provided by the SPARCstation 10SX. Similarly, multimedia applications, characterized by JO requirements for image compression and decompression, audio/video signal processing, and data format conversion are also addressed by the unique architecture of the SPARCstation 10SX. Other graphics market segments having similarly diverse requirements for specialized operations are well addressed by this architecture.

HO Overview

ou

oo

14 EP 0 752 685 A1

5

SPARCstatlon 10SX Technical/Commercial Imaging Color Prepress EDA 10 MCAD Earth Resources GIS Photorealistic Rendering J Simulation J Multimedia M 15

20 3-D Geometry Accelerators 2-D Geometry MCAD Accelerators Photorealistic Rendering Drafting Simulation EDA GIS Earth Resources

Figure 1.1 Display-oriented Application Space and Relevant Operations

A key restriction in most framebuffers and geometry accelerators is associated with the acceleration of off-screen images. On these devices, offscreen manipulation and acceleration is virtually nonexistent. Off-screen images are used in imaging applications and application programming interfaces (APIs) as intermediate storage for data prior to display. Efficient off-screen image support relies on the use of memory rasters. The reason for the lack of memory raster support in common graphics devices is because almost all graphics accelerators are display-oriented. Thus, they are optimized for operations that directly render geometry, rather than provide facilities for the manipulation of images. To address the independent requirements of the major market segments, the SPARCstation 10SX graphics technology was designed to employ features that allow it to accelerate the low-level rendering operations common to all graphics accelerator devices. In addition, the unique coupling within the 45 8 SPARCstation 10SX Graphics Technology White Paper— October 1993

50

55

15 EP 0 752 685 A1

SPARCstation 10SX system between the SMC, memory controller and CPU enable direct access to system and display memory in such a way as to also provide acceleration for image manipulation. The SPARCstation 10SX system includes broad-based acceleration of all 2-D and 3-D graphics rendering and manipulation operations, making it suitable as a volume platform for common graphics applications of all types. In addition, its unique features make it especially well suited for very specific markets. This is, in part, due to the SPARCstation 10SX system's high-performance imaging capabilities, particularly in areas where imaging requirements have historically been poorly addressed. The sections that follow describe the key primary and secondary market segments, the characteristics of their applications, and how the SPARCstation 10SX architecture provides acceleration to them.

Principal Markets

Color Prepress

A typical customer in this field wants to scan in high-resolution color photographs and prepare them for printing on a color printer. The first step in this preparation might be to perform simple image processing operations on the images, such as sharpening, blurring, rotation, or cropping. Finally, the images must be converted to the CMYK color space, which provides the digital equivalent of photographic color separation. Morphological dilation of selected image features (known as "trapping") may be employed to avoid registration problems. Because color prepress is concerned with guaranteeing image fidelity, it is important that the image data be digitally maintained at the highest possible resolution. The SPARCstation 10SX system's ability to efficiently manipulate large image rasters, coupled with its support for color space manipulation, make it ideal for color prepress requirements. The high level of integration between the SMC, CPU and display memory also insures that image processing operations an be performed quickly on multiple images.

Overview

6 bK 0 bBb Al

lecnmcai imaging Technical imaging applications include remote sensing, medical imaging, machine vision, and other areas where imaging is used to distinguish physical characteristics from digitized imagery. Technical imaging applications require broad base-level functionality for image manipulation, display, compression, and color. These applications frequently require advanced image processing capabilities, including transformations, filtering, and image enhancement.

The SPARCstation 10SX system is especially well suited for the wide range of image sizes, pixel depths, and compression characteristics common in technical imaging markets. The ability to manipulate and display sequences of images is extremely important to technical imaging users in areas such as medicine and remote sensing — and the SPARCstation 10SX system's high-speed access to display memory ensures that high frame rates can be achieved. In addition, the common image manipulation facilities (such as convolution, warping, logical operations, comparisons, and multiply) are directly accelerated by the SPARCstation 10SX system.

Commercial and Technical Document Imaging

Technical imaging customers handle fewer images than commercial ones, although they are generally much larger. Potential applications in the technical document imaging market include textile design, cartography, facilities layout, construction, and architecture. Applications in this market need to process very large images, provide image-manipulation capabilities such as 'de- speckling' for removal of scanning artifacts, realignment to correct for inaccurate scanning registration, and warping to deal with the attributes of old hard-copy drawings printed on unstable materials. Commercial imaging customers such as banks and other financial institutions, airlines, and insurance companies process enormous numbers of paper documents during a typical day. Applications addressing this market must read digital versions of documents from scanning devices, compress the images into CCITT Group 3 or 4 facsimile format, index, and store them. Later, it may be necessary to retrieve the documents in a rapid manner for additional image processing, scaling, and enhancement

The SPARCstation 10SX system was designed to accelerate the low-level image raster operations specifically characterized by the document imaging markets. For example in technical imaging, warping, enhancement, and morphology

" 3rAj\i_staiion jy^A graphics technology White Paper — October 1993 EP 0 752 685 Al

operations are directly supported through individual and multiple optimized operations. The memory access speed between the SMC and the display memory further ensures that the copying and manipulation of multiple source images occurs very quickly. Manipulation and processing of images with the SPARCstation 10SX is aided by its advanced compression/decompression capabilities, data conversion, resizing (including zooming), and enhancement facilities. The SPARCstation 10SX system's ability to manipulate image raster data very quickly is especially important to this market segment, because of the extremely large volume of images that must be previewed and managed.

Electronic Design Automation (EDA)

Another significant application area for geometry graphics is Electronic Design Automation. Historically, the requirements of this market have been integer or fixed point integer coordinate data and rich 2-D primitive support. EDA applications generally require high-resolution displays, fast manipulation of 8- bit fixed point integer data. EDA applications also frequently must display large amounts of data, and re-rendering and fast pan and zoom of the display are important features. EDA applications benefit from several characteristics of the SPARCstation 10SX, including very fast pan and zoom capabilities, support for high- resolution display, and on-chip management of integer data.

Mechanical Design (MCAD) and Computer- Aided Drafting (CAD) Mechanical design and computer drafting applications rely on the efficient manipulation of 2-D and 3-D vector data. The SPARCstation 10SX renders vectors directly, and uses the floating-point capabilities of the SPARC CPU to accelerate transformations, light and shading computation. As a result the SPARCstation 10SX provides performance on par with more expensive specialized geometry accelerators. geographic Information Systems (GIS)

In general, GIS applications vary broadly in data format, and manipulate large amounts of data. As a result, GIS applications depend equally upon both integer and floating point data manipulation and rendering. EP 0 752 685 Al

I he SPARCstation 10SX system is well suited for GIS applications for a variety of reasons. Copy speed from main memory to the screen, large in-memory support, and support for overlays (through an appropriate graphics API) provide capabilities important to GIS. Where data format or quantity is particularly important, the SPARCstation 10SX architecture enables graphics data to be processed very quickly, resulting in faster data fetching. The close coupling between the SMC and CPU ensures optimized floating point performance while also accelerating integer instructions. The 24-bit rendering capabilities of the SPARCstation 10SX system, together with its ability to manipulate very large rasters, also make it attractive to GIS. tarth Resources

Like geographic information systems, the earth resources market typically deals with large volumes of data. Earth resources market requirements include 2-D and 3-D geometry data, and frequently, 2-D image data. With such varied requirements, earth resources applications need hardware with the flexibility to accelerate geometry with a range of coordinate types, and with the capability of manipulating images efficiently. The SPARCstation 10SX system is an ideal platform for earth resources applications. Its ability to address video and display memory directly, as well as providing strong vector and triangle performance enables earth resources applications to manipulate and render large data sets easily.

Multimedia

Multimedia applications require rapid loading and display of multiple data types, including images, text, geometry and audio and video. The SPARCstation 10SX provides benefits to multimedia applications through its tight coupling to the SPARC CPU via the MBus, compression/decompression logic on the SMC, and extremely fast raster memory to display transfer required by video.

SPARCstation 10SX Graphics Technology White Paper — October 1993 Architectural Overview

sij/\K^station wsA Arcnitecture

The SPARCstation 10SX graphics processor is a scalable graphics architecture consisting of an enhancement to the memory controller chip present on MBus- based SPARC™ systems. The memory controller chip is specifically enhanced with arithmetic processing capability to become an "intelligent" memory interface capable of optimized pixel operations applicable to graphics and imaging.

The SPARCstation 10SX, like its predecessor the SPARCstation 10, includes 2 MBus-based slots for the simple upgrade to 2 or 4 processors. MBus boards can easily be added to the SPARCstation 10SX by the riovice user, and are automatically recognized by the Solaris operating environment. SPARCstation 10SX technology is based on the concept of graphics operations becoming extensions of the common integer and floating-point operations of the CPU. The SPARCstation 10SX system employs a closely-coupled architecture, wherein a dedicated processor handles low-level display operations through processor-memory direct access. In addition to the usual buffering of data between the bus and memory interface and the generation of address and control signals, the SMC loads and stores memory data to and from its internal registers. It also performs arithmetic and other processing on that data. The processing consists of extensive pixel operations that support graphics and imaging functionality present in graphics and window system software APIs such as the XGV and XIL™ graphics libraries, and the OpenWindows™ windowing environment.

3

J bP U fbZ bBb Al

ine bML is a slave processor to the SPARC CPU. It is designed to accept commands from the CPU, and has no independent instruction-fetch capabilities.

The SMC was designed as an extension to the memory controller chip for several reasons. The pixel processing speed of CPUs is, in general, a limiting factor for many application areas — thus malting any significant acceleration an attractive proposition. Second, since a large part of the cost of graphics and imaging accelerators is in the need for dedicated memory components, an architecture that reduces these costs and requirements is desirable. Finally, the memory interface on most systems typically has the highest bandwidth of any data path in the system. Its direct relationship to the CPU makes it an ideal candidate for providing a path for access to additional graphics functionality.

Next generation applications such as multimedia place significant demands on graphics rendering performance, video display and audio synchronization. The SMC's ability to move data to and from the CPU very quickly, together with the XIL imaging library's support for optimized compression and decompression algorithms were key considerations in designing the SPARCstation 10SX to support multimedia. Additional platform features of the SPARCstation 10SX which support multimedia applications include 16-bit CD-quality audio, a 64-bit wide SBus, and multiprocessor support. The SMC processor extends the functionality of the standard memory control found on MBus-based systems by incorporating logic for arithmetic and logical operations, vector (multiple) instruction support, multipurpose registers, and a variety of other high-level computational features. Table 2.1 describes some of the architectural elements found in the SMC-

jmKLstadon WbX Graphics Technology White Paper— October 1993

1 EP 0 752 685 A1

5 r-

Feature Description 40 MHz Clock Speed Synchronous operation with motherboard clock and asynchronous 10 with the CPU 128 32-bit Registers Multipurpose registers for intermediate SX calculations Register Files with Dual Read Ports Employed and configured to allow and Single Write Ports simultaneous read access of 8 15 registers and write access of up to 4 registers Instruction Queue of 64 Levels Used to buffer between 32 and 64 instructions, depending on type, sent from the host CPU 20 Control Registers on Separate Physical Separate from noncritioal resources Page allowing only supervisor access to critical resources Error Checking and Flagging For handling of exceptions should the SMC encounter a fault condition MBus Interface Logic Should the SMC be busy and unable to make a normal termination response, this logic issues requests to the CPU for relinquishing and retrying access Four-Stage Pipeline For execution instruction queuing Vector Instruction Support Allows for up to 16 iterations of an instruction (up to 32 with load and 30 store) Dual Arithmetic Logic Units Allows individual operations within vector operations to be executed simultaneously Dual Multiply Units Allows two 16x1 6-bit multiply operations to be executed 35 simultaneously 64-bit Interface with Memory Controller Allows multiple vector elements within an instruction to be accessed simultaneously Store and Load Instructions Supporting direct and anay addressing, 32 distinct data types, data clamping 40 on stores, and masking of stored data at item and bit level Table 2.1 SMC Processor Architecture Features and Description

45 Architectural Overview 15

!2 bP 0 /bZ b8b Al

Feature Description Boolean Logic Operation Support Providing direct support for AND. OR. and XOR operations Multiply Operation Support Including multiply, dot product, and SAXPY, available in either 16x16 or 32x16 operands 32-bit Barrel Shifter Support Supporting logical, arithmetic and funnel shifts 32-bit Signed Value Comparison Supporting tests for > < = >= and Support <= SELECT Instruction Support For vector merge and 1-to-N expansion Raster Operation Suppod Through common ROP instructions 2-D Register Support With GATHER and SCATTER funclions Incremental Addition Instruction For use in Gouraud shading Bresenham Interpolation Instruction For line drawing lacuez.i iMi. irocessor Architecture features and Description )rAKL.station wz/L High-Level Architecture The SMC processor lies on the MBus between the SPARC CPU(s) and memory. In addition, Video RAM (VRAM) is mapped into the same address space, thus allowing the SMC to access and manipulate the display in the same manner as DRAM. This addressing gives the SPARCstation 10SX system superior bandwidth in moving images from memory to the screen. VRAM is packaged as a portion of a VSIMM, which also includes a Video Buffer Chip (VBC), a memory display interface (MD1), and Digital-to-Analog Converter (DAC) logic. Instructions executing in one or more of the SPARC CPUs make requests of the memory controller (MC) portion of the processor. The memory controller logic fulfills CPU requests for memory accesses, such as those required for fetching, loading, and storing instructions.

When one of the CPUs encounters a graphics operation, it passes it to the SMC pixel processor (SXPP), which executes the instruction directly. When memory access is required to complete an operation, the SXPP makes requests of the memory controller to process its requests. Figure 2.1 illustrates the high-level architecture of the SPARCstation MBus system, including CPU, SMC processor, and video memory.

5/vtKl_sw; lUbX graphics Technology White Paper— October 1993

i EP 0 752 685 Al

Main Memory (UWXgWJ MH1J

hgure >A bl'AKLstation lObX System High-level Architecture jMC Low-Level Architecture

1 ne bML processor section executes the commands written to it by the SPARC CPU over the MBus, including loads and stores between the SMC internal registers and main memory, using the memory interface. When a normal memory access over the MBus occurs, the SMC pauses and lets it through. The SMC stall latency needs to be quite short in this case to avoid burdening normal memory traffic. This short latency is accomplished by implementing register loads and stores that acta great deal like MBus memory block accesses (e.g., bursts of 4 or 8 64-bit words). The blocking effect is ameliorated by the asynchronous nature of the CPU and SMC operation. Figure 2.2 illustrates the low-level view of the SMC processor. The internal register set is designed with 128 registers, so that it can contain the variables required for common operations, such as 5x5 and 7x7 convolutions and edge and span interpolation.

ftrcnuecturai uvervrczv 17

4 EP 0 752 685 A1

Instruction Queue, Decode & Control

Source A Source B

Destination

figure 2.2 SPARCstation 10SX SMC Processor Low-level Architecture

The SMC memory data types include signed and unsigned bytes, shorts, and longs to support common pseudocolor, grayscale and full-color images. Memory data types are all converted to and from internal 32-bit register values, with additional specification of their fixed-point precision, if appropriate. Internal processing is carried out on 32-bit integers, with specification of the multiply source and destination precision. SMC operations include common arithmetic and logical operators, with the addition of various multiply and multiply/add operations. The SMC itself has no conditional instruction execution, since it doesn't fetch instructions directly and can't perform branches. However, memory writes are conditioned by masking, and arithmetic operations can conditionally write destination registers and set the mask. SMC operations can be specified as multiples (e.g., with an operation count field, or vector instructions). This approach reduces the bandwidth needed for the CPU command interface and matches the block memory behavior of the memory interface. The count is designed to be small enough to keep the SMC latency low. Longer counts, such as polygon spans or image scan lines, are broken by the driver software into smaller pieces.

IS SPARCstation 10SX Graphics Technology White Paper— October 1993

25 ine 3ML includes an instruction queue to de-couple its execution speed from the CPU command write speed. The SMC can be halted at instruction boundaries, and the queue saved and restored. All internal SMC registers are accessible to the CPU on the Mbus. The SMC has no exception or error states caused by command execution.

SPARCstation 10 SX Packaging

The SPARCstation I0SX has the same compact, sturdy construction as the familiar pizza box' used in other SPARCstarions. This low-profile, easy to service packaging fits conveniently beneath the standard 19-inch high- resolution Sun monitor. The SPARCstation 10SX includes industry standard I/O ports for: * SCSI-2 * Synchronous/asynchronous RS-422 and RS-232 Serial communications * Parallel interface (Centronics-compatible) * Ethernet supporting both AUI and lOBase-T for connection to local area networks * 13W3 monitor connector * Audio connector for speakerbox and microphone (included standard) The SPARCstation 10SX also includes three 64-bit SBus slots for a large variety of Sun and 3rd-party expansion boards including serial and parallel interfaces, networking, graphics, SCSI, video and specialized processors.

The SMC Processor

The SMC processor replaces the memory controller chip present on MBus- based SPARCstarions. The SPARCstation 10 system requires a full CPU board upgrade to add the SPARCstation 10SX processor. The SMC is located directly on the SPARCstation 10SX motherboard. Figure 2.3 illustrates the location of the SMC processor and the framebuffer V5IMMS in the SPARCstation 10SX desktop.

rcmiccmrai vvcrvtezD ^

5 EP 0 752 685 A1

vbiMivi Memory

Figure 2.3 SPARCstation 10SX System Packaging - SPARCstation MBus-based Desktop

SPARCstation 10SX Configuration Options

The flexible memory architecture of the SPARCstation 10SX system permits several alternative display configurations, based on the amount of configured VSIMM memory present and user-selected options for resolution and frame buffer depth. Frame buffer depth and display frequency are determined at boot time. Additional selection of display options is performed via command line options before the initiation of the Open Windows window server. In SMCC's current desktop configurations, VSIMM options are based upon the installation of either one or two 4- or 8-MByte VSIMM boards in the dedicated VSIMM slots adjacent to processor memory on the SPARCstation motherboard (see Figure 2.3).

Table 2.2 illustrates the display options for the SPARCstation 10SX system, based on available VSIMM memory.

zu SPARCstation 10SX Graphics Technology White Paper— October 1993

!7 EP 0 752 685 A1

Features Characteristics Frame Butler Depth Supported in Processor Memory Space Configurable Depth and Resolution 8-MByte VSIMM 1152x900 24 bits with 8-bit Overlay 1152x900 8 bits 1280x1024 24 bits with 8-bit Overlay 1280x1024 8 bits

4-MByle VSIMM 1152x900 24 bits with 8-bit Overlay 1152x900 8 bits 1280x1024 8 bits Multiple Lookup Tables Simultaneous Display of Truecolor Visuals - 2 8-bit colormaps of 256 colors or levels of gray to alleviate colormap flashing Table 2.2 SPARCstation 10SX Frame Buffer Features

SPARCstation 10SX Graphics Features and Functionality

The SPARCstation 10SX system's SMC processor is a special-purpose computer. While it can accelerate a wide range of graphics and imaging functions, it is principally a highly functional imaging accelerator. Most imaging and many vector functions are very appropriate for implementation on the SPARCstation 10SX system. Access to the SPARCstation 10SX graphics capabilities is achieved by the Solaris® graphics and window system APIs. As a result, developers of graphics and imaging applications need not concern themselves with SPARCstation 10SX instructions and their optimization. The APIs employ their internal pipelines to optimize SPARCstation 10SX instruction execution.

Integer Arithmetic

As mentioned previously, the SMC is an integer processor. It is able to support fixed-point precision (integer plus fractional bits) suitable for graphics and imaging, while geometric and numeric algorithms are better left to the floating- point capabilities of the host CPU.

Architectural Overview 21

>8 EP 0 752 685 Al

The approach of utilizing the CPUs floating point has the additional advantage of dual processing. While the CPU is performing a floating-point operation, the SMC can be performing related integer operations, thus providing graphical multiprocessing.

Vectorization

To achieve optimum performance, the SMC processor supports 'vector' instructions, that is, ones that perform a sequence of operations. This allows the SMC's instruction bandwidth across the MBus to be relatively low compared to the SPARC CPU — usually on the order of 20-25%. Many functions can be trivially vectorized, while others can be recast into vector form with some thought. The CPU write buffers and the SMC instruction queue provide load balancing between the SMC processor and the CPU, so that single-operation instructions are efficiently executed when intermixed with longer vector instructions. Often, the beginning or end loops need to be 'filled in' with single-operand or short-vector instructions, and the SMC's variable length vector instructions simplify this.

Conditionals

The SMC processor is most useful when executing a stream of instructions unconditionally. CPU read back of the SMC status registers will halt write pipelining. For this reason, several innovative features of the SMC processor instruction set, including the ability to conditionally write to registers or memory based on a bit mask that an be set by a comparison, serve to implement conditional processing without a test and branch. This allows conditionals to execute while preserving write pipelining performance.

Accelerated Memory

The SMC processor operates directly on physical memory, or on its internal register set. On a SPARCstation 10SX system, the video memory (VRAM) and dynamic memory (DRAM) are in the same physical address space, so the SMC processor can accelerate pixel operations on both of them. When the SMC has physically contiguous blocks of memory to operate on, memory management and usage optimized and performance is increased. Certain classes of

>fAKLstatwn IDbX Grapmcs Technology White Paper — October 1993

9 EP 0 752 685 Al

rendering, in particular, rendering vectors to an off -screen image, or maintaining a Z-buffer, benefit by reserving some system memory for the exclusive use of the SMC at boot time.

For typical XGL applications, 8 MBytes of memory reserved as contiguous physical memory for the SMC processor, provides optimal performance.

Accelerated Rendering Functions

The architecture of the SPARCstation 10SX system provides for optimized rendering of a number of imaging and graphics operations. Solaris APIs, optimized to utilize the above-mentioned features of the SPARCstation 10SX SMC processor, combine together with pipelining, casting and other methods to result in extremely fast performance for most base-level operations. Tables 2.3 and 2.4 illustrate the various types of imaging and graphics operations that are accelerated by the SMC processor

image Operation . Types Move Copy, BUT, Fill, Clear Point Arithmetic. Logical, , ROP Conversion Monochrome/Greyscale/Halflone Color/Pseudocolor/Dither Geometry Resizing, Rotation, Warping- Filtering Convolution, Interpolation Enhancement Equalization, Contrast/Range Adjustment 'able 2.3 Accelerated Imaging Operations

Geometry Operation Types Symbols & Patterns Text, Markers, Stencils, Patterns Vectors (Lines) Solid, Patterned, Wide. Depth-cued, Antialiased Polygons Filled, Patterned. Shaded, 2-buffered, Antialiased aoie i.a. Accelerated (jeometry Operations

\rchitcctural Overview 23

0 EP 0 752 685 A1

SPARCstation 10SX System Performance

The SPARCstation 10SX processor delivers outstanding performance to a broad range of graphics and imaging functions. These functions, illustrated in Table 2.5, represent common operations for both geometry and raster-based graphics rendering, which are accelerated through Sun's low-level APIs such as the XGL, XIL, and Direct Xlib libraries. Additional, 3rd-Party APIs layered above the XGL, XIL, and Direct Xlib libraries, will provide performance corresponding to the level of optimization at which they were ported. In general, most layered APIs experience substantial performance boosts using the SPARCstation 10SX system over common unaccelerated frame buffers. Chapter 3 describes the SPARCstation 10SX system graphics software interfaces in more detail.

Operation Pixel Rate1 (32-btt) Fill 43.65 MPixel/s Monadic (3 band) 6.85 MPixel/s Monadic (Single band) 23.39 Mpixel/s Dyadic (3 band) 4.83 MPixel/s Dyadic (Single band) 16.90 MPixel/s Blend 1.86 MPixel/s Convolve (3x3 8-bit Separable) 4.66 MPixel/s Convolve (5x5 8-bit Separable) 2.58 MPixel/s Lookup (8-bit to 8-bit) 7.84 MPixel/s Pan (3 band) 19.69 MPixel/s Zoom. x2 (3 band) 19.32 MPixel/s Rotate, 45 deg (3 band) j 2.42 MPixel/s 3Z-t)ft. •xcapl wh«ra otherwise cpocmed fable 2.5 Estimated SPARCstation 10SX Performance

While the SPARCstation 10SX SMC is an integer processor, 2-D and 3-D vector performance (assuming CPU floating-point calculation of transformations) will also render at significant rates. SPARCstation 10SX performance is not specifically limited to rendering operations, since the internal architecture

bPARCstatwn 10SX Graphics Technology White Paper— October 7993

11 EP 0 752 685 A1

provides support for non-display oriented imaging, compression, and correlation functions. Performance for a range of general 2-D and 3-D geometry operations are illustrated in Table 2.6.

Operetion SPAHCstatian 10SX Model 40 Geometry Test (GBench) 2- D Vectors' 495.000 3- D Vectors' 330.000 3-D Polygons2 30.500 X11perf 1.3 Xmark 3 704 tOO-praal, randomly orianiad. trantformad. and eippad 2 SO-pixal. Qouraud ahadad, Z-buflarad, unlft, trtangla maah Table 2.6 SPARCstation 10SX Performance

Architectural Uvcmai;

12 EP 0 752 685 A1

5

10

15

20

25

30

35

40

45 26 SPARCstation 10SX Graphics Technology White Paper— October 1993

50

33 bK U fbZ bBb Al

SPARCstation 10SX Software

5iJ/iK^station lUbA Software interfaces

The SPARCstation 10SX system is supported by all Solaris graphics and window systems APIs, including the XGL, XIL libraries, PostScript, and the OpenWindows Version 3.3 (Xll R5) window system. Industry-standard X- extension libraries, such as Xlib and PEXlib, are also available and are accelerated via the XGL and XIL foundation graphics libraries. Emerging standards such as the X Imaging Extension (X1E), and Display PostScript (DPS), together with their related APIs, will be supported in the future. As both an imaging and geometry device, the SPARCstation 10SX system accelerates each of the APIs mentioned above. The sections that follow briefly describe the foundation graphics interfaces and the functions accelerated on the SPARCstation 10SX processor. Figure 3.1 illustrates the Solaris graphics and windowing APIs and their relationship to the SPARCstation 10SX system.

/ ne imaging ana viaeo Library

The XIL library is a foundation-level imaging interface providing the common functions required by many imaging applications. As a foundation software layer for imaging applications, the XIL library defines how imaging operations, such as display, image manipulation, and compression and decompression, are carried out.

7

4 EP 0 752 685 A1

Applcalions

5 > x a. q_w tr ^

AIL XGL Imagino/Video 2-D/3-D Geometry X Windows

SPARCstalon 10SX Figure 3 1 Solaris Foundation Graphics Libraries and Layered Interfaces

The XIL library contains three primary components: a programming interface specification for basic imaging functionality, a high-performance implementation of the specification, and a standard hardware interface specification, which enables third-party hardware developers to readily support the XIL library and applications built on it. The XIL library is an object-oriented interface. As such, it hides the details of its implementation from the programmer. This enables the internals of the library to be extended without changing the interface in an incompatible way. C language bindings provide compatibility with other tools available under the Solaris environment including a large number of software development products. Figure 3.1 illustrates the XIL library in relationship to other foundation-level interfaces. All XIL integer functions, whether they are 8, 16, or 32 bits, are accelerated by the SPARCstation 10SX system. Floating-point operations are handled by the SPARC CPU's integrated floating-point processor. The SPARCstation 10SX system accelerates broad classes of functions, including: * Pan, zoom, scale * Flips, transformations * Raster rotation * Sharpen, blur, convolution * Monadic and dyadic image operations * Blend, composite * Video decompress and display zo SPARCstation 10SX Graphics Technology White Paper— October 19S3

!5 EP 0 752 685 A1

The monadic and dyadic operations referred to above are of particular interest because of the structure of the XJL library, and its architectural reliance on them. The SPARCstation 10SX system is especially well suited to accelerating such operations. The graphics porting interface employed with the XIL foundation library provides a flexible and powerful mapping to the functionality of the SPARCstation 10SX processor. Every XJL operator can be considered an atom. Groups of XIL operators concatenated together in a sequential manner are known as molecules.

Molecules are particularly important to the SPARCstation 10SX SMC processor because they allow the device to notify the XIL library at runtime that it (the device) wishes to provide hardware support for groups of chained operations. For example, the SMC provides image scaling and casting from 15- to 8-bit data. As a result, and unlike many other graphics devices, the SMC processor does not have to move the data back to CPU memory for XIL software to cast from 16 to 8 bits. The additional copy operation is avoided, because the SMC processor can scale and cast on-chip. At runtime, the SMC device handler loads into memory and notifies the XIL library of the atoms and molecules it wishes to replace with its own mapping to hardware. The XJL library uses a directed acyclic graph to hold a list of these dependencies during application execution. When the XIL library encounters a known sequence of operations (defined by a molecule), it transfers the operation to the SMC for execution. Table 3.1 illustrates the XIL operations accelerated bythe SPARCstation 10SX SMC processor.

Operation Description xil_add Add two images »l_add_const Add image and constant xil_affine Affine transformation xil_and Logical AND of two images xil_and_const Logical AND of image and constant xil_blend Alpha blend Ulher compression formats (under development) will be accelerated by the SMC fable 3.1 XIL Operations Accelerated by the SPARCstation 10SX System

.FAKCstafron 10SX Software 29

16 EP 0 752 685 A1

5 Operation Description xil_color_convert Color space conversion xil_convolve 3x3, 5x5. 7x7 convolution xil_decompress JPEG decompress1 10 xil_copy Copy image xil_divide Divide two images xil_divide_into_const Divide constant by image xil_extrema Find image minimum and maximum xil_get_pixel Get a pixel's value xiljookup LUT operation xil_multiply Multiply two images xil_multiply_const Multiply image by constant xil_not Logical NOT xil_or Logical OR xil_or_const Logical OR of image with constant xil_ordered_dither Ordered dither xil_paint Paint an image xil rescale Destination - k1 * source + k2 xil_rotate Rotate an image xil_scale Resize an image xil_set_pixel Set a pixel's value xil_set_value Fill an image with a constant 30 xil_subtract Subtract one image from another xil_subtract_const Subtract constant from an image xil_subtract_from_const Subtract image from constant xiljhreshhold Threshold the image 35 xil_translate Translation of an image xil_transpose Transpose an image xil_xor Logical exclusive OR of two images xil_xor_const Logical exclusive OR of image/constant Other compression formats (under development) will be accelerated by the SMC Table 3.1 XIL Operations Accelerated by the SPARCstation 10SX System

45 30 SPARCstation 10SX Graphics Technology While Paper— October 1993

50

37 EP 0 752 685 Al

The XGL Geometry Library

The XGL library is the Solaris foundation geometry library providing the functionality and performance required by applications requiring geometry manipulation and display. The XGL library provides for optimization of 2-D and 3-D rendering, including high-quality lighting and shading, advanced primitives (such as NURBS and meshes, texture mapping, antialiasing, and transparency), and a flexible geometry pipeline. The XGL library includes sophisticated state-of-the-art features, such as dynamic tessellation of NURBS surfaces, as well as flexible pipelines for lighting and transformation. 15 The XGL library has an object-oriented interface that hides the low-level details of implementation from the application developer. As such, it is architected so that it may evolve as an interface, permitting new algorithms to be implemented within the XGL library without incompatibly changing the application program interface bindings. Table 3.2 illustrates the XGL 20 operations accelerated by the SPARCstation 10SX system.

Operation Description 2- D - Solid and patterned thin lines. 8 and 24 bit - Triangles, quads, simple polygons, and general polygons, solid and patterned. 8 and 24 bit - Markers, 8 and 24 bit 3- D - Solid and patterned thin lines, 8 and 24 bit • Triangles, triangle lists, quad meshes, and general poly- gons, lighted and shaded, 24 bit only - Markers, 8 and 24 bit - Antialiased lines, 24 bit only. 4 types of blending including transparent and opaque - Blended Z buffered triangles, triangle strips, quad meshes and general polygons, 24 bit only, 4 types of blending including transparent and opaque - Antialiased dot markers. 24 bit only - Raster copy, induding depth conversion and Z-buffering - Accumulation buffering (for global antialiasing) Other - All other primitive/attribute combinations are accelerated via acceleration of dots, lines, and spans viner compression lormats are supported by XIL. but are not accelerated by the SMC able 3.2 XGL Operations Accelerated by the SPARCstation 10SX System

fvucusiaiiori lUiX Software 21 U /O^ DOO A I

u Deration Description xgl_context_copy_raster Copies raster to a window xgl_multi_marker Draws pre-stored markers xgl_murti_polylme Sequence of connected lines xgl_multi_simple_polygon Multiple polygons of 3 or vertices xgl_traingle_strip Strip of connected triangles xgl_corrtext_accumulate Manage XGL graphics context(s) iouiu j.j wperuiorb ntxeieraiea oy tne SfAKLStation 10SX System i nt

The Open Windows environment is an MIT (Xll R5) compliant server. Applications that use the X protocol, running both locally or remotely, will receive acceleration from the SPARCstation 10SX System. The graphic operations in the XII Window System map perfectly to the SPARCstation 10SX architecture, making the SPARCstation 10SX an excellent window system accelerator. In general, both graphics and window management functions have been optimized to take full advantage of the SPARCstation 10SX architecture. The following functions are accelerated on the SPARCstation 10SX system. * Hide/expose window * Resize window * Copy pixmaps to and from windows * " Text scrolling * Get and put image * Line drawing * Raster operations * Text

jrfmwmnon jloa ^rapntcs technology White Paper— October 7993

i EP 0 752 685 A1

3

Direct Xlib vs. Xlib Graphics Libraries

The Xlib interface is the basic low-level 2-D graphics library shipped as part of the XII Window System. The Direct Xlib library is a supplemental product that transparently enhances graphics rendering performance for graphics intensive Xlib applications in the OpenWindows environment. The Direct Xlib library was developed to provide an optimized Xlib interface for applications that typically run locally (client application and X Windows server on the same machine).

The Direct Xlib lib rary allows most X Windows graphics rendering operations to occur directly in the client application process. This helps avoid the overhead associated with the copying of client application data to the X Windows request buffer, client/server context switching, and server parsing and processing of the request buffer for these operations. The key to the Direct Xlib library is the use of Sun's (DGA) technology, which provides a synchronization mechanism between the Direct Xlib library and the OpenWindows server. The principal benefit in using the Direct Xlib library is rendering performance. In general, most applications experience two times the rendering performance using the Direct Xlib library versus the Xlib interface. Use of the standard Xlib library is appropriate when graphics performance is not a factor, and when the application must be statically linked. Direct Xlib ships with every copy of the Solaris operating system and takes full advantage of the SPARCstation 10SX graphics accelerator.

bFARUtatwn WSX Software 33

0 EP 0 752 685 A1

5

10

15

20

25

30

35

40

34 SPARCstation WSX Graphics Technology White Paper— October 1993

50

41 bP 0 /bZ b8b Al

Summary 4 EE

loaay s desktop computing environment is becoming the focus of converging technologies supporting window systems, video, imaging, CAD, multimedia and other rendering-intensive applications. While users today typically run several graphics or imaging programs, the display hardware products available are generally limited to one, or a few areas of graphics acceleration. Only by using expensive graphics accelerators do users overcome the limitations of their computers and realize the potential of their applications.

The SPARCstation 10SX graphics technology represents a fundamental breakthrough in delivering optimized imaging acceleration balanced with geometry rendering. By employing a closely coupled CPU-to-memory controller design, the SPARCstation 10SX system is ahle to boost performance of all rendering operations, whether they are geometry, pixel, or raster image oriented.

The key benefits of such an architecture include lower cost per seat across the applications spectrum, from MCAD, EDA and GIS markets to color publishing, imaging and visualization. By using the host system's dynamic memory for raster manipulation, the SPARCstation 10SX platform not only outperforms other 24-bit systems, but does so at lower cost.

The SPARCstation 10SX technology represents a clear industry direction in embedded graphics design. In the future, it will become commonplace to use the CPU for the simultaneous processing of floating-point data and non- rendering application code while a dedicated processor manages rendering directly to non-display memory.

15 EP 0 752 685 A1

5

10

15

20

25

30

35

40

45 36 SPARCstation 10SX Graphics Technology White Paper— October 1993

50

43 References

juiuju aul nejerence MflriHai, aunoOtt inc. Solaris XGL Programmer's Manual, SunSoft Inc. Solaris XIL Imaging Library Reference Manual, SunSoft Inc. Solaris XIL Imaging Library Programmers Guide, SunSoft Inc. Xlib Reference Manual, O'Reilly <5i Associates. Inc. Xlib Programming Manual, O'Reilly ic Associates. Inc. Solaris Open Windows User's Guide, SunSoft Solaris Open Windows Programmers Guide, SunSoft X Window System User's Guide, O'Reilly & Associates. Inc. Protocol Reference Manual, MIT X Consortium, Public Review Draft Version 4.1 .2

4n Introduction to Computer Graphics Concepts, Addison-Wesley, 1991 Solaris LIVE Technical White Paper, SunSoft, 1993 Solaris XGL Technical White Paper, SunSoft Inc., 1993 Solaris VISUAL Technical White Paper, SunSoft Inc., 1993 Solaris XIL Imaging Library Technical White Paper, SunSoft Inc., 1993

7 EP 0 752 685 A1

10

15

20

25

30

35

40

45 38 SPARCstation 10SX Graphics Technology While Paper— October 1993

50

45 EP 0 752 685 Al

Glossary

LOiornup A color lookup table that stores a set of RGB values. Applications index into the colormap to get the values to drive the red, blue, and green guns of an RGB monitor. Compressed Image Sequence (CIS) Compressed Image Sequence. The XIL library's compressors store (generally related) compressed video frames in structures called CIS buffers. The images may represent frames in a movie, pages in a document, and so on. The data in the image sequence may or may not have undergone compression. If the data is compressed, it may be in Cell or JPEG. The XIL library provides functions that enable users to select frames in a buffer, decompress the frames, store frames into the buffer, and get information about the stored frames. compression The process of converting data from its original format to a format that requires less space. Compressed data uses less storage than uncompressed data and can be transmitted more quickly; however, in most instances, once data has been compressed, that data is not completely recoverable. Image data can be compressed from one-tenth to one-fiftieth its original size without visibly affecting the quality of the image. convolution The process whereby the product of the integral of two digital signals is computed to produce a third signal.

!9

6 EP 0 752 685 A1

Convolution Kernel A filter function used in a convolution. Decompression The restoring of data that has undergone compression to its original state, or to something close to its original state. How closely the decompressed data matches the original data depends on the compression algorithm used.* Display Image The XIL library treats displays as special images, and operations allow display images to serve as destinations. This strategy enables an operation to draw on a screen directly, without requiring an intermediate copy. In some cases, display images can also serve as source images. Dithenng The process of simulating intermediate colors or shades of gray by altering the size, arrangement or shape of background pixels. Full-motion Video The showing of a series of related digital images at a rate sufficient to give the illusion that objects in the images are moving naturally. With television, the rate of 30 frames of video per second is used to create this illusion, but rates as low as 20 frames per second are still effective. Indexed color The distinction between true color and pseudocolor has to do with the design of a monitor's frame buffer. If the frame buffer uses 8 bits per pixel to store color information, the monitor can display 256 colors simultaneously. What you see on such a monitor is called pseudocolor because the colors that can be shown at any one time are a small subset of the colors the eye can distinguish. If the frame buffer uses 24 bits per pixel to store color information, your monitor can display over 16 million colors (true color). Pseudocolor is sometimes called indexed color because the values stored in the frame buffer on a pseudocolor system are not the RGB values needed to drive the red, green, and blue electron guns in a monitor. Rather, they are indexes into a colormap, or color lookup table, which stores 256 sets of RGB values. Interpolation A way of estimating a value that falls between other, known values. In image processing, interpolation frequently plays a part in geometric operations such as warping. After that type of spatial transformation, pixel locations in the output image will correspond to non-integer coordinates in the input image. Therefore, the pixel values in the output must be estimated by looking at the

SPARCstation 1QSX Graphics Technology White Paper— October 1993

17 EP 0 752 685 A1

values of the pixels surrounding the point of interest in the input. The XIL library supports several types of interpolation, including nearest neighbor, bilinear, and bicubic interpolation. JPEG Joint Photographic Experts Group. A joint venture of the CCITT and ISO that developed a standard for compressing grayscale or color still images. Actually, the standard defines a number of methods for compressing images. Several of these are lossy methods based on the Discrete Cosine Transform (DCD, but one method is lossless and is based on Differential Pulse Code Modulation (DPCM). JPEG compression can be used for moving images, but does not take into account the frame-to-frame repetition of moving images. MPEG, which is designed for use with moving images, is more computationally expensive than JPEG, but provides more compression. Lookup Table (LUT) A table, indexed linearly, containing reference values for a displayed image. Also see Colormap. Luminance The portion of a composite signal that carries brightness information. For example, luminance information is contained in the Y component of a YUV signal. (U and V are the color values.) Video compression techniques take advantage of the fact that the human eye is more sensitive to variations in luminance that it is to variations in color (chrominance). Therefore, chrominance values can be compressed (with lossy techniques) more than luminance values, resulting in greater overall compression. MJftb See JPEG. Pixel Pixel element. In a raster grid, a pixel is the smallest unit that can be addressed and given a color or intensity. PostScript PostScript is a page description language developed by Adobe Systems, Inc., for use in specifying the visual appearance of printed documents. Since PostScript's inception, it has evolved a companion product known as Display PostScript (DPS), to drive widely varied-resolution displays and to coexist with the X Window System.

Glossary 41

18 bP 0 /bZ b8b Al

DCUUUCUIUI The distinction between true color and pseudocolor has to do with the design of a monitor's frame buffer. If the frame buffer uses 8 bits per pixel to store color information, the monitor can display 256 colors simultaneously. The term pseudocolor is often used because the colors that can be shown at any one time are a small subset of the colors the eye can distinguish. By judiciously mixing these colors, the perception of more colors can be achieved (see dithering). When a frame buffer uses 24 bits per pixel, the monitor can display over 16 million colors (true color). RGB A color model in which colors are built by mixing the three additive primary colors of red, green, and blue. In this model, you construct grays by including equal amounts of each primary: (0,0,0) is black and (1,1,1) is white. The RGB color model is closely associated with color CRT monitors because they use this model to produce their colors. Thresholding An image-processing technique used to convert a single-band image to a bivalued image. Specifically, the operation sets all pixels in the input image that fall between a low threshold and a high threshold to 1 in the output image. All other pixels in the output image are set to 0. This technique is useful for creating images that can serve as regions of interest in later operations. XGL XGL is a platform programming interface for geometry-based graphics. It provides broad functionality for 2-D and 3-D graphics ranging from vector and polygon primitives to lighting and shading, texture mapping, NURBS, depth and cueing other features. As a platform interface, XGL is the basis for many third-party geometry-oriented graphics APIs wishing to access underlying graphics hardware. XJL XIL is a platform programming interface for imaging and video support. It provides a common implementation of imaging functionality that is common to multiple higher-level interfaces, provides imaging capabilities that are not currently available, and provides a way for ISVs to access low-level and hardware functionality.

ir/iALStatioii iUjX. Uraplucs technology WhitePaper — Octoberl993

4 EP 0 752 685 A1

YIQ A color model used in the NTSC (USA) television format. Each color is represented by a combination of three components: Y, I, and Q. YIQ is a recoding of RGB for transmission efficiency and for downward compatibility with black-and-white television. YUV A color model used in the PAL (European) television format. Each color is represented by a combination of three components: Y, U, and V. The Y component represents the luminance, or brightness, of the color, and the U and V components carry 1.3Mhz chrominance information.

Glossary 43

SO EP 0 752 685 A1

10

15

20

25

30

35

40

45 44 SPARCstation 10SX Graphics Tcchnolog-j White Paper— October 1993

50

51 EP 0 752 685 A1

Sun Microsystems Computer Corporation

or U S Sale Office loolioni. ulL 80082M643 Auunlii: (021 413 2666 Honj Kong: 852 802 4 1 88 New 2,.Und (041499 2344 Tii-.r. 2-5U-CS6? ^ Olilomu: 8O0 82M642 Belgium. -32 2 759 38 1 1 li.ly CD9 60551 Nordic Counine..46 <01 8 623 90 CO UK: 0276 20444 C^nidj: 416 477.6745 |«p.n: ICO) 3221.7021 PRC BbI-831-5568 Either* tlx the world c F,r,l.r.d.-358-O-SO22700 korei: 823-563-8700 Suij.fon: 7J4 3388 Corner.,, H«.dqu.rl.r. France. (1)3067 50 00 Litin Ajneho: 415 688-9464 Sp#m (911 5551648 415 960-1300 Cenn.n,: (0189-16 00 8-0 T*e NelherUndi 033 501234 S~«KrUr.d 1011 825 71 11 liUcrconunenul Sain 415 688-9000

FILE E 435 9 IRH

APPENDIX B

THIS DOCUMENT IS REFERRED TO IN THE ACCOMPANYING APPLICATION AND IS INCORPORATED IN THAT APPLICATION BY REFERENCE.

THIS DOCUMENT IS TO BE PLACED ON THE EUROPEAN PATENT OFFICE FILE FOR THE ACCOMPANYING APPLICATION SO THAT IT IS AVAILABLE FOR PUBLIC INSPECTION

52 EP 0 752 685 A1

Platform Notes: SPARCstation 1 OSXand

SPARCstation 20 System Configuration Guide

53 EP 0 752 685 A1

0 W4 Sun Microsystems. Inc. 2550 Garcia Avenue. Mountain View, California 94043-1100 U.S. A . 5 All rights reserved I his product anil related dncumentationare protected by copyright and distributed under licenses restricting its use. copying, distribution, and decompilation. No part of this product or related documentation may be reproduced in any form by any means without prior written authorization of Sun and its licensors, it anv. Portions os' this product may be derived from the UNLXSand Berkeley 4 3 BSD systems, licensed from UNIX Svstem Laboratories. Inc. an J the University of California, respectively. Third-parly foni software in this pnxlurt is protected bv copyrightand licensed irom Sun's Font Suppliers. REST KICIED RIGHTS LEGEND: Use, duplication, or disclosure by the United States Government is subject to the restrictions set forth In DFARS 252.227-7013 (cidKU) and FAR52.22M9. Iti'? product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending applications 1 RADEM AKkS 15 Sun. Sun Microsystems, the Sun logo, Sun Microsystems Computer Corpora tion, the SMCC lego, SunSoft, the SunSoft logo. Solaris. SunOS, Opei\Windows,XOI..,XII... DeskSet.ONC.and NT'S are twrtenurks or registered trademarks of Sun Microsystems, inc. UNIX and OPEN LOOK are registered trademarks of UNIX System Laboratories, Inc All other product names mentioned herein are the trademarks of their respective owners. All SPARC trademarks, including the SCDCompliant Logo, are trademarks or registered trademarks ot SPARC Internationa!. SPARC station, SPA KCserver, SI'A KCengine. Sl'AKCu-orks. and SI'AKCoinpilerare licensed exclusively to Sun Microsystems. Inc. Products bearing SPARC trademarks are based upon an architecture developed bv Sun Microsystems, Inc. I he OPEN LOOK-.vand Sun™ Crjpiiica! User Interlaces were developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xeros in researching and developing the concept of visual or graphical user interface* for the computer industry Sun holds a turn exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun's licensees who implement OPEN LOOK GUIs and otherwise, comply with Sun's written license 25 agreements. X Window System is a trademark and product of the Massachusetts Lnstilute of Technology. THIS PUBLICATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. EITHER EXPRESS OR IMPLIED, INCI.UDt.NC. PUT NOT UMITED TO. THE IMPLIED WARRANT IIS 01- MERCHANTABILITY, FITNESS I;OR A PARTICULAR PURPOSE. OK NON-INFRINGEMENT 30 I H IS PUBLICATION COULD INCLUDE TECHNIC AL INACCURACIES OR TYI'OCRAI'H ICAI. ERRORS. CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN; THESE CHANGES WILL DE INCORPORATED IN NEW EDiTIONS OF THE PUBLICATION. SUN MICROSYSTEMS, INC. MAY MAKE IMPROVEMENTS AND/ OR CHANGES IN THE PRODUCT'S) AND/ OR THE PROGRAMS? DESCRIBED IN THIS PUBLICATION AT ANY TIME.

54 EP 0 752 685 A1

Contents

Preface ix

1. Introduction to Graphics on the SPARCstation 10SX and SPARCstation 20 1-1

2. Reserving DRAM for SX Accelerated Applications 2-1 2.1 Introduction 2-1

2.2 Advantages of Using SXDRAM.... 2-2

2.3 When to Reserve SXDRAM 2-2

2.4 Calculating SXDRAM to Reserve 2-2

2.5 Configuring SXDRAM 2-3

2.5.1 Memory Bank Layout on the SPARCstation 10SX . 2-3 2.5.2 Memory Bank Layout on the SPARCstation 20 2-7

2.5.3 System Software Constraints for SXDRAM Configuration 2-7

2.5.4 Recommended DSIMM/VSIMM Configuration for the SPARCstation 10SX 2-8

2.5.5 SXDRAM Configuration 2-9

55 tK u baa ai

J. Kunmng upenwindows on the SPARCstation 10SX and SPARCstation 20 3.1 3.1 CG14 Pixel Modes for Running the Window System ... 3-1 3.2 Visuals Supported By Openwindows 3.3 3-2 3.3 False Color Effects 3.3 4. XIL Acceleration on SX 4.1 4.1 SX XIL Features 4.] 4.2 Molecules 4.5 5. XGL 3.0.2 Accelerator Guide for SX 5-1 5.1 Overview 5.] 5.2 X Visuals 5.2 5.2.1 XGL_3D_CTX_JITTER_OFFSET 5-3 5.2.2 SIGFPE 5-3 5.2.3 XGL_CTX_PICK_APERTURE 5-3 5.2.4 XGL_DEV_COLOR_TYPE and XGL_DEV_REAL_COLOR_TYPE 5-4 5.3 Antialiasing 5,4 5.4 Performance Considerations 5.4 Boot Messages ^.j Cr U DOS M I

laoies

' *■ 1 ~»[ ruwjmuuii iu3A system Memory Layout 1 6 MByte DSIMMs on ly 2-5 Table 2-2 SPARCstation 10SX System Memory Layout 1 4 MByte VSIMM, 7 16 MByte DSIMMs 2-5 Iable 2-3 SPARCstation 10SX System Memory Layout 1 4 MByte VSIMM, 7 64 MByte DSIMMs 2-6

Iable 2-4 Comparing Slot Locations on the SPARCstation 10SX and SPARCstation 20 2-7 EP 0 752 685 A1

•laiform \otrt: W*.Ri.sia}ic:i Ti/S\ and sPAj- lstaiicm 20~» November 1494-

tigures

2-1 Mother igure Memory Layout on Board of SPARCstation 10SX ... 2-4

8 EP 0 752 685 A1

10

15

20

25

30

35

40

45 <••:» Platform Notes: SPAROtathn WSX and SPAItCsiatian 20—NtKY.ir.bcr !9s>4

50

59 tK U /OZ boo Al

rrejace

mis manual, riarjorm notes: irVLKLsfaiicm WSX and SPARCstation 20, describes the machine-dependent functionalities of the Solaris® 2.x graphics and window system APIs (Application Program Interfaces) such as XGL™(2-D and 3-D Graphics Library), XIL™ (X Imaging Library), and Xlib, as related to the SX video subsystem. In addition, this document describes configuring and tuning the SX video subsystem to enhance the performance of the applications using the XGL and XIL APIs.

This document should be used as an addendum to the Solaris 2.x document set and the SPARCstation WSX Hardware Owner's Guide or SPARCstation 20 Hardware Owner's Guide.

Who Should Use This Book

This book is intended for developers who want to rune the SPARCstation™ 10SX or SPARCstation 20 video subsystem for using OpenWindows™, XGL, or XIL applications.

How This Book Is Organized

Chapter 1, "Introduction to Graphics on the SPARCstation 10SX and SPARCstation 20", gives a brief description of the SPARCstation 10SX and 20SX. Cr U DOS M I

3 ixcscivuig i^iv^vivi ror OA Accelerated Applications and 20SX", discusses issues pertinent to configuring the SPARCstation 10SX to enhance Sun Pixel Arithmetic Memory processor (SX) accelerator performance. Chapters, "Running OpenWindows on the SPARCstation 10SX and SPARCstation 20 and o 20SX", discusses the visuals that are present when . running OpenWindows on the SPARCstation 10SX. Chapter 4, "XIL Acceleration on SX", covers commonly-used XIL functions which have been accelerated on the SX.

5 Chapter 5, "XGL 3.0.2 Accelerator Guide for SX", discusses the operation of XGL 3.0.2 on the SX.

Appendix A, "Boot Messages", shows messages displayed on the SPARCstation 10SX or 20SX during the boot process following SXDRAM configuration.

What Typographic Changes Mean

The following table describes the typographic changes used in this book.

ypeface or ymbol Meaning Example aBbCcl23 The names of commands. Edit your .login file. files, and directories; Use Is -a to list all files, on-screen computer output raachine_name» you have aBbCcl23 What you type, contrasted machine_namet su with on-screen computer Password: output aBbCcl23 Command-line placeholder: To delete a file, type cm filename. replace with a real name or value aBbCcU3 Book titles, new words or Read Chapter 6 in User's Guide. terms, or words to be These are called class options, emphasized You must be root to do this

Platform .V«.Vs: SPARC

5 Shell Prompts in Command Examples

The following table shows the default system prompt and superuser prompt for the C shell, Bourne shell, and Kom shell.

Table P-2 Shell Prompts Shell Prompt C Shell prompt machine namet C shell superuser prompt machine name! 15 Bourne shell and Konrt shell $ prompt Bourne shell and Kom shell ♦ superuser prompt

20

25

30

35

40

45 Prijut:,:

50

62 EP 0 752 685 A1

10

15

20

25

30

35

40

45

50

63 EP 0 752 685 A1

Introduction to Graphics on the SPARCstation lOSXand

SPARCstation 20 1 =

The SPARCstation 10SX is a variant of the SPARCstation 10. The critical architectural difference between the SPARCstation 10 and the SPARCstation 10SX is the video subsystem. The SPARCstation 10SX integrates the graphics/imaging accelerator into the system memory controller. This assembly is referred to as the Scalable Memory Controller (SMC). SMC is an integer vector processor which is used for graphics and imaging acceleration. The accelerator renders directly into DRAM or video RAM. The SMCC official product name for the graphics/imaging accelerator is SX.

All SPARCstation 20 machines have the SX graphics/imaging accelerator.

The video RAM (here referred to as the frame buffer) for the SPARCstation 10SX is integrated into the system main memory address space. It is available on a video SIMM card (VSIMM) in two configurations: • With 4 MBytes of video RAM • With 8 MBytes of video RAM.

This frame buffer offers true color functionality. The video SIMM by itself functions as a dumb frame buffer. The acceleration when rendering to the video memory is provided by the SX imaging and graphics accelerator. The SMCC official product name for the frame buffer in SPARCstation 10SX and SPARCstation 20 workstations is cgfourteen.

64 EP 0 752 685 A1

5

10

15

20

25

30

35

40

Phtjonr. fVo'.-s,; SPAROhtion 1G?X and SPAROtuthr. 20- Norsmber 1 Sli

50

65 EP 0 752 685 A1

Reserving DRAM for SX Accelerated Applications 2 EE

One of the performance enhancements for SX applications is the availability of physically contiguous DRAM. Physically contiguous DRAM for SX will be referred to in this document as SXDRAM. This document describes: • The process of configuring SXDRAM for exclusive 'use by the SX accelerated applications. • The appucation context in which SXDRAM is used • The advantages of using SXDRAM

SX provides acceleration of the graphics and imaging segments of applications that run on a SPARCstation 10SX or SPARCstation 20 workstation. Acceleration can be used for a wide range of pixel operations, including 2D and 3D graphics rendering, multimedia, and image processing. The SX accelerator, built into the SMC, can directly accelerate operations on both the system main memory (DRAM) and the video memory (VRAM). The SMC is comprised of:

1. An error-correcting code memory controller which interfaces with both the system main memory (DRAM) and the video memory (VRAM; the frame buffer) to the system memory bus. 2. The SX imaging and graphics accelerator.

36 EP 0 752 685 Al

Z.l Advantages of Using bXuKAM

As a configuration option, you can reserve SXDRAM. When SXDRAM is reserved, the SX has additional optimizations available to it when accessing SXDRAM, and operations on SXDRAM execute more quickly. The reserved memory, however, is then not available for use by other applications. For example, on a 4S-megabyte system, allocating 16 megabytes of SXDRAM means that the system will in effect run as a 32-megabyte system.

2.3 When to Reserve SXDRAM

/\ Caution - The memory reserved for SXDRAM will not be available for system /f\ use. When reserving SXDRAM, consider the amount of memory left for A system use. Ensure that there is sufficient memory left for system use that system performance is not adversely affected.

Reserving daUKAM can improve the performance of an application that uses the foundation libraries XJL or XGL. The default configuration is to use no SXDRAM. XGL uses SXDRAM for Z buffers and double-buffering. Typically, 8 MBytes of SXDRAM must be reserved if both Z-buffering and double-buffering are used. 4 MBytes must be reserved when Z-buffering is used or when double-buffering is used alone. XIL uses SXDRAM for table lookup operations and for image rotation operations. For table lookup operations, the SXDRAM size varies between 256 byres and 128K bytes. However, the minimum SXDRAM that can be configured is 1 MByte. For image rotation operations, the amount of SXDRAM that must be reserved should be the same as the size of the image, rounded up to the nearest integer multiple of 1 MByte. For example, a 1200 x 1200 image with four 8-bit channels per pixel will fit in 5.493 MBytes, requiring 6 MBytes of SXDRAM.

I A Calculating SXDRAM to Reserve

To calculate the amount of SXDRAM to reserve, add up the individual requirements and round up to the next multiple of 1 MByte. The individual requirements are:

la '.term <\fl.vs: i>f/iiiL.

7 EP 0 752 685 A1

5 1. The XGL Z buffer requires 4 bytes per pixel. The XGL back buffer is required only for RGB double buffering, and is also 4 bytes per pixel. For example, if the application uses an animated 24-bit Z-buffered raster that is limited to 900 x 900 pixels, then the SXDRAM requirement is 900 x 900 x 8 = 1U10 6480000 <; 7 MBytes (1048576 x 7 bytes). 2. The XIL lookup tables require 128K bytes of SXDRAM per table used. For operations such as image rotation, enough SXDRAM to store the entire source image should be reserved.

15 2.5 Configuring SXDRAM

This section lists the steps to follow in order to configure SXDRAM. It also discusses the constraints imposed by the system software and hardware. It is essential that you understand the system memory map before you configure 20 SXDRAM.

Note that there are some key differences in the way memory is arranged on the SPARCstation 10SX and the SPARCstation 20: * The physical sequence of slots is different 25 • The slots that can use VSIMMs are different Information specific to the SPARCstation 20 is provided in Section 2.5.2, "Memory Bank Layout on the SPARCstation 20," on page 2-7. To plan SXDRAM configurations for those systems, take this information into account when the in the JO30 applying principles explained material covering the SPARCstation 10SX.

2.5.1 Memory Bank Layout on the SPARCstation 10SX There two banks SPARCstation 35 are memory on a 10SX. Bank 0 is comprised of slots 0, 1, 2, and 3. Bank 1 is comprised of slots 4, 5, 6, and 7. These 8 slots are available for configuring memory on the SPARCstation 10SX. Each bank of memory can map 256 MByte of physical address space. Each slot in each memory bank maps 64 MByte of physical address space. 40 The beginning physical address for bank 0 is 0. For bank 1, it is 0x10000000.

45 Reserving DRAM fcr S X AccdcrairJ AppiUvtiai

bU

bb

68 EP 0 752 685 A1

Slots 4 must be configured with a VSIMM; slot 5 may be configured with either a DSIMM or a VSIMM (CG14). The SPARCstation 10SX supports 16 MByte and 64 MByte DSIMMs, and 4 MByte and 8 MByte VSIMMs. Each slot maps 64 MByte of physical address space regardless of the size and type of SIMM that is configured in the slot. Figure 2-1 Memory Layout on Mother Board of SPARCstation 10SX

Front SIMM slots: Physical Address: 0 [=□ SIMM CLTE slot 7 DSIMM Ox1cOCXK)00 slots slot 3 DSIMM OxcOOOOOO slot 6 DSIMM 0x18000000 slot 2 DSIMM 0x8000000 MBus slot 1 I SBus slot 1 SBus slot 3 MBus slot 0 SBus slot 0 I | SBus slot 2 slot 5 DSIMM/VSIMM 0x14000000 9 slot 1 DSIMM 0x4000000 slot 4 DSIMM/VSIMM 0x10000000 slot 0 DSIMM 0x0

Note: VSIMM VRAM physical addresses 0U start with T in bits <31 -28> CD rm rm ED rrn

Rear

P!a'.f,>rrt Nnfrs: SPARC •ion 10SX and SPAROiuiior. 20-- Nsivrr.b,-- ?.c94

69 Table 2-1 below illustrates the physical address map of a system configured with 16 MByte DSIMMS in all the slots

Table 2-1 SPARCstation 10SX System Memory Layout 16 MByte DSIMMs only SIMM Slots DSIMM Size Physical Address Slot 7 16 MByte DSIMM OxlcOOOOOO Slot 3 16 MByte DSIMM OxcOOOOOO Slot 6 16 MByte DSIMM 0x18000000 Slot 2 16 MByte DSIMM 0x8000000 Slot 5 16 MByte DSIMM 0x14000000 Slot 1 16 MByte DSIMM 0x4000000 Slot 4 16 MByte DSIMM 0x10000000 Slot 0 16 MByte DSIMM 0x0 larjie t-i oeiow illustrates tne physical address map of a system configured vith one 4 MByte VSIMM installed in slot 4 and 16 MByte DSIMMs in the ^maining slots.

-able 2-2 SPARCstation 10SX System Memory Layout 1 4 MByte VSIMM, 7 16 MByte DSIMMs 5IMM Slots DSIMM/VSIMM Size Physical Address lot 7 16 MByte DSIMM OxlcOOOOOO lot 3 16 MByte DSIMM OxcOOOOOO lot 6 16 MByte DSIMM 0x18000000 lot 2 16 MByte DSIMM 0x8000000 lot 5 16 MByte DSPMM 0x14000000 lot 1 16 MByte DSIMM 0x4000000 lot 4 4 MByte VSIMM OxfOOOOOOO lot 0 16 MByte DSIMM 0x0

J EP 0 752 685 A1

Thus, on systems configured with 16 MByte DSIMMS, the maximum size of a physically contiguous block of DRAM is 16 MByte. However, you can reserve multiple blocks of SXDRAM on such systems. In order to be able to configure a single block of SXDRAM greater than 16 MByte, the system must be configured w with 64 MByte DSIMMs. Table 2-3 illustrates a system configured with one 4 MByte VSIMM and seven 64 MByte DSIMMs. Table 2-3 SPARCstation 10SX System Memory Layout ,s 1 4 MByte VSIMM, 7 64 MByte DSIMMs bIMM Slots D5IMM/V51MM Size Physical Address Slot 7 64 MByte DSIMM OxlcOOOOOO Slot 3 64 MByte DSCvlM OxcOOOOOO 20 Slot 6 64 MByte DSIMM 0x18000000 Slot 2 64 MByte DSIMM 0x8000000 Slot 5 64 MByte DSIMM 0x14000000 Slot 1 64 MByte DSIMM 0x4000000 25 Slot 4 4 MByte VSIMM OxfOOOOOOO Slot 0 64 MByte DSIMM 0x0

this layout results in one contiguous block of 256 MBytes (slots 0, 1, 2, and 3) beginning at physical address 0, and another block of 192 MBytes (slots 4, 5, 6, und 7) beginning at physical address 0x14000000. Therefore, the maximum imount of DRAM that can be installed in this configuration is 448 MBytes. A typical system will most likely have 16 MByte and 64 MByte DSIMMs, and VSIMMs. There are a large number of possible permutations of the system :onfiguration which, due to space limitations, will not be discussed here. lb be able to allocate the largest possible block of SXDRAM with a given set of VSIMMs and DSIMMs, use the illustrations in this section as a guide. Hie next section provides some information unique to the SPARCstation 20. The two sections following that discuss system software constraints and ronfiguration recommendations that involve both systems.

Platform .Vo.Vs: SPARCttst-on 10SX and ^HAROKitk'n 20 -■ November

1 tr* U /O^ DOO A I

ivicmory nanK Layout on trie SfAKLstation 20

On the SPARCstation 20, the physical sequence of slots is different from that slots on the SPARCstation 10SX. The slots that can be used for VSIMMs differ as well. The different layouts are compared in Table 2-4. Table 2-4 Comparing Slot Locations on the SPARCstation 10SX and SPARCstation 20

Slot Names on Slot Names on SPARCstation 10SX SPARCstation 20 5'ot 7 Slot 0 Slot 3 Slot 2 slot 6 Slot 5 Slot 2 Slot 3 slot 5 (can be VSIMM 1) Slot 6 Slot 1 Slot 1 Slot 4 (can be VSCvLM 0) Slot 7 (can be VSIMM 0) >>°l 0 Slot 4 (can be VSIMM 1) cystCTn software constraints for bXUKAM Configuration The following constraints are described in terms of the SPARCstation 10SX, but the same concerns apply to the SPARCstation 20. 1. The first slot (slot 0) must always be configured with a DSIMM. 2. The minimum recommended amount of memory required for reasonable SPARCstation 10SX performance is 32 MByte. Thus, to be able to reserve SXDRAM, a system should be configured with more than 32 MBytes of DRAM. However, users can configure the minimum amount of memory that be must reserved for system use by using the -1 option of the sxconf ig UM) command. The difference between the amount of DRAM installed on the system and the configured minimum limit (32 MBytes by default) is the maximum amount of memory that can be reserved for SXDRAM. 3. The amount of physically contiguous memory that should be reserved must be specified as an integer multiple of 1 MByte. Thus, the minimum amount that can be reserved is 1 MByte. EP 0 752 685 Al

25 A Recommended DSIMM/VSIMM Configuration for the SPARCstation WSX

1. The VSIMM can only be installed in slots 4 or 5 on the SPARCstation 10SX. If there is only one VSIMM, it can be installed in either slot 4 or 5. To install the VSIMM in slot 5, an AVB (Auxiliary Video Board) card is required. This card is not bundled with the SPARCstation 10SX.

2. Always install a 16 MByte DSIMM in slot 0 when you have a combination of 16 MByte and 64 MByte DSIMMs. 3. If the memory system consists only of 16 MByte DSIMMs. They can be configured in any slots, provided that the first 16 MByte DSIMM is installed in slotO.

4. Within a memory bank, always install the DSIMMs in the order of decreasing DSIMM sizes (the ordering does not matter if all the DSIMMs are of the same size). In other words, if there is a combination of 64 MByte DSIMMs and 16 MByte DSIMMs, install the 64 MByte DSIMM in the lowest- number slot, followed by the 16 MByte in the immediate next slot. When configuring the memory subsystem with 64 -MByte DSIMMs and 16 MByte DSIMMs, the following examples can be used as a guide: System configuration: 1 VSIMM, 2 16 MByte DSIMMs, 2 64 MByte DSIMMs Can be configured as: 1 16 MByte DSIMM in slot 0 1 16 MByte DSIMM in slot 7 1 64 MByte DSIMM in slot 6 1 64 MByte DSIMM in slot 5 1 VSIMM in slot 4

1 16 MByte DSIMM in slot 0 1 16 MByte DSIMM in slot 3 1 64 MByte DSIMM in slot 2 1 64 MByte DSIMM in slot 1 1 VSIMM in slot 4 or 5

-'•"/•'rm ,\o:.-i: if/.KUCfim 7(/.-A and bPARCflatwn 2G-\!srcrr.bgr 7594

3 cr U ID£ DOO M I

The operating system includes a driver for reserving and managing physically contiguous memory. The memory should be reserved as part of the boot process, because it is likely to be the least fragmented at this time, and chances of finding large blocks of physically contiguous memory are higher during boot time.

The amount of SXDRAM to reserve can be specified by using the sxconf ig(lM) command, sxconf ig can be executed only by a process with superuser privileges. Here are some examples of sxconf ig command use. To disable fragmentation, enter

^ an (.uiiuguranon paramerers to tne default values, enter

■y ut.ouu, u muyica ui pnysicauy contiguous memory is reserved, is ragmentation not allowed, and 32 MBytes of memory is reserved for system se.

b display the Current configuration parameters in the driver configuration file nter: °

, s_

n« nut uoouea arrer tne last time the configuration parameters 'ere changed, then the displayed values will not reflect the actual set- For system p. more information about using sxconf ig, refer to the on-line man age. EP 0 752 685 Al

I he sxconf ig command resides in the directory /usr/kvm; the shell environment variable PATH must include this directory. To find out whether the PATH environment variable includes the /usr/kvm directory, type:

Tour search path will be displayed. An example:

/Din : /Kvra : / etc/ : /uar/bin :

li the line displayed does not include /usr/kvm, enter the following if you are in either the Bourne shell or the Kom shell:

f fATH«^ATH:/u3r/Xva export PATH

lonowea Dy:

i axpoxx fATH

it you are in the Bourne shell. If you are in the C shell, enter:

t Jot«OT PATE 5 PATH /usr/kvm

it 16 MBytes of memory must be reserved, enter:

r sxcoczig —a lb

On a system configured with 16 MByte DSIMMs, the maximum amount of SXDRAM that can be reserved in a single block is 16 MBytes. On such systems, when more than 16 MBytes of memory must be reserved for

-«;;.«■« ,\e:a: Stv.rfCjstet.-wi 7G.-X and SPARC

b EP 0 752 685 A1

1

SXDRAM, the sxconf ig command can be used to specify that fragmented reservation of the requested amount of SXDRAM is allowed. For example, to reserve 32 MBytes of memory on a system configured with 16 MBytes, enter:

♦ sxconfig -3 32 -f sxconf ig and reboot causes a search of the system page pool for a contiguous block of memory of the specified size. If the block of memory is found, it is reserved. If fragmentation is specified (as shown above), more than 16 MBytes is specified, and the search fails, the operating system searches for contiguous blocks of 16 MBytes. If no blocks of this size are found, the operating system searches for contiguous blocks of 256 KBytes. When the SXDRAM configuration is finished, halt the system:

» halt

The Open Boot PROM prompt is displayed on the console:

ok

Boot the system by entering:

ok boot disk -rv

The -r option specifies a reconfiguration boot. The -v option specifies verbose mode. As part of the boot process, the requested amount of SXDRAM will be reserved. Refer to Appendix A, "Boot Messages" for a listing of the messages that will be displayed. After the system is rebooted, log in, start OpenWindows, and start the XGL or XIL application of your choice.

Ressrvir.z DRAM for SX AcccUraiod Apri'u v:i-.?ns 2-11

76 EP 0 752 685 A1

5

10

15

20

25

30

35

40

Platform Nirf.v SPAROlstion 1GSX and SPAROlntitm 20~~\::

50

77 bK U fbZ bBb Al

Running Open Windows on the SPARCstationlOSXand

SPARCstation 20 3 =

inis cnapter discusses tne visuals mat are present when running OpenWindows on the SPARCstation 10SX and SPARCstation 20.

CG14 Pixel Modes for Running the Window System

The cgfourteen frame buffer is configurable to scan out either 8-bit pixels or 32- bit pixels. This allows the cgfourteen to be used in high resolution modes. For example, you can configure a 4MByte cgfourteen connected to a multi-sync monitor to display 8-bit pixels at 1280x1024 resolution with the command:

■u3i-/KTrm/cgj.«C0HIig -r 12S0xl024g66

™nen tne system is reDooted the monitor displays at the new resolution. Since IMBytes is insufficient memory to have 32 bits per pixel, invoking Dpen Windows will automatically select 8-bit pixels only. (he same hardware, when configured to display at 1152x900 resolution with he command:

u3r/Kvwcgi4conrig -r 1152x900876

sail auow upen windows to use 61 bits per pixel, after rebooting.

• 1

5 bH 0 /bZ bBb Al

in ootn modes, tne lett-over VKAM not displayed on the screen is utilized by the window system for pixmap allocation. It is possible to use the frame buffer in 8-bit pixel mode even when there is sufficient VRAM for 32-bit pixels. There is a significant performance improvement when the frame buffer is in 8-bit pixel mode. To force the pixel mode, put the verb pixelmode="8" in the OWconf ig file used by the server. The OWconf ig file is typically in /usr/openwin/server/etc. A complete entry with this in the file would look like:

:la33-"XSCREEN" name-"SU*T«cgl'l" ddxHandler-"ddxSU>rWcgl4 . so . 1" ddxInitFunc-"sunCG14Init" pixelmode- "3"; s.l visuals supported by Upenwindows 5.3

When the window system runs in 8-bit mode, it exports the same visuals that are exported by Openwindows 3.3 on other 8-bit frame buffers: * 8-bit StaticGray * 8-bit Grayscale * 8-bit StaticColor * 8-bit Pseudocolor * 8-bit TrueColor and * 8-bit DirectColor.

Only one hardware color lookup table is available to be shared by all Xll colormaps.

In 32-bit mode, the server supports a 24-bit TrueColor visual, in addition to all of the visuals present in 8-bit mode. When the server is started with the following option:

usr/ opan»in/Bin/ openmn -dav /dav/fbs/cgf ourtaenO def depth 8

wrrcrm ,\a;rs: ifAKi..

3

the default visual, in which the root window is created, is an 8-bit Pseudocolor visual. When the following option is used:

/uar/opanwin/Din/oponwin -dev /dev/Cbs/cgf ourteonO deffdepth 24

tne aetauit visual is a 24-bit TrueColor visual.

3 .3 False Color Effects

The phenomenon of seeing the wrong colors in a window because another Xll colormap is installed in the hardware is called false color. The best way to avoid false color is to use a TrueColor visual. Since all 32 bits are available for TrueColor visuals, the colors always show up correctly. The SX hardware renders 24-bit visuals with the same speed as it renders 8-bit visuals, so there is no performance penalty when using 24-bit visuals. In 32-bit mode the StaticGray visual has its own dedicated hardware color lookup table (actually a linear ramp). Hence StaticGray windows in 32-bit mode will never cause other 8-bit windows to appear incorrectly.

0 EP 0 752 685 A1

5

10

15

20

25

30

35

40

Ptutfrrm ,\'n!,v SPAROtxtiun lOSXand •iPARC.'aHon 2G—Novr.

50

81 XIL Acceleration on SX

mis Appendix covers commonly-used xjl functions which have been accelerated on the SX for byte and short images. There are certain restrictions which have been placed on the functions in order for acceleration to occur. If the function is not accelerated on SX, it is performed using the memory driver of XIL.

Two functions, xil_lookup (3) and xil_rotate (3), require contiguous memory (up to 128k bytes) for lookup, and the size of the source image for xil_rotateO-

The following functional features are supported for the XIL port to SX. All other parameters will cause an XIL_FAILUHE to be returned, prompting the memory code to finish the operation. Note that rotate and lookup require SXDRAM (for which you'll need over 32MBytes on your machine.)

SX XIL Features

Dyadic functions: • 13,4 banded byte images. • 3 banded child of 4 banded byte image • child images with x,y offsets ok; no band offsets EP 0 752 685 Al

' full Support tor KUlS

Byte XIL Function By^e Short Child

1 band 3 bands 4 bands n bands 3/4 xil_add x x x x x xil_and x x x x x xil_multiply x x x x x xil_or X x x x x xil_subtract x x x x x xil_xor x x x x x utner functions:! • band short images • 1,3,4 band byte images ■ 3-banded child of a 4-banded image • child images w/ x,y offsets, no band offsets ' full support for ROIs

Byte <1L Function Byte Short Child

1 band 3 bands 4 bands 1 band 3/4 cil_add_con3t x * x x x cil_and_const x x x x x cil_divide_const x x x x x :il_multiply_con3t x x x x x :il-not X X X X x :il_or_con3t x x x x x :il_aubtract_cori3C x x x x x :il_subtract_f rom_con3t x x x x x

iw.jc.rm

3 EP 0 752 685 A1

Byte ail runcnon Byte Short Child

1 band 3 bands 4 bands 1 band 3/4 xi±_xor_con3t xil_blend xil_paint xil_3cale xi l_re scale xil_set_value xil_extrema xil_convolve (see N'ote 4 below) xil_rotate (Nearest Neighbor only, no ROD xii_iooicup (see Note 1 below) xi l_color_convert (see Note 2 below) xii_copy (see Note 3 below) xi l_thre shold xil_traxi3poae x xil_tran3late x xil_get_pixel x xil_put_pixel X xil_band_combine (see Note 5 below) x (3- band) xil_decorapre33 (see Note 6 below) xii_cast (see Note 7 below)

\/(, Acceleration en SX 4-1

54 EP 0 752 685 Al

l. ihe following variations of xil_lookup are implemented.

XIL Function From-To Bands xil_lookup 8-8 1 band only xil_lookup 16_16 1 band only xil_lookup 8_16 1 band only xil_lookup 16_8 1 band only xil_lookup 8-24 1 band to 3 bands

2. The following variations of xil color convert are im

t Source Colorspace o Dest. Colorspace rgblinear to rgb709 rgblinear to ycc709 rgblinear to ycc601 rgblinear to ylinear rgblinear to cmylc ycc601 to rgb709 rgb709 to rgblinear rgb709 to ycc601 rgb709 to photoycc photoycc to rgb709 cmyk to rgblinear

3. Copy function also allows the insertion and extraction of one band from 3/4 banded byte and 3-banded short images.

'lut-orm ionics: SPAK'i^laf.on and j?nKOf«!io.i 20—NottmbxT 1934

b EP 0 752 685 Al

4. xii_convolve is implemented for the following kernels with central key pixels:

XIL Function Kernel 10 xil_convolve 3x3 xil_convolve 5x5 xil_convolve 7x7 xil_convolve 3x1/1x3 (molecule) 15 xil_convolve 5x1/1x5 (molecule) xil_convolve 7x1/1x7 (molecule)

o. xi±_dand_combine has been implemented for 3-banded short images. 6. xil_decompress has been implemented for JPEG. 7. xil_cast has been implemented for conversion between 3-banded byte and 3-banded short images.

£.2 Molecules

ine following molecules are implemented:

Function Bands (byles only) 1 3 4 Kil_copy+diaplay x x x icil_rotate+di3play x x x «il_3cale+dLisplay x x x

XII. Acceleration cn SX 4. -\

b EP 0 752 685 A1

5

10

15

20

25

30

35

40

45 *•<» P/«f/.!.'« .VoJ.v Sft*RC«teM

50

87 EP 0 752 685 A1

= XGL 3. 02 Accelerator Guide for SX 5

I his chapter discusses the operation of XGL 3.0.2 on SX. It describes the SX and the implementation of the XGL/SX driver so that you can understand how to use their features most effectively.

lhe bA is a programmable device that accelerates operations on pixels; for XGL, this includes: # Drawing • Dots • Antialiased dots • Lines • Antialiased lines - Spans • Triangles • Operations on areas such as: • Filling • Copying • Accumulation buffering and anything else that involves reading and writing pixel data (color and Z buffer included.) The pixel data can reside either in video ram (the cgl4 frame buffer, VRAM), or in main memory (DRAM.)

8 EP 0 752 685 A1

I he bX is currently used to accelerate all XGL pixel operations except accessing a textured pixel from a texture MipMap. The SX does not support floating point operations. Thus, the transformation, clip checking, clipping, and optionally lighting steps that comprise the 2D and 3D graphics pipeline are done on the CPU. Since the SX runs in parallel with the CPU, typically the CPU will be transforming an object while the SX is rendering the previous object. The SX has a single hardware context. This context is switched among all processes using SX. For example, using Xlib to render pixels via the server's SX driver, then XGL to render pixels via the XGL/SX driver, will cause a delay as the hardware context is switched between the two processes. Running a performance meter, for example, can cause a noticeable glitch in application animation when the SX context is switched between the meter process and the application. Use of Direct X when mixing Xlib and XGL rendering is highly recommended as, typically, no context switch will occur. The same SX context will be shared between the Xlib rendering calls and the XGL/SX driver. Similarly, mixing XIL, XGL and Direct X in the same process will cause no context switches. The frame buffer, cgl4, supports both 8 and 24 bit drawables, and can have both visible simultaneously. If window identifiers have not all been used up by other 8 bit double-buffered drawables, then the cgl4 will use buffer switching for double buffering. Otherwise, the XGL/SX driver uses copy double buffering, using the SX to accelerate the copy. The driver always uses copy double buffering for 24 bit drawables. The Z buffer is stored in DRAM, with one allocated for each XGL raster that has Z buffering .turned on. The SX accelerates Z buffer clearing and comparison. If SXDRAM is available, the XGL/SX driver will use it for Z buffers, and back buffers as well (if double buffering is enabled, and copy double buffering is being used.) SXDRAM significantly improves line-drawing and context- switching performance. Other pixel operations run about 10%-20% faster.

5.2 X Visuals

The XGL/SX driver supports the following subset of the available cgl4 visuals:

Wuifarm Nam SPAROhtion 10SX ami SPARO.'aiior. 2P-- Nu-rmbcr 7=94

9 EP 0 752 685 A1

" e-tut Pseudocolor * 8-bit StaticColor * 8-bit StaticGray • 8-bit Grayscale • 24-bit TrueColor Note that the application must recognize that 8-bit DirectColor and 8-bit TrueColor visuals exist, and be programmed to not use them with XGL (XGL will reject any attempt to make such a visual an XGL raster.) Also, since the window system can come up in defdepth 24 (see the openwin(l) man page), the application should not assume that the root window is of depth 8 with a default colormap available.

XfJL_JD CTX_JITTER_OFFSET

If accumulation buffering (global antialiasing) is intended, a non-zero jitter value must be used. The XGL/SX driver draws Bresenham-style lines in 3D when the X and Y jitter values are exactly zero, and true sampled lines (suitable for accumulation) otherwise. b.2.2 SIGFPE

To maximize performance, zero divides and floating point overflows are allowed to occur in normal operation of the XGL/SX driver. The default is to ignore these exceptions. If the application enables these exceptions, they should be set up to be ignored before calling XGL.

S.2J XGL_CTX_PICK_APERTURE

The SX uses the rasterization semantic for picking. See the Solaris XGL 3.0.2 Reference Manual for a complete description of this semantic.

Accelerator Oinuc for ?X

10 cr U ID£ DOO M I

D.L.-i Abij_Ub v_LULOR_TYPE ana XGL_DEV_REAL_COL0R TYPE The XGL/SX driver supports only matching XGL_DEV COLOR TYPE and XGL_DEV_REAL_COLOR_TY?E; they must both be eitheT XGL_COLOR^_INDEX or XGL_COLOR_RGB. Otherwise, the slower XGL/Xlib driver will be used to render into the window raster. d.d rxniuuiasing

cgl4conf ig(lM) should be used to set the gamma value to a reasonable value for your monitor before viewing antialiased lines and dots. A good invocation to start with is

^uierw^e, me annanasea ODjects will not look right.

D.-i performance L.onsiaerations

. suuuiu uc useu wnere possioie to draw polygons other than triangles.

Mot all rendering functions are equally accelerated by the XGL/SX driver. The greatest effort was focused on maximizing the performance of the most useful 5nes. The following is a necessarily-incomplete list of these. Please note that Unctions not on this list are not slow; they are just not as fast as they could be in tne next release or tne aul/sa driver. It operations that are critical to your application are not in this list, please make a request for their speed to be increased.

xgl_context_copy_raster () From window raster to window raster. 10 xgl_multi_marxer () 2D circles of radii 1 to 32 pixels. xgl_multi_polyline 0 In 2D, thin lines, solid or patterned, containing no color or homogeneous values, and with XGL_CTX ROP equal to XGL_ROP_C0PY. 15 In 3D, thin lines, solid or patterned. containing no color, flag, homogeneous or data values, not model clipped, clipped to +w only, and with XGL_CTX_ROP equal to XGL_R0P_C0PY. 20 xgl_multi_3iinple_polygon () In 3D, triangles. The hint flags must be set to XGL_FACET_FLAG_S IDES ARE 3. xgl_triangle_strip () Triangles that have no homogeneous or data values, are not model clipped, are clipped to +w only, are 24-bit, have edges turned off, are solid and opaque, and have 25 XGL_3D_CTX_Z_BUTFER_C0M? METHOD equal to XGL_Z_C0MP_LESS_THAN_OR_EQu~AL. xgl_context_accumulate () All operations. EP 0 752 685 A1

5

10

15

20

25

30

35

40

?-6 Platform Notrs: SPARCstation WSXanJ '-PAROtatior. 26— November 7.994

50

93 EP 0 752 685 A1

Boot Messages

This Appendix lists messages displayed on the SPARCstation 10SX or SPARCstation 20 during the boot process following SXDRAM configuration. These messages provide information regarding the amount of contiguous memory that has been reserved.

SunOS Release 5.3 Version alpha2.3 [UNIX (R) .System V Release 4.0] Copyright (c) 1983-1993, Sun Microsystems, Inc. pac : enaLbled - SuperSPARC/SuperCache Cpu 0: TI,TMS390Z55 (mid 8 irapl 0x0 ver 0x0 clock 40 MHz) mem - 49152K (0x3000000) avail mem - 41820160 Ethernet address - 8:0:20:13:0:37 root nexus - SUNW, Premie r-2 4 iuuuuuO at root: obio OxeOOOOOOO sbusO at iommuO: obio OxeOOOlOOO espdmaO at sbusO: SBus slot f 0x400000 espO at espdmaO : SBus slot f 0x800000 spare ipl 4 sdO at espO : target 0 lun 0 sdO is /iommuef , e0000000/sbus9f , eOOOlOOO/espdmaef , 400000/esp8f , 800000/sdeo, 0 sd2 at espO: target 2 lun 0 sd2 is /ioitami6f,e0000000/sbu39f , eOOOl OOO/espdmaSf , 400000/espef , 800000/sd82, 0 sd3 at espO : target 3 lun 0 sd3 is /iommu9f , eOOOOOOO/sbusSf , e0001000/espdma@f , 400000/esp9f , 800000/sd33, 0

A-l

94 EP 0 752 685 A1

tiUNUi^i cyl 1131 alt 2 hd 9 sec 80> 3d6 at espO : target 6 lun 0 3d6 is /iomnu@f , eOOOOOOO/sbusSf , eOOOlOOO/espdmagf , 400000/esp9f, 800000/sd86, 0 o Unable to install/attach driver 'isp' root on /iommu8f , eOOOOOOO/sbuae f , eOOO 1000/espdmaaf , 400000/espaf, 800000/sde3, 0 :a fatype ufa obioO at root zaO at obioO: obio 0x100000 spare lpl 12 zaO ia /obio/zaSO, 100000 zsl at obioO: obio 0x0 spare ipl 12 zsl ia /obio/zs80,0 configuring network interfaces : ledmaO at sbusO: SBus slot f 0x400010 leO at ledmaO : SBus slot f OxcOOOOO spare ipl 6 leO ia /ionrntuef , eOOOOOOO/sbusS f , eOOOl 000/ledma8 f , 400010/leef, cooooo leO. Hostname: example dump on /dev/d3k/c0t3d0al size 658S0K Configuring the /devices directory Unable to install/attach driver 'bwtwo' Unable to install/attach driver 'audio' Unable to install/attach driver 'cgthree' 3t4: st4 at espO : target 4 lun 0 3t4 ia /iommu8f , eOOOOOOO/sbusSf , eOQOl 000/espdmaaf , 400000/eapaf, 800000/ste4, 0 Unable to install/attach driver 'iap' SUNW, fdtwoO at obioO: obio 0x700000 spare ipl 11 SUNW, fdtwoO ia /obio/SUNW, fdtwoeo, 700000 Unable to install/attach driver 'egaix' Unable to install/attach driver 'vme' Unable to install/attach driver 'ipi33c' Unable to install/attach driver 'id' Unable to install /attach driver 'vme' Unable to install/attach driver 'vmemem1 abusmemO at sbuaO : SBua slot 0 0x0 abusmemO ia /ioimuS f , eOOOOOOO/sbusa f , eOOO lOOO/sbusmemaO , 0 abusmeml at sbusO : SBua alot 1 0x0 abuameml ia /iommu@f , eOOOOOOO/sbuaaf , eOOOlOOO/sbuamemai , 0 3busmem2 at sbusO : SBua alot 2 0x0 abuamem2 ia /iommu8£, eOOOOOOO/sbusaf, e0001000/3bu3mem32 , 0 3buamem3 at sbuaO : SBua slot 3 0x0

'Mjcrm Now-: SPARCsiation lOSXand SPARQUatior. 20 -November i.994

15 EP 0 752 685 A1

A sbusmem3 ia / iammu8f , e0000000/3bus8f, e0 0 010 00/sbu3meme3, 0 3busmeml4 at abusO : SBus slot e 0x0 3buaneml4 /iomtnu8f , e0000000/abua@f, eOOOlOOO/sbuamerr.ae, 0 sbuamemlS at sbuaO: SBus slot f 0x0 sbusmeml5 is /iommu8 f , eOOOOOOO/ abua3f , eOO 01 000/ abusmema f , 0 Unable to install/attach driver 'xbox' SUNW.bppO at sbuaO: SBus slot f 0x4800000 SBua level 2 spare ipl 3 SUNW.bppO is /iommu9f , eOOOOOOO/ sbusef , eOOOl 000/SUNW, bppSf , 4800000 Unable to install/attach driver 1 pn ' Unable to install/attach driver 'lebuffer' Unable to install/attach driver 'cgeight' Unable to install/attach driver 'ipi33c' SUNW, DBRIeO at sbusO: SBua alot e 0x10000 SBus level 5 spare ipl 9 SUNW, DBRIeO ia /iomxnue i , eOOOOOO 0/abua8f , eOOOl 000/SUNW, D3RIe9e, 10000 MMCQDEC: Manufacturer id 1, Reviaion 1 paeudo-device : volO volO ia /paeudo/vol80 Unable to inatall/attach driver 'xbox'xbox ' Unable to install/attach driver ''vme' vme ' Unable to install/attach driver ''mcp'' mcp'' Unable to install/attach driver ''vme' vme ' Unable to install/attach driver 'mcp' Unable to install/attach driver 'mcpiaa'mcpiaa' ' Unable to inatall/attach driver ''vme' vme ' Unable to inatall/attach driver 'mcp'mcp' ' Unable to inatall/attach driver ''mepp' mepp ' SUNW, axO at root: obio 0x80000000 and obio 0x80001000 SUNW, 3x0 13 /SUNW, ax8f, B0000000 cgfourteenO at obioO : obio 0x0 and obio 0x0 spare ipl 8 cgfourteenO ia /obio/cgf ourteenS 1 , 0 ax_cmea: Installed 112MB Ra served 8MB Fragment 0 Avail Tor System Uao 104MB paeudo-device: sx cmemO ax_asemO is /paeudo/ax_caexs{90 Unable to inatall/attach driver 'ate'3tC Unable to inatall/attach driver 'isp'isp' Unable to install/attach driver 'cgtwelve'cgtwelve ' Unable to install/attach driver 'gt'gt" Unable to install/attach driver 'leo'leo * Unable to in3tall/attach driver 'rtvc'rtvc ' Unable to inatall/attach driver 'tcx'tcx '

36 EP 0 752 685 A1

Configuring the /dev directory Configuring the /dev directory (compatibility devices) The system is coming up. Please wait.

A A Platform .Vn.'.s: SPAROi-Hun 7fo'X and £ PAR Gilt: I tun 2G-- November ;.«9-;

97 bK U fbZ bBo Al

index

K antialiasing, 5-4 ROI support, 4-2

cgn irame ouner, o-^ Scalable Memory Controller (SMC), 1-1 ' cgl4 visuals, 5-2 SIGFPE, 5-3 cgfourteen device driver, 1-1 SMC (Scalable Memory Controller), 1-1, 2- configuring SXDRAM, 2-3 to 2-11 1 SPARCstation 10, 1-1 D SPARCstation 10SX, 1-1 SPARCstation double-buffering, 2-2 20, 1-1 SX hardware context, 5-2 SX imaging and graphics accelerator, 1-1, 2-1 raise coior enects, j-j SX XIL features, 4-1 to 4-5 SXDRAM, 2-1 to 2-11 M calculating amount to reserve, 2-2 molecules, 4-5 configuring, 2-3 to 2-11 system memory controller, 1-1 o V OpenWindows on SPARCstation 10 SX, 3-1 video RAM, 1-1 video subsystem, 1-1 visuals, supported by OpenWindows, 3-2 VSIMM, 1-1

!K:OX-l

3 EP 0 752 685 A1

X XGL 3.0.2 accelerator guide for SX, 5-1 to 5-5 XGL, use of SXDRAM by, 2-2 to 2-3 XGL/SX driver, 5-2,5-4 XGL_3D_CTX_JITTER_0FFSET, 5-3 XGL_CTX_PICK_APERTURE, 5-3 XGL_DEV_COLOR_TYPE, 5-1 XGL_DEV_REAL_COLOR_TYPE, 5-4 XIL acceleration on SX, 4-1 to 4-5 XIL, use of SXDRAM by, 2-2 to 2-3 xil_color_convert variations, supported, 4-4 xil_convolve implementations, supported, 4-4,4-5 xil_lookup variations, supported, 4-4 xil_lookup(3), 4-1 xil_rotate(3), 4-1

z Z buffer, 5-2 Z-buffering, 2-2

Platform /Vc.Vi: SPARCstation 1GSX and SPARCstation 20— \'t}xrtr.b,ir

Claims

1. A method for rendering data, corresponding to a view of an object, for a display coupled to a computer system having a processor, a memory coupled to the processor, and a memory controller, the memory storing viewpoint data represented by at least one polygon and texture map data representing the object, the method including the steps of:

99 EP 0 752 685 A1

(1) generating, in the processor, span data for said polygon for a plurality of spans across said polygon; (2) transferring said span data relating to a current said span from the processor to the memory controller; (3) in the memory controller, mapping said current span data onto said texture map data; (4) rendering said current span data by the memory controller; and (5) repeating steps 2-4 until all of said plurality of spans have been rendered.

2. The method of claim 1 , wherein said viewpoint data is represented as model coordinates corresponding to a model coordinate space of the object, the method further including, before step 1 , the step of: (6) binding said model coordinates to said texture map data. 10 3. The method of claim 2, wherein step 6 is executed by said processor.

4. The method of Claim 2 or Claim 3, further including, after step 6 and before step 1 , the step of transforming said model coordinates into device coordinates corresponding to positions on said display. 15 5. The method of claim 4, wherein said transforming step is executed by said processor.

6. The method of any preceding claim, wherein at least one of said viewpoint data and said texture map data is stored in a dedicated portion of said memory accessed by said memory controller. 20 7. The method of claim 6, wherein said dedicated portion of memory is configured to be not directly accessible by said processor.

8. The method of any preceding claim, wherein step 4 includes the step of writing said span data to a display memory. 25 9. The method of any preceding claim including the step of transmitting said span data to said display.

1 0. A method for rendering data representing an object for a display coupled to a computer system having a processor, a memory coupled to the processor, and a memory controller, the memory storing viewpoint data representing a 30 view onto said object and texture map data representing the object itself, the method including the steps of:

(1) generating, in the processor, discrete data corresponding to the viewpoint data; (2) generating, in the memory controller, texture map data corresponding to the discrete data to produce a texture-mapped data set; and 35 (3) rendering, by the memory controller, the texture-mapped data set.

11. The method of claim 10, wherein at least one of said viewpoint data and said texture map data is stored in a dedicated portion of said memory under control of the memory controller.

40 1 2. An apparatus for rendering data, corresponding to a view of an object, for a display coupled to a computer system having a processor, a memory coupled to the processor, and a memory controller, the memory storing viewpoint data representing a view onto said object and texture map data representing the object itself, the memory further storing program modules, the apparatus including:

45 a generating module configured to generate in said processor discrete data corresponding to said viewpoint data; a mapping module configured to generate in said memory controller texture map data corresponding to said discrete data; and a rendering module configured to render by said memory controller the texture-mapped data set. 50 13. The apparatus of claim 12, wherein said memory includes a portion coupled to said memory controller.

14. The apparatus of claim 13, including a memory portion control module configured to provide control over said memory portion to said memory controller. 55 15. The apparatus of claim 14, wherein said control is exclusive control by said memory controller.

16. A computer system for rendering data corresponding to a view of an object, including:

100 EP 0 752 685 A1

a processor; a memory coupled to said processor, storing viewpoint data representing a view onto said object and texture map data representing the object itself, the memory further storing program modules configured to execute functions relating to said rendering; a generating module configured to generate in said processor discrete data corresponding to said viewpoint data; a mapping module configured to generate in said memory controller texture map data corresponding to said discrete data; and a rendering module configured to render by said memory controller the texture-mapped data set.

17. The system of claim 16, wherein:

said memory includes a portion storing at least one of said viewpoint data and said texture map data; the system further including a control module configured to provide control over said portion of said memory to said memory controller.

101 120 11U 130

rrocessor [nteliigcnt Memory Controller

MEMORY CONTROLLER

ccacne

14U MEMORY

150 CMEM SYM

J— / , 60-^ 170-^

L

VRAM

{

JO Devices Mass storage NcrworK Display jcomecry Data 90 !00^ texrure Map A

184 50

1 igure 1 EP 0 752 685 A1

Figure 2

103 EP 0 752 685 A1

Figure 3

104 EP 0 752 685 Al

,410 Processor requests IMC to load graphics data from disk into CMEM i 420 Processor binds model coordinate space data to texture map data 1 430 Processor transforms model coordinates into screen coordinates

440 Processor culls out invisible regions i 442 Select next polygon for rendering

444 Processor edge-walks polygon to determine span data

M6 . r— -— 1 beiect next span or current polygon L

Processor transfers current span information (starting addresses, 450 vector directions and lengths, 3D slope information, and texture map physical memory addresses) to IMC

460 uvit maps current span onto span of voxels in texture map space

470 IMC retrieves colors/textures for nternal points of current scan line

480 UML renders current span, with colors/textures, to display memory or regular memory 490 is mere anotner span J J " .I' 1 Y \^ to oe rendered in this polygon? No | 500 —f Is there ""N another polygon _ res\^ to pe rendered in this view? 510 No, L 520 s mere anotner \ geometry/view ►] >top I tobe rendered? J No • Yes

A igure 4

05 EP 0 752 685 A1

Figure 5

106 EP 0 752 685 A1

" European Patent EUROPEAN SEARCH REPORT office EP 96 30 4978

DOCUMENTS CONSIDERED TO BE RELEVANT Citation of document with indication, where appropriate, Relevant tXASMHt-AUUIN Of 1HI Category of relevant passages to claim APPLICATION (Int.CI.6) EP-A-0 144 924 (GENERAL ELECTRIC) L-17 G06T15/10 k page 14, line 19 - page 15, line 19; figure 2 *

EP-A-0 600 204 (IBM) L.10,12, L6 * the whole document *

ie.t-tUNK.iU. rwujsj SEARCHED ant.CI.6) G06T

The present search report has been drawn up for all claims Place of search Date of completmi of the search Lxamlner BERLIN 9 October 1996 Burgaud, C CATEGORY OF CITED DOCUMENTS T : theory or principle underlying the invention E : earlier patent document, but published on, or X : particularly relevant if taken alone after the filing date Y : particularly relevant if combined with another D : document cited in the application document of the same category L : document cited for other reasons A : technological background O : non-written disclosure & : member of the same patent family, corresponding P : intermediate document document

107