A Low Cost, Transputer Based Visual Display Processor G.J. Porter, B
Total Page:16
File Type:pdf, Size:1020Kb
Transactions on Information and Communications Technologies vol 3, © 1993 WIT Press, www.witpress.com, ISSN 1743-3517 A low cost, transputer based visual display processor G.J. Porter, B. Singh, S.K. Barton Department of Electronic and Electrical Engineering, University of Bradford, Richmond Road, Bradford, West Yorkshire, UK ABSTRACT As major computer systems and in particular parallel computing resources become more accessible to the engineer, it is becoming necessary to increase the performance of the attached video display devices at least pro-rata with those of the computational elements. To this end a number of device manufacturers have developed or are in the process of developing new display controllers, dedicated to the task of improving the display environments of the super computers in use today by engineers. There is however a price to be paid for this development, in terms of monetary cost and in the design effort involved in integrating the new technology into existing systems. This paper will present a solution to this problems. Firstly, by showing how to utilise available transputer technology to upgrade the display capability of existing systems by transferring some of the available processing power from the computational elements of the system to the display controller. Secondly, by utilising a low cost off-the-shelf Transputer Module based video display unit. The new video display processor utilises a three transputer pipeline formed from as Scan Converter Unit, a Span Encoder Unit and a Span Filler Unit, and can be further expanded by increasing the functionality and/or parallelism of each stage if required. 1. INTRODUCTION The continued expansion in the application of computer graphics to all aspects of the working life of engineers has placed ever increasing demands on display technology. Higher resolution and faster refresh rates are constantly required for new application programs, Graphic Environments such as X Transactions on Information and Communications Technologies vol 3, © 1993 WIT Press, www.witpress.com, ISSN 1743-3517 154 Applications of Supercomputers in Engineering Windows and MS Windows, and user written programs. It is not possible to continually upgrade the graphics cards and display devices utilised in computer systems, leading to a display system that is less capable than that required by engineers. As part of the BRAD3D real-time image generation program, a system was developed that would allow the utilisation of existing low cost, relatively slow display cards, in a high speed display unit, called a Visual Display Processor (VDP). This unit utilises the existing display device as a simple back-end to the new system, performing all the necessary pre-processing in a two stage intelligent front-end. Leaving the slower display card with the simple task of drawing the actual horizontal spans into the display memory. 1.1 The Brad3D Project The BRAD3D project began at the University of Bradford in 1983, with the development of the first real-time image generation systemfl"|. This was constructed from two interconnected Motorola MC68000 (later MC68020), processors and a custom VDP system. The resolution of the unit was 256 by 256 pixels and it could, in 1987, manipulate a 3-D image constructed from 400 vertices, 160 polygons, in realistic real-time, (40mS frame interval). This was later improved by the addition of the XTAR[2] Graphics Co-processor and a four processor network, which increased the available resolution to 512 by 512 pixels and the processing rate to over 200 polygons in a 20mS frame interval. This was an eight fold increase over the previous system. Further improvements were made to the structure of BRAD3D and in 1991 a bit-slice processor was added to remove some of the computation from the Motorola MC68020 devices. This improved the through-put of the system but also highlighted its problems of extendability and flexibility. The system was seen as being inflexible and difficult to expand past its current size due to the fixed architecture of the dual VMEbus system employed. This instigated work on the current Transputer based system, described in greater detail in another offering to this conference, of which the VDP described in this paper is the last stage in the processing pipeline. 1.2 The Inmo$ Transputer The Inmos Transputer[3] is really a family of microprocessors, each sharing three basic facilities: a fast processor unit, on-chip memory and the ability to communicate bi-directionally with four other transputers. The transputer variant utilised in this project is the T800, which has both an integer (32bits) and a floating point (64bits) processor unit, giving it a sustained performance in excess of IMflop at 20MHz. To support this high computation rate the communication structure, via the serial links[4], can support a data rate of up to 2.3Mbytes per link. This together with the 4Kbytes of on-board memory and the availability of a hardware scheduler with a dual priority level, makes Transactions on Information and Communications Technologies vol 3, © 1993 WIT Press, www.witpress.com, ISSN 1743-3517 Applications of Supercomputers in Engineering 155 the T800 a useful processor on which to develop parallel programs. The FLOATING POINT PROCESSOR SYSTEM INTEC]ER SERVICES PROCESSOR LINK 0 4Kloyte SRAM LINK 1 TIMER LINK 2 EXTERNAL LINK 3 MEMORY INTERFACE EVEf\JT 32 bit Internal Hiahwav Figure 1.1 The Inmos T8QQ Transputer structure of the T800 is illustrated in figure 1.1. 1.3 The NT1 OOP Transputer System The Niche NT1000[5] is a multi-user transputer facility hosted by a SUN workstation. It is divided into four sites, each site being further divided into eight slots. One slot being able to accommodate a size one Transputer Module (Tram). An individual user has direct access to a single site, with up to eight processors if size one Trams are used. Alternatively, a user may utilise two or more sites, giving them access to a maximum of thirty two processors. Sites may not be split between users. Communication between processors within the same site is handled by a single IMSC004 Link Switch, as shown in figure 1.2. Each slot has two links, numbers 0 & 3, which can be configured to communicate with other similar slots within the same site. Alternatively, by using the Central Link Switch, a limited number of connections can be established between slots on different sites. Thus, most interprocessor communication paths can be established in this system. Transactions on Information and Communications Technologies vol 3, © 1993 WIT Press, www.witpress.com, ISSN 1743-3517 156 Applications of Supercomputers in Engineering t 1 SLOT 2 SLOT 2 SLOT 2 SLOT 2 SLOT 4 SLOT 4 SLOT 4 SLOT 4 SLOT 5 SLOT 5 SLOT 5 SLOT 5 SLOT 1 SLOT 1 SLOT 1 SLOT 1 SLOT 0 SLOT 0 SLOT 0 SLOT 0 SLOT 6 SLOT 6 SLOT 6 SLOT 6 SLOT 7 SLOT 7 SLOT 7 SLOT 7 SLOr 3 SLOT 3 SLOT 3 SLOT 3 SITE 0 SITE 1 SITE 2 SITE VMEbus INTERFACE Figure 1.2 Niche NT1000 Transputer System A single cycle or path exists between the processors in each site. This connects link 2 of each slot to link 1 of the next slot; with the slot that connects to the host machine using link 1 to establish this communication, and the slot that connects to the off-board interface using link 2. 1.4 Inmos B419 Graphics TRAM The IMSB419[6] is a graphics Tram, based on a single T800 transputer and a G300 Graphics Controller^?!. The structure of the unit is shown in figure 1.3. The transputer has access to two areas of memory, one to hold program & data and the second to act as a Frame Buffer for the image to be displayed. Both of these arrays of memory are 2Mbytes in size and are accessed on word boundaries in the transputers memory map. The G300 has access to the Frame Buffer memory via its Pixel Port. The frame buffer being constructed from Video RAMS, which have a high speed serial port that allows access to the memory at rates sufficient to drive a monitor at a horizontal resolution of 1024 pixels. Initial investigations of the B419 showed that to clear the screen when a screen size of 640 by 480 pixels took nearly 37mS when using the supplied CGI Transactions on Information and Communications Technologies vol 3, © 1993 WIT Press, www.witpress.com, ISSN 1743-3517 Applications of Supercomputers in Engineering 157 graphics librariesfS] and was reduced to approximately 20mS when coded directly in assembly language and utilising the fast block MOVE[91 instructions. Upon Link 0..3 SYNC Figure 1.3 Inmos B419 Graphics Tram examining the timings of the dynamic rams used for the video memory, it could be seen that the 20mS performance figures were extremely close to the maximum obtainable due to the restricted memory bandwidth. The results of the initial tests, showed clearly that it was not possible to use the standard draw to Frame Buffer approach of clearing the buffer to a background attribute, and then writing each polygon to the memory in the order in which it was scan converted. Normally when this type of Painters algorithm is employed, the time to clear the video memory to a background attribute represents a small percentage of the available frame time, nominally 10-20%. Whilst in this case it was almost 60% of the frame time. Consequently, the three processor system described in this paper was developed, in an attempt to overcome the shortcomings in the design of the B419 graphics tram. 1.5 Overview The VDP developed to overcome the problems defined in section 1.4 above, is divided into three functional processes each mapped onto a single transputer.