A General Multi-Microprocessor Interconnection Mechanism for Non

-^^^ Center for Information Systems Research Massachusetts Institute of Technology Sloan School of Management 77 Massachusetts Avenue Cambridge, Massachusetts, 02139 A GENERAL MULTI-MICROPROCESSOR INTERCONNECTION MECHANISM FOR NON-NUMERIC PROCESSING Hoo-min D. Toong Svein 0. Strommen* Earl R. Goodrich II February 1980 CISR No. 53 Sloan WP No. 111^80 Reprinted from Proceedings of Fifth Workshop on Computer Architecture for Non-Numeric Processing. *Work performed while on leave from THE CHRISTIM MICHELSEN INSTITUTE, Bergen, Norw£y. ABSTR.\CT modifications will be illustrated in this paper MMPS interconnection problems are discussed in terms of a singl time-shared bus. 'The performance draw bacl'.s usually the single bu s alternative SOFTWARE associated with SVSTEM -ST4RT BALR l3,0 are attributed to the high b us utilization USING ,i5 at the basic building bloc k level. The "1" LA I .BUFFER Pended Transaction Bus protocol is MVI BUFFER, X'OO' a general solu tion to such /7//j/;I".T'.i;ii presented as BUILDING BLOCK utilizations. Such a bus i s developed to rjmNTERCONNECTION LEVEL support more than 50 proce ssors without ^v7;TtE4 severe contention. The bas ic protocol of mil the MC68000 as a curren t generation microprocessor is investiga ted, and shown BASIC BUILDING BLOCK LEVEL ineffective for true multi microprocessor systems. I ^ J 1. INTRODUCTION INPUT* -^^' Gains in computer performance can be Figure 1 achieved through improvements at the The four basic levels of potential circuit level (eg. faster circuitry), the Computer System Performance basic building block level (eg. more Improvement; i.e. circuit level, powerful microprocessors), the building building block level, building block block interconnection level (eg. better interconnection level, and system computer system architecture) , and the software level. system software level (eg. more effective system software) . Many of these points are studied in [1]. However, there are strong relative dependencies between the levels The recent LSI technology evolution (see Figure 1), and a full system has created sig nificant improvements at the utilization of improvements at one level circuit level and the basic building block will usually require some associated level. However , equivalent improvements modifications at the other levels. The have not yet taken place either at the absence of both necessary system building block interconnection level or at interconnection signals and important the system so ftware level. Unfortunately, system software instructions in modern there is also still a lack of simple building blocks are typical examples of merhods by whi ch the necessary adjustments situations where it is appropriate to make across the leve 1 boundaries can be handled adjustments across level boundaries. Some in order to obtain optimum system of the dependencies and their associated utilization of the available technology; i.e., too much money is usually invested in solutions at t he different levels when the actual changes are proposed. n -^q^h^ The memory contention can be dealt with in two different ways. Danielsson and his colleagues (7) have suggested that the memory space should be divided into small modules. This will allow the memory requests to be spread out over a large number of independent modules, thus reducing the probability of getting ' CONVENTIONAL simultaneous requests for the same module. Rather than using small memory modules, the problem can be solved using faster memory circuitry in the highly demanded areas. This concept must be accompanied by a memory content migration schema which must be based on continuous memory traffic statistics. The idea of using memory modules with different speed characteristics is analagous to the well-known cache concept; however, the shared-bus architecture is more flexible than a standard cache. Toong [15] has studied tne memory speed part of this solution (assuming a stationary memory content) , using an analytic model, and his results show promising effects. '^ RESP0NSE_TU^^5 No practical implementation of the above suggested solutions to the memory contention will eliminate the entire problem. The processors in the system will 10 20 30 40 50 60 therefore continue to become unproductive # PROCESSORS when they must wait for crucial information from the memory. Note that only a portion Figure 4 of the delayed information will influence Average bus utilization and average bus the processor productivity. response time as a function of rhe number of processors for a pended During the unproductive periods, the transaction based MMPS. The graph also actual processors may waste bus bandwidth includes the average bus utilization through repeated requests for the needed for a conventional single bus based information. The waste of bus bandwidth can MMPS. [15] be reduced significantly by using input and output queues on all of the devices that are connected to the bus. This will allow 2.4 Memory and Processor Contentions all of the devices to transmit information on the bus even though the receiver should In a single bus based MMPS, any number be busy; i.e., the information will be of processors may simul caneously need stored in the input queue until it can be information from the same memory module. In processed. It will also permit the devices addition, there is likely to be an to keep on working even if they cannot get exponential-type distribution of the memory immediate access to the bus, i.e., the load [15]; i.e., certain areas of r.he output information will be deposited in the memory will be in greater demand than other output queue until it can be transmitted. areas (see Figure 2) . Since a memory module In normal operation, the actual size of the can only service one request at a time, queues, which is a system design parameter, this situation may result in a severe is not likely to go beyond practical contention among the processors which are limits. According to Toong [13], who has requesting the use of rhe highly demanded studied both the memory speed and the queue memory areas. Similarly, a sec of memory solutions to the device busy problem, a modules may like to respond simultaneously 64-level queue will result in a queue to one particular processor. This creates a overflow probability on the order of processor contention. The memory and 10**-i2, which is nearly zero for all processor contentions, which are often practical purposes. referred to as the "device busy" problem, may degrade the MMPS performance. This Design of a reliable single-bus-based performance degradation will arise from MMPS, which would utilize queues of processors awaiting crucial information insufficient length to reduce the devJ.ce from the highly demanded memory modules, busy problem, must incorporate mechanisms and from tne extra bus load which will be that will prevent queue overflow. A imposed by multiple requests for "queue-full" signal can be used to solve transmission to busy devices. Thus, the this problem. To avoid wasting bus device busy problem must bo dealt with in a bandwidth on queue-full conditions, it single bus-based MMPS design. would be necessary to incorporate all the The physical characteristics of the likely to involve performance trade-offs. bus lines (i.e., speed, length, etc.) will However, the differences between the definitely influence the bus performance. various implementations has no major impact However, the effect of this matter is on performance, and a further discussion of signiticanc only for long busses (length > these topics is not pertinent here. 1 ft.) and/or for critical speed requirements. For the purposes of this paper, the bus lines will be considered to be short enough and well conditioned so that they do not impose any significant data transfer constraints. The only remaining physical speed constraint, then, is the speed of the interface circuitry. This can be solved, for all practical purposes, by using high-speed, uniform logic at the bus interface (See Figure 3 for illustration) . Pn The only remaining factor to consider, then, is the protocol used on the bus as a limit to data transfer rates. The high bus utilizations of the above mentioned processors is primarily a function of the master-slave based bus protocols tnat are used. Normally, the protocols are fixed at the basic building block design stage, and cannot be changed after the design is completed. Recent micro-coded processors, though, present the potential of being modified to optimize bus protocol without changing the basic processor. In general, the basic building block designers may use any bus protocol. The actual choice is always the result of a trade-off process, which, until recently, due to low-density devices, could not favor sophisticated bus protocols. Today, however, with VLSI technology at hand, it should be possible to implement low bus utilization protocols without any major penalties on the other system parameters. Different versions of a special "split transaction" bus protocol, 'have been proposed by various authors as a general solution to the high bus utilization problem [3,6,7]. This type of protocol, which is illustrated in figure 3, splits the regular master-slave based transaction (Tstd) into' two subtransactions (Tl and T2) . These can take place disconnected in time as a transaction initiation part and a transaction completion part. Consequently, the bus will be free for other usages during the asynchronous wait interval (Mt). Obviously, this implies that a read transaction will utilize both cf the subtransactions, whereas a write operation will utilize only the first subtr ansae tion (Tl). The actual bus protocol implementation varies in two major areas. These are: -centralized versus decentralized control , and -synchronous versus asynchronous logic. The trade-off process between centralized and decentralized control involves topics of reliability (fail-soft), modularity, and cost. The choice between synchronous and asynchronous logic is more : . Performance has always been considered as fast as the minimum bus information the only major drawback with the ningle transfer process.

A General Multi-Microprocessor Interconnection Mechanism for Non

The Birth, Evolution and Future of Microprocessor

Programmable Digital Microcircuits - a Survey with Examples of Use

Introduction to Cpu

How to Make Ez80 Code Execute Faster with Copy To

Computer Architectures an Overview

The Z80 Microprocessor Architecture

Time: Office Hour: TR 03:20Am ~ 05:20Pm Website

Development of Microprocessor Learning Media Using Zilog Z-80 For

TTL and CMOS Logic 74 Series

Microprocessor Wikipedia,TheFreeEncyclopedia Page 1Of 16

ZILOG Z80 PIO USER=S MANUAL Page 1 of 22 TABLE of CONTENTS

Gnu Assembler