Hardware Architecture Considerations in the WE32100 Chip Set

FEATURE Hardware Architecture Considerations in the WE32100 Chip Set Michael L. Fuccio AT&T Bell Laboratories Benjamin Ng AT&T Information Systems The interchip communication protocols for the WE32100 microprocessor chip set had to be designed to support several ambitious architectural goals. T he WE32100 series of VLSI chips is an upgrade * support the software architecture of the entire chip set, and addition to the WE32000 (formerly, Bellmac- * retain some compatibility with predecessor chips so 32A) 32-bit microprocessor family. The WE32100 that Bellmac customers desiring higher performance via the family is a CMOS chip set that differs from its predecessor WE32100 chip set would not need to perform wholesale ar- in operating speed as well as in hardware architecture. chitectural changes when upgrading, and Some software architecture modifications of the upward- * allow for easy interfacing to commercial memories and compatible type were also implemented. Here, we discuss peripherals. the WE32100 chip set's interchip protocols as they relate to system hardware architectures. These protocols were de- fined at the same time the WE32100 family was designed. We also examine architectural trade-offs as they pertain to Alternatives investigated the WE32100 chip family. We investigated over a dozen alternatives for an interchip communications protocol. They fell into three Goals categories: * asynchronous protocols compatible with the WE32000 A number of goals guided the definition of the hard- (Bellmac-32A) microprocessor module, ware architecture. We wanted to * asynchronous protocols with pipelined memory trans- * provide an interchip protocol that would support com- actions, and petitive performance in a 32-bit microprocessor operating at * fully synchronous protocols with "split-read" memory a prescribed frequency, transactions. * provide a true all-VLSI chip set requiring no SSI The first category included about eight variations on the ''glue,'' theme. The WE32000 module has a four-cycle (zero-wait- * allow for multiple chip set configurations, to permit a state) memory transaction. We felt that a cycle could be system designer to customize the chip set to his needs, eliminated from this access and, further, that excess over- April 1986 0272-1732/86/0400-0029$01.00 © 1986 IEEE 29 WE32100 SERIES head on the WE32000 module could be eliminated. This showed that the occurrence of memory accesses that could alternative and its variations, then, provided an asynchro- be overlapped is relatively small for the WE32100 CPU. nous, three-cycle memory transaction for the CPU and any Furthermore, a write transaction cannot be overlapped. Yet other bus master. another problem, an internal CPU bus conflict, all but We evaluated the possibility of a memory transaction of eliminated "overlappable" transactions. Given these facts, two (or less) clock cycles, and the preliminary timing infor- we abandoned the second class of alternatives. mation we obtained indicated that, for a two-cycle memory At the time of the WE32100 development, our investiga- access, very little memory access time would be available tions of the third class of alternatives were still very ex- for common memory parts. This did not meet our first ploratory. Furthermore, the alternatives in this class did not goal, that is, that the speed be competitive. It was our judg- meet the goal of compatibility with Bellmac chips. Since ment that a two-cycle memory transaction would not be compatibility was very important and since the concepts of usable with common memory well into the foreseeable this class were not well proven, we abandoned it. future. Furthermore, certain functionality would have been Table 1 summarizes how the three classes of protocol lost if the transaction had been made two cycles long. alternatives met our goals. The details of the chip set pro- In sampling asynchronous signals, there is a high cost tocols are described in the WE32100) information manual.l associated with preventing input signal metastability. In the remainder of this article, we will describe some of Avoiding metastability requires a signal input to be doubly the more important issues relating to the realization of a latched. This requires additional latching times (clock hardware architecture for a 32-bit VLSI chip set. edges), which a two-cycle transaction could not have provided. As a result of all these difficulties, we abandoned the Chip set protocol issues two-cycle approach in favor of the three-cycle one. The second class of alternatives also addressed our desire Here, we examine the issues that arose when we sought to reduce the length of a memory transaction to two cycles. to choose a chip set protocol that would meet our goals. In these alternatives, a three-cycle access would overlap, by These issues involved one cycle, the subsequent access. This memory transaction overlap would yield an effective two-cycle transaction. We * choosing asynchronous or synchronous operation, felt that this overlap could occur often, given the instruc- * choosing multiplexed or nonmultiplexed addresses and tion prefetch policy of the CPU and the early availability of data, the next instruction. More careful analysis, however, * implementation of transaction types, including reads, writes, interlocked reads, block fetches, and interrupt- vector number fetches, * exception handling, * bus arbitration, * the implementation of the interface to the memory alternativs me hW3 0 iciett0gas management unit, or MMU, * the handling of support processors, ;Goalf Aleraiv IAterm0itive 2 leraiv * providing for special needs of the chip set architecture, Competitive:l0 :0 * choosing the operating frequency, and * electrical design. Yes YeWs :0Yes0 i't;t- ; perormne?;000 $A;StleL;: :<SEd;.X; Before discussing each of the issues listed above, we tSXllEi;t4;f \y;ti100-050X; should describe, briefly, some of the features of the Glue logic?; 00; N:rn Sm,;'a,,zMoun 0'dsiE^ WE32100 chip set. Additional detail can be found in ;S:t;;00t,-'000^00 ji ;;^^XE several technical papers. 2-5 M:ultipl chip 0i0j0)0 t:;E^:; ftity4^;-;f; - ;,l The WE32100 CPU has a rich instruction set that in- con£;gurations? ';i'.tS;:fE;1ffi:e:az:S cludes operating system support instructions. Two instruc- ;if';f:.:;iE00w<^.000 Yes,;f;0W::' SS: i;; ;it;;<, i;; f;: ftA42 -:.: ;NiC. \,8 tions requiring I/O protocol support are implemented. The ES;;03X'S'S;t:;X:^j8X4j. SSA;SEd^\X:, .. 10000 ^t-LL t;;-0^ :;SH;;,:;:::4:o.S tE; W; dE: Nt;ji,,; 00 ;,:00 first is the SWAPI (swap interlock) instruction and the sec- 2 ,E4 b: ;f;; f. S;X,,75;-k t ;; t;;t;Li; W;; ond is the MVTRW instruction. We will discuss support for a£¢chitecture 00 Yi t:s? GE tE f^. --.^4.zy ?:;;; iAtl.;^;;|Xt:: ^t ;;-Cv; t: t>; ,3t these instructions later. The WE32100 CPU also supports a a,,l! ?2E,f(;;;; :T' t:U: <, rich interrupt structure, including Yes tiEE4-:S: ,40 t;;^ l0040 li4S'40.E '1x;z^ an "autovector" feature. :iS:: ': ;"; -St .;,;; ;.; f f"t1; rNl The WE32100 CPU has an on-board instruction cache t^:?::.iS E: :;;;gS;-: t;. ,;:X;E XK :t^f^--;; that fetches a two-word block on a cache miss. This two- ffiWE32;000-. ^ Y;: 'ties; :1|..S:||^|- (M, ^:^; ^t :;ts :$ '^.!@' word fetch has a hardware protocol support mechanism, 4: ;gl; j? :,; :tiNg:^^:i,,,;- 11 fW:cpantibe? ; ;,,,$;?S b ll;A. ",'i:^ :?;FfE;'..r"'', which we will discuss later. ;S.., :d t:-: ;SA J; iSk; ;;? S4t'S;04^^-' :|.; ;;i'.^:^SS The WE32101 MMU provides typical virtual memory Ea,yS commercial:t ,.;U.. 51, SS*3:_w; atl^0n'S^^= functions-that is, address translation and protection-for the WE32100 chip set. The MMU supports segmentation 30 E'SN''xS;AtS;, IEIEEE MICROIR and paging and, in the paged translation mode, a special entities can only react to bus events, since they cannot be I/O protocol feature called "early RAS." made aware of the activity of other bus entities in a global The WE32106 math accelerator unit, or MAU, complies sense. with Draft 10.0 of the proposed IEEE floating-point Entities on an asynchronous bus typically do not share a arithmetic standard.6 It can be configured as a support clock signal governing bus transactions. Typical VLSI processor (i.e., make use of the WE32100 support pro- devices advance their internal state via a clock signal. This cessor interface) or as a memory-mapped peripheral. causes some difficulty in connecting the asynchronous bus The WE32103 dynamic RAM controller supports up to to the synchronous internals of such devices. Metastability four banks of dynamic memory. The DRAM controller is a problem here. 6-9 To avoid having a sampled asynchro- also supports dual-ported memory and has an interface for nous signal become metastabie, one should be sure the an error detection and correction chip. It is programmable signal is doubly latched. That is, the signal, once sampled, for various speeds of dynamic RAM. should have its sampled value latched some time later to The WE32104 direct memory access controller features a resolve a potential metastability condition. The need to system bus and a peripheral bus. The system bus is a 32-bit- doubly latch an asynchronous signal means that more time wide bus designed for memory-to-memory DMA. The pe- must be dedicated to guaranteeing the state of that signal. ripheral bus is an eight-bit-wide bus optimized for byte- This can mean an overall loss of bandwidth for an asyn- wide peripherals such as UARTs and LAN interfaces. The chronous bus compared to a synchronous bus. DMA controller can perform system-bus-to-peripheral-bus transactions and also allows the CPU to access peripherals, Pseudosynchronous buses. A pseudosynchronous bus has on the peripheral bus, from the system bus. The DMA con- the attributes of both a synchronous and an asynchronous troller is programmable for burst-mode or cycle-steal trans- bus.

Hardware Architecture Considerations in the WE32100 Chip Set

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Hardware Architecture

Antipatterns Refactoring Software, Architectures, and Projects in Crisis

Efficient Processing of Deep Neural Networks

Architectural Adaptation for Application-Specific Locality

Performance Analyses and Code Transformations for MATLAB Applications Patryk Kiepas

Development of a Predictable Hardware Architecture Template and Integration Into an Automated System Design Flow

Computer Architectures an Overview

Large-Scale Software Architecture

Integrated Circuit Technology for Wireless Communications: an Overview Babak Daneshrad Integrated Circuits and Systems Laboratory UCLA Electrical Engineering Dept

[0470848499]Large-Scale Software Architecture.Pdf

Hybrid Pipeline Hardware Architecture Based on Error Detection and Correction for AES