<<

The Symbolics tedey iit A 40 Taggeda9 Archhectapetigp: MicroprocessorMicrop . Clark Baker, David Chan, Jim Gli xy.-Alan Corry, Greg Efland, Bruce Edwards, Mark Mateon#ie Minsky, Eric Nestler, Tan Kalman Heti, David Sarraxin; Oiearles Sommer, David Bid VLSI Systems Group Symbolics Cambridge Research Center con- branches are resolved in the second (D) stage. A taken branch ditional branch executes in two cycles; a not-taken Abstract ‘2 2 shows a block diagram of the Ivory CPU. Instruc- (up to 64 of a third tiahe are fetched directly from a 32 word This paper describes the VISI implementation The cache, optimized for the Lisp lan- jnatructiona) direct-mapped instruction cache. generation symbolic processor serves to buff- Jwiich ‘ig filled by an autonomous prefetcher, guage. The design effort has resulted in a single chip im- and hold email plementation of a complete CPU which efficiently executes er. instructions arriving from memory provides the instruction offers performance, system integration loops. A_ bypass path Lisp primitives and stage is stalled on a and manufacturability advantages over current approaches Fikestrom memory when the first Heche miss. to Lisp processors. the Te operand address calculation data path contains stack frame pointer registers and a 32-bit adder/subtractor. the operand and stack pointer 1. Introduction Itiaenaputes the address of adjustment according to the macroinstruction. This data ig aleo used in parallel with the main data path to ac- Lisp processors have been implemented in a variety of tech- function call/return. aDey nologies with a number of objectives in mind, Contem- roedde ia stored in a 1200 word x 180 bit ROM. This porary Lisp processors use mare board area, more power, reads 8 words in parallel, allowing a late 1-of-4 select than contemporary VLSI current and are more expensive the. neat microinstruction to be based on the LU233]. As the market for symbolic ALM tondition. there is a need for both higher perfor- processing matures cache, mance, Jower cost stand-alone systems, and embedded ays- Fhemain data path contains a 128 word top-of-stack of the top tems. _ The Ivory architecture was initiated with the intent g:62-word scratchpad (which contains a duplicate and tag a single chip processor which provides on-chip word on the stack in a fixed location), the ALU, of designing unit, support for a rich Lisp environment. ing logic. The ALU includes an adder, boolean C logic, and support for one-bit-per-cycle integer with This paper describes the resulting micro-architecture and pmiliply/divide. Tag checking is done in parallel with some of the implementation details of the Symbolics Ivory, a no time the ALU operation, so that in the common cases single chip symbolic . check- pemslty is paid for type checking [4]. Similarly, ECC ALU of data from memory is done in parallel with the jag the top- using the on-chip ECC logic. Bypass paths for both of the pre- 2. Iwory Microarchitecture of-atack cache and scratchpad forward the result wittas instruction to the ALU as necessary. recent vir- The Ivory_microprocessor implements a 4-stage pipeline as An on-chip 16 entry fully associative cache holds physical address translations. The cache is main- shown in Figure 1 The first stage fetches the inatruction, tual to in- decodes it, and adjusts the program counter. The second ‘tained by , with hardware to assist searching selection. stage fetches the initial microinstruction, computes the metagty translation tables and cache replacement operand address, and adjusts the stack pointer, The third The Ivory processor supports a pipelined memory bus which —- as- stage fetches the operands and computes the result. The ean have up to four outstanding requests at once. An fourth stage stores the result, unless a fault has occurred, is main- acciative queue of outstanding request addresses in which case it restores the state of the third stage. tained for detecting when instructions arrive from memory the instruction cache. The memory 0 E C and installing them into PROGRAM: I by an independent state interface protocol is implemented PUSH A . between on-chip users of the ADD PUSHB PLSHA — machine which arbitrates PUSH B ; bus masters. PCPC ADC PUSHB PUSHA memory system and other ADD on Hardware included for testing includes scan-out registers POP C —— poR ADD PUSH micro the microcode ROM and real time observation of the acces- — — POPC ADD program counter. All hardware defined registers are sible from high level language primitives. The microcode _— —_- — POPC run. ROM may be checksummed via the scan-out path while Figure 1. Ivory Pipeline Stages ning Lisp.

Instructions spend only a single cycle in the first two pipeline stages, but can epend an arbitrary number of cycles in the execute stage. Simple instructions, such as “push”, add", and “eq” execute in a single cycle. Conditional

12$01.00 © 1987 IEEE (CH2473-7/87/0000/05

{ STACK CACHE f “ ag ADORE SS On Deu! CONTROL fant Logic ALU 3 “acon MCROCODE t AT Ba — * Aon OF peayy Ot * Moose

ADORESS PCy REGISTERS OSTAUCTION 0€cone INSTALICTION. CACHE PAEFETCHER unr

4 [ DATA

Figure 2. Ivory Processor Block Diagram 3. VLSI Implementation 32 bit adders which all have to cycle. The operate twice every clock static adder Provides a Ivory was designed using tween size and good compramise be- a highly integrated proprietary speed, Based on a hit VLSI design system called adder it employs Manchester carry NS (5), which ig implemented a Passaround path the Symbolics 3600 . on tributed pseudo-nMOS controlled by a dis- interactive This provides a highly lookahead gate to improve design and simulation bit pitch is 40u speed. The environment and while the length of the dition, a symbolic layout in ad. Tules. It adder is 232u in 2u methodology that allows may be used as a aingle the com- carry-select unit or combined into adder. The typical speed a around 26nS. for a 2u process ig Extensive use was made of sim various modules, thig capability NS design system. Examples include logic, instruction-decode the ECC generation ro: j CH

A two phase clock is employed which is routed throughout the chip. Registers and latches use nMOS pass transistors with restoring inverterg as the basic storage device, An tra synchronization clock ia ex- used to generate memory timing signala. Transistors(z) Area(%) Relative In line with Density the ability to retarget the Cortrol symbolic layout design using the 7.§ 13. tools and the sheer impossibility Data-path ? timizing 390,000 of hand op- 32.6 17.2 transistors, the circuit design Memory 23.9 3.3 was kept fairly of the chip 7.9 5.4 simple. The majority of u-Code 37.9 Is logic in the design 11.4 4.8 static - either fully complementary Pads 3.6 pseudo CMOS logic gates, Routing 3.9 .64 nMOS gates for high fan-in 8 48 pass transistor NOR gates or nMOS Q logic for selected muxes, Dynamic only used in gates are the microcode ROM, instruction decoder the RAMs. The and RAMsare implemented using a standard transistor static cell, 6 while the CAMs in the design use a 10 transistor static cell, All control logic is in the form static, fully complementary of atandard cells. The symbolic layout for control modules was automatically generated using a standard cell place and route syatem. cell library, A standard memory library and data-path library for most is used of the designs in the chip. Customization and circuita of logic is kept to a minimum to ensure uniformity performance and of reduce the amount of verification required at the circuit level. As an example, the design employs five

513 a i. Feature Review 8:Summary

With respect to the requirements of a commercial Lisp is paper has presented processor the following reviews the contribution ae the micro-are b of Ivory ‘the 5. hardware: significant technical features of the: i , ' CPU.It is the first single chip Lisp CPU the features required of * Fast call and return - Spvcialized next generation Lisp CPUs. ” operations, datapaths, paratiel © and fast cycle time support the complex calling strategies required by Lisp, 6. References + Runtime type checking - Parallel tag processor, late- branch (1] Motorola, ROM and comprehensive trap logic “MC88020 32-Bit Microprocessor support Manual,” User's generic arithmetic and pointer manipulation. Prentice Hall, 1985. (2] J. Crawford, “Architecture * Specialized of the Intel 80368," ICCD Lisp operations - Pipelined memory inter- "86, pp. 155-160. face and high level microcoded primitives support ef- {3]_ C. Hansen, D. Freitas, E. Hudson, ficient implementation of 5. J. Mouazsouris, operations auch as CAR and Przybylski, TT. Riordan and ¢. Rowen, CDR, Microprocessor "A RISC with Integral MMU and Cache Interface," ICCD "86, Pp. 145-148. ¢ Garbage Collection - On chip hardware to facilitate ef. {4] D. Moon, "Architecture ficient of the Symbolies 3600," GC algorithms suchas the Ephemeral Proceedings of the GC [6]. [2th Symposium on Architec- ture, June 1985, pp. 76-83. * Support - On chip translation buffer, (5): J. Cherry, H. microcoded Shrobe, N. Mayle, C. Baker, H. cache-misg hackup and the K. Reti, and N. Weste, , support of "NS: An Integrated Symbolic CDR-coded lists (more compact Design System,” physical memory VLSI ‘85, E. Hoerbst, ed, Elsevier representation). Science Publishers B. V., pp. 325-334. North-Helland, Amsterdam, 1986, Other features which improve performance {6] D. A Moon. tem integration and lower sys- "Garbage Collection in a Large cost inciude- System,” 1984 Lisp ACM Sympostum and Functional Programming, pp. 235-248. « A programmable interleaved memory interface to allow a wide range of memory system speeds and architec- tures to be used - ranging from small high speed caches to four-way interleaved standard MOS memories.

* A fast coprocessor interface, primarily used to provide = high floating point performance.

* Fast “vector” instructions for garbage collection, data- base searching and graphics applications. Depending on the system configuration, initial versions Ivory will allow performance of increases of 3-5 times that of current Symbolics products. Ivory supports the full Genera 7 symbolic processing environment, igure 3 shows a chip photograph of the 2u Ivory Lisp microprocessor.

514 d ] q , ] q d ] 4

ru

ea SOs

Figure 3. Ivory Photomicrograph

$15