The Symbolics Ivory Processor: a 40 Bit Tagged Architecture Lisp
Total Page:16
File Type:pdf, Size:1020Kb
The Symbolics tedey iit A 40 Bit Taggeda9 Archhectapetigp: MicroprocessorMicrop . Clark Baker, David Chan, Jim Gli xy.-Alan Corry, Greg Efland, Bruce Edwards, Mark Mateon#ie Minsky, Eric Nestler, Kalman Heti, David Sarraxin; Oiearles Sommer, David Tan Bid VLSI Systems Group Symbolics Cambridge Research Center branches are resolved in the second (D) stage. A taken con- ditional branch executes in two cycles; a not-taken branch Abstract ‘2 2 shows a block diagram of the Ivory CPU. Instruc- This paper describes the VISI implementation of a third tiahe are fetched directly from a 32 word (up to 64 generation symbolic processor optimized for the Lisp lan- jnatructiona) direct-mapped instruction cache. The cache, guage. The design effort has resulted in a single chip im- Jwiich ‘ig filled by an autonomous prefetcher, serves to buff- plementation of a complete CPU which efficiently executes er. instructions arriving from memory and hold email Lisp primitives and offers performance, system integration loops. A_ bypass path provides the instruction and manufacturability advantages over current approaches Fikestrom memory when the first stage is stalled on a to Lisp processors. Heche miss. Te operand address calculation data path contains the stack frame pointer registers and a 32-bit adder/subtractor. 1. Introduction Itiaenaputes the address of the operand and stack pointer adjustment according to the macroinstruction. This data ig aleo used in parallel with the main data path to ac- Lisp processors have been implemented in a variety of tech- a function call/return. nologies with a number of objectives in mind, Contem- Dey porary Lisp processors use mare board area, more power, roedde ia stored in a 1200 word x 180 bit ROM. This and are more expensive than contemporary VLSI reads 8 words in parallel, allowing a late 1-of-4 select microprocessors LU233]. As the market for symbolic the. neat microinstruction to be based on the current processing matures there is a need for both higher perfor- ALM tondition. mance, Jower cost stand-alone systems, and embedded ays- Fhemain data path contains a 128 word top-of-stack cache, tems. _ The Ivory architecture was initiated with the intent g:62-word scratchpad (which contains a duplicate of the top of designing a single chip processor which provides on-chip word on the stack in a fixed location), the ALU, and tag support for a rich Lisp software environment. ing logic. The ALU includes an adder, boolean unit, integer This paper describes the resulting micro-architecture and C logic, and support for one-bit-per-cycle some parallel with with of the implementation details of the Symbolics Ivory, a pmiliply/divide. Tag checking is done in single chip symbolic microprocessor. the ALU operation, so that in the common cases no time pemslty is paid for type checking [4]. Similarly, ECC check- jag of data from memory is done in parallel with the ALU 2. Iwory Microarchitecture using the on-chip ECC logic. Bypass paths for both the top- of-atack cache and scratchpad forward the result of the pre- wittas instruction to the ALU as necessary. The Ivory_microprocessor implements a 4-stage pipeline as An on-chip 16 entry fully associative cache holds recent vir- shown in Figure 1 The first stage fetches the inatruction, tual to physical address translations. The cache is main- decodes it, and adjusts the program counter. The second ‘tained by microcode, with hardware to assist searching in- stage fetches the initial microinstruction, computes the metagty translation tables and cache replacement selection. operand address, and adjusts the stack pointer, The third stage The Ivory processor supports a pipelined memory bus which fetches the operands and computes the result. The fourth stage stores the result, unless a fault has occurred, ean have up to four outstanding requests at once. An as- in which case it restores the state of the third stage. acciative queue of outstanding request addresses is main- tained for detecting when instructions arrive from memory PROGRAM: I 0 E C and installing them into the instruction cache. The memory PUSH A . interface protocol is implemented by an independent state PUSH B ; ADD PUSHB PLSHA — machine which arbitrates between on-chip users of the ADD PCPC ADC PUSHB PUSHA memory system and other bus masters. POP C —— poR ADD PUSH Hardware included for testing includes scan-out registers on the microcode ROM and real time observation of the micro — — POPC ADD program counter. All hardware defined registers are acces- _— —_- — POPC sible from high level language primitives. The microcode ROM may be checksummed via the scan-out path while run. Figure 1. Ivory Pipeline Stages ning Lisp. Instructions spend only a single cycle in the first two —- pipeline stages, but can epend an arbitrary number of cycles in the execute stage. Simple instructions, such as “push”, add", and “eq” execute in a single cycle. Conditional (CH2473-7/87/0000/05 12$01.00 © 1987 IEEE STACK CACHE { t f “ ag On Deu! ADORE SS fant ALU CONTROL Logic “acon 3 AT Ba MCROCODE t — * Aon OF peayy Ot * CH Moose ADORESS PCy REGISTERS OSTAUCTION INSTALICTION. 0€cone CACHE PAEFETCHER unr DATA 4 [ Figure 2. Ivory Processor Block Diagram 3. VLSI Implementation 32 bit adders which all have to operate twice every clock cycle. The static adder Provides a good compramise be- tween size and speed, Based on a hit Manchester carry Ivory was designed using a highly integrated proprietary adder it employs a Passaround path controlled by a dis- VLSI design system called NS (5), which ig implemented on tributed pseudo-nMOS lookahead gate to improve speed. The the Symbolics 3600 Lisp machine. This provides a highly bit pitch is 40u while the length of the adder is 232u in 2u interactive design and simulation environment and in ad. Tules. It may be used as a aingle unit or combined into a dition, a symbolic layout methodology that allows the com- carry-select adder. The typical speed for a 2u process ig around 26nS. Extensive use was made of sim various modules, thig capability NS design system. Examples include the ECC generation logic, instruction-decode ro: j A two phase clock is employed which is routed throughout the chip. Registers and latches use nMOS pass transistors with restoring inverterg as the basic storage device, An ex- tra synchronization clock ia used to generate memory timing Transistors(z) Area(%) Relative Density signala. Cortrol 7.§ 13. In line with the ability to retarget the design using the ? Data-path 32.6 17.2 3.3 symbolic layout tools and the sheer impossibility of hand op- Memory 23.9 7.9 5.4 timizing 390,000 transistors, the circuit design of the chip u-Code 37.9 11.4 4.8 was kept fairly simple. The majority of logic in the design Pads 3.6 3.9 .64 Is static - either fully complementary CMOS logic gates, Routing 8 48 Q pseudo nMOS gates for high fan-in NOR gates or nMOS pass transistor logic for selected muxes, Dynamic gates are only used in the microcode ROM, instruction decoder and the RAMs. The RAMsare implemented using a standard 6 transistor static cell, while the CAMs in the design use a 10 transistor static cell, All control logic is in the form of static, fully complementary atandard cells. The symbolic layout for control modules was automatically generated using a standard cell place and route syatem. A standard cell library, memory library and data-path library is used for most of the designs in the chip. Customization of logic and circuita is kept to a minimum to ensure uniformity of performance and reduce the amount of verification required at the circuit level. As an example, the design employs five 513 a i. Feature Review 8:Summary With respect to the requirements of a commercial Lisp ae is paper has presented the micro-are b processor the following reviews the contribution of Ivory ‘the significant technical features of the: i 5. , hardware: ' CPU.It is the first single chip Lisp CPU the features required of next generation Lisp CPUs. ” * Fast call and return - Spvcialized datapaths, paratiel © operations, and fast cycle time support the complex calling strategies required by Lisp, 6. References + Runtime type checking - Parallel tag processor, late- (1] Motorola, “MC88020 32-Bit Microprocessor branch ROM and comprehensive trap User's logic support Manual,” Prentice Hall, 1985. generic arithmetic and pointer manipulation. (2] J. Crawford, “Architecture of the Intel 80368," ICCD * Specialized Lisp operations - Pipelined memory inter- "86, pp. 155-160. face and high level microcoded primitives support ef- {3]_ C. Hansen, D. Freitas, E. Hudson, J. Mouazsouris, ficient implementation of operations auch as CAR and 5. Przybylski, TT. Riordan and ¢. Rowen, "A RISC CDR, Microprocessor with Integral MMU and Cache Interface," ICCD "86, Pp. 145-148. ¢ Garbage Collection - On chip hardware to facilitate ef. {4] D. Moon, "Architecture of the Symbolies 3600," Proceedings ficient GC algorithms suchas the Ephemeral GC [6]. of the [2th Symposium on Computer Architec- ture, June 1985, pp. 76-83. * Virtual Memory Support - On chip translation buffer, (5): J. Cherry, H. Shrobe, N. Mayle, C. Baker, H. , = microcoded cache-misg hackup and the support of K. Reti, and N. Weste, "NS: An Integrated Symbolic Design System,” CDR-coded lists (more compact physical memory VLSI ‘85, E. Hoerbst, ed, Elsevier Science Publishers B. V., North-Helland, Amsterdam, 1986, representation). pp. 325-334. Other features which improve performance and lower sys- {6] D. A Moon. "Garbage Collection in a Large Lisp tem integration cost inciude- System,” 1984 ACM Sympostum on Lisp and Functional Programming, pp. 235-248. « A programmable interleaved memory interface to allow a wide range of memory system speeds and architec- tures to be used - ranging from small high speed caches to four-way interleaved standard MOS memories.