The Symbolics Ivory Processor: a 40 Bit Tagged Architecture Lisp

The Symbolics tedey iit A 40 Bit Taggeda9 Archhectapetigp: MicroprocessorMicrop . Clark Baker, David Chan, Jim Gli xy.-Alan Corry, Greg Efland, Bruce Edwards, Mark Mateon#ie Minsky, Eric Nestler, Kalman Heti, David Sarraxin; Oiearles Sommer, David Tan Bid VLSI Systems Group Symbolics Cambridge Research Center branches are resolved in the second (D) stage. A taken conditional branch executes in two cycles; a not-taken branch Abstract ‘2 2 shows a block diagram of the Ivory CPU. Instruc- This paper describes the VISI implementation of a third tiahe are fetched directly from a 32 word (up to 64 generation symbolic processor optimized for the Lisp lan- jnatructiona) direct-mapped instruction cache. The cache, guage. The design effort has resulted in a single chip im- Jwiich ‘ig filled by an autonomous prefetcher, serves to buff- plementation of a complete CPU which efficiently executes er. instructions arriving from memory and hold email Lisp primitives and offers performance, system integration loops. A_ bypass path provides the instruction and manufacturability advantages over current approaches Fikestrom memory when the first stage is stalled on a to Lisp processors. Heche miss. Te operand address calculation data path contains the stack frame pointer registers and a 32-bit adder/subtractor. 1. Introduction Itiaenaputes the address of the operand and stack pointer adjustment according to the macroinstruction. This data ig aleo used in parallel with the main data path to ac- Lisp processors have been implemented in a variety of tech- a function call/return. nologies with a number of objectives in mind, Contem- Dey porary Lisp processors use mare board area, more power, roedde ia stored in a 1200 word x 180 bit ROM. This and are more expensive than contemporary VLSI reads 8 words in parallel, allowing a late 1-of-4 select microprocessors LU233]. As the market for symbolic the. neat microinstruction to be based on the current processing matures there is a need for both higher perfor- ALM tondition. mance, Jower cost stand-alone systems, and embedded ays- Fhemain data path contains a 128 word top-of-stack cache, tems. _ The Ivory architecture was initiated with the intent g:62-word scratchpad (which contains a duplicate of the top of designing a single chip processor which provides on-chip word on the stack in a fixed location), the ALU, and tag support for a rich Lisp software environment. ing logic. The ALU includes an adder, boolean unit, integer This paper describes the resulting micro-architecture and C logic, and support for one-bit-per-cycle some parallel with with of the implementation details of the Symbolics Ivory, a pmiliply/divide. Tag checking is done in single chip symbolic microprocessor. the ALU operation, so that in the common cases no time pemslty is paid for type checking [4]. Similarly, ECC check- jag of data from memory is done in parallel with the ALU 2. Iwory Microarchitecture using the on-chip ECC logic. Bypass paths for both the top- of-atack cache and scratchpad forward the result of the pre- wittas instruction to the ALU as necessary. The Ivory_microprocessor implements a 4-stage pipeline as An on-chip 16 entry fully associative cache holds recent vir- shown in Figure 1 The first stage fetches the inatruction, tual to physical address translations. The cache is main- decodes it, and adjusts the program counter. The second ‘tained by microcode, with hardware to assist searching in- stage fetches the initial microinstruction, computes the metagty translation tables and cache replacement selection. operand address, and adjusts the stack pointer, The third stage The Ivory processor supports a pipelined memory bus which fetches the operands and computes the result. The fourth stage stores the result, unless a fault has occurred, ean have up to four outstanding requests at once. An as- in which case it restores the state of the third stage. acciative queue of outstanding request addresses is main- tained for detecting when instructions arrive from memory PROGRAM: I 0 E C and installing them into the instruction cache. The memory PUSH A . interface protocol is implemented by an independent state PUSH B ; ADD PUSHB PLSHA — machine which arbitrates between on-chip users of the ADD PCPC ADC PUSHB PUSHA memory system and other bus masters. POP C —— poR ADD PUSH Hardware included for testing includes scan-out registers on the microcode ROM and real time observation of the micro — — POPC ADD program counter. All hardware defined registers are acces- _— —_- — POPC sible from high level language primitives. The microcode ROM may be checksummed via the scan-out path while run. Figure 1. Ivory Pipeline Stages ning Lisp. Instructions spend only a single cycle in the first two —- pipeline stages, but can epend an arbitrary number of cycles in the execute stage. Simple instructions, such as “push”, add", and “eq” execute in a single cycle. Conditional (CH2473-7/87/0000/05 12$01.00 © 1987 IEEE STACK CACHE { t f “ ag On Deu! ADORE SS fant ALU CONTROL Logic “acon 3 AT Ba MCROCODE t — * Aon OF peayy Ot * CH Moose ADORESS PCy REGISTERS OSTAUCTION INSTALICTION. 0€cone CACHE PAEFETCHER unr DATA 4 [ Figure 2. Ivory Processor Block Diagram 3. VLSI Implementation 32 bit adders which all have to operate twice every clock cycle. The static adder Provides a good compramise between size and speed, Based on a hit Manchester carry Ivory was designed using a highly integrated proprietary adder it employs a Passaround path controlled by a dis- VLSI design system called NS (5), which ig implemented on tributed pseudo-nMOS lookahead gate to improve speed. The the Symbolics 3600 Lisp machine. This provides a highly bit pitch is 40u while the length of the adder is 232u in 2u interactive design and simulation environment and in ad. Tules. It may be used as a aingle unit or combined into a dition, a symbolic layout methodology that allows the com- carry-select adder. The typical speed for a 2u process ig around 26nS. Extensive use was made of sim various modules, thig capability NS design system. Examples include the ECC generation logic, instruction-decode ro: j A two phase clock is employed which is routed throughout the chip. Registers and latches use nMOS pass transistors with restoring inverterg as the basic storage device, An ex- tra synchronization clock ia used to generate memory timing Transistors(z) Area(%) Relative Density signala. Cortrol 7.§ 13. In line with the ability to retarget the design using the ? Data-path 32.6 17.2 3.3 symbolic layout tools and the sheer impossibility of hand op- Memory 23.9 7.9 5.4 timizing 390,000 transistors, the circuit design of the chip u-Code 37.9 11.4 4.8 was kept fairly simple. The majority of logic in the design Pads 3.6 3.9 .64 Is static - either fully complementary CMOS logic gates, Routing 8 48 Q pseudo nMOS gates for high fan-in NOR gates or nMOS pass transistor logic for selected muxes, Dynamic gates are only used in the microcode ROM, instruction decoder and the RAMs. The RAMsare implemented using a standard 6 transistor static cell, while the CAMs in the design use a 10 transistor static cell, All control logic is in the form of static, fully complementary atandard cells. The symbolic layout for control modules was automatically generated using a standard cell place and route syatem. A standard cell library, memory library and data-path library is used for most of the designs in the chip. Customization of logic and circuita is kept to a minimum to ensure uniformity of performance and reduce the amount of verification required at the circuit level. As an example, the design employs five 513 a i. Feature Review 8:Summary With respect to the requirements of a commercial Lisp ae is paper has presented the micro-are b processor the following reviews the contribution of Ivory ‘the significant technical features of the: i 5. , hardware: ' CPU.It is the first single chip Lisp CPU the features required of next generation Lisp CPUs. ” * Fast call and return - Spvcialized datapaths, paratiel © operations, and fast cycle time support the complex calling strategies required by Lisp, 6. References + Runtime type checking - Parallel tag processor, late- (1] Motorola, “MC88020 32-Bit Microprocessor branch ROM and comprehensive trap User's logic support Manual,” Prentice Hall, 1985. generic arithmetic and pointer manipulation. (2] J. Crawford, “Architecture of the Intel 80368," ICCD * Specialized Lisp operations - Pipelined memory inter- "86, pp. 155-160. face and high level microcoded primitives support ef- {3]_ C. Hansen, D. Freitas, E. Hudson, J. Mouazsouris, ficient implementation of operations auch as CAR and 5. Przybylski, TT. Riordan and ¢. Rowen, "A RISC CDR, Microprocessor with Integral MMU and Cache Interface," ICCD "86, Pp. 145-148. ¢ Garbage Collection - On chip hardware to facilitate ef. {4] D. Moon, "Architecture of the Symbolies 3600," Proceedings ficient GC algorithms suchas the Ephemeral GC [6]. of the [2th Symposium on Computer Architec- ture, June 1985, pp. 76-83. * Virtual Memory Support - On chip translation buffer, (5): J. Cherry, H. Shrobe, N. Mayle, C. Baker, H. , = microcoded cache-misg hackup and the support of K. Reti, and N. Weste, "NS: An Integrated Symbolic Design System,” CDR-coded lists (more compact physical memory VLSI ‘85, E. Hoerbst, ed, Elsevier Science Publishers B. V., North-Helland, Amsterdam, 1986, representation). pp. 325-334. Other features which improve performance and lower sys- {6] D. A Moon. "Garbage Collection in a Large Lisp tem integration cost inciude- System,” 1984 ACM Sympostum on Lisp and Functional Programming, pp. 235-248. « A programmable interleaved memory interface to allow a wide range of memory system speeds and architec- tures to be used - ranging from small high speed caches to four-way interleaved standard MOS memories.

The Symbolics Ivory Processor: a 40 Bit Tagged Architecture Lisp

Genera Workbook Using This Book Preface This Is the Document To

The Evolution of Lisp

Symbolics Architecture

Free As in Freedom (2.0): Richard Stallman and the Free Software Revolution

A Lisp Oriented Architecture by John W.F

CLX — Common LISP X Interface

Genera User's Guide Overview of Symbolics Computers

Genera Concepts Genera the Best Software Environment Available

Common Lisp Object System

Selected Essays of Richard M. Stallman

The Symbolics Virtual Lisp Machine Or Using the Dec Alpha As a Programmable Micro-Engine1

Object Oriented Programming (OOP) Imperative Programming