SESSION 11: MlCROPROCESSORS/

WAM 2.6: A 32b CMOS VLSI with On-Chip Virtual Pflemory Management

Yoichi Yano, Jyun-ichi Iwasaki, YoshikuniSato, Toshiki Iwata, Katsuhiko Nakagawa, Masahiro Ueda

NEC Microcomputer Products Division

Kawasaki, Japan

THIS PAPER WILL DESCRIBE a single chip 32b CMOS VLSI micro- addition, the ALU implements the second-order Booth's algorithm which which integrates the virtual for enables a 16-clock multiplication for 32b data. A provider demand-paging (4Kb page size) and the floating-point operations that various shift operations in 1511s (typical), including the logical-shift, are compatible with the IEEE 754 Floating-Point Standard. arithmetic-shift, and rotation-with-carry. A 191,808b micro-program The chip microphotograph is shown in Figure 1. It has been imple- ROM has been implemented by employing a ratio circuitry for the mented by using a double-metal layer CMOS process technology with address decoder and current-mirror-typesense-amplifiers for output 1.5pm design rule to integrate 375,000 transistors on a single-chip. It drivers to meet the speed requirements. This realizes 23ns access time operates at 16MHz, and consumes 1.5". (typical). The processor has six independently-operational function-units A 16MHz clock is internally distributed to minimize the clock-skew. that form a pipeline structure, as shown in Figures 2 and 3. The PFU Each clock line - for phi1 and pbi2, respectively - is driven by two (Prefetch Unit) prefetches instructions into a 16-byteprefetch queue. clock drivers which are located in opposite sides of the chip to allow The IDU (Instruction Decode Unit) decodes the instructions, and sets unskewed clock timing over the whole chip. This method affords a commands into a twowords by 53b decoded instruction queue (IDQ). 2ns delay for the riaing/falling edge (typical). The EAG (Effective Address Generator) calculates the operand address, For chip-testing, large PLAs have been designed to be testable and while the MMU (Memory Management Unit) translates virtual address observable by setting test-pattern data into the inputsof PLAs from into real address. A BCU (Bus ) initiates memoryaccess external pins, to generate signatures in the horizontal-direction of the for instruction/data fetch. The EXL () carries out the PLA. The test pattern data are designed to produce the signature to be instruction-set function. zero, and the signature can be read through pin-outs. Also, most of the The integrated memorymanagement unit (MMU) has a 16-entry PLAs have condensed-cutput lines which generate a compressed value full associative Translation Look-aside Buffer (TLB) and a protection of the output that can be observed through pin-outs. To test large check circuitry. The TLB holds sixteen virtual-to-real address pairs decoders, special sense-amplifiers have been attached to detectexcessive in full associative manner, each consists of a 21b contents addressable current flow that is typical in stuck-at faults, such as multiple-selection memory (CAM) for virtual address tag and a 28b data memoryfor real defects. For testing, a microword can be set from an address. The TLB can translate the virtual address to real address in external pin serially, and all the microcode ROM contents can be 36ns in worst case. read externally. The execution unit (EXU)is a microprogrammed 32b data path The microprocessor has 273 instructions in 119 types. The chip processor which has thirty-two 32bgeneral-purpose registers, sixteen has a 13.92mm x 13.80mm diesize, and is housed in a 68-pinPin Grid 32b scratch-pad registers, a 64b barrel shifter, a 32b arithmeticlogic Array (PGA) package with non-multiplexed 16b databus and 24b unit (ALU); and a couple of control registers. Three data-buses that address. It can execute 3.5MIPS (Million machine instructions per are running across the chip connect theregisters, the shifter, and the second) at 16MIlz operation. ALU. General purpose registers are configured as a three-ported , in which two ports are connected to two data-buses for Acknowledgments instruction execution and the otheris for preprocessing in effective Authors would like to thank M. Suzuki, T. Furuhashi, and address calculation by EAG. The ALU has a carry-look-ahead (CLA) M. Mimno for their helpful suggestions. They are also indebted to circuitry which is based on the ratio circuitry, instead of the standard €1. Sasaki, K. Kani, J. Takashima, A. Morino and Ii. Yamamoto for ratio-less implementation, to meet the speed requirements. The CLA their support throughout the project. circuit realizes 13ns propagation time (typical) in 32b operations. In

[See page 296 for Figure 1.1 Microcode ROM

Control General-purpose Registers PC EAG I (Data; :&"I 's,", 7

FIGURE 2-Block diagram of the CPU. FIGURE 3-Pipeline organization.

General-Purpose Registers 32 (32b) Instructions 273 instructions in 11 9 types Virtual Address Space 4Gb per a space Translation Look-Aside Buffer 16-entry Full-Associative 32b Progam Counter 4 326 AdderlALUs 5 64b Barrel Shifter 1 PI,As 31 Microcode ROM 191,80833 Process Double-metal lkyer CMOS Design Rule 1.Spm Transistors 375,000 Chip Size 13.92mm x 13.80mm 16RIHz Ilficrocycle 62.5ns Power Dissipation 1.5W Power Supply +5V Package 68-pin PGA Performance 3.5MIPS

TABLE 1-Features of the CPU. FIGURE 1-Microphotograph of the chip.