Digital Semiconductor Product Brief

March 1995

Description

The Alpha 21164 microprocessor is a high-performance implementation of Digi- tal’s Alpha architecture designed for application servers and high-performance clients. It has a superscalar design capable of issuing four instructions every clock cycle. The integration of an instruction , data cache, and second-level cache offers unrivaled microprocessor performance. The 21164 uses a high-performance interface to access main memory, data buses, and an optional board-level cache. Features • Fully pipelined 64-bit advanced RISC • Onchip, 96KB, 3-way, set-associative (reduced instruction set computing) write-back L2 unified instruction and architecture supports multiple operat- data cache ing systems, including: • Onchip write buffer with six fully - Microsoft Windows NT associative 32-byte entries - Digital UNIX • High-performance interface - OpenVMS - 128-bit memory data path • Best-in-class performance - Selectable error correction code - 266 through 300 MHz operation (ECC) or parity protection on data - 290 through 330 SPECint (est.) - 40-bit addressing - 440 through 500 SPECfp (est.) - Programmable interface timing - Superscalar (4-way instruction issue) - Two outstanding load instructions - Peak instruction execution rate of - Control for optional offchip L3 over 1200 million instructions per cache second - Synchronous/asynchronous RAM - 0.50 µm CMOS technology support • Pipelined (9-stage) floating-point unit - Programmable cache block size - IEEE single- and double-precision - Programmable cache speed, one- operation third to one-fifteenth of clock speed - VAX F_floating and G_floating data • Programmable performance counters types to measure CPU and system perform- - Longword and quadword data types ance • Pipelined (7-stage) integer unit • Serial ROM interface • Memory-management unit - Loads instruction cache after reset - - Demand-paged memory manage- Allows software-controlled serial ment port after initialization • - 48-entry, fully associative instruction Chip- and module-level test supports translation buffer JTAG (IEEE 1149.1) - 64-entry, fully associative data • 3.3-V I/O supply voltage with chip translation buffer interface directly to 5-V logic - Each translation buffer entry able to • 499-pin ceramic interstitial pin grid ar- map 1, 8, 64, or 512 8KB pages; ray (IPGA) package each entry supports all four granular- ity hint bit combinations • Privileged architecture library code (PALcode) supports • Onchip, 8KB, direct-mapped, write- TM through L1 data cache - Optimization for multiple operating systems • Onchip, 8KB, direct-mapped L1 instruction cache - Flexible memory-management implementations - Multi-instruction atomic sequences Alpha 21164 Microarchitecture The 21164 microprocessor consists Memory-Management Unit —The Data Cache —The data cache is a of five independent functional units: memory-management unit processes dual-ported, 8KB, write-through, the instruction fetch, decode, and all load and store instructions; two read-allocate, direct-mapped, physi- branch unit; the integer execution load instructions can be executed in cally addressed cache with 32-byte unit; the memory-management unit; parallel. (The data cache is dual blocks. the cache control and bus interface ported to support this.) Up to 21 unit; and the floating-point unit. load instructions can be in progress L2 Cache —The onchip L2 cache is There are three onchip caches: the in- at any time. The memory-manage- a 96KB, 3-way, set-associative, struction cache, the data cache, and ment unit also manages the data physical, write-back, write-allocate, the second-level cache. cache and logic that serializes and data and instruction cache. The cache merges load instructions that miss is fully pipelined and supports both Pipeline Organization —The mi- the data cache. 32-byte and 64-byte blocks. croprocessor uses a 7-stage pipeline for integer operate and memory ref- Cache Control and Bus Interface Optional Offchip L3 Cache —The erence instructions. It uses a 9-stage Unit —The cache control and bus 21164 supports and fully manages a pipeline for floating-point operate in- interface unit processes all accesses direct-mapped external backup cache structions. The instruction unit main- sent by the memory-management (optional). This cache is a physical, tains state for all pipeline stages to unit and implements all memory- write-back, write-allocate cache with track outstanding register write trans- related external interface functions. 32-byte or 64-byte blocks. The L3 actions. The cache control and bus interface cache controller supports synchro- unit also manages all cache coher- nous and asynchronous cache RAMs. Instruction Fetch, Decode, and ence protocol functions. It controls Wave pipelining can be used with Branch Unit —The instruction unit the L2 cache and the optional off- asynchronous RAMs in the 64-byte fetches, decodes, and issues instruc- chip L3 cache. block mode. It is a mixed data and tions to the integer unit, memory- instruction cache. The user can select management unit, and floating-point Floating-Point Unit —The floating- an L3 cache size of 1, 2, 4, 8, 16, 32, unit. It manages the pipelines, the point unit contains a floating-point or 64MB. program counter (PC), the instruction multiply pipeline and a floating- cache, prefetching, and instruction point add pipeline. (Divides are as- Big Endian Support —The 21164 stream memory management. The in- sociated with the add pipeline but provides limited support for big struction unit can decode up to four are not pipelined themselves.) IEEE endian data formats. One mode in- instructions in parallel and check that S_ and T_floating data types are verts physical bit <2> for all long- required resources are available for supported with all rounding modes. word references. Another mode com- each. VAX F_floating and G_ floating plements operand Rbr <2:0> on data types are fully supported. VAX EXTxy, INSxy and MSKxy instruc- Integer Execution Unit —The inte- D_floating data types are partially tions. ger execution unit contains two supported. 64-bit integer pipelines. Results of —The vir- most integer operations are available Instruction Cache —The instruc- tual address is a 64-bit unsigned inte- for use by subsequent instructions. tion cache is an 8KB virtual direct- ger that specifies a byte location in The integer unit also partially exe- mapped cache with 32-byte blocks. the virtual address space. The micro- cutes all memory instructions by cal- processor implements a 43-bit subset culating the effective address. of the virtual address space and sup- ports a 40-bit, 1-terabyte physical ad- dress space. Thermal Management The 21164 dissipates approximately 50 watts. Conventional forced air cooling methods are sufficient to remove heat and maintain the highest levels of reliabil- ity. The user may also define an application-specific heat sink. Digital specifies two separate heat sinks for this device. Both are specified to 1000 LFM. One heat sink has mounting holes in line with the cooling fins and is approximately 6.6 × 6.6 × 3.3 cm (2.6 × 2.6 × 1.3 in). The other heat sink has mounting holes rotated 90 degrees from the cooling fins and measures approximately 7.6 × 7.6 × 4.6 cm (3 × 3 × 1.8 in). Alpha Architecture Summary As implemented in the 21164, the Al- Load/Store Instructions — These in- PALcode pha architecture supports: structions can move either 32-bit or • A fixed 32-bit instruction size 64-bit quantities. Eight-bit and PALcode is a privileged layer of soft- 16-bit load/store operations are sup- ware that automatically performs such • Separate integer and floating-point ported through an extensive set of in- functions as the dispatching and serv- registers register byte manipulations. The I/O icing of interrupts, exceptions, task - Thirty-two 64-bit integer registers bus directly supports byte and word switching, and additional privileged - Thirty-two 64-bit floating-point reg- operations in hardware. and unprivileged user instructions as isters specified by operating systems using Operate Instructions — Integer op- the CALL_PAL • 32-bit (longword) and 64-bit erate instructions manipulate full instruction. (quadword) integer data types 64-bit values and include a full com- • 32-bit and 64-bit IEEE and VAX plement of arithmetic, compare, logi- PALcode is the only method of per- floating-point data types cal, and shift instructions. There are forming some operations on the hard- also three 32-bit integer operates: add, ware. In addition to the instructions • Memory accesses, using a 64-bit vir- subtract, and multiply. defined by the Alpha architecture, a tual byte address set of implementation-specific instruc- • In addition to conventional RISC op- tions is provided. Privileged architecture library code eration, the instruction set provides (PALcode) scaled add/subtract for quick subscript PALcode runs in an environment with calculation; 128-bit multiply for multi- privileges enabled, instruction stream Instruction Set precision arithmetic and division by a mapping disabled, and interrupts dis- constant; conditional moves for avoid- abled. Disabling memory mapping al- All instructions are 32 bits long and ing branches; and an extensive set of lows PALcode to support functions use one of four different instruction in-register byte manipulation instruc- such as TB-miss routines. Disabling formats. Each format uses a 6-bit tions. interrupts allows the instruction opcode and zero, one, two, or three stream to provide multi-instruction 5-bit register fields. Floating-point operate instructions in- sequences as atomic operations. clude four complete sets of instruc- CALL_PAL Instructions — These tions for IEEE single-precision, IEEE Memory Management instructions vector to a privileged double-precision, VAX F_floating, layer of software that atomically per- and VAX G_floating arithmetic. In The memory-management architecture forms both privileged and unprivi- addition to arithmetic instructions, provides: leged functions. there are instructions for conversions between floating and integer values, • A large address space for instructions Branch Instructions — Conditional including the VAX D_floating data and data branch instructions test a register for type. positive/negative, zero/non-zero, or • Convenient and efficient sharing of in- even/odd, and perform a PC-relative structions and data branch. Unconditional branch instruc- • tions perform either an absolute or Independent read and write access PC-relative jump, using an arbitrary protection 64-bit register value. Unconditional • Flexibility through programmable branch instructions can update a desti- PALcode support nation register with a return value. Alpha Architecture Compared to Conventional RISC Architecture

Feature Alpha Architecture Difference 64-bit architecture A true 64-bit architecture with 64-bit data and address — not a 32-bit architecture expanded to 64 bits.

High speed Suited to very high speed implementations. All instructions are simple and 32 bits in length. Memory operations are either load or store instructions. All data ma- nipulation is done between registers. There are no implementation-specific pipeline timing hazards, no load delay slots, and no branch delay slots.

Multiple operating systems The user can implement a privileged layer of software (PALcode) for operations specific to an . This allows the processor to run OpenVMS by us- ing one version of the privileged software, run OSF/1 by using a second version, or run Microsoft Windows NT by using a third version. Additional operating sys- tem implementations can be efficiently supported.

String manipulation The Alpha architecture has a powerful set of instructions that support dramatic performance improvements in string manipulation. These instructions allow string manipulation to take place in-register 8 bytes at a time. Using these instructions eliminates performance penalties of an 8-byte memory transfer required in other RISC architectures.

Optimized arithmetic performance The Alpha architecture provides the programmer with flexibility in optimizing arithmetic performance. Most conventional RISC processors require precise ex- ception handling for arithmetic traps. This creates inherent bottlenecks in program execution. With Alpha, exceptions are imprecise by default, improving arithmetic performance. The programmer can select precise exceptions by using the trap bar- rier instruction.

HINTS A number of implementation-specific HINTS are included for higher perform- ance. Software can provide HINTS to the hardware for optimum operation. HINTS can help improve use of the pipeline, cache memory, and translation lookaside buffers. Characteristics For More Information Characteristic Specification To learn more about the availability of Power supply Vss 0.0 V, Vdd 3.3 V ± 5% the Alpha 21164 microprocessor, con- tact your local semiconductor distribu- Storage temperature range –55°C to 125°C tor. To learn more about Digital Semi- conductor’s product portfolio, contact Power dissipation @ Vdd = 3.45 V and 50 W the Digital Semiconductor Informa- Frequency = 300 MHz tion Line: Package 499-pin ceramic IPGA 1-800-332-2717 1-800-332-2515 (TTY) Outside North America, call: +1-508-568-6868

While Digital believes the information in this publication is correct as of the date of publica- tion, it is subject to change without notice. © Digital Equipment Corporation 1994, 1995. EC-QAENB-TE All rights reserved. Printed in U.S.A. Digital, OpenVMS, VAX, the AlphaGeneration design mark, and the DIGITAL logo are trade- marks of Digital Equipment Corporation. Digital Semiconductor is a Digital Equipment Corporation business. IEEE is a registered trademark of The Institute of Electrical and Electronics Engineers, Inc. Microsoft is a registered trademark and Windows NT is a trademark of Microsoft Cor- poration.; UNIX is a registered trademark in the United States and other countries, licensed ex- clusively through X/Open Company, Ltd.; X/Open is a trademark of X/Open Company Limited. All other trademarks and registered trademarks are the property of their respective holders.