Digital Semiconductor Alpha 21164 Microprocessor Product Brief
Total Page:16
File Type:pdf, Size:1020Kb
Digital Semiconductor Alpha 21164 Microprocessor Product Brief March 1995 Description The Alpha 21164 microprocessor is a high-performance implementation of Digi- tal’s Alpha architecture designed for application servers and high-performance clients. It has a superscalar design capable of issuing four instructions every clock cycle. The integration of an instruction cache, data cache, and second-level cache offers unrivaled microprocessor performance. The 21164 uses a high-performance interface to access main memory, data buses, and an optional board-level cache. Features • Fully pipelined 64-bit advanced RISC • Onchip, 96KB, 3-way, set-associative (reduced instruction set computing) write-back L2 unified instruction and architecture supports multiple operat- data cache ing systems, including: • Onchip write buffer with six fully - Microsoft Windows NT associative 32-byte entries - Digital UNIX • High-performance interface - OpenVMS - 128-bit memory data path • Best-in-class performance - Selectable error correction code - 266 through 300 MHz operation (ECC) or parity protection on data - 290 through 330 SPECint (est.) - 40-bit addressing - 440 through 500 SPECfp (est.) - Programmable interface timing - Superscalar (4-way instruction issue) - Two outstanding load instructions - Peak instruction execution rate of - Control for optional offchip L3 over 1200 million instructions per cache second - Synchronous/asynchronous RAM - 0.50 µm CMOS technology support • Pipelined (9-stage) floating-point unit - Programmable cache block size - IEEE single- and double-precision - Programmable cache speed, one- operation third to one-fifteenth of clock speed - VAX F_floating and G_floating data • Programmable performance counters types to measure CPU and system perform- - Longword and quadword data types ance • Pipelined (7-stage) integer unit • Serial ROM interface • Memory-management unit - Loads instruction cache after reset - - Demand-paged memory manage- Allows software-controlled serial ment port after initialization • - 48-entry, fully associative instruction Chip- and module-level test supports translation buffer JTAG (IEEE 1149.1) - 64-entry, fully associative data • 3.3-V I/O supply voltage with chip translation buffer interface directly to 5-V logic - Each translation buffer entry able to • 499-pin ceramic interstitial pin grid ar- map 1, 8, 64, or 512 8KB pages; ray (IPGA) package each entry supports all four granular- ity hint bit combinations • Privileged architecture library code (PALcode) supports • Onchip, 8KB, direct-mapped, write- TM through L1 data cache - Optimization for multiple operating systems • Onchip, 8KB, direct-mapped L1 instruction cache - Flexible memory-management implementations - Multi-instruction atomic sequences Alpha 21164 Microarchitecture The 21164 microprocessor consists Memory-Management Unit —The Data Cache —The data cache is a of five independent functional units: memory-management unit processes dual-ported, 8KB, write-through, the instruction fetch, decode, and all load and store instructions; two read-allocate, direct-mapped, physi- branch unit; the integer execution load instructions can be executed in cally addressed cache with 32-byte unit; the memory-management unit; parallel. (The data cache is dual blocks. the cache control and bus interface ported to support this.) Up to 21 unit; and the floating-point unit. load instructions can be in progress L2 Cache —The onchip L2 cache is There are three onchip caches: the in- at any time. The memory-manage- a 96KB, 3-way, set-associative, struction cache, the data cache, and ment unit also manages the data physical, write-back, write-allocate, the second-level cache. cache and logic that serializes and data and instruction cache. The cache merges load instructions that miss is fully pipelined and supports both Pipeline Organization —The mi- the data cache. 32-byte and 64-byte blocks. croprocessor uses a 7-stage pipeline for integer operate and memory ref- Cache Control and Bus Interface Optional Offchip L3 Cache —The erence instructions. It uses a 9-stage Unit —The cache control and bus 21164 supports and fully manages a pipeline for floating-point operate in- interface unit processes all accesses direct-mapped external backup cache structions. The instruction unit main- sent by the memory-management (optional). This cache is a physical, tains state for all pipeline stages to unit and implements all memory- write-back, write-allocate cache with track outstanding register write trans- related external interface functions. 32-byte or 64-byte blocks. The L3 actions. The cache control and bus interface cache controller supports synchro- unit also manages all cache coher- nous and asynchronous cache RAMs. Instruction Fetch, Decode, and ence protocol functions. It controls Wave pipelining can be used with Branch Unit —The instruction unit the L2 cache and the optional off- asynchronous RAMs in the 64-byte fetches, decodes, and issues instruc- chip L3 cache. block mode. It is a mixed data and tions to the integer unit, memory- instruction cache. The user can select management unit, and floating-point Floating-Point Unit —The floating- an L3 cache size of 1, 2, 4, 8, 16, 32, unit. It manages the pipelines, the point unit contains a floating-point or 64MB. program counter (PC), the instruction multiply pipeline and a floating- cache, prefetching, and instruction point add pipeline. (Divides are as- Big Endian Support —The 21164 stream memory management. The in- sociated with the add pipeline but provides limited support for big struction unit can decode up to four are not pipelined themselves.) IEEE endian data formats. One mode in- instructions in parallel and check that S_ and T_floating data types are verts physical bit <2> for all long- required resources are available for supported with all rounding modes. word references. Another mode com- each. VAX F_floating and G_ floating plements operand Rbr <2:0> on data types are fully supported. VAX EXTxy, INSxy and MSKxy instruc- Integer Execution Unit —The inte- D_floating data types are partially tions. ger execution unit contains two supported. 64-bit integer pipelines. Results of Virtual Address Space —The vir- most integer operations are available Instruction Cache —The instruc- tual address is a 64-bit unsigned inte- for use by subsequent instructions. tion cache is an 8KB virtual direct- ger that specifies a byte location in The integer unit also partially exe- mapped cache with 32-byte blocks. the virtual address space. The micro- cutes all memory instructions by cal- processor implements a 43-bit subset culating the effective address. of the virtual address space and sup- ports a 40-bit, 1-terabyte physical ad- dress space. Thermal Management The 21164 dissipates approximately 50 watts. Conventional forced air cooling methods are sufficient to remove heat and maintain the highest levels of reliabil- ity. The user may also define an application-specific heat sink. Digital specifies two separate heat sinks for this device. Both are specified to 1000 LFM. One heat sink has mounting holes in line with the cooling fins and is approximately 6.6 × 6.6 × 3.3 cm (2.6 × 2.6 × 1.3 in). The other heat sink has mounting holes rotated 90 degrees from the cooling fins and measures approximately 7.6 × 7.6 × 4.6 cm (3 × 3 × 1.8 in). Alpha Architecture Summary As implemented in the 21164, the Al- Load/Store Instructions — These in- PALcode pha architecture supports: structions can move either 32-bit or • A fixed 32-bit instruction size 64-bit quantities. Eight-bit and PALcode is a privileged layer of soft- 16-bit load/store operations are sup- ware that automatically performs such • Separate integer and floating-point ported through an extensive set of in- functions as the dispatching and serv- registers register byte manipulations. The I/O icing of interrupts, exceptions, task - Thirty-two 64-bit integer registers bus directly supports byte and word switching, and additional privileged - Thirty-two 64-bit floating-point reg- operations in hardware. and unprivileged user instructions as isters specified by operating systems using Operate Instructions — Integer op- the CALL_PAL • 32-bit (longword) and 64-bit erate instructions manipulate full instruction. (quadword) integer data types 64-bit values and include a full com- • 32-bit and 64-bit IEEE and VAX plement of arithmetic, compare, logi- PALcode is the only method of per- floating-point data types cal, and shift instructions. There are forming some operations on the hard- also three 32-bit integer operates: add, ware. In addition to the instructions • Memory accesses, using a 64-bit vir- subtract, and multiply. defined by the Alpha architecture, a tual byte address set of implementation-specific instruc- • In addition to conventional RISC op- tions is provided. Privileged architecture library code eration, the instruction set provides (PALcode) scaled add/subtract for quick subscript PALcode runs in an environment with calculation; 128-bit multiply for multi- privileges enabled, instruction stream Instruction Set precision arithmetic and division by a mapping disabled, and interrupts dis- constant; conditional moves for avoid- abled. Disabling memory mapping al- All instructions are 32 bits long and ing branches; and an extensive set of lows PALcode to support functions use one of four different instruction in-register byte manipulation instruc- such as TB-miss routines. Disabling formats. Each format uses a 6-bit tions. interrupts allows the instruction opcode