Digital Semiconductor Alpha 21064A Microprocessor Product Brief

Digital Semiconductor Alpha 21064A Microprocessor Product Brief January 1996 Description Digital’s Alpha 21064A microprocessor is a member of the 21064 family of microprocessors, which all implement Digital’s Alpha architecture. The Alpha 21064A microprocessor is a 0.5-µm, CMOS-based, superscalar, super-pipelined processor using dual instruction issue and frequencies of either 200, 233, 275, or 300 MHz. The Alpha architecture is a 64-bit RISC architecture designed with emphasis on speed, multiple instruction issue, multiple processors, and multiple operating systems. Features •64-bit, fully pipelined, advanced RISC •Selectable data bus width of 64 bits or architecture supported by multiple op- 128 bits erating systems •External cache memory supports: - Microsoft Windows NT - Programmable size (256KB to - Digital UNIX (not supported by 16MB) 21064A-275-PC) - Programmable latency 3 to 16 CPU - OpenVMS (not supported by cycles 21064A-275-PC) - Flexible cache coherency •Sustained leadership performance •Privileged architecture library code - 200-, 233-, 275-, and 300-MHz op- (PALcode) supports: eration - Optimization for multiple operating - Superscalar (dual issue) systems - Peak instruction execution rate of - Flexible memory-management im- 400, 466, 550, or 600 million instruc- plementations tions per second - Multi-instruction atomic sequences - 0.5-µm CMOS technology • • Advanced packaging technology for Pipelined floating-point unit superior thermal-management per- - IEEE single- and double-precision, formance VAX F_floating and G_floating, • longword and quadword data types Onchip 16KB, direct-mapped, write- through data cache with parity - Improved floating-point divide • •Memory management Onchip 16KB direct-mapped instruction cache with parity - Demand-paged • - 12-entry instruction-stream transla- Improved branch prediction logic tion buffer (TB) •Serial ROM (SROM) interface >Eight entries map 8KB pages >Four entries for 4MB pages - Loads instruction cache after reset - 32-entry data-stream TB, each entry - Allows software-controlled serial maps 8KB, 64KB, 512KB, or 4MB port after initialization pages •Onchip parity and ECC generators and •Main memory physically addressed up checkers to 16GB •Chip- and module-level test support •Onchip write buffer with four 32-byte •3.3-V chip I/O supply voltage with entries chip interface directly to 5-V logic •Internal clock generator provides: • TM Software compatible with - High-speed CPU clock Alpha 21064 and Alpha 21066 micro- - Pair of programmable system clocks processors (CPU/2 to CPU/17) •Pin compatible with Alpha 21064 Microarchitecture (DTB); load silo; write buffer; and The Alpha 21064A microprocessor Integer Execution Unit (IEU) —The Dcache interface. The LSU supports uses a 7-stage pipeline for integer op- IEU contains a 64-bit, fully pipelined all integer and floating-point load and erate and memory reference instruc- integer execution data path including store instructions, including address tions, and a 10-stage pipeline for adder, logic box, barrel shifter, byte calculation and translation, and cache floating-point operate instructions. extract and mask, and independent in- control logic. The instruction fetch and decode unit teger multiplier. The IEU also con- (IDU) maintains state for all pipeline tains the 32-entry, 64-bit integer regis- Instruction Cache (Icache)—The stages to track outstanding register ter file (IRF). Icache contains 16KB with parity pro- write transactions. tection. It is a direct-mapped, physi- Floating-Point Unit (FPU) —The cally addressed cache with 32-byte Figure 1 is the Alpha 21064A block FPU contains a fully pipelined blocks. diagram. floating-point unit and independent divider. IEEE single-precision and Data Cache (Dcache) —The Dcache The Alpha 21064A CPU contains: double-precision floating-point data contains 16KB with parity protection. types are supported. VAX F_floating It is a write-through, read-allocate, Instruction Fetch and Decode Unit and G_ floating data types are fully direct-mapped, physically addressed (IDU) —The IDU includes the in- supported, with limited support for the cache with 32-byte blocks. struction translation buffers (ITBs) VAX D_floating data type. The FPU and the branch prediction logic also contains a 32-entry, 64-bit External Cache —The 21064A sup- (branch unit). The IDU performs in- floating-point register file (FRF). ports an optional external flexible struction fetch, resource checks, and cache built from off-the-shelf SRAMs. dual instruction issue to the IEU, LSU, Load and Store Unit (LSU) —The The external cache is a write-back, FPU, or branch unit. The IDU also LSU contains four major sections: ad- read-allocate cache with 8-byte controls pipeline bypasses, stalls, dress translation data path, which in- blocks. This allows each implementa- aborts, and restarts. cludes the data translation buffer tion to select its own external cache speed and configuration. The pro- Figure 1 Alpha 21064A Block Diagram grammable interface supports cache sizes from 256KB to 16MB and la- Instruction Cache tency of 3 to 16 CPU cycles. Branch Tag Data History Table Virtual Address Space —The virtual Address Bus address is a 64-bit unsigned integer that specifies a byte location in the vir- Integer Instruction Floating−Point tual address space. The microproces- Execution Fetch/Decode Unit sor implements a 43-bit subset of the Unit Unit Multiplier/ virtual address space and supports a Multiplier Prefetcher Adder 34-bit, 16GB physical address space. Shifter Pipeline Resource Adder Divider Conflict Data Bus (128 Bits) Serial ROM (SROM) Interface — Logic Box PC This interface loads the instruction Calculation cache after reset. It has a software- ITB Bus controlled serial port, which is acti- Pipeline Interface Integer Floating−Point Unit vated after initialization. The SROM Register Control Register File File supports system-level design debug- ging. External Cache Control Load/Store Unit Write Address DTB Load Silo Buffer Generator External System Interface Data Cache Tag Data Alpha Architecture Summary As implemented in the Alpha 21064A, absolute or PC-relative jump using an PALcode Alpha architecture supports: arbitrary 64-bit register value. They can update a destination register with PALcode is a privileged layer of soft- • A fixed 32-bit instruction size a return value. ware that automatically performs such functions as the dispatching and serv- • Separate integer and floating-point Load and Store Instructions—These icing of interrupts, exceptions, task registers instructions can move either 32-bit or switching, and additional privileged and unprivileged user instructions as - Thirty-two 64-bit integer registers 64-bit quantities; 8-bit and 16-bit load and store operations are supported specified by operating systems using - Thirty-two 64-bit floating-point reg- through an extensive set of in-register the CALL_PAL instruction. isters byte manipulations. The I/O bus di- PALcode is the only method of per- • rectly supports byte and word opera- 32-bit (longword) and 64-bit tions in hardware. forming some operations on the hard- (quadword) integer data types ware. In addition to the instructions • 32-bit and 64-bit IEEE and VAX Operate Instructions —Integer oper- defined by the Alpha architecture, a floating-point data types ate instructions manipulate full 64-bit set of implementation-specific instruc- values and include a full complement tions is provided. • Memory accesses using a 64-bit vir- of arithmetic, compare, logical, and tual byte address shift instructions. There are also three PALcode runs in an environment with privileges enabled, instruction stream • Privileged architecture library code 32-bit integer operations: add, subtract, and multiply. mapping disabled, and interrupts (PALcode) disabled. Disabling memory mapping In addition to conventional RISC op- allows PALcode to support functions Instruction Set eration, the instruction set provides such as translation buffer miss rou- scaled add and subtract for quick sub- tines. Disabling interrupts allows the All instructions are 32 bits long and script calculation, 128-bit multiply for instruction stream to provide multi- use one of four different instruction multi-precision arithmetic and division instruction sequences as atomic opera- formats (see Figure 2). Each format by a constant, conditional moves for tions. uses a 6-bit opcode and 0, 1, 2, or 3 avoiding branches, and an extensive 5-bit register fields. set of in-register byte manipulation instructions. Memory Management CALL_PAL Instructions—These instructions vector to a privileged layer Floating-point operate instructions in- The memory-management architecture of software that atomically performs clude four complete sets of instruc- provides: both privileged and unprivileged functions for IEEE single-precision, IEEE tions. double-precision, VAX F_floating, • A large address space for instructions and VAX G_floating arithmetic. In and data Branch Instructions—Conditional addition to arithmetic instructions, branch instructions test a register for • Convenient and efficient sharing of in- there are instructions for conversions structions and data positive/negative, zero/nonzero, or between floating and integer values, even/odd conditions, and perform a including the VAX D_floating data • Independent read and write access PC-relative branch. Unconditional type. protection branch instructions perform either an • Flexibility through programmable Figure 2 Alpha Instruction Formats PALcode

Load more