Digital Semiconductor Alpha 21064A Product Brief

January 1996

Description

Digital’s Alpha 21064A microprocessor is a member of the 21064 family of , which all implement Digital’s Alpha architecture. The Alpha 21064A microprocessor is a 0.5-µm, CMOS-based, superscalar, super-pipelined processor using dual instruction issue and frequencies of either 200, 233, 275, or 300 MHz. The Alpha architecture is a 64-bit RISC architecture designed with emphasis on speed, multiple instruction issue, multiple processors, and multiple operating systems.

Features •64-bit, fully pipelined, advanced RISC •Selectable data width of 64 bits or architecture supported by multiple op- 128 bits erating systems •External cache memory supports: - Microsoft Windows NT - Programmable size (256KB to - Digital UNIX (not supported by 16MB) 21064A-275-PC) - Programmable latency 3 to 16 CPU - OpenVMS (not supported by cycles 21064A-275-PC) - Flexible cache coherency •Sustained leadership performance •Privileged architecture library code - 200-, 233-, 275-, and 300-MHz op- (PALcode) supports: eration - Optimization for multiple operating - Superscalar (dual issue) systems - Peak instruction execution rate of - Flexible memory-management im- 400, 466, 550, or 600 million instruc- plementations tions per second - Multi-instruction atomic sequences - 0.5-µm CMOS technology • • Advanced packaging technology for Pipelined floating-point unit superior thermal-management per- - IEEE single- and double-precision, formance VAX F_floating and G_floating, • longword and quadword data types Onchip 16KB, direct-mapped, write- through data cache with parity - Improved floating-point divide • •Memory management Onchip 16KB direct-mapped instruc- tion cache with parity - Demand-paged • - 12-entry instruction-stream transla- Improved branch prediction logic tion buffer (TB) •Serial ROM (SROM) interface >Eight entries map 8KB pages >Four entries for 4MB pages - Loads instruction cache after reset - 32-entry data-stream TB, each entry - Allows software-controlled serial maps 8KB, 64KB, 512KB, or 4MB port after initialization pages •Onchip parity and ECC generators and •Main memory physically addressed up checkers to 16GB •Chip- and module-level test support •Onchip write buffer with four 32-byte •3.3-V chip I/O supply voltage with entries chip interface directly to 5-V logic •Internal clock generator provides: • TM Software compatible with - High-speed CPU clock Alpha 21064 and Alpha 21066 micro- - Pair of programmable system clocks processors (CPU/2 to CPU/17) •Pin compatible with Alpha 21064

Microarchitecture (DTB); load silo; write buffer; and The Alpha 21064A microprocessor Integer Execution Unit (IEU) —The Dcache interface. The LSU supports uses a 7-stage pipeline for integer op- IEU contains a 64-bit, fully pipelined all integer and floating-point load and erate and memory reference instruc- integer execution data path including store instructions, including address tions, and a 10-stage pipeline for , logic box, , byte calculation and translation, and cache floating-point operate instructions. extract and mask, and independent in- control logic. The instruction fetch and decode unit teger multiplier. The IEU also con- (IDU) maintains state for all pipeline tains the 32-entry, 64-bit integer regis- Instruction Cache (Icache)—The stages to track outstanding register ter file (IRF). Icache contains 16KB with parity pro- write transactions. tection. It is a direct-mapped, physi- Floating-Point Unit (FPU) —The cally addressed cache with 32-byte Figure 1 is the Alpha 21064A block FPU contains a fully pipelined blocks. diagram. floating-point unit and independent di- vider. IEEE single-precision and Data Cache (Dcache) —The Dcache The Alpha 21064A CPU contains: double-precision floating-point data contains 16KB with parity protection. types are supported. VAX F_floating It is a write-through, read-allocate, Instruction Fetch and Decode Unit and G_ floating data types are fully direct-mapped, physically addressed (IDU) —The IDU includes the in- supported, with limited support for the cache with 32-byte blocks. struction translation buffers (ITBs) VAX D_floating data type. The FPU and the branch prediction logic also contains a 32-entry, 64-bit External Cache —The 21064A sup- (branch unit). The IDU performs in- floating-point (FRF). ports an optional external flexible struction fetch, resource checks, and cache built from off-the-shelf SRAMs. dual instruction issue to the IEU, LSU, Load and Store Unit (LSU) —The The external cache is a write-back, FPU, or branch unit. The IDU also LSU contains four major sections: ad- read-allocate cache with 8-byte controls pipeline bypasses, stalls, dress translation data path, which in- blocks. This allows each implementa- aborts, and restarts. cludes the data translation buffer tion to select its own external cache speed and configuration. The pro- Figure 1 Alpha 21064A Block Diagram grammable interface supports cache sizes from 256KB to 16MB and la- Instruction Cache tency of 3 to 16 CPU cycles. Branch Tag Data History Table Virtual Address Space —The virtual Address Bus address is a 64-bit unsigned integer that specifies a byte location in the vir- Integer Instruction Floating−Point tual address space. The microproces- Execution Fetch/Decode Unit sor implements a 43-bit subset of the Unit Unit Multiplier/ virtual address space and supports a Multiplier Prefetcher Adder 34-bit, 16GB physical address space. Shifter Pipeline Resource Adder Divider Conflict Data Bus (128 Bits) Serial ROM (SROM) Interface — Logic Box PC This interface loads the instruction Calculation cache after reset. It has a software- ITB Bus controlled serial port, which is acti- Pipeline Interface Integer Floating−Point Unit vated after initialization. The SROM Register Control Register File File supports system-level design debug- ging. External Cache Control

Load/Store Unit

Write Address DTB Load Silo Buffer Generator

External System Interface Data Cache

Tag Data Alpha Architecture Summary

As implemented in the Alpha 21064A, absolute or PC-relative jump using an PALcode Alpha architecture supports: arbitrary 64-bit register value. They can update a destination register with PALcode is a privileged layer of soft- • A fixed 32-bit instruction size a return value. ware that automatically performs such functions as the dispatching and serv- • Separate integer and floating-point Load and Store Instructions—These icing of interrupts, exceptions, task registers instructions can move either 32-bit or switching, and additional privileged and unprivileged user instructions as - Thirty-two 64-bit integer registers 64-bit quantities; 8-bit and 16-bit load and store operations are supported specified by operating systems using - Thirty-two 64-bit floating-point reg- through an extensive set of in-register the CALL_PAL instruction. isters byte manipulations. The I/O bus di- PALcode is the only method of per- • rectly supports byte and word opera- 32-bit (longword) and 64-bit tions in hardware. forming some operations on the hard- (quadword) integer data types ware. In addition to the instructions • 32-bit and 64-bit IEEE and VAX Operate Instructions —Integer oper- defined by the Alpha architecture, a floating-point data types ate instructions manipulate full 64-bit set of implementation-specific instruc- values and include a full complement tions is provided. • Memory accesses using a 64-bit vir- of arithmetic, compare, logical, and tual byte address shift instructions. There are also three PALcode runs in an environment with privileges enabled, instruction stream • Privileged architecture library code 32-bit integer operations: add, sub- tract, and multiply. mapping disabled, and interrupts (PALcode) disabled. Disabling memory mapping In addition to conventional RISC op- allows PALcode to support functions Instruction Set eration, the instruction set provides such as translation buffer miss rou- scaled add and subtract for quick sub- tines. Disabling interrupts allows the All instructions are 32 bits long and script calculation, 128-bit multiply for instruction stream to provide multi- use one of four different instruction multi-precision arithmetic and division instruction sequences as atomic opera- formats (see Figure 2). Each format by a constant, conditional moves for tions. uses a 6-bit opcode and 0, 1, 2, or 3 avoiding branches, and an extensive 5-bit register fields. set of in-register byte manipulation in- structions. Memory Management CALL_PAL Instructions—These in- structions vector to a privileged layer Floating-point operate instructions in- The memory-management architecture of software that atomically performs clude four complete sets of instruc- provides: both privileged and unprivileged func- tions for IEEE single-precision, IEEE tions. double-precision, VAX F_floating, • A large address space for instructions and VAX G_floating arithmetic. In and data Branch Instructions—Conditional addition to arithmetic instructions, branch instructions test a register for • Convenient and efficient sharing of in- there are instructions for conversions structions and data positive/negative, zero/nonzero, or between floating and integer values, even/odd conditions, and perform a including the VAX D_floating data • Independent read and write access PC-relative branch. Unconditional type. protection branch instructions perform either an • Flexibility through programmable Figure 2 Alpha Instruction Formats PALcode support

CALL_PAL Opcode Function

Branch Opcode Ra Displacement

Load/Store Opcode Ra Rb Displacement

Operate Opcode RaRb Function Rc 31 26 25 212016 15 5 4 0

Adding the Alpha 21064A to Your Design Strategy The Alpha 21064A reduces the overall cost of system development, allows for easier system design, and provides the benefits of high performance by bringing the Alpha architecture to systems spanning the range from high-end desktop to enterprise-wide servers. By using the Alpha 21064A, your designs can benefit in the following ways:

Easy Migration of Alpha 21064 Flexibility of Design Module The 21064A interfaces directly to The 21064A is pin compatible with components operating at a supply volt- the existing family of Alpha 21064 age of either 3.3 V or 5 V. This means products. This allows current that you can use the standard compo- Alpha 21064 customers to improve nents that best fit your applications performance significantly, without the and market demands. inherent costs associated with redes- igning motherboards. ISO 9002 Certification All of Digital’s semiconductor manu- The 21064A system clock frequency facturing sites adhere to the stringent is a programmable divider of 2 to 17 ISO 9002 standards, and are ISO 9002 of the CPU clock. A system using the certified. 21064-150 can plug in a 21064A; in- crease the CPU clock frequency to Longevity of Design 200, 233, 275, or 300 MHz; and pro- All Alpha microprocessors are de- gram the system clock frequency di- signed to the Alpha architecture. So, vider to keep the same system timing. whether your design requires the high This increases performance signifi- performance of the Alpha 21064A, the cantly. price/performance of the Alpha 21066, or the low power of the Alpha 21068, Module Flexibility for Performance your designs are assured of compati- Optimization bility across this family of high-per- The 21064A has a programmable ex- formance Alpha microprocessors. ternal interface supporting a complete range of system sizes and performance Technology Leadership levels, while maintaining peak CPU Designing with Digital’s Alpha execution speed. microprocessors puts you in a place of leadership in the industry. Your prod- The 21064A can address 16GB of ucts benefit from the unbounded ca- physical memory. The bus width is pacity of 64-bit technology, the built- selected as either 64 bits or 128 bits to in longevity of the Alpha architecture, provide flexible module designs. and Digital’s commitment to constant improvement in price/performance The 21064A has the logic to support through the company’s investment in board-level cache including program- semiconductor design and fabrication. mable cache size—256KB to 16MB, programmable latency—3 to 16 CPU clock cycles, and multiple cache- coherency schemes. Cost versus per- formance trade-offs are made by se- lecting smaller to larger cache size, longer to shorter latency, and single versus symmetric multiprocessing. Characteristics For More Information Characteristic Specification To learn more about the availability of Power supply Vss 0.0 V the Alpha 21064A microprocessor, Vdd 3.3 V ±5% contact your local semiconductor dis- tributor. To learn more about Digital Operating temperature Tj = 90°C Semiconductor’s product portfolio, contact the Digital Semiconductor In- Storage temperature range –55°C to 125°C formation Line: Power dissipation 200 MHz 24 W 233 MHz 28 W 1-800-332-2717 275 MHz 33 W 300 MHz 36 W Outside North America, call: Package 431-pin ceramic PGA with 0.10-in grid +1-508-628-4760 spacing and advanced thermal technology

Ordering Information To order this 21064 family Frequency Ask your distributor for... member... 21064A-200 200 MHz 21064-AB 21064A-233 233 MHz 21064-BB 21064A-275 275 MHz 21064-DB 21064A-275-PC 275 MHz 21064-P1 21064A-300 300 MHz 21064-EB While Digital believes the information in this publication is correct as of the date of publica- tion, it is subject to change without notice. © Digital Equipment Corporation 1993, 1995, 1996. All rights reserved. Printed in U.S.A. EC–QH0RB–TE AlphaGeneration, Digital, Digital Semiconduc- tor, OpenVMS, VAX, the AlphaGeneration de- sign mark, and the DIGITAL logo are trade- marks of Digital Equipment Corporation. Digital Semiconductor is a Digital Equipment Corporation business. IEEE is a registered trademark of The Institute of Electrical and Electronics Engineers, Inc.; Microsoft is a registered trademark and Win- dows NT is a trademark of Microsoft Corpora- tion; UNIX is a registered trademark in the United States and other countries licensed ex- clusively through X/Open Company Limited. All other trademarks and registered trademarks are the property of their respective owners.