Implementation of the VAX)

Microarchitecture Choices (Implementation of the VAX) Yale N. Patt Electrical Engineering and Computer Science University of Michigan, Ann Arbor 48109-2122 ABSTRACT tics all provide challenges to the microarchitect. The VAX architecture was introduced in The VAX Architecture provides hardware imple- 1977 with its first microarchitecture, the VAX mentors with an opportunity or a nightmare, 11/780, a TTL MS1 implementation. Several depending on your point of view. Such chamc- features of the 780 clearly come from the fact that teristics as 304 opcodes, a large number of (1) it was the first implementation, (2) it was addressing modes, a large number of supported made out of ‘ITL MS1 parts, and (3) it was data types, and non-regularities in the ISA seman- intended to be fast. Since then, there have been tics all provide challenges to the microarchitect. several distinct implementations, each reflecting The VAX architecture was introduced in 1977 (1) the technology in which it was implemented, with its first microarchitecture, the VAX 11/780, (2) the performance/cost tradeoffs it was sup- a ‘ITL MS1 implementation. Since then, there posed to consider, and (3) the design methodol- have been several distinct implementations, each ogy of its implementers. reflecting (1) the technology in which it was implemented, (2) the performance/cost tradeoffs This paper is a frrst attempt at discussing it was supposed to consider, and (3) the design several VAX implementations from the stand- methodology of its implementors. This paper is a point of the choices made in the microarchitecture tirst attempt at discussing several VAX imple- as driven by the context of the device technology, mentations from the standpoint of the choices the performance/costtradeoffs, and other relevant made in the microarchitecture as driven by the considerations. context of the device technology, the The paper is organized in five sections. performance/cost tradeoffs, and other considera- Section 2 describes the VAX architecture and tions. several common aspects of its implementations. With respect to the architecture, we focus on the “stuff” that every VAX implementation must deal with. Section 3 discussesthe implementations of 1. Introduction the early machines, the 11/780 and the 11/750. Section 4 discussesthe more recent higher perfor- The VAX Architecture provides hardware mance ECL versions, the 8800 and the 8600. implementors with an opportunity or a nightmare, Section 5 offers some concluding remarks. depending on your point of view. Such characteristics as 304 opcodes, a large number of addressing modes, a large number of supported 2. Architecture and Implementation data types, and non-regularities in the ISA seman- 2.1. The Architecture Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, The VAX architecture was introduced in the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for 1977 by Digital Equipment Corporation. It Computing Machinery. To copy otherwise, or to republish, requires a fee clearly had as design priorities extension of the and/or specific permission. 0 1990 ACM 089791-3469/90/0002/0213 $1.50 address space to 32 bits, dense coding of the 213 instruction stream, user-friendliness with respect merit, which is required by real-time applications. to software developers, support for general Pur- Support for multiprogramming and multiprocess- pose data processing, and support for multipro- ing include a number of constructs including four gramming and multiprocessing. levels of privilege, five levels of priority, and the These design objectives resulted in the LDPCTX, SVPCTX, PROBE, CKMx, and locked specification of 244 (now, 304) opcodes, nine (or instructions. 13, or 21, or over 40 depending on how one counts) different, and for the most part orthogo- 2.2. Implementations nal, addressing modes, and more than a dozen The richness of the instruction set architec- different data types, including the doubly-linked ture, as described above, has provided, as stated list [l]. The semantics of instructions specified in the introduction, an opportunity or a nightmare that they be variable-length and byte-aligned, for hardware developers, depending on your point each containing the number of operands appropri- of view. An architecture, such as the VAX, ate to that opcode. Information regarding the admits many microarchitectures, depending on number of operands and the accesstype and data the technology available, the cost/performance type of each is part of the semantics of the formula that one is operating under, and the exe- opcode. Addressing modes, which are specified cution model that one adopts. by a variable number of bytes, are for the most It is also the case that, given the set of exe- part independent of the opcode, introducing cution models adopted in the past, certain of the further variability in the length of the VAX characteristics of the VAX almost demand the instruction. inclusion of the one appropriate mechanism for The architecture supports a virtual address handling that characteristic. space of 2**32 bytes, with mapping indirectly All VAXes are microcoded. The richness through two page tables. Unaligned memory of the instruction set urges that the flexibility of accesses, which were forbidden on the earlier microcoded control be employed, notwithstand- PDP 11, were allowed on the VAX. Other sup- ing the conventional mythology that hardwired port for software development included protec- control is somehow faster than microcode. It is tion against ill-advised uses of operands and instructive to point out that (1) hardwired control addressing modes. produces higher performance execution only in There are several examples of code density situations where the critical path is in the in the instruction set architecture. Many opcodes microsequencing function, and (2) that this specify the execution of multiple operations, both should not occur in VAX implementations if one in the instruction’s address evaluation phase and designs with the well-understood (to microarchi- in its execution phase. Many data types have tects) technique that the next control store address optional storage requirements; for example, must be obtained from information available at integers can be eight, 16, 32, etc. bits long, the start of the current microcycle. A variation of numeric character strings can be coded in ASCII this basic old technique is the recently popular- or in packed decimal, etc. Displacement address- ized delayed branch present in many ISA archi- ing modes can have byte, word, or longword tectures introduced in the last few years. offsets. Immediate operands can be specified in The orthogonal&y of the instruction set, the straight- forward way introduced in the PDP resulting in variable length instructions (from one 11, or using fewer bits, as short literals. to more than 50 bytes), the semantics of the Support for general purpose data process- autoincrement and autodecrement addressing ing included the FPD bit, a substantial number of modes, and the use of memory operands in the instructions specifically targeted to handle the execution phase of the instruction all contribute to needs of both the COBOL and FORTRAN com- the need for a back-up mechanism in the microar- pilers, and constructs to support both multipro- chitecture. The result is a back-up PC and an gramming and multiprocessing. The FPD bit RLOG stack for undoing autoincrements and allows long iterative commercial instructions to autodecrementsperformed during addressevalua- coexist with a short interrupt latency figure of tion. 214 The orthogonality of the instruction set and relied on greater use of regular structures such as the variable number of operands also demand buses. some mechanism for accessing the VAX instruc- The 780 used a smaller register file, using it tion stream many times during the execution of only for internal temporaries and target machine each instruction. The result is a decoding struc- general purpose registers, preferring to implement ture with multiple entry points to the microcode separately (and at increased performance) the for each VAX instruction, depending on the internal processor registers and the constants operand being processed, its addressing mode, needed in the data path. The 750 used a larger access type, and data type. On the 780, it is register file, preferring to use it for internal pro- called the DECODE ROM, on the 8600, the cessorregisters as well. DRAM. The 780 included no provision for introduc- The use of memory operands during the ing new constants into the data path. The 750 operate phase of an instruction, the frequency of provided two immediate formats in its microin- occurrence of those memory operands, and the struction for allowing nine bit and 32 bit con- inordinate amount of time required to access a stants to be included within the microcode. PTE (indirectly through two page tables) demand a faster way to perform virtual to physical address The 780 allowed a microsubroutine translation. The result is the Translation Buffer, a RETURN to have a destination which is offset by cache of most recently used BTEs. A common some Hamming Distance from the control store implementation technique is to make the Transla- address which contained the CALL. The 750 tion Buffer as large as possible. allowed this offset to an arbitrary distance. The 780 mechanism is faster, but less flexible; the 750 Beyond these implementation structures, mechanism is slower, but more general. there is room for divergence. The 780 had a bug in its microinstruction definition whereby the microcode could not in the 3. The Early Machines same microinstruction perform a microsubroutine CALL and use the target machine for determining The first implementation of the VAX was the next control store address. The 750 corrected the VAX 11/780, made from Schottky TTL MS1 the bug by encoding the two microorders in parts, and introduced in 1977.

Implementation of the VAX)

18-447 Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures

A 4.7 Million-Transistor CISC Microprocessor

Embedded Multi-Core Processing for Networking

Introduction to Microcoded Implementation of a CPU Architecture

A Characterization of Processor Performance in the VAX-1 L/780

Digital and System Design

Chapter 5 the LC-3

Exploiting Coarse-Grained Parallelism to Accelerate Protein Motif Finding with a Network Processor

Instruction Set Architecture

The Implementation of Prolog Via VAX 8600 Microcode ABSTRACT

CHAPTER 4 MARIE: an Introduction to a Simple Computer

Vhdl Projects to Reinforce Computer Architecture Classroom Instruction