Page 1 United States Patent [191 Fite Et A1. [54] METHOD AND

Home , Execution unit

|lllllllllllllllllllllllllillllllllllllllllllllllllllllllllllllllllllllllll USOO5109495A United States Patent [191 [11] Patent Number: 5,109,495 Fite et a1. [45] Date of Patent: Apr. 28, 1992

[54] METHOD AND APPARATUS USING A plementation of the VAX Architecture“, Digital Tech SOURCE OPERAND LIST AND A SOURCE nical Journal. No. 1, Aug. 1985, pp. 24-42. OPERAND POINTER QUEUE BETWEEN Fossum et al., “The F Box. Floating Point in the VAX THE EXECUTION UNIT AND THE 8600 System". Digital Technical Journal. No. 1, Aug. INSTRUCTION DECODING AND OPERAND 1985, pp. 43-53. PROCESSING UNITS OF A PIPELINED G. Desrochers, Principles of Parallel and Multiprocess DATA PROCESSOR ing. Intertext Publications, Inc., McGraw-Hill Book [75] Inventors: David B. Fite; Tryggve Fossum, both Co., 1987, pp.1 54-163. of Northboro; William R. Grundmann, Hudson; Dwight P. Primary Examiner-Thomas C. Lee Manely, Holliston; Francis X. Assistant Examiner-—Ken S. Kim McKeen, Westboro; John E. Murray, Attorney, Agent, or Firm-Arnold, White 8: Durkee Acton; Ronald M. Salett, [57] ABSTRACT Framingham; Eileen Samberg, Southborough; Daniel P. Stirling, To execute variable-length instructions independently Shrewsbury, all of Mass. of instruction preprocessing, a central processing unit is provided with a set of queues in the data and control [73] Assignee: Digital Equipment Corp., Maynard, paths between an instruction unit and an execution unit. Mass. The queues include a "fork“ queue, a source queue. a [21] Appl. No: 306,843 destination queue, and a program counter queue. The [22] Filed: Feb. 3, I989 fork queue contains an entry of control information for [51] Int. Cl.5 ...... G061’ 9/34; G06F 9/38 each instruction processed by the instruction unit. This [52] US. Cl...... 395/375; 364/231.8; control information corresponds to the opcode for the 364/239.51; 364/2391; 364/2443; 364/2388; instruction, and preferably it is a microcode "fork" 364/9264; 364/93981; 364/9480; 364/263.1; address at which a microcode execution unit begins 364/948.34; 364/957.6; 364/9642; 364/2434; execution to execute the instruction. The source queue 364/DIG. 1 specifies the source operands for the instruction. Prefer [58] Field of Search 364/200 MS File, 900 MS File ably the source queue stores source pointers and the [56] References Cited operands themselves are included in a separate “source U.S. PATENT DOCUMENTS list“ in the case of operands fetched from memory or immediate data from the instruction stream, or are the 3,949,379 4/1976 Ball ...... 340/1725 contents of a set of general purpose registers in the 4,392,200 7/1983 Arulpragasam. 364/200 execution unit. The destination queue specifies the desti 4,395,758 7/1983 Helenius et al. 364/200 4,454,578 6/1984 Matsumoto et al. 364/200 nation for the instruction. for example, either memory 4,509,116 4/1985 Lackey et a1. 364/200 or general purpose registers. The program counter 4,543.626 9/1985 Bean et al...... 364/200 queue contains the starting value of the program 364/200 counter for each of the instructions passed from the 4,750,1124,890,218 12/19896/1988 JonesBram et...... al. .. 364/200 instruction unit to the execution unit. Preferably the 4,926,323 5/1990 Baror et a]...... 364/200 queues are large enough to hold control information 364/200 4,967,338 10/1990 Kibohara et a]. and data for up to six instructions. The queues therefore OTHER PUBLICATIONS shield the execution unit and the instruction unit from Fossum et al., "An Overview of the VAX 8600 Sys each others complexities and provide a buffer to allow tem", Digital Technical Journal, No. 1, Aug. 1985, pp. for an uneven processing rate in either of them. 8-23. Troiani et al., "The VAX 8600 I Box, A Pipelined Im 19 Claims, 15 Drawing Sheets

US. Patent Apr. 28, 1992 Sheet 14 of 15 5,109,495

AVAiLABLE FUNCTTONAL FOR REQUEST U N T BUS Y AT HEAD OF

SOURCE 203 OPERANDS AVAILABLE AVAILABLE

RETIRE RESULT IN ACCORDANCE WITH ENTRY IN DE STINATION RESULT QUEUE KNOWN NO l f 215 YES r205 REMOVE ENTRY INSE RT NEW AT HEAD OF ENTRY INTO RESULT QUEUE RESULT QUEUE

FIG. l6 US. Patent Apr. 28, 1992 Sheet 15 of 15 5,109,495 ZOHFGZHBwHD ZOHBUHAWm HOOD

.01m_

w_m_om_mmmnmvmmmwmt90-QE:~_9wm\-mvn_m 5,109,495 1 2 PIPELINE OF A VIRTUAL MEMORY SYSTEM METHOD AND APPARATUS USING A SOURCE BASED DIGITAL COMPUTER, Ser. No. OPERAND LIST AND A SOURCE OPERAND 07/306,866 ?led Feb. 3, 1989, and issued as U.S. Pat. POINTER QUEUE BETWEEN THE EXECUTION No. 4,985,825 on Jan. 154, 199i; Hetherington et al., UNIT AND THE INSTRUCTION DECODING AND 5 METHOD AND APPARATUS FOR CONTROL OPERAND PROCESSING UNITS OF A PIPELINED LING THE CONVERSION OF VIRTUAL TO DATA PROCESSOR PHYSICAL MEMORY ADDRESSES IN A DIGI TAL COMPUTER SYSTEM, Ser. No. 07/306,544 The present application discloses certain aspects of a ?led Feb. 3, 1989 abandoned, continued in Ser. No. computing system that is further described in the fol ID 07/476,007 ?led Aug. 9, I991; Hetherington, WRITE lowing U.S. patent applications ?led concurrently with BACK BUFFER WITH ERROR CORRECTING the present application: Evans et al., AN INTERFACE CAPABILITIES, Ser. No. 07/306,703 ?led Feb. 3, BETWEEN A SYSTEM CONTROL UNIT AND A 1989, and issued as U.S. Pat. No. 4,995,041 on Feb. 19, SERVICE PROCESSING UNIT OF A DIGITAL I991; Flynn et al., METHOD AND MEANS FOR COMPUTER. Ser. No. 07/306,325 ?led Feb. 3, 1989; ARBITRATING COMMUNICATION REQUESTS Arnold et al., METHOD AND APPARATUS FOR USING A SYSTEM CONTROL UNIT IN A MUL INTERFACING A SYSTEM CONTROL UNIT TI-PROCESSOR SYSTEM, Ser. No. 07/306,87l ?led FOR A MULTIPROCESSOR SYSTEM WITH THE Feb. 3, I989; Chinnasway et al., MODULAR CROSS CENTRAL PROCESSING UNITS, Ser. No. BAR INTERCONNECTION NETWORK FOR 07/306,837 ?led Feb. 3, 1989; Gagliardo et al., 20 DATA TRANSACTIONS BETWEEN SYSTEM METHOD AND MEANS FOR INTERFACING A UNITS IN A MULTI-PROCESSOR SYSTEM, Ser. SYSTEM CONTROL UNIT FOR A MULTI No. 07/306,336 ?led Feb. 3, 1989, and issued as U.S. PROCESSOR SYSTEM WITH THE SYSTEM Pat. No. 4,968,977 Nov. 6, 1990; Polzin et al., MAIN MEMORY, Ser. No. 07/306,326 ?led Feb. 3, METHOD AND APPARATUS FOR INTERFAC 1989, abandoned, continued in Ser. No. 07/646,522 ?led 25 ING A SYSTEM CONTROL UNIT FOR A MULTI Jan. 28, 1991; D. Fite et al., METHOD AND APPA PRO‘CESSOR SYSTEM WITH INPUT/OUTPUT RATUS FOR RESOLVING A VARIABLE NUM BER OF POTENTIAL MEMORY ACCESS CON’ UNITS, Ser. No. 07/306,862 ?led Feb. 3, 1989, and FLICTS IN A PIPELINED COMPUTER SYSTEM, issued as U.S. Pat. No. 4,965,793 on Oct. 23, 1990; Ga gliardo et al., MEMORY CONFIGURATION FOR Ser. No. 07/306,767 ?led Feb. 3 1989; D. Fite et al., 30 DECODING MULTIPLE SPECIFIERS IN A USE WITH MEANS FOR INTERFACING A SYS VARIABLE LENGTH INSTRUCTION ARCHI TEM CONTROL UNIT FOR A MULTI-PROCES TECTURE, Ser. No. 07/307,347 ?led Feb. 3, I989; D. SOR SYSTEM WITH THE SYSTEM MAIN MEM Fite et al., VIRTUAL INSTRUCTION CACHE RE ORY, Ser. No. 07/ 306,404 ?led Feb. 3, 1989, and issued FILL ALGORITHM, Ser. No. 03/306,831 ?led Feb. 3, as U.S. Pat. No. 5,043,874 on Aug. 27, I991; and Ga 1989; Murray et al., PIPELINE PROCESSING OF gliardo et al., METHOD AND MEANS FOR REGISTER AND REGISTER MODIFYING SPEC ERROR CHECKING OF DRAM-CONTROL SIG IFIERS WITHIN THE SAME INSTRUCTION, Ser. NALS BETWEEN SYSTEM MODULES. Ser. No. No. 07/306,833 ?led Feb. 3, 1989; Murray et al., MUL 07/306,836 ?led Feb. 3, I989, abandoned, continued in TIPLE INSTRUCTION PREPROCESSING SYS Ser. No. 07/582,493 ?led Sept. 14, I990. TEM WITH DATA DEPENDENCY RESOLU TECHNICAL FIELD TION FOR DIGITAL COMPUTERS, Ser. No. 07/306,773 ?led Feb. 3, 1989; Murray et al., PREPRO The present invention relates generally to digital CESSING IMPLIED SPECIFIERS IN A PIPE computers and, more particularly, to a system for re LINED PROCESSOR, Ser. No. 07/306,846 ?led Feb. 45 solving data dependencies during the preprocessing of 3, 1989; D. Fite et al., BRANCH PREDICTION, Ser. multiple instructions prior to execution of those instruc No. 07/306,760 ?led Feb. 3, I989; Fossum et a., PIPE tions in a digital computer. This invention is particularly LINED FLOATING POINT ADDER FOR DIGI applicable to the preprocessing of multiple instructions TAL COMPUTER, Ser. No. 07/306,343 ?led Feb. 3, in a pipelined digital computer system using a variable 1989, and issued as U.S. Pat. No. 4,994,996 on Feb. 19, 50 length complex instruction set (CISC) architecture. 1991; Grundmann et al., SELF TIMED REGISTER DESCRIPTION OF RELATED ART FILE, Ser. No. 07/ 306,445 ?led Feb. 3, I989; Beaven et al., METHOD AND APPARATUS FOR DETECT Preprocessing of instructions is a common expedient ING AND CORRECTING ERRORS IN A PIPE used in digital computers to speed up the execution of LINED COMPUTER SYSTEM, Ser. No. 07/306,B28 large numbers of instructions. The preprocessing opera ?led Feb. 3, 1989 and issued as U.S. Pat. No. 4,982,402 tions are typically carried out by an instruction unit on Jan. l, 1991; Flynn et al., METHOD AND MEANS interposed between the memory that stores the instruc FOR ARBITRATING COMMUNICATION RE tions and the execution unit that executes the instruc QUESTS USING A SYSTEM CONTROL UNIT IN tions. The preprocessing operations include. for exam A MULTI-PROCESSOR SYSTEM, Ser. No. ple, the prefetching of operands identi?ed by operand 07/306,871 ?led Feb. 3, 1989; E. Fite et al., CONTROL speci?ers in successive instructions so that the operands OF MULTIPLE FUNCTION UNITS WITH PAR are readily available when the respective instructions ALLEL OPERATION IN A MICROCODED EXE are loaded into the execution unit. The instruction unit CUTION UNIT, Ser. No. 07/306,832 ?led Feb. 3, 1989; carries out the preprocessing operations for subsequent Webb, Jr. et al., and issued as U.S. pat. No. 5,067,069 65 instructions while a current instruction is being exe Nov..l9, 1991 PROCESSING OF MEMORY AC cuted by the execution unit, thereby reducing the over CESS EXCEPTIONS WITH PRE-FETCHED IN all processing time for any given sequence of instruc STRUCTIONS WITHIN THE INSTRUCTION tions. 5,109,495 3 4 Although, the preprocessing of instructions improves FIG. I is a block diagram of a digital computer sys CPU (central processing unit) performance, the in tem having a central pipelined processing unit which crease in performance is limited by con?icts between employs the present invention; the preprocessing operations and the execution of the FIG. 2 is a diagram showing various steps performed instructions, and in particular the fact that for variable to process an instruction and which may be performed length instructions, the time spent during execution is in in parallel for different instructions by a pipelined in many cases different from the time spent during prepro struction processor according to FIG. 1; ceasing. FIG. 3 is a block diagram of the instruction processor of FIG. 1 showing in further detail the queues inserted SUMMARY OF THE INVENTION between the instruction unit and the execution unit; To execute variable-length instructions indepen FIG. 4 is a block diagram of the instruction decoder dently of instruction preprocessing, a central processing in FIG. 1 showing in greater detail the data paths asso unit is provided with a set of queues in the data and ciated with the source list and the other registers that control paths between an instruction unit and an execu are used for exchanging data among the instruction unit, tion unit. The queues are loaded by the instruction unit 5 the memory access unit and the execution unit; as a result of preprocessing instructions. and the queues FIG. 5 is a block diagram showing the data path are read by the execution unit to execute the instruc through the instruction unit to the queues; tions. FIG. 6 is a diagram showing the format of operand Preferably the queues include a "fork" queue, a speci?er data transferred over a GP (general purpose) source queue, a destination queue, and a program bus from an instruction decoder to a general purpose counter queue. The fork queue contains an entry of unit in an operand processing unit in the instruction control information for each instruction processed by unit: the instruction unit. This control information corre FIG. 7 is a diagram showing the format of short sponds to the opcode for the instruction, and preferably literal speci?er data transferred over a SL (short literal) it is a microcode "fork“ address at which a microcode bus from the instruction decoder to an expansion unit in execution unit begins execution to execute the instruc the operand processing unit; tion. FIG. 8 is a diagram showing the format of source and The source queue speci?es the source operands for destination speci?er data transmitted over a TR (trans the instruction. Preferably the source queue stores source pointers and the operands themselves are in fer) bus from the instruction decoder to a transfer unit in cluded in a separate “source list" in the case of operands the operand processing unit; FIG. 9 is a schematic diagram of the source pointer fetched from memory or immediate data from the in struction stream, or are the contents of a set of general queue; purpose registers in the execution unit. Preferably the FIG. 10 is a schematic diagram of the transfer unit; source queue can be loaded with two source pointers FIG. 11 is a schematic diagram of the expansion unit; per cycle, and the “source list“ is a FIFO (?rst in, ?rst FIG. 12 is a schematic diagram of the general purpose out) buffer that can be loaded with both a source oper unit in the operand processing unit; and from memory and a source operand of immediate FIG. 13 is a block diagram of the execution unit, data per cycle, with the source list entries being as which shows the control ?ow for executing instructions signed incrementally as the source operands are de 40 and retiring results; coded. FIG. 14 is a block diagram of the execution unit, The destination queue speci?es the destination for the which shows the data paths available for use during the instruction. The destination can either be memory or execution of instructions and the retiring of results; general purpose registers. Preferably a separate “write FIG. 15 is a timing diagram showing the states of queue" in a memory access unit holds the addresses of respective functional units when performing their re memory destinations, and the destination queue holds spective arithmetic or logical operations upon source either a register number or a ?ag indicating that the operands of various data types; destination is in memory. FIG. 16 is a ?owchart of the control procedure fol~ The program counter queue contains the starting lowed by an instruction issue unit in the execution unit value of the program counter for each of the instruc 50 for issuing source operands to speci?ed functional units tions passed from the instruction unit to the execution and recording the issuance and the destination for the unit. The starting value of the program counter is used respective result in a result queue in the execution unit; by several different variable-length instructions in a FIG. 17 is a ?owchart of the control procedure fol typical CISC instruction set, and it is also used for han lowed by the retire unit for obtaining the results of the dling exceptions and interrupts in the conventional functional unit speci?ed by the entry at the head of the manner. retire queue, and retiring those results at a destination Preferably the queues are large enough to hold con speci?ed by that entry, and removing that entry from trol information and data for up to six instructions. The the head of the result queue; and queues therefore shield the execution unit and the in FIG. 18 is a diagram showing the information that is struction unit from each others complexities and pro 60 preferably stored in an entry of the result queue. vide a buffer to allow for an uneven processing rate in While the invention is susceptible to various modi? either of them. cations and alternative forms, speci?c embodiments thereof have been shown by way of example in the BRIEF DESCRIPTION OF THE DRAWINGS drawings and will be described in detail herein. It Other objects and advantages of the invention will 65 should be understood, however, that it is not intended become apparent upon reading the following detailed to limit the invention to the particular forms disclosed, description and upon reference to the drawings in but on the contrary, the intention is to cover all modi? WI’liCI't: cations, equivalents, and alternatives falling within the 5,109,495 5 6 spirit and scope of the invention as de?ned by the ap ?ed block of data from the main memory 10 and storing pended claims. - that block of data in the cache 14. In other words, the cache provides a “window” into the main memory, and DETAILED DESCRIPTION OF THE contains data likely to be needed by the instruction and PREFERRED EMBODIMENTS execution units. Turning now to the drawings and referring first to If a data element needed by the instruction and execu FIG. 1, there is shown a portion of a digital computer tion units 12 and 13 is not found in the cache 14, then the system which includes a main memory 10, a memory data element is obtained from the main memory 10, but CPU interface unit 11, and at least one CPU comprising in the process, an entire block, including additional an instruction unit 12 and an execution unit 13. It should data, is obtained from the main memory 10 and written be understood that additional CPUs could be used in into the cache 14. Due to the principle of locality in such a system by sharing the main memory 10. It is time and memory space, the next time the instruction practical, for example, for up to four CPUs to operate and execution units desire a data element, there is a high simultaneously and communicate efficiently through degree of likelihood that this data element will be found the shared main memory 10. in the block which includes the previously addressed Both data and instructions for processing the data are data element. Consequently, there is a high degree of stored in addressable storage locations within the main likelihood that the cache 14 will already include the memory 10. An instruction includes an operation code data element required by the instruction and execution (opcode) that specifies, in coded form, an operation to units 12 and 13. In general, since the cache 14 will be be performed by the CPU, and operand speci?ers that 20 accessed at a much higher rate than the main memory provide information for locating operands. The execu 10, the main memory can have a proportionally slower tion of an individual instruction is broken down into access time than the cache without substantially degrad multiple smaller tasks. These tasks are performed by ing the average performance of the data processing dedicated, separate, independent functional units that system. Therefore, the main memory 10 can be com are optimized for that purpose. 25 prised of slower and less expensive memory elements. Although each instruction ultimately performs a dif The translation buffer 15 is a high speed associative ferent operation, many of the smaller tasks into which memory which stores the most recently used virtual-to each instruction is broken are common to all instruc physical address translations. In a virtual memory sys tions. Generally, the following steps are perfonned tem, a reference to a single virtual address can cause during the execution of an instruction: instruction fetch, several memory references before the desired informa instruction decode, operand fetch, execution, and result tion is made available. However, where the translation store. Thus, by the use of dedicated hardware stages, buffer 15 is used, translation is reduced to simply finding the steps can be overlapped in a pipelined operation, a "hit" in the translation buffer 15. thereby increasing the total instruction throughput. An I/O (input/output) bus 16 is connected to the The data path through the pipeline includes a respec main memory 10 and the main cache 14 for transmitting tive set of registers for transferring the results of each commands and input data to the system and receiving pipeline stage to the next pipeline stage. These transfer output data from the system. registers are clocked in response to a common system The instruction unit 12 includes a program counter 17 clock. For example, during a first clock cycle, the ?rst and an instruction cache 18 for fetching instructions instruction is fetched by hardware dedicated to instruc from the main cache 14. The program counter 17 pref tion fetch. During the second clock cycle, the fetched erably addresses virtual memory locations rather than instruction is transferred and decoded by instruction the physical memory locations of the main memory 10 decode hardware, but, at the same time, the next in and the cache 14. Thus, the virtual address of the pro struction is fetched by the instruction fetch hardware. gram counter 17 must be translated into the physical During the third clock cycle, each instruction is shifted 45 address of the main memory 10 before instructions can to the next stage of the pipeline and a new instruction is be retrieved. Accordingly, the contents of the program fetched. Thus, after the pipeline is filled, an instruction counter 17 are transferred to- the interface unit 11 where will be completely executed at the end of each clock the translation buffer 15 performs the address conver cycle. sion. The instruction is retrieved from its physical mem This process is analogous to an assembly line in a ory location in the cache 14 using the converted ad~ manufacturing environment. Each worker is dedicated dress. The cache 14 delivers the instruction over data to performing a single task on every product that passes return lines to the instruction cache 18. The organiza through his or her work stage. As each task is per tion and operation of the cache 14 and the translation formed the product comes closer to completion. At the buffer 15 are further described in Chapter 11 of Levy final stage, each time the worker performs his assigned 55 and Eckhouse, J r., Computer Programming and Archi task a completed product rolls off the assembly line. tecture, The VAX-l1, Digital Equipment Corporation, In the particular system illustrated in FIG. 1, the pp. 351468 (1980). interface unit 11 includes a main cache 14 which on an Most of the time, the instruction cache has prestored average basis enables the instruction and execution units in it instructions at the addresses speci?ed by the pro 12 and 13 to process data at a faster rate than the access gram counter 17, and the addressed instructions are time of the main memory 10. This cache 14 includes available immediately for transfer into an instruction means for storing selected predefined blocks of data buffer 19. From the buffer 19, the addressed instructions elements, means for receiving requests from the instruc are fed to an instruction decoder 20 which decodes both tion unit 12 via a translation buffer 15 to access a speci the op-codes and the speci?ers. An operand processing ?ed data element, means for checking whether the data 65 unit (CPU) 21 fetches the specified operands and element is in a block stored in the cache, and means supplies them to the execution unit 13. operative when data for the block including the speci The CPU 21 also produces virtual addresses. In par fled data element is not so stored for reading the speci ticular, the OPU 21 produces virtual addresses for 5,109,495 7 8 memory source (read) and destination (write) operands. the GPRs and a list of source operands. Thus entries in For at least the memory read operands, the CPU 21 the source pointer queue will either point to GPR loca must deliver these virtual addresses to the interface unit tions for register operands, or point to the source list for 11 where they are translated to physical addresses. The memory and literal operands. Both the interface unit 11 physical memory locations of the cache 14 are then and the instruction unit 12 write entries in the source list accessed to fetch the operands for the memory source 24, and the execution unit 13 reads operands out of the operands. source list as needed to execute the instructions. For In each instruction, the ?rst byte contains the opcode, executing instructions, the execution unit 13 includes an and the following bytes are the operand speci?ers to be instruction issue unit 25, a microcode execution unit 26, decoded. The ?rst byte of each speci?er indicates the an arithmetic and logic unit (ALU) 22. and a retire unit addressing mode for that speci?er. This byte is usually 27. broken in halves, with one half specifying the address The present invention is particularly useful with pipe ing mode and the other half specifying a register to be lined processors. As discussed above, in a pipelined used for addressing. The instructions preferably have a processor the processor’s instruction fetch hardware variable length, and various types of speci?ers can be may be fetching one instruction while other hardware is used with the same opcode, as disclosed in Strecker et decoding the operation code of a second instruction, al., U.S. Pat. No. 4,241,397 issued Dec. 23, 1980. fetching the operands of a third instruction, executing a The ?rst step in processing the instructions is to de fourth instruction, and storing the processed data of a code the "opcode" portion of the instruction. The ?rst ?fth instruction. FIG. 2 illustrates a pipeline for a typi~ portion of each instruction consists of its opcode which cal instruction such as: speci?es the operation to be performed in the instruc tion. The decoding is done using a table-look-up tech ADDL3 ma‘izatnuu. nique in the instruction decoder 20. The instruction decoder ?nds a microcode starting address for execut This is a long-word addition using the displacement ing the instruction in a look-up table and passes the mode of addressing. starting address to the execution unit 13. Later the exe In the ?rst stage of the pipelined execution of this cution unit performs the speci?ed operation by execut instruction, the program count (PC) of the instruction is ing prestored microcode. beginning at the indicated created; this is usually accomplished either by incre starting address. Also, the decoder determines where menting the program counter from the previous instruc source>operand and destination-operand speci?ers 30 tion, or by using the target address of a branch instruc occur in the instruction and passes these speci?ers to the tion. The PC is then used to access the instruction cache CPU 21 for pre-processing prior to execution of the 18 in the second stage of the pipeline. instruction. ' In the third stage of the pipeline, the instruction data The look-up table is organized as an array of multiple is available from the cache 18 for use by the instruction blocks, each having multiple entries. Each entry can be 35 decoder 20, or to be loaded into the instruction buffer addressed by its block and entry index. The opcode byte 19. The instruction decoder 20 decodes the opcode and addresses the block, and a pointer from an execution the three speci?ers in a single cycle. as will be described point counter (indicating the position of the current in more detail below. The R1 number along with the speci?er in the instruction) selects a particular entry in byte displacement is sent to the CPU 21 at the end of the the block. The output of the lookup table speci?es the decode cycle. data context (byte, word, etc.), data type (address, inte in stage 4, the R0 and R2 pointers are passed to the ger, etc.) and accessing mode (read, write, modify, etc.) queue unit 23. Also, the operand unit 21 reads the con for each speci?er, and also provides a microcode dis tents of its GPR register ?le at location R1, adds that patch address to the execution unit. value to the speci?ed displacement (12), and sends the After an instruction has been decoded, the CPU 21 45 resulting address to the translation buffer 15 in the inter parses the operand speci?ers and computes their effec face unit 11, along with an OP READ request. at the tive addresses; this process involves reading general end of the address generation stage. A pointer to a re purpose registers (GPRs) and possibly modifying the served location in the source list for receiving the sec» GPR contents by autoincrementing or autodecrement ond operand is passed to the queue unit 23. When the ing. The operands are then fetched from those effective OP READ request is acted upon, the second operand addresses and passed on to the execution unit 13, which read from memory is transferred to the reserved loca later executes the instruction and writes the result into tion in the source list. the destination identi?ed by the destination pointer for In stage 5, the interface unit 11 uses the translation that instruction. buffer 15 to translate the virtual address generated in Each time an instruction is passed to the execution 55 stage 4 to a physical address. The physical address is unit, the instruction unit sends a microcode dispatch then used to address the cache 14, which is read in stage address and a set of pointers for (l) the locations in the 6 of the pipeline. execution-unit register ?le where the source operands In stage 7 of the pipeline, the instruction is issued to can be found, and (2) the location where the results are the ALU 22 which adds the two operands and sends the to be stored. Within the execution unit, a set of queues result to the retire unit 27. During stage 4, the register 23 includes a fork queue for storing the microcode dis numbers for R0 and R2, and a pointer to the source list patch address, a source pointer queue for storing the location for the memory data, were sent to the execu source-operand locations, and a destination pointer tion unit and stored in the pointer queues. Then during queue for storing the destination location. Each of these the cache read stage, the execution unit started to look queues is a ‘FIFO buffer capable of holding the data for 65 for the two source operands in the source list. In this multiple instructions. particular example it ?nds only the register data in R0, The execution unit 13 also includes a source list 24, but at the end of this stage the memory data arrives and which is a multi-ported register ?le containing a copy of is substituted for the invalidated read-out of the register