Processor-103

US005634119A United States Patent 19 11 Patent Number: 5,634,119 Emma et al. 45 Date of Patent: May 27, 1997 54) COMPUTER PROCESSING UNIT 4,691.277 9/1987 Kronstadt et al. ................. 395/421.03 EMPLOYING ASEPARATE MDLLCODE 4,763,245 8/1988 Emma et al. ........................... 395/375 BRANCH HISTORY TABLE 4,901.233 2/1990 Liptay ...... ... 395/375 5,185,869 2/1993 Suzuki ................................... 395/375 75) Inventors: Philip G. Emma, Danbury, Conn.; 5,226,164 7/1993 Nadas et al. ............................ 395/800 Joshua W. Knight, III, Mohegan Lake, Primary Examiner-Kenneth S. Kim N.Y.; Thomas R. Puzak. Ridgefield, Attorney, Agent, or Firm-Jay P. Sbrollini Conn.; James R. Robinson, Essex Junction, Vt.; A. James Wan 57 ABSTRACT Norstrand, Jr., Round Rock, Tex. A computer processing system includes a first memory that 73 Assignee: International Business Machines stores instructions belonging to a first instruction set archi Corporation, Armonk, N.Y. tecture and a second memory that stores instructions belong ing to a second instruction set architecture. An instruction (21) Appl. No.: 369,441 buffer is coupled to the first and second memories, for storing instructions that are executed by a processor unit. 22 Fied: Jan. 6, 1995 The system operates in one of two modes. In a first mode, instructions are fetched from the first memory into the 51 Int, C. m. G06F 9/32 instruction buffer according to data stored in a first branch 52 U.S. Cl. ....................... 395/587; 395/376; 395/586; history table. In the second mode, instructions are fetched 395/595; 395/597 from the second memory into the instruction buffer accord 58 Field of Search ............................ 395/375, 421.03, ing to data stored in a second branch history table. 395/800, 376,586, 587, 595,597 The first instruction set architecture may be system level 56 References Cited instructions and the second instruction set architecture may U.S. PATENT DOCUMENTS be millicodeinstructions that, for example, define a complex system level instruction and/or emulate a third instruction 3,702,007 10/1972 Davis, II ................................. 395/375 set architecture. 4.338,661 7/1982 Tredennicket al. .................... 395/375 4,477,872 10/1984 Losq ........................................ 395/375 4,679,141 7/1987 Pomerene et al. ...................... 395/375 40 Claims, 8 Drawing Sheets MEMORY SYSTEM MILLICODE ARRAY INSTRUCTION 05 BUFFER processor-103 CONDITIONAL BRANCH BRANCH RESOLUTION STORAGE LOGIC U.S. Patent May 27, 1997 Sheet 1 of 8 5,634,119 MEMORY SYSTEM MILLICODE ARRAY INSTRUCTION 05 BUFFER PROCESSOR CONDITIONAL BRANCH BRANCH RESOLUTION STORAGE LOGIC U.S. Patent May 27, 1997 Sheet 2 of 8 5,634,119 FIG. 2 FROM INSTRUCTION O3 BUFFERS 05 --------------- MILLICODE DETECTION LOGIC TO MHT/BHT STATIC ENABLE/ CONDITION DATA DISABLE er's TO KEEE ------ TO BRANCH RESOLUTION 308 LOGIC 2. U.S. Patent May 27, 1997 Sheet 3 of 8 5,634,119 FIG. 3 200 INSTRUCTION SET ARCHITECTURE 208 202 HARDWARE 20 INSTRUCTIONS MILLICODE INSTRUCTIONS 204 214 U.S. Patent May 27, 1997 Sheet 4 of 8 5,634,119 FIG. 4 2 3 4 5 BRANCH TRAP GW INSTR INSTR BS, TARGET re- INSTR LOCATION ADDRESS "it OCATION centee BA v opera oedeo O2E7A BC o2E70 7E LA 182 LTR o2E180 184 BC v O2EDE O2EDE | T | | | | | 02EDo Ee ec voteos oeso cate08 NI o2200 20c w v 20 LH | | | | | 02E210 24 N 28 STH | | | | | ec 20 oeeeeo 22 w | | | | 22e TM 220 BC v 02:223 o2E234 LH | | | | | 02E230 U.S. Patent May 27, 1997 Sheet 5 of 8 5,634,119 FIG. 5 2 3 4 5 INSTR TAKEN BRANCH GW LOCATION eINSTR BRANCHESSTARGET LOCATIONINSTR OED20 c. v OEF7c OEE20 OEF7c B | | | | OEF170 180 x OEF180 182 u 184 to v OEF38c OEF38c OEF380 U.S. Patent 5,634,119 ?I,IX|081+00 M?NIHIIM SNdWH1 U.S. Patent May 27, 1997 Sheet 8 of 8 5,634,119 FIG. 8 26 SM MMMM FIG. 9 SC3 SC2 SC SCO 5,634,119 1. 2 COMPUTER PROCESSING UNIT termined size, typically from 1024 to 4096 entries. Entries EMPLOYNG ASEPARATE MILLCODE are made only for taken branches as they are encountered. BRANCH HISTORY TABLE When the table is full, adding a new entry requires displac ing an older entry. This may be accomplished by a Least BACKGROUND OF THE INVENTION 5 Recently Used (LRU) algorithm as in caches. 1. Technical Field In principle, each branch in the stream of instructions being executed is looked up in the table, by its address, and The invention relates to computer processor units and, if it is found, its target is fetched and becomes the next more particularly, to controlling the fetching of instructions instruction in the stream. If the branch is not in the table, it in computer processing units. 10 is presumed not taken. As the execution of the branch 2. Description of the Prior Art instructions proceeds, the table is updated accordingly. If a Many of today's computer processing units (CPUs) and branch predicted to be taken is not taken, the associated table microprocessors have a Complex Instruction Set Computer entry is deleted. If a branch predicted not to be taken is (CISC) Architecture. For example, the architectures of the taken, a new entry is made for it. If the predicted target IBM System 390 and the Intel X86 and Pentium micropro 15 address is wrong, the corrected address is entered. cessors are classified as CISC Architectures. The instruction However, using the system level branch history table for set of a CISC architecture system include both simple both the system level instructions and the millicode instruc instructions (e.g. Load, or Add) and complex instructions tions is inefficient and may not improve performance due to (e.g. Edit and Load, Start Interpretive Execution or a number of factors. One of these factors, for example, is Diagnose). 20 based on the observation that millicode instructions will not Conventionally, the complex functions of the CISC archi be executed very often. Thus, when millicode instructions tecture are implemented in microcode because building are encountered, most branch entries created in the system hardware execution units to execute the complex functions level branch history table for the millicode instructions will is expensive and error prone. Additionally, implementing have been overwritten. An additional factor is that utilizing complex instructions in microcode provides flexibility to fix 25 the branch history table for millicode branches also dis problems and expandability in that additional functions can places system level branchinstructions. Thus, in some cases, be included later. However, as the number of complex performance may actually be lost by using the conventional instructions implemented in microcode grows larger and system level branch history table to predictmillicode branch larger, the cycle time of the hardware execution units in Outcomes. executing the microcode instructions and the real estate 30 required for the microcode itself becomes a limiting factor SUMMARY OF THE INVENTION in the performance of the processor/CPU. The above-stated problems and related problems of the To solve this problem, designers have implemented mil prior art are solved with the principles of the present licode routines that execute such complex functions. As 35 invention, computer processor unit employing a millicode shown in U.S. Pat. No. 5.226,164 to Nadas et al., the branch history table. The computer processing unit of the millicode routines may be stored in a separate millicode present invention includes a first memory that stores instruc array. The millicode routines may include standard system tions belonging to a first set of instructions and a second architecture instructions and/or additional hardware instruc memory that stores instructions belonging to a second set of tions that are added to improve performance. The set of 40 instructions. An instruction buffer is coupled to the first and additional hardware instructions stored in the millicode second memories, for storing instructions that are executed array may be considered to be an alternate architecture that by a processor unit. The system operates in one of two the computer processing unit can operate. modes. In a first mode, instructions are fetched from the first Additionally, the millicode routines may be used to emu memory into the instruction buffer according to data stored late a second instruction set architecture. In this case, the 45 in a first table. In the second mode, instructions are fetched additional hardware instructions stored in the millicode from the second memory into the instruction buffer accord array emulate instructions from a second instruction set ing to data stored in a second table. architecture. The first and second set of instructions typically include While the above-described system provides increased at least one branch type instruction. In the illustrative flexibility and processing speed, it leaves a number of 50 embodiment, the first table is a branch history table associ problems unsolved. One problem relates to the manipulation ated with the first set of instructions and the second table is of millicode branch instructions. In many instances, it may a branch history table associated with the second set of be desirable to predict the outcome of the millicode branch instructions. The first set of instructions may be system level instructions such that the proper instruction (next sequential instructions and the second set of instructions may be or target) can be fetched before or at least as soon as it is 55 millicode instructions that for example, define a complex needed, so that no delay occurs in executing the millicode system level instruction and/or emulate a second instruction instruction stream. set architecture. An effective strategy for predicting the outcome of branch instructions at the system level is embodied in U.S.

Processor-103

The 32-Bit PA-RISC Run-Time Architecture Document, V. 1.0 for HP

Subchapter 2.4–Hp Server Rp5400 Series

Analysis of GPGPU Programs for Data-Race and Barrier Divergence

Evolving GPU Machine Code

Readingsample

MT-088: Analog Switches and Multiplexers Basics

A Remotely Accessible Network Processor-Based Router for Network Experimentation

The 32-Bit PA-RISC Run- Time Architecture Document

Design of the RISC-V Instruction Set Architecture

A Lisp Oriented Architecture by John W.F

Abatement of Area and Power in Multiplexer Based on Cordic Using

Graphical Microcode Simulator with a Reconfigurable Datapath