United States Patent [19J [11] Patent Number: 6,112,019 Chamdani Et Al

Total Page:16

File Type:pdf, Size:1020Kb

United States Patent [19J [11] Patent Number: 6,112,019 Chamdani Et Al Illlll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111 US006112019A United States Patent [19J [11] Patent Number: 6,112,019 Chamdani et al. [45] Date of Patent: Aug. 29, 2000 [54] DISTRIBUTED INSTRUCTION QUEUE "An Efficient Algorithm for Exploring Multiple Arithmetic Units," Tomasulo IBM Journal, Jan. 1967, pp. 25-33. [75] Inventors: Joseph I. Chamdani, Marietta; Cecil 0. Alford, Lawrenceville, both of Ga. "Implementation of Precise Interrupts In Pipelined Proces­ sors," James E. Smith Andrew R. Pleszkun,© 1985 IEEE, [73] Assignee: Georgia Tech Research Corp., Atlanta, pp. 36-44. Ga. "Instruction Issue Logic in Pipelined Supercomputers", [21] Appl. No.: 08/489,509 Shlomo Weiss & James E. Smith,© 1984 IEEE, Transac­ [22] Filed: Jun. 12, 1995 tions on Computers, vol. c-33, No. 11, pp. 1012-1022 Nov. 1984. [51] Int. CI.7 ........................................................ G06F 9/22 [52] U.S. Cl. .............................................................. 395/390 "The Metafiow Architecture", Popescu et al, IEEE Micro,© [58] Field of Search ..................................... 395/376, 378, 1991 IEEE, pp. 10-13, 63-73. 395/381, 382, 391, 393, 390 [56] References Cited Primary Examiner-David Y. Eng Attorney, Agent, or Firm-Thomas, Kayden, Horstemeyer U.S. PATENT DOCUMENTS & Risley, L.L.P. 3,924,245 12/1975 Eaton et al. ....................... 395/421.09 4,725,947 2/1988 Shonai et al. ........................... 395/585 [57] ABSTRACT 4,736,288 4/1988 Shintani et al. .. ... ... ... .... ... ... ... 395 /395 4,752,873 6/1988 Shonai et al. ...................... 395/800.23 A distributed instruction queue (DIQ) in a superscalar 4,780,810 10/1988 Torii et al. .... ... ... ... ... .... ... ... ... 395 /800 microprocessor supports multi-instruction issue, decoupled 4,896,258 1/1990 Yamaguchi et al. .................... 395/583 data flow scheduling, out-of-order execution, register 4,992,938 2/1991 Cocke et al. ............................ 395/393 renaming, multi-level speculative execution, and precise 5,050,067 9/1991 McLogan et al. ...................... 395/678 5,122,984 6/1992 Strehler ..................................... 365/49 interrupts. The DIQ provides distributed instruction shelving 5,129,067 7/1992 Johnson .................................. 395/375 without storing register values, operand value copying, and 5,136,697 8/1992 Johnson .................................. 395/375 result value forwarding, and supports in-order issue as well 5,208,914 5/1993 Wilson et al. .......................... 395/275 as out-of-order issue within its functional unit. The DIQ 5,261,066 11/1993 Jouppi et al. ........................... 395/425 5,345,569 9/1994 Tran ........................................ 395/393 allows a reduction in the number of global wires and 5,355,457 10/1994 Shebanow et al. ..................... 395/394 replacement with private-local wires in the processor. The 5,367,703 11/1994 Levitan . ... ... ... .... ... ... ... ... ... .... .. 395 /800 DIQ's number of global wires remains the same as the 5,371,684 12/1994 Iadonato et al. ........................ 364/491 number of DIQ entries and data size increases. The DIQ 5,414,822 5/1995 Saito et al. .............................. 395/375 maintains maximum machine parallelism and the actual 5,619,730 4/1997 Ando ....................................... 395/855 performance of the microprocessor using the DIQ is better OIBER PUBLICATIONS due to reduced cycle time or more operations executed per Sohi, "Instruction Issue Logic for High-Performance, Inter­ cycle. ruptible, Multiple Function Unit, Pipelined Computers", 1990 IEEE, pp. 349-359. 22 Claims, 40 Drawing Sheets (from dispatch buses) (to execution unit) 300 340 DIQ alloc_port 310 bottom_DIQ_entry (O .. Nd-1) (issued_DIQ_entry) / 365 mispred_flag 350 Tail Pointer Logic 360 391 Note: insLID ;:;unique instruction tag opcode =opcode of the instruction RS1 =register number of first source operand RS1 _tag =register tag of first source operand RS2 =register number of second source operand RS2_tag ""register tag of second source operand In-order Issue Distributed Instruction Queue(DIQ) •d \JJ. • Fetch Decode Dispatch Issue Execute Writeback Retire ~ ~ Execution Unit results to ...... Result ~ instructions Register Decoder Central ~ • Buffer ...... from Window • [[ File = I-cache __J • w Execution Unit load Store store to Load/Store Unit > Buffer D-cache ~= N ~~ load from D-cache cN c c (a) with a Central Instruction Window Fetch Decode Dispatch Issue Execute Write back Retire 'Jl ~=- ~ Dist. Window Execution Unit results to ..... Result • instructions Register • • Buffer • '"""'0....., from Decoder File • K: • • .i;;.. I-cache • • c Dist. Window I •I Execution Unit Store store to Load/Store Unit r ~1 Dist. Window --- Buffer D-cache ....0--, load from D-cache ~ ~ Fig. 1 ....N (a) with Distributed Instruction Windows =~ \C •d \JJ. • ~ ......~ ~ =...... Note: inst 1 and inst 2 can be a floating-point arithmetic and/or a load/store instruction ~ inst 1 inst 2 ~ N Free List (FL) ~~ S2IS3 OPITIS1 IS2IS3 cN I I I I I c c Mapping Table 33 I 34 I 35 I 36 I 371 38 I 39 32X6 3 'Jl Pending-Target ~=­ Return Queue (PTRQ) .....~ N 0....., Register Mapping Table in IBM RS/6000 Floating-Point Unit .i;;.. c Fig. 2 ....0--, ~ ~ ....N =~ \C •d \JJ. Register File I : right_opr_bus (operand data • (In-Order State) left_opr_bus to functional units) I ~ ......~ ~ retire bus Result =...... Comparator/ Shift Bypass Network Register • • • entry number I Reorder Buffer ---- J ~ (Look-Ahead State) - result bus (result value & exception condition ~ from a functional unit) N ~~ Result Shift Register cN Reorder Buffer c functional c stage valid tag entry dest. excep- program unit source result valid number reg tions counter N 0 'Jl • • • • • • • • • • ~=­ • • • • • • • • • • .....~ • • • • • • • • • • ~ shift 0....., 6 tail- direction 1 4 .i;;.. 5 float add c 5 0 0 17 4 0 head- 4 4 0 16 l 3 0 3 2 integer add 1 5 1 0 ....0--, ~ Note: N =the length of longest functional-unit pipeline ~ ....N Fig. 3 Reorder Buffer Organization =~ \C •d \JJ. • tag oocode S1 a(S1) S2 a(S2) D a<D) B(D) 12 ~ top ......~ 16 fadd RO 2 R4 2 RO 2 2 8 ~ 15 fadd R4 1 R6 1 R4 1 1 4 ...... 14 fadd R6 0 R7 0 R6 0 0 0 = 13 fadd R4 0 RS 0 R4 0 0 0 12 fadd RO 1 R2 1 RO 1 1 4 11 fadd R2 0 R3 0 R2 0 0 0 bottom IO fadd RO 0 R1 0 RO 0 0 0 ~ (a) Before Issue ~ N ~~ cN c taa oocode S1 a(S1) S2 a(S2) D a<D) B(D) 12 c top empty spaces ready for subsequent instructions 'Jl ~=­ 16 fadd RO 1 R4 1 RO 1 1 4 .....~ 15 fadd R4 0 R6 0 R4 0 0 0 .i;;.. bottom T2 fadd RO 0 R? 0 RO 0 0 0 0....., c.i;;.. (b) After Issue and Completion Note: S1/S2 =first/second source register identifier, D =destination register identifier, a(X) = # of times register X is designated as a destination register in preceding inst (below it), P(X) =#of times register Xis designated as a source register in preceding instruction (below it), = issue index= a(S1) + a(S2) + a(D) + p(D). ....0--, ~ 8-Entry Dispatch Stack ~ ....N Fig. 4 =~ \C U.S. Patent Aug. 29, 2000 Sheet 5 of 40 6,112,019 issue writeback 12 15 16 retire I 15 16 (a) Instruction Timing Note: "float add" takes 6 cycles to complete after issued. Entry PC Opcode Source 0 lerand 1Source 0 )erand 2 Destination Executed Excep- Number ·eady tag content ready tag conten tag con ten tions tail - 7 6 16 fadd 0 0.2 - 0 4.2 - 0.3 - 0 - I+-alloc 5 15 fadd 0 4.1 - 0 6.1 - 4.2 - 0 - I+- not rdy 4 14 fadd 1 - R60 1 - R~ 6.1 - 0 - I+-4th issue 3 13 fadd 1 - R40 1 - R50 4.1 - 0 - r+- 3rd issue 2 12 fadd 0 0.1 - 0 2.1 - 0.2 - 0 - -+- not rdy 1 11 fadd 1 - R~ 1 - R30 2.1 - 0 - -+- 2nd issue head - 0 IO fadd 1 - ROO 1 - R10 0.1 - 0 - -+- 1st issue (b) RUU Snapshot at Cycle 7 Note: Rik means the kth instance of register Ri (register renaming) Entry PC Opcode Source Ooerand 1Source Operand 2 Destination Executed Excep- Number eady tag content ready tag conten tag conten tions tail - 7 6 16 fadd 0 0.2 . 0 4.2 - 0.3 - 0 - -+- not rdy 5 15 fadd 0 4.1 - 0 6.1 - 4.2 - 0 - +- not rdy 4 14 fadd 1 - R60 1 - R?O 6.1 - 0 - -+- in exec 3 13 fadd 1 - R40 1 - R50 4.1 - 0 - -+- in exec 1 1 2 12 fadd 1 0.1 R0 1 2.1 R2 0.2 - 0 - -+- ready 1 11 fadd 1 - R20 1 - R30 2.1 R21 1 none -+- wrtback head 0 IO fadd 1 - ROO 1 - R10 0.1 R01 1 none -+- retire - (c) RUU Snapshot at Cycle 9 Register Update Unit Fig. 5 •d \JJ. • Instruction ~ Cache ......~ ~ =...... Fetch & Decode ORIS > Register ~= Retire Write- • File (Deferred-Scheduling, Register-Renaming back • N Instruction Shelf) I • I ~~ cN c c ••• operand buses • • • Issue/Schedule operand and instruction routing 'Jl write back ~=­ • • • buses .....~ O'I • • • bypass 0....., FU1 FUn buses c.i;;.. functional units • • • ....0--, Metaflow Architecture ~ ~ Fig. 6 ....N =~ \C •d \JJ. • cycle­ 1 2- 3- 4 5- 6- 7 8- 9- 10. - 11.. allocate­ IO, 11, 12, 13 14, 15, I6 ~ issue­ IO 11 13 14 12 15 16 ......~ writeback­ IO 11 13 14 12 15 ~ retire- IO 11 13 14 12 13 ...... (a) Instruction Timing = Note: Assume there are 4 allocate ports, 4 retire ports, -- 2 floating-point add FUs (class rum 2) with 3-cycle latency. Source Operand 1 Source Operand 2 Destination FU Index Disp. Exec. Opcode PC Locked R num ID Locked R num ID latest Rnum content class ~ 7 ~ N 6 1 0 0.2 1 4 0.5 1 0 - 0 2 0 fadd I6 ._alloc ._alloc ~~ 5 1 4 0.3 1 6 0.4 1 4 - 0 2 0 fadd 15 N c 4 0 6 - 0 7 - 1 6 - 0 2 0 fadd 14 -alloc c c 3 0 4 - 0 5 - 0 4 - 0 2 0 fadd 13 ._ready 2 1 0 0.0 1 2 0.1 0 0 - 0 2 0 fadd 12 .__not rdy 1 0 2 - 0 3 - 1 2 - 1 2 0 fadd 11 -1st iss h 0 0 0 1 0 1 fadd -1st iss 0 - - 0 - 2 0 IO 'Jl Note: Ri'\ means the ktn instance of (b) ORIS Snapshot at Cycle 2 ~=­ register Ri (register renaming) .....~ Source Operand 1 Source Operand 2 Destination FU -..J Disp.
Recommended publications
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Section 2. Worldwide IC Vendors
    2 WORLDWIDE IC MANUFACTURERS OVERVIEW Analysis of worldwide integrated circuit (IC) sales by region from 1975 through 1996 provides a perspective to the increasingly global importance of semiconductor manufacturing (Figure 2-1). Historically, sales by North American companies were dominant in the mid-1970s. By the mid- 1980s IC sales were evenly split between North American and Japanese companies. By 1990, the Japanese had increased their share of worldwide IC sales to a leading 48 percent, mainly at the expense of North American companies. Then, in 1992 Japan’s economy weakened causing their share to drop; a persistent economic slump in the following years coupled with Korean IC manu- facturers’ success in the memory market caused the Japanese share to continue dropping. The Japanese share fell further in 1996 primarily because of the yen to dollar exchange rate and the decline in the dynamic random access memory (DRAM) market. While market shares for North American and Japanese IC manufacturers have fluctuated over the past 22 years, European IC companies have continued to capture a relatively steady seven to ten percent of worldwide IC sales. The share belonging to ROW companies, which is dominated by Korean and Taiwanese manufacturers, surpassed that of European companies in 1992, reaching 15 percent in 1995, and 14 percent in 1996. A ranking of the world’s top ten merchant semiconductor sales leaders in 1996, and ICE’s estimate for 1997, is given in Figure 2-2. The estimated sales for these companies as a group increased 10 percent during 1997. The 1996 top fifteen worldwide IC sales leaders are listed in Figure 2-3.
    [Show full text]
  • Bruce D. Lightner 3/3/16, 5:07 PM
    Bruce D. Lightner 3/3/16, 5:07 PM Bruce D. Lightner's Home Page Bruce Who? I am a seventh generation Californian born in smoggy Los Angeles, California. My mother (who had asthma) wised up and moved us to Guam (on a freighter, the USS Edgar F. Luckenbach) when I was in the second grade. I returned to LA (North Hollywood) in 1963 to finish high school...because Typhon Karen blew away Guam's only high school. In 1966, as soon as I graduated from North Hollywood High School, I too wised up and moved from LA to San Diego to study something at Revelle College at UC San Diego. (You can't choose a major there before your third year.) In exchange for washing dishes, they eventually taught me chemistry. I met my wife Sherri there...we both worked in the cafeteria. I didn't know she was going get me to marry her at the time...but to her...and my...and her parent's...surprise, she eventually did. The Revelle College Provost finally made me graduate in 1972...but being clever, I enrolled in graduate school at UCSD and managed to stay even longer, learning to be a geo/cosmo-chemist. In the Chemistry Department at UCSD I learned a lot about moon rocks and meteorites...and computers and electronics. Although I am technically a chemist, no one seems to want to pay me for that. Instead, they pay me for something I never studied in school, namely computers. That's actually best, because I never liked chemistry that much, except for the computers...and the part where we melted moon rocks with a big radio transmitter and analyzed the He, Ne, Ar, Kr and Xe isotopes which leaked out.
    [Show full text]
  • AES Candidate C
    Public Comments Regarding The Advanced Encryption Standard (AES) Development Effort Round 1 Comments [in response to a notice in the September 14, 1998 Federal Register (Volume 63, Number 177; pages 49091-49093)] From: "Yasuyoshi Kaneko" <[email protected]> To: <[email protected]> Subject: AES Candidate Comments Date: Tue, 25 Aug 1998 11:43:28 +0900 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 4.72.2106.4 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.2106.4 Messrs. Information Technology Laboratory I am Y.Kaneko belonging Telecommunications Advancement Organization of Japan(TAO). I have found the specifications of all AES candidate Algolizums on web and begun to read. In my opinion, these 15 ciphers are classified to 4 groups. Namely 'DES-type' , 'IDEA-type' , 'SAFER-type' and 'other-type' are found out from these specifications. 'DES-type' ciphers are 'DEAL', 'DFC', 'E2' and 'LOKI97' , these ciphers have feistel structures with a F-function. 'IDEA-type' ciphers are 'CAST-256', 'MARS', 'RC6' and 'TWOFISH', these ciphers have paralell 4 inputs and 4 outputs with some (F-) functions. 'SAFER-type' ciphers are 'CRYPTON', 'MAGENTA', 'RIJENDA', 'SAFER+' and 'SERPENT', these ciphers have a kind of Pseude-Hadamard Transform structure with invertible functions per same bits-length input/output. 'other-type' ciphers are 'HPC' and 'FLOG' these ciphers have particular structures with original algolithms. The differences of the first three types are summarized as followings: 1) Differential and Linear cryptanalysis From the viewpoint of the differential and linear cryptanalysis, supposing that r-round cipher (which round numbers would be logically and experimentaly obtained) gives an enough strong ciphers for each types then 'DES-type' ciphers need r+3 or r+2 round to keep security, because (r-2)-round or (r-3)-round differential or linear attacks are possible for the feistel structure, 'IDEA-type' ciphers and 'SAFER-type' ciphers need r+1 round to keep security in general meaning.
    [Show full text]