United States Patent [19J [11] Patent Number: 6,112,019 Chamdani Et Al

United States Patent [19J [11] Patent Number: 6,112,019 Chamdani Et Al

Illlll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111 US006112019A United States Patent [19J [11] Patent Number: 6,112,019 Chamdani et al. [45] Date of Patent: Aug. 29, 2000 [54] DISTRIBUTED INSTRUCTION QUEUE "An Efficient Algorithm for Exploring Multiple Arithmetic Units," Tomasulo IBM Journal, Jan. 1967, pp. 25-33. [75] Inventors: Joseph I. Chamdani, Marietta; Cecil 0. Alford, Lawrenceville, both of Ga. "Implementation of Precise Interrupts In Pipelined Proces­ sors," James E. Smith Andrew R. Pleszkun,© 1985 IEEE, [73] Assignee: Georgia Tech Research Corp., Atlanta, pp. 36-44. Ga. "Instruction Issue Logic in Pipelined Supercomputers", [21] Appl. No.: 08/489,509 Shlomo Weiss & James E. Smith,© 1984 IEEE, Transac­ [22] Filed: Jun. 12, 1995 tions on Computers, vol. c-33, No. 11, pp. 1012-1022 Nov. 1984. [51] Int. CI.7 ........................................................ G06F 9/22 [52] U.S. Cl. .............................................................. 395/390 "The Metafiow Architecture", Popescu et al, IEEE Micro,© [58] Field of Search ..................................... 395/376, 378, 1991 IEEE, pp. 10-13, 63-73. 395/381, 382, 391, 393, 390 [56] References Cited Primary Examiner-David Y. Eng Attorney, Agent, or Firm-Thomas, Kayden, Horstemeyer U.S. PATENT DOCUMENTS & Risley, L.L.P. 3,924,245 12/1975 Eaton et al. ....................... 395/421.09 4,725,947 2/1988 Shonai et al. ........................... 395/585 [57] ABSTRACT 4,736,288 4/1988 Shintani et al. .. ... ... ... .... ... ... ... 395 /395 4,752,873 6/1988 Shonai et al. ...................... 395/800.23 A distributed instruction queue (DIQ) in a superscalar 4,780,810 10/1988 Torii et al. .... ... ... ... ... .... ... ... ... 395 /800 microprocessor supports multi-instruction issue, decoupled 4,896,258 1/1990 Yamaguchi et al. .................... 395/583 data flow scheduling, out-of-order execution, register 4,992,938 2/1991 Cocke et al. ............................ 395/393 renaming, multi-level speculative execution, and precise 5,050,067 9/1991 McLogan et al. ...................... 395/678 5,122,984 6/1992 Strehler ..................................... 365/49 interrupts. The DIQ provides distributed instruction shelving 5,129,067 7/1992 Johnson .................................. 395/375 without storing register values, operand value copying, and 5,136,697 8/1992 Johnson .................................. 395/375 result value forwarding, and supports in-order issue as well 5,208,914 5/1993 Wilson et al. .......................... 395/275 as out-of-order issue within its functional unit. The DIQ 5,261,066 11/1993 Jouppi et al. ........................... 395/425 5,345,569 9/1994 Tran ........................................ 395/393 allows a reduction in the number of global wires and 5,355,457 10/1994 Shebanow et al. ..................... 395/394 replacement with private-local wires in the processor. The 5,367,703 11/1994 Levitan . ... ... ... .... ... ... ... ... ... .... .. 395 /800 DIQ's number of global wires remains the same as the 5,371,684 12/1994 Iadonato et al. ........................ 364/491 number of DIQ entries and data size increases. The DIQ 5,414,822 5/1995 Saito et al. .............................. 395/375 maintains maximum machine parallelism and the actual 5,619,730 4/1997 Ando ....................................... 395/855 performance of the microprocessor using the DIQ is better OIBER PUBLICATIONS due to reduced cycle time or more operations executed per Sohi, "Instruction Issue Logic for High-Performance, Inter­ cycle. ruptible, Multiple Function Unit, Pipelined Computers", 1990 IEEE, pp. 349-359. 22 Claims, 40 Drawing Sheets (from dispatch buses) (to execution unit) 300 340 DIQ alloc_port 310 bottom_DIQ_entry (O .. Nd-1) (issued_DIQ_entry) / 365 mispred_flag 350 Tail Pointer Logic 360 391 Note: insLID ;:;unique instruction tag opcode =opcode of the instruction RS1 =register number of first source operand RS1 _tag =register tag of first source operand RS2 =register number of second source operand RS2_tag ""register tag of second source operand In-order Issue Distributed Instruction Queue(DIQ) •d \JJ. • Fetch Decode Dispatch Issue Execute Writeback Retire ~ ~ Execution Unit results to ...... Result ~ instructions Register Decoder Central ~ • Buffer ...... from Window • [[ File = I-cache __J • w Execution Unit load Store store to Load/Store Unit > Buffer D-cache ~= N ~~ load from D-cache cN c c (a) with a Central Instruction Window Fetch Decode Dispatch Issue Execute Write back Retire 'Jl ~=- ~ Dist. Window Execution Unit results to ..... Result • instructions Register • • Buffer • '"""'0....., from Decoder File • K: • • .i;;.. I-cache • • c Dist. Window I •I Execution Unit Store store to Load/Store Unit r ~1 Dist. Window --- Buffer D-cache ....0--, load from D-cache ~ ~ Fig. 1 ....N (a) with Distributed Instruction Windows =~ \C •d \JJ. • ~ ......~ ~ =...... Note: inst 1 and inst 2 can be a floating-point arithmetic and/or a load/store instruction ~ inst 1 inst 2 ~ N Free List (FL) ~~ S2IS3 OPITIS1 IS2IS3 cN I I I I I c c Mapping Table 33 I 34 I 35 I 36 I 371 38 I 39 32X6 3 'Jl Pending-Target ~=­ Return Queue (PTRQ) .....~ N 0....., Register Mapping Table in IBM RS/6000 Floating-Point Unit .i;;.. c Fig. 2 ....0--, ~ ~ ....N =~ \C •d \JJ. Register File I : right_opr_bus (operand data • (In-Order State) left_opr_bus to functional units) I ~ ......~ ~ retire bus Result =...... Comparator/ Shift Bypass Network Register • • • entry number I Reorder Buffer ---- J ~ (Look-Ahead State) - result bus (result value & exception condition ~ from a functional unit) N ~~ Result Shift Register cN Reorder Buffer c functional c stage valid tag entry dest. excep- program unit source result valid number reg tions counter N 0 'Jl • • • • • • • • • • ~=­ • • • • • • • • • • .....~ • • • • • • • • • • ~ shift 0....., 6 tail- direction 1 4 .i;;.. 5 float add c 5 0 0 17 4 0 head- 4 4 0 16 l 3 0 3 2 integer add 1 5 1 0 ....0--, ~ Note: N =the length of longest functional-unit pipeline ~ ....N Fig. 3 Reorder Buffer Organization =~ \C •d \JJ. • tag oocode S1 a(S1) S2 a(S2) D a<D) B(D) 12 ~ top ......~ 16 fadd RO 2 R4 2 RO 2 2 8 ~ 15 fadd R4 1 R6 1 R4 1 1 4 ...... 14 fadd R6 0 R7 0 R6 0 0 0 = 13 fadd R4 0 RS 0 R4 0 0 0 12 fadd RO 1 R2 1 RO 1 1 4 11 fadd R2 0 R3 0 R2 0 0 0 bottom IO fadd RO 0 R1 0 RO 0 0 0 ~ (a) Before Issue ~ N ~~ cN c taa oocode S1 a(S1) S2 a(S2) D a<D) B(D) 12 c top empty spaces ready for subsequent instructions 'Jl ~=­ 16 fadd RO 1 R4 1 RO 1 1 4 .....~ 15 fadd R4 0 R6 0 R4 0 0 0 .i;;.. bottom T2 fadd RO 0 R? 0 RO 0 0 0 0....., c.i;;.. (b) After Issue and Completion Note: S1/S2 =first/second source register identifier, D =destination register identifier, a(X) = # of times register X is designated as a destination register in preceding inst (below it), P(X) =#of times register Xis designated as a source register in preceding instruction (below it), = issue index= a(S1) + a(S2) + a(D) + p(D). ....0--, ~ 8-Entry Dispatch Stack ~ ....N Fig. 4 =~ \C U.S. Patent Aug. 29, 2000 Sheet 5 of 40 6,112,019 issue writeback 12 15 16 retire I 15 16 (a) Instruction Timing Note: "float add" takes 6 cycles to complete after issued. Entry PC Opcode Source 0 lerand 1Source 0 )erand 2 Destination Executed Excep- Number ·eady tag content ready tag conten tag con ten tions tail - 7 6 16 fadd 0 0.2 - 0 4.2 - 0.3 - 0 - I+-alloc 5 15 fadd 0 4.1 - 0 6.1 - 4.2 - 0 - I+- not rdy 4 14 fadd 1 - R60 1 - R~ 6.1 - 0 - I+-4th issue 3 13 fadd 1 - R40 1 - R50 4.1 - 0 - r+- 3rd issue 2 12 fadd 0 0.1 - 0 2.1 - 0.2 - 0 - -+- not rdy 1 11 fadd 1 - R~ 1 - R30 2.1 - 0 - -+- 2nd issue head - 0 IO fadd 1 - ROO 1 - R10 0.1 - 0 - -+- 1st issue (b) RUU Snapshot at Cycle 7 Note: Rik means the kth instance of register Ri (register renaming) Entry PC Opcode Source Ooerand 1Source Operand 2 Destination Executed Excep- Number eady tag content ready tag conten tag conten tions tail - 7 6 16 fadd 0 0.2 . 0 4.2 - 0.3 - 0 - -+- not rdy 5 15 fadd 0 4.1 - 0 6.1 - 4.2 - 0 - +- not rdy 4 14 fadd 1 - R60 1 - R?O 6.1 - 0 - -+- in exec 3 13 fadd 1 - R40 1 - R50 4.1 - 0 - -+- in exec 1 1 2 12 fadd 1 0.1 R0 1 2.1 R2 0.2 - 0 - -+- ready 1 11 fadd 1 - R20 1 - R30 2.1 R21 1 none -+- wrtback head 0 IO fadd 1 - ROO 1 - R10 0.1 R01 1 none -+- retire - (c) RUU Snapshot at Cycle 9 Register Update Unit Fig. 5 •d \JJ. • Instruction ~ Cache ......~ ~ =...... Fetch & Decode ORIS > Register ~= Retire Write- • File (Deferred-Scheduling, Register-Renaming back • N Instruction Shelf) I • I ~~ cN c c ••• operand buses • • • Issue/Schedule operand and instruction routing 'Jl write back ~=­ • • • buses .....~ O'I • • • bypass 0....., FU1 FUn buses c.i;;.. functional units • • • ....0--, Metaflow Architecture ~ ~ Fig. 6 ....N =~ \C •d \JJ. • cycle­ 1 2- 3- 4 5- 6- 7 8- 9- 10. - 11.. allocate­ IO, 11, 12, 13 14, 15, I6 ~ issue­ IO 11 13 14 12 15 16 ......~ writeback­ IO 11 13 14 12 15 ~ retire- IO 11 13 14 12 13 ...... (a) Instruction Timing = Note: Assume there are 4 allocate ports, 4 retire ports, -- 2 floating-point add FUs (class rum 2) with 3-cycle latency. Source Operand 1 Source Operand 2 Destination FU Index Disp. Exec. Opcode PC Locked R num ID Locked R num ID latest Rnum content class ~ 7 ~ N 6 1 0 0.2 1 4 0.5 1 0 - 0 2 0 fadd I6 ._alloc ._alloc ~~ 5 1 4 0.3 1 6 0.4 1 4 - 0 2 0 fadd 15 N c 4 0 6 - 0 7 - 1 6 - 0 2 0 fadd 14 -alloc c c 3 0 4 - 0 5 - 0 4 - 0 2 0 fadd 13 ._ready 2 1 0 0.0 1 2 0.1 0 0 - 0 2 0 fadd 12 .__not rdy 1 0 2 - 0 3 - 1 2 - 1 2 0 fadd 11 -1st iss h 0 0 0 1 0 1 fadd -1st iss 0 - - 0 - 2 0 IO 'Jl Note: Ri'\ means the ktn instance of (b) ORIS Snapshot at Cycle 2 ~=­ register Ri (register renaming) .....~ Source Operand 1 Source Operand 2 Destination FU -..J Disp.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    79 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us