Lecture 6: Tomasulo Algorithm (II)
Explanations of each stages, A big example
1 Three Stages of Tomasulo Algorithm 1.Issue—get instruction from FP Op Queue If reservation station free (no structural hazard), control issues instr & sends operands (renames registers). 2.Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch Common Data Bus for result 3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting units; mark reservation station available
Issue: build dependence for new inst Writeback: Wakeup dependent instructions
2 Adapted from UCB CS252 S98 Issue Stage and Renaming Table
Renames its two source registers (source renaming) Assigns it to a free RS Updates Renaming table (dest renaming) Also decodes the inst and read register values in parallel
How would the following inst be renamed? ADD $16, $8, $9 ADD $17, $16, $16
3 Execute Stage Only “ready” instructions can join the competition There is a select logic to select instructions for FU execution Some policy may be used, e.g. age based Non-ready instructions can be “waken up” during writeback of its parent inst
4 Writeback and Common Data Bus
Normal data bus: data + destination (“go to” bus) Common data bus: data + source (“come from” bus) 64 bits of data + 4 bits of source index (tag) Does the broadcast to every instruction in the fly Child instructions do tag matching and update their ready bits and value fields (if the tag matches theirs)
5 Adapted from UCB CS252 S98, Copyright 1998 USB Code Example LD F6,34(R2) LD F2,45(R3) LD1 LD2 MULTI F0,F2,F4 SUBD MULTI SUBD F8,F6,F2 DIVD F10,F0,F6 ADD DIVD ADD F6,F8,F2
Operation latencies: load/store 2 cycles, Add/sub 2 cycles, Mult 10 cycles, divide 40 cycle
6 Tomasulo Example Cycle 0
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 Load1 No LD F2 45+ R3 Load2 No MULT F0 F2 F4 Load3 No SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 No 0 Mult2 No Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 0 FU
7 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 1
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 Load1 NoYes 34+R2 LD F2 45+ R3 Load2 No MULT F0 F2 F4 Load3 No SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 1 FU Load1
8 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 2
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 Load1 Yes 34+R2 LD F2 45+ R3 2 Load2 Yes 45+R3 MULT F0 F2 F4 Load3 No SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 2 FU Load2 Load1
9 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 3
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 Load1 Yes 34+R2 LD F2 45+ R3 2 Load2 Yes 45+R3 MULT F0 F2 F4 3 Load3 No SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 0 Add2 No Add3 No 0 Mult1 Yes MULTD R(F4) Load2 0 Mult2 No Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 3 FU Mult1 Load2 Load1 • Note: registers names are removed (“renamed”) in Reservation Stations
• Load1 completing; what is waiting for Load1? 10 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 4
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 Load2 Yes 45+R3 MULT F0 F2 F4 3 Load3 No SUBD F8 F6 F2 4 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 Yes SUBD M(34+R2) Load2 0 Add2 No Add3 No 0 Mult1 Yes MULTD R(F4) Load2 0 Mult2 No Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 4 FU Mult1 Load2 M(34+R2) Add1
• Load2 completing; what is waiting for it?
11 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 5
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 Load3 No SUBD F8 F6 F2 4 DIVD F10 F0 F6 5 ADDD F6 F8 F2 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 2 Add1 Yes SUBD M(34+R2) M(45+R3) 0 Add2 No Add3 No 10 Mult1 Yes MULTD M(45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 5 FU Mult1 M(45+R3) M(34+R2) Add1 Mult2
12 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 6
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 Load3 No SUBD F8 F6 F2 4 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 1 Add1 Yes SUBD M(34+R2) M(45+R3) 0 Add2 Yes ADDD M(45+R3) Add1 Add3 No 9 Mult1 Yes MULTD M(45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 6 FU Mult1 M(45+R3) Add2 Add1 Mult2
13 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 7
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 Load3 No SUBD F8 F6 F2 47 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 Yes SUBD M(34+R2) M(45+R3) 0 Add2 Yes ADDD M(45+R3) Add1 Add3 No 8 Mult1 Yes MULTD M(45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 7 FU Mult1 M(45+R3) Add2 Add1 Mult2
• Add1 completing; what is waiting for it?
14 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 8
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 Load3 No SUBDF8 F6 F2 47 8 DIVD F10 F0 F6 5 ADDDF6 F8 F2 6 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 2 Add2 Yes ADDD M()-M() M(45+R3) 0 Add3 No 7 Mult1 Yes MULTDM(45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 8 FU Mult1 M(45+R3) Add2 M()-M() Mult2
15 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 9
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 Load3 No SUBD F8 F6 F2 47 8 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 1 Add2 Yes ADDD M()–M() M(45+R3) 0 Add3 No 6 Mult1 Yes MULTD M(45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 9 FU Mult1 M(45+R3) Add2 M()–M() Mult2
16 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 10
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 Load3 No SUBD F8 F6 F2 47 8 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 10 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 0 Add2 Yes ADDD M()–M() M(45+R3) 0 Add3 No 5 Mult1 Yes MULTD M(45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 10 FU Mult1 M(45+R3) Add2 M()–M() Mult2
• Add2 completing; what is waiting for it?
17 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 11
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 Load3 No SUBDF8 F6 F2 47 8 DIVD F10 F0 F6 5 ADDDF6 F8 F2 6 10 11 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 0 Add2 No 0 Add3 No 4 Mult1 Yes MULTDM(45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 11 FU Mult1 M(45+R3) (M-M)+M() M()ŠM() Mult2
• Write result of ADDD here vs. scoreboard? 18 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 12
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MUL F0 F2 F4 3 Load3 No DSUBF10 F8 F0 F6 F6 F25 47 8 A DD F6DIV F8 F2 6 10 11 Reservation Stations S1 S2 RS for j RS for k Time Name Bus yOp Vj Vk Qj Qk 0 Add1 No 0 Add2 No 0 Add3 No T M(45+R3) R(F4) 3Mult10Mult2YesYes MUL DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 12 FU Mult1 M(45+R3) (M-M)+M( ) M()–M (Mult2
19 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 13
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 Load3 No SUBD F8 F6 F2 47 8 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 10 11 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 0 Add2 No Add3 No 2 Mult1 Yes MULTD M(45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 13 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2
20 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 14
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 Load3 No SUBD F8 F6 F2 47 8 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 10 11 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 0 Add2 No 0 Add3 No 1 Mult1 Yes MULTD M(45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 14 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2
21 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 15
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 15 Load3 No SUBD F8 F6 F2 47 8 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 10 11 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 0 Add2 No Add3 No 0 Mult1 Yes MULTD M(45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 15 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2 • Mult1 completing; what is waiting for it?
22 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 16
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 15 16 Load3 No SUBD F8 F6 F2 47 8 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 10 11 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 40 Mult2 Yes DIVD M*F4 M(34+R2) Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 16 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2 • Note: Just waiting for divide
23 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 55
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 15 16 Load3 No SUBD F8 F6 F2 47 8 DIVD F10 F0 F6 5 ADDD F6 F8 F2 6 10 11 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 1 Mult2 Yes DIVD M*F4 M(34+R2) Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 55 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2
24 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 56
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 15 16 Load3 No SUBD F8 F6 F2 47 8 DIVD F10 F0 F6 556 ADDD F6 F8 F2 6 10 11 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 Yes DIVD M*F4 M(34+R2) Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 56 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2
• Mult 2 completing; what is waiting for it?
25 Adapted from UCB CS252 S98, Copyright 1998 USB Tomasulo Example Cycle 57
Instruction status Execution Write Instruction j k Issue complete Result Busy Address LD F6 34+ R2 1 3 4 Load1 No LD F2 45+ R3 2 4 5 Load2 No MULT F0 F2 F4 3 15 16 Load3 No SUBD F8 F6 F2 47 8 DIVD F10 F0 F6 55657 ADDD F6 F8 F2 6 10 11 Reservation Stations S1 S2 RS for j RS for k Time Name Busy Op Vj Vk Qj Qk 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 57 FU M*F4 M(45+R3) (M–M)+M() M()–M() M*F4/M • Again, in-oder issue, out-of-order execution, completion 26 Adapted from UCB CS252 S98, Copyright 1998 USB Review Dependences How are dependences are enforced or removed in Tomasulo Algorithm?
Data dependences (RAW) Antidependence (WAR) Output Dependence (WAW)
27 The Use of Tag
In Tomasulo, RS and load/store buffer index is used as tag. Renaming assign new Why tag is so chosen? inst a unique tag RS stores tags to What does tag really preserve dependences represent? CDB broadcasts tag with data for data What else can be used passing and wakeup as tag?
28 Tomasulo Summary
Reservations stations: Increases effective register number Distributes scheduling logic
Register renaming: Avoids WAR and WAW dependence
Tag + Data broadcasting for waking up child instructions
Pros: can be effectively combined with speculative execution
Cons: CDB broadcasting adds one-cycle delay (addressed in modern instruction scheduling)
29 Adapted from UCB CS252 S98, Copyright 1998 USB