History Scoreboarding Overview Machine Correctness Four Stages

History 1966: scoreboarding in CDC6600, Lecture 6: Scoreboarding and Tomasulo implementing limited dynamic Algorithm scheduling Three years later: Tomasulo in IBM 360/91, introducing register renaming and reservation station Now appearing in todays Dec Alpha, SGI MIPS, SUN UltraSparc, Intel Pentium, IBM PowerPC, and others 1 Zhao Zhang, CPRE 581, Fall 2005 2 Adapted from UCB CS252 S98, Copyright 1998 USB Scoreboarding Overview Machine Correctness Basic idea: E(D,P) = E(S,P) if Use scoreboard to track data (RAW) dependence 1. E(D,P) and E(S,P) execute the same set of through register instructions Main points of design: 2. For any inst i, i receives the outputs in Instructions are sent to FU unit if there is no E(D,P) of its parents in E(S,P) outstanding name dependence 3. In E(D,P) any register or memory word receives RAW data dependence is tracked and enforced the output of inst j, where j is the last by scoreboard instruction writes to the register or memory Register values are passed through the register word in E(S,P) file; a child instruction starts execution after the last parent finishes execution Scoreboarding merit: Be able to execute Pipeline stalls if any name dependence (WAR or independent instructions in parallel without WAW) is detected violating statement 2. Zhao Zhang, CPRE 581, Fall 2005 3 Zhao Zhang, CPRE 581, Fall 2005 4 Four Stages of Scoreboard Control Four Stages of Scoreboard Control Fetch first, then 3.Execution—operate on operands (EX) 1. Issue—decode instructions & check for structural Actions: The functional unit begins execution upon receiving hazards operands. When the result is ready, it notifies the Wait conditions: (1) the required FU is free; (2) no other scoreboard that it has completed execution. instruction writes to the same register destination (to 4.Write result—finish execution (WB) avoid WAW) Wait condition: no other instruction/FU is going to read the Actions: (1) the instruction proceeds to the FU; (2) scoreboard updates its internal data structure register destination of the instruction If an instruction is stalled at this stage, no other instructions Actions: Write the register and update the scoreboard can proceed WAR Example: 2. Read operands—wait until no data hazards, then DIVD F0,F2,F4 read operands ADDD F10,F0,F8 Wait conditions: all source operands are available SUBD F8,F8,F14 Actions: the function unit reads register operands and start execution the next cycle CDC 6600 scoreboard would stall SUBD until ADDD reads operands Zhao Zhang, CPRE 581, Fall 2005 5 Zhao Zhang, CPRE 581, Fall 2005 6 1 Code Example Scoreboard Connections LD F6,34(R2) Registers LD F2,45(R3) LD1 LD2 FP mult FP mult MULTI F0,F2,F4 SUBD MULTI SUBD F8,F6,F2 FP div … DIVD F10,F0,F6 ADD DIVD FP add ADD F6,F8,F2 INT unit Operation latencies: load/store 2 cycles, Scoreboard Add/sub 2 cycles, Mult 10 cycles, divide 40 cycle Control/ Control/ status status Zhao Zhang, CPRE 581, Fall 2005 7 Zhao Zhang, CPRE 581, Fall 2005 8 Three Parts of the Scoreboard Scoreboard Example Instruction status Read ExecutioWrite 1. Instruction status—which of 4 steps the instruction is in Instruction j k Issue operandcomplet Result LD F6 34+ R2 2. Functional unit status—Indicates the state of the functional LD F2 45+ R3 unit (FU). 9 fields for each functional unit MULT F0 F2 F4 Busy—Indicates whether the unit is busy or not SUBD F8 F6 F2 Op—Operation to perform in the unit (e.g., + or –) DIVD F10 F0 F6 Fi—Destination register ADDD F6 F8 F2 Functional unit status dest S1 S2 FU for j FU for k Fj? Fk? Fj, Fk—Source-register numbers Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Qj, Qk—Functional units producing source registers Fj, Fk Integer No Rj, Rk—Flags indicating when Fj, Fk are ready Mult1 No Mult2 No 3. Register result status—Indicates which functional unit will Add No write each register, if one exists. Blank when no pending Divide No instructions will write that register Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 FU 9 10 Scoreboard Example Cycle 1 Scoreboard Example Cycle 2 Instruction status Read ExecutioWrite Instruction status Read ExecutioWrite Instruction j k Issue operand completeResult Instruction j k Issue operand completeResult LD F6 34+ R2 1 LD F6 34+ R2 1 2 LD F2 45+ R3 LD F2 45+ R3 MULT F0 F2 F4 MULT F0 F2 F4 SUBD F8 F6 F2 SUBD F8 F6 F2 DIVD F10 F0 F6 DIVD F10 F0 F6 ADDD F6 F8 F2 ADDD F6 F8 F2 Functional unit status dest S1 S2 FU for j FU for k Fj? Fk? Functional unit status dest S1 S2 FU for j FU for k Fj? Fk? Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer Yes Load F6 R2 Yes Integer Yes Load F6 R2 Yes Mult1 No Mult1 No Mult2 No Mult2 No Add No Add No Divide No Divide No Register result status Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 Clock F0 F2 F4 F6 F8 F10 F12 ... F30 1 FU Integer 2 FU Integer • Issue 2nd LD? 11 12 2 Scoreboard Example Cycle 3 Scoreboard Example Cycle 4 Instruction status Read ExecutioWrite Instruction status Read ExecutioWrite Instruction j k Issue operand completeResult Instruction j k Issue operand completeResult LD F6 34+ R2 1 2 3 LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 LD F2 45+ R3 MULT F0 F2 F4 MULT F0 F2 F4 SUBD F8 F6 F2 SUBD F8 F6 F2 DIVD F10 F0 F6 DIVD F10 F0 F6 ADDD F6 F8 F2 ADDD F6 F8 F2 Functional unit status dest S1 S2 FU for j FU for k Fj? Fk? Functional unit status dest S1 S2 FU for j FU for k Fj? Fk? Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer Yes Load F6 R2 Yes Integer Yes Load F6 R2 Yes Mult1 No Mult1 No Mult2 No Mult2 No Add No Add No Divide No Divide No Register result status Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 Clock F0 F2 F4 F6 F8 F10 F12 ... F30 3 FU Integer 4 FU Integer • Issue MULT? 13 14 Scoreboard Example Cycle 5 Scoreboard Example Cycle 6 Instruction status Read ExecutioWrite Instruction status Read ExecutioWrite Instruction j k Issue operand completeResult Instruction j k Issue operand completeResult LD F6 34+ R2 1 2 3 4 LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 LD F2 45+ R3 56 MULT F0 F2 F4 MULT F0 F2 F4 6 SUBD F8 F6 F2 SUBD F8 F6 F2 DIVD F10 F0 F6 DIVD F10 F0 F6 ADDD F6 F8 F2 ADDD F6 F8 F2 Functional unit status dest S1 S2 FU for j FU for k Fj? Fk? Functional unit status dest S1 S2 FU for j FU for k Fj? Fk? Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer Yes Load F2 R3 Yes Integer Yes Load F2 R3 Yes Mult1 No Mult1 Yes Mult F0 F2 F4 Integer No Yes Mult2 No Mult2 No Add No Add No Divide No Divide No Register result status Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 Clock F0 F2 F4 F6 F8 F10 F12 ... F30 5 FU Integer 6 FU Mult1 Integer 15 16 Scoreboard Example Cycle 7 Scoreboard Example Cycle 8a Instruction status Read ExecutioWrite Instruction status Read ExecutioWrite Instruction j k Issue operand completeResult Instruction j k Issue operand completeResult LD F6 34+ R2 1 2 3 4 LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 56 7 LD F2 45+ R3 56 7 MULT F0 F2 F4 6 MULT F0 F2 F4 6 SUBD F8 F6 F2 7 SUBD F8 F6 F2 7 DIVD F10 F0 F6 DIVD F10 F0 F6 8 ADDD F6 F8 F2 ADDD F6 F8 F2 Functional unit status dest S1 S2 FU for j FU for k Fj? Fk? Functional unit status dest S1 S2 FU for j FU for k Fj? Fk? Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer Yes Load F2 R3 Yes Integer Yes Load F2 R3 Yes Mult1 Yes Mult F0 F2 F4 Integer No Yes Mult1 Yes Mult F0 F2 F4 Integer No Yes Mult2 No Mult2 No Add Yes Sub F8 F6 F2 Integer Yes No Add Yes Sub F8 F6 F2 Integer Yes No Divide No Divide Yes Div F10 F0 F6 Mult1 No Yes Register result status Register result status Clock F0 F2 F4 F6 F8 F10 F12 ... F30 Clock F0 F2 F4 F6 F8 F10 F12 ... F30 7 FU Mult1 Integer Add 8 FU Mult1 Integer Add Divide • Read multiply operands? 17 18 3 Scoreboard Example Cycle 8b Scoreboard Example Cycle 9 Instruction status Read ExecutioWrite Instruction status Read ExecutioWrite Instruction j k Issue operand completeResult Instruction j k Issue operand completeResult LD F6 34+ R2 1 2 3 4 LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 56 78 LD F2 45+ R3 56 78 MULT F0 F2 F4 6 MULT F0 F2 F4 69 SUBD F8 F6 F2 7 SUBD F8 F6 F2 79 DIVD F10 F0 F6 8 DIVD F10 F0 F6 8 ADDD F6 F8 F2 ADDD F6 F8 F2 Functional unit status dest S1 S2 FU for j FU for k Fj? Fk? Functional unit status dest S1 S2 FU for j FU for k Fj? Fk? Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer No Integer No Mult1 Yes Mult F0 F2 F4 Yes Yes 10 Mult1 Yes Mult F0 F2 F4 Yes Yes Mult2 No Mult2 No Add Yes Sub F8 F6 F2 Yes Yes 2 Add Yes Sub F8 F6 F2 Yes Yes Divide Yes Div F10 F0 F6 Mult1 No Yes Divide Yes Div F10 F0 F6 Mult1 No Yes Register result status Register result status Clock F0 F2 F4 F6 F8 F10 F12 ..

History Scoreboarding Overview Machine Correctness Four Stages

Pipelining 2

EECS 570 Lecture 5 GPU Winter 2021 Prof

14 Superscalar Processors

Concepts Introduced in Chapter 3 Instruction Level Parallelism (ILP)

Intro Evolution of Superscalar Processor

Simty: Generalized SIMT Execution on RISC-V Caroline Collange

Advanced RISC-V Architectures

Overview of 15-740

5 10 15 20 25 30 35 Class 712 Electrical Computers And

Modern Computer Architectures Lecture-1: Introduction

Pipelining: Basic and Intermediate Concepts

Lecture 6: Scoreboarding and Tomasulo Algorithm