Computer Architecture Spring 2016

Computer Architecture Spring 2016 Lecture 11: Dynamic Scheduling: Scoreboarding Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CS252, UC Berkeley and CS 246, Harvard University] Dynamic Scheduling • Dynamic scheduling: rely on hardware to rearrange inst. execution to reduce the stalls while maintaining exception behavior and data flow • Advantages of dynamic scheduling: – Handling cases when dependences are unknown at compile time, e.g., involving memory reference – Simplifies compiler – Allow code that compiled for one pipeline to run efficiently on a different pipeline – Hardware speculation, a technique with significant performance advantages, that builds on dynamic scheduling Dynamic Scheduling: The Idea • Key idea: allow instructions behind stall to proceed DIVD F0 ,F2,F4 ADDD F10, F0 ,F8 SUBD F12,F8,F14 • Enables out-of-order execution and allows out-of-order completion - But • Dynamic scheduling introduces the possibility of WAW and WAR hazards • Dynamic scheduling processor may generate imprecise exceptions Precise vs. Imprecise Exceptions • Precise exception: if the pipeline can be stopped so that the inst. before the faulting inst. are completed and those after it can be restarted from scratch. • Imprecise exception: if the processor state when an exception is raised does not look like exactly as if the inst. were executed sequentially in strict program order – Inst. later in program order than the faulting inst. have completed – Inst. earlier in program order than the faulting inst. have not yet completed Dynamic Scheduling with a Scoreboard • Scoreboard : allow inst. to execute out-of-order when there are sufficient resources and no data dependences • Named after CDC6600 scoreboard • The scoreboard takes full responsibility for instruction issue and execution, including all hazard detection Supercomputer (CDC6600) Definitions of a supercomputer: • Fastest machine in world at given task • A device to turn a compute-bound problem into an I/O bound problem • Any machine costing $30M+ • Any machine designed by Seymour Cray • CDC6600 (Cray, 1963) regarded as first supercomputer Control Data Corporation (CDC), 1957-1992 CDC 6600 Seymour Cray , 1963 • A fast pipelined machine with 60-bit words – 128 Kword main memory capacity, 32 banks • Ten functional units (parallel, unpipelined) – Floating Point: adder, 2 multipliers, divider – Integer: adder, 2 incrementers, ... • Hardwired control (no microcoding) • Scoreboard for dynamic scheduling of instructions • Ten Peripheral Processors for Input/Output – a fast multi-threaded 12-bit integer ALU • Very fast clock, 10 MHz (FP add in 4 clocks) • >400,000 transistors, 750 sq. ft., 5 tons, 150 kW, novel freon-based technology for cooling • Fastest machine in world for 5 years (until 7600) – over 100 sold ($7-10M each) CDC 6600: A Load/Store Architecture (RISC) • Separate instructions to manipulate three types of reg. – 8x60-bit data registers (X) – 8x18-bit address registers (A) – 8x18-bit index registers (B) • All arithmetic and logic instructions are register-to-register 6 6 6 6 opcode i j k Ri ← Rj op Rk • Only Load and Store instructions refer to memory! 6 6 6 6 opcode i j disp Ri ← M[Rj + disp] Touching address registers 1 to 5 initiates a load 6 to 7 initiates a store -very useful for vector operations CDC6600 Scoreboard • Instructions dispatched in-order to functional units provided no structural hazard or WAW – Stall on structural hazard, no functional units available – Only one pending write to any register • Instructions wait for input operands (RAW hazards) before execution – Can execute out-of-order • Instructions wait for output register to be read by preceding instructions (WAR) – Result held in functional unit until register free Scoreboarding Centralized scheme – No bypassing – WAR/WAW hazards are problems • Originally proposed in CDC6600 (S. Cray,1963) First Stage of Scoreboarding – Issue (or Dispatch) • Issue replaces first half of ID stage – Check structural hazard • All function units for this instruction are busy – Check WAW hazards • Active instruction wants to write the same destination register – Stall all current (future) inst. issuing until none of these hazards remain (In-Order) – Issue the instruction and update scoreboard Second Stage of Scoreboarding – Read Operands (or Issue!) • Read operands replaces the second half of ID – Check for RAW hazards: are all source operands available? • If NO, hold instruction in a pre-execution buffer (I- Buffer) • If YES, scoreboard tells functional unit to read operands from registers and begin execution Third Stage of Scoreboarding – Execution • Execution replaces EX in MIPS – Begin execution upon receiving operands – Notify scoreboard when it completes execution Fourth Stage of Scoreboarding – Write Result • Write Result replaces WB in MIPS – Checks for WAR hazards • Any preceding instruction has not read its operands, and • One of operand is the same register as the destination register of the completing inst. • Stalls the completing inst. If WAR detected • Otherwise, functional unit writes its result to destination register Scoreboard Implementation • Instruction status: which of the stages the inst. is in • Functional unit status: state of the functional unit – Busy : indicates whether the unit is busy or not – Op : operation to perform in the unit – Fi : destination register – Fj , Fk : source registers – Qj , Qk : functional units producing Fj, Fk – Rj , Rk : flags indicating when Fj, Fk are ready and not yet read. Set to NO after operands are read. • Register result status: indicates which functional unit will write each register Scoreboard Example Scoreboard Example: Cycle 1 Scoreboard Example: Cycle 2 Issue 2nd LD? Scoreboard Example: Cycle 3 Issue MULT? Scoreboard Example: Cycle 4 Scoreboard Example: Cycle 5 Scoreboard Example: Cycle 6 Scoreboard Example: Cycle 7 Read MULT operands? Scoreboard Example: Cycle 8a (First half of clock cycle) Scoreboard Example: Cycle 8b (Second half of clock cycle) Scoreboard Example: Cycle 9 Read operands for MULT & SUB? Issue ADD? Scoreboard Example: Cycle 10 Scoreboard Example: Cycle 11 Scoreboard Example: Cycle 12 Read Operands for DIV? Scoreboard Example: Cycle 13 Scoreboard Example: Cycle 14 Scoreboard Example: Cycle 15 Scoreboard Example: Cycle 16 Scoreboard Example: Cycle 17 Why not write result of ADD? Scoreboard Example: Cycle 18 Scoreboard Example: Cycle 19 Scoreboard Example: Cycle 20 Scoreboard Example: Cycle 21 WAR Hazard is now gone… Scoreboard Example: Cycle 22 Skip Couple of Cycles Scoreboard Example: Cycle 61 Scoreboard Example: Cycle 62 Review Scoreboard Example: Cycle 62 In-order issue; out-of-order execute & commit Scoreboarding Limitations • Number and type of functional units • Number of scoreboard entries • Amount of application ILP (RAW hazards) • The presence of anti and output dependence – Lead to WAR and WAW stalls.

Load more