Computer Architecture Spring 2016

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Lecture 11: Dynamic Scheduling: Scoreboarding Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CS252, UC Berkeley and CS 246, Harvard University] Dynamic Scheduling • Dynamic scheduling: rely on hardware to rearrange inst. execution to reduce the stalls while maintaining exception behavior and data flow • Advantages of dynamic scheduling: – Handling cases when dependences are unknown at compile time, e.g., involving memory reference – Simplifies compiler – Allow code that compiled for one pipeline to run efficiently on a different pipeline – Hardware speculation, a technique with significant performance advantages, that builds on dynamic scheduling Dynamic Scheduling: The Idea • Key idea: allow instructions behind stall to proceed DIVD F0 ,F2,F4 ADDD F10, F0 ,F8 SUBD F12,F8,F14 • Enables out-of-order execution and allows out-of-order completion - But • Dynamic scheduling introduces the possibility of WAW and WAR hazards • Dynamic scheduling processor may generate imprecise exceptions Precise vs. Imprecise Exceptions • Precise exception: if the pipeline can be stopped so that the inst. before the faulting inst. are completed and those after it can be restarted from scratch. • Imprecise exception: if the processor state when an exception is raised does not look like exactly as if the inst. were executed sequentially in strict program order – Inst. later in program order than the faulting inst. have completed – Inst. earlier in program order than the faulting inst. have not yet completed Dynamic Scheduling with a Scoreboard • Scoreboard : allow inst. to execute out-of-order when there are sufficient resources and no data dependences • Named after CDC6600 scoreboard • The scoreboard takes full responsibility for instruction issue and execution, including all hazard detection Supercomputer (CDC6600) Definitions of a supercomputer: • Fastest machine in world at given task • A device to turn a compute-bound problem into an I/O bound problem • Any machine costing $30M+ • Any machine designed by Seymour Cray • CDC6600 (Cray, 1963) regarded as first supercomputer Control Data Corporation (CDC), 1957-1992 CDC 6600 Seymour Cray , 1963 • A fast pipelined machine with 60-bit words – 128 Kword main memory capacity, 32 banks • Ten functional units (parallel, unpipelined) – Floating Point: adder, 2 multipliers, divider – Integer: adder, 2 incrementers, ... • Hardwired control (no microcoding) • Scoreboard for dynamic scheduling of instructions • Ten Peripheral Processors for Input/Output – a fast multi-threaded 12-bit integer ALU • Very fast clock, 10 MHz (FP add in 4 clocks) • >400,000 transistors, 750 sq. ft., 5 tons, 150 kW, novel freon-based technology for cooling • Fastest machine in world for 5 years (until 7600) – over 100 sold ($7-10M each) CDC 6600: A Load/Store Architecture (RISC) • Separate instructions to manipulate three types of reg. – 8x60-bit data registers (X) – 8x18-bit address registers (A) – 8x18-bit index registers (B) • All arithmetic and logic instructions are register-to-register 6 6 6 6 opcode i j k Ri ← Rj op Rk • Only Load and Store instructions refer to memory! 6 6 6 6 opcode i j disp Ri ← M[Rj + disp] Touching address registers 1 to 5 initiates a load 6 to 7 initiates a store -very useful for vector operations CDC6600 Scoreboard • Instructions dispatched in-order to functional units provided no structural hazard or WAW – Stall on structural hazard, no functional units available – Only one pending write to any register • Instructions wait for input operands (RAW hazards) before execution – Can execute out-of-order • Instructions wait for output register to be read by preceding instructions (WAR) – Result held in functional unit until register free Scoreboarding Centralized scheme – No bypassing – WAR/WAW hazards are problems • Originally proposed in CDC6600 (S. Cray,1963) First Stage of Scoreboarding – Issue (or Dispatch) • Issue replaces first half of ID stage – Check structural hazard • All function units for this instruction are busy – Check WAW hazards • Active instruction wants to write the same destination register – Stall all current (future) inst. issuing until none of these hazards remain (In-Order) – Issue the instruction and update scoreboard Second Stage of Scoreboarding – Read Operands (or Issue!) • Read operands replaces the second half of ID – Check for RAW hazards: are all source operands available? • If NO, hold instruction in a pre-execution buffer (I- Buffer) • If YES, scoreboard tells functional unit to read operands from registers and begin execution Third Stage of Scoreboarding – Execution • Execution replaces EX in MIPS – Begin execution upon receiving operands – Notify scoreboard when it completes execution Fourth Stage of Scoreboarding – Write Result • Write Result replaces WB in MIPS – Checks for WAR hazards • Any preceding instruction has not read its operands, and • One of operand is the same register as the destination register of the completing inst. • Stalls the completing inst. If WAR detected • Otherwise, functional unit writes its result to destination register Scoreboard Implementation • Instruction status: which of the stages the inst. is in • Functional unit status: state of the functional unit – Busy : indicates whether the unit is busy or not – Op : operation to perform in the unit – Fi : destination register – Fj , Fk : source registers – Qj , Qk : functional units producing Fj, Fk – Rj , Rk : flags indicating when Fj, Fk are ready and not yet read. Set to NO after operands are read. • Register result status: indicates which functional unit will write each register Scoreboard Example Scoreboard Example: Cycle 1 Scoreboard Example: Cycle 2 Issue 2nd LD? Scoreboard Example: Cycle 3 Issue MULT? Scoreboard Example: Cycle 4 Scoreboard Example: Cycle 5 Scoreboard Example: Cycle 6 Scoreboard Example: Cycle 7 Read MULT operands? Scoreboard Example: Cycle 8a (First half of clock cycle) Scoreboard Example: Cycle 8b (Second half of clock cycle) Scoreboard Example: Cycle 9 Read operands for MULT & SUB? Issue ADD? Scoreboard Example: Cycle 10 Scoreboard Example: Cycle 11 Scoreboard Example: Cycle 12 Read Operands for DIV? Scoreboard Example: Cycle 13 Scoreboard Example: Cycle 14 Scoreboard Example: Cycle 15 Scoreboard Example: Cycle 16 Scoreboard Example: Cycle 17 Why not write result of ADD? Scoreboard Example: Cycle 18 Scoreboard Example: Cycle 19 Scoreboard Example: Cycle 20 Scoreboard Example: Cycle 21 WAR Hazard is now gone… Scoreboard Example: Cycle 22 Skip Couple of Cycles Scoreboard Example: Cycle 61 Scoreboard Example: Cycle 62 Review Scoreboard Example: Cycle 62 In-order issue; out-of-order execute & commit Scoreboarding Limitations • Number and type of functional units • Number of scoreboard entries • Amount of application ILP (RAW hazards) • The presence of anti and output dependence – Lead to WAR and WAW stalls.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    44 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us