Control (Branch) Hazards

Control (Branch) Hazards

Control (Branch) Hazards A: beqz R2, L1 B C D ------ L1: P Naïve (Lazy) Implementation of a Conditional Branch Instruction in DLX Pipeline : IF: Fetch Branch Instruction from IM ID: Decode Instruction and read registers to be used in comparison EX: Determine Branch Outcome: Compare Source Register values to zero (or each other) and set FLAG in Output Pipeline Register MEM: Compute Target Address: (PC + 4) carried along with instruction + Sign Extended and Shifted Displacement At end of clock cycle: PC assigned Target Address if instruction now in MEM is a successful branch else Execution continues as normal Successful (or Taken) Branch: (i) instruction is a branch and (ii) branch condition is true 1 Naive Implementation of Branch Equal Instruction: BEQZ Rs, d MUX AND PC PC P ADD C C C + n n t PC + 4 t Decode r r REG l l FILE (rs) zero F L I REG Outcome of the n A MEM G branch known s FILE (rt) A IM t at end of cycle r L 3. u rs c U PC updated t rt with new value i rd (branch target o or PC+4 value) n at end of cycle SE d << d 4. Compute target address 2 IF ID EX MEM WB Branch outcome known Control Hazard T = 1 IF A ID EX MEM WB T = 2 IF ID A EX MEM WB T = 3 IF ID EX A MEM WB T = 4 B/P IF ID EX MEM A WB 16 /NOT TAKEN / TAKEN BRANCH Control (Branch) Hazards : Problems: • The target address of the branch is not known (at least) till instruction is decoded • What is the address of instruction P? • The outcome of the branch (taken/ not taken) is determined deep in the pipeline • Should we execute B or P after A? What should the pipeline (processor) do after fetching the branch instruction? SOLUTION 1 • Delay the next instruction • Till we know the outcome of the branch and the address of next instruction Software: Add 3 NOPS after every Branch Instruction Hardware: Hazard Detection Unit checks for a Branch Instruction in the ID, EX, or MEM stage and stalls PC/Inserts NOPs 15 Simple Software Solution: Insert NOPs A: beqz R2, label Possible execution sequences: NOP NOP Branch Not Taken: A, NOP, NOP, NOP, B NOP Branch Taken: A, NOP, NOP, NOP, P B ----- • Adds 3 cycles to execution time for every branch label: P 1 2 3 4 5 6 7 8 9 A IF ID EX MEM WB NOP NOP NOP B P IF ID EX MEM WB 17 Control Hazard T = 1 NOP IF A ID EX MEM WB NOP T = 2 NOP IF ID A EX MEM WB NOP NOP T = 3 NOP IF ID EX A MEM WB NOP NOP NOP P/B T = 4 IF ID EX v MEM A WB 16 Hardware-Controlled Pipeline Stall A :BEQ R1, R2, L1 B : ---- C: --- L1: P 1 2 3 4 5 6 7 8 9 A IF ID EX MEM WB B IF IF IF CPC IF ID EX MEM WB Branch Taken: 3 Additional Cycles Hardware-Controlled Pipeline Stall A :BEQ R1, R2, L1 B : ---- C: --- L1: P 1 2 3 4 5 6 7 8 9 A IF ID EX MEM WB B IF IF IF IF ID EX MEM WB C Branch Not Taken: 3 Additional Cycles Hardware-Controlled Pipeline Stall A :BEQ R1, R2, L1 B : ---- C: --- L1: P 1 2 3 4 5 6 7 8 9 A IF ID EX MEM WB B IF IF IF ID EX MEM WB CC IF ID EX MEM WB Optimized Branch Not Taken: PC gets address of C Hazard Detection Unit Freeze register: do not update HDU Insert NOP P IF ID EX MEM WB C Stall PC and Insert NOP into IF/ID if there is a Branch instruction in either the IF/ID, ID/EX or EX/MEM pipeline register Hardware Controlled Pipeline Stall A: BEQ R1, R2, L1 B: C: --- L1: P Stall B A T = 2 IF ID EX MEM WB Stall A B T = 3 IF ID EX MEM WB • Instruction B (address) held in PC register until A reaches WB stage • Internally generated NOPs propagated forward while B is stalled Hardware Controlled Pipeline Stall B T = 4 IF ID EX A MEM WB P T = 5 IF ID EX MEM A WB TAKEN BRANCH B T = 5 IF ID EX MEM A WB BRANCH NOT TAKEN C B OPTIMIZED IF ID EX MEM A WB Branch Delay Slots Software Solution: • Software must delay the execution of the next-in-line instruction after the Branch Delay depends on the pipeline structure • Microarchitecture is exposed to the software (compiler) Branch Delay slots: • Delay introduced by software to avoid control hazards • Dummy instructions following branch instruction for purpose of creating delays till the new PC value can be set • Instructions in the Branch Delay slots always executed • In our design: 3 Branch Delay Slots • Microarchitecture might choose not to expose all the delay slots and use some hardware mechanisms for providing the remaining delay 5 Performance of Simple Stall Based Schemes 1. Stall scheme has a branch penalty of 3 cycles (may be 2 in optimized hardware design) 2. Software inserted NOPs (3 cycles) 1. Hardware inserted stall cycles (3 non-optimized) Example: Suppose Branch Frequency is 20% and 60% of branches are taken. Assume software solution with penalties as above. Assume the compile is able to fill 20% of the Branch Delay slots with useful instructions. How is CPI affected? Each Branch Instructions incurs extra delay of 3 cycles except for the delay slots filled with useful instructions. Branch Penalty (per executed instruction) = 20% x 3 (delay slots) x(80%) unfilled delay slots = 0.48 cycles CPI = Nominal CPI + Penalty Cycles (per instruction) Assuming no other causes of stalls CPI = 1.0 + 0.52 = 1.48 13 Alternate Hardware Solution beqz R2, label Why delay in-line instructions B, C, D etc? B • C • Let instructions following A enter pipeline normally D E -- Works if Branch Not Taken! label: P 1 2 3 4 5 6 7 8 9 A IF ID EX MEM WB B IF ID EX MEM WB C IF ID EX MEM WB D IF ID EX MEM WB E IF ID EX MEM WB 14 Control Hazard No Stall Cycles T = 1 B IF A ID EX MEM WB T = 2 C IF B ID A EX MEM WB T = 3 D IF C ID B EX A MEM WB T = 4 E IF D ID C EX B MEM A WB 15 BRANCH NOT TAKEN Speculation: Alternate Hardware Solution beqz R2, label What if Branch is Taken? B C D E • B, C, D have not updated machine state at cycle 4 -- • Flush B, C, D at end of cycle 4 label: P 1 2 3 4 5 6 7 8 9 A IF ID REG MEM WB B IF ID REG MEM WB C IF ID REG MEM WB D IF ID REG MEM WB P IF ID REG MEM WB 16 Control Hazard T = 1 B IF A ID EX MEM WB T = 2 C IF B ID A EX MEM WB T = 3 D IF C ID B EX A MEM WB T = 4 P IF D ID C EX B MEM A WB 16 TAKEN BRANCH Control Hazard T = 4 P IF D ID C EX B MEM A WB T = 5 Q IF P ID D EX C MEM B WB T = 6 R IF Q ID P EX D MEM C WB T = 7 S IF R ID Q EX P MEM D WB 16 TAKEN BRANCH: WRITES to MEM or REG by B, C or D will result in error Alternate Hardware Solution Taken Branch T = 2 B IF A ID EX MEM WB T = 3 C IF B ID A EX MEM WB T = 4 D IF C ID B EX A MEM WB Insert NOP in IF/ID, ID/EX, EX/MEM T = 4 P IF ID EX MEM A WB 17 Branch Penalty in Modified Hardware Scheme • More than an optimized implementation of stall • Simple form of control speculation • Speculating it is a NOT TAKEN Branch • Continue fetching in-line instructions • Performance depends on accuracy of speculation • Speculation correct (NOT TAKEN Branch): Continue with no stalls (0 Penalty Cycles) • Speculation incorrect (TAKEN Branch): Flush 3 trailing instructions (3 Penalty Cycles) Example: Branch Frequency: 20% 5% of Branches are Unconditional Branches 70% Conditional branches are NOT TAKEN CPI = Nominal CPI + Penalty cycles for TAKEN BRANCH + Penalty Cycles for NOT TAKEN Branch Penalty Cycles for TAKEN BRANCH = Penalty cycles for UNCONDITIONAL BRANCH + Penalty cycles for TAKEN CONDITIONAL BRANCH = 20% x 5% x 3 + 20% x 95% x 30% x 3 = 0.03 + 0.171 = 0.201 CPI = 1.0 + = 1.201 19 Predict branch: Not Taken; Actually Not Taken No Stall Cycles T = 1 B IF A ID EX MEM WB T = 2 C IF B ID A EX MEM WB T = 3 D IF C ID B EX A MEM WB BRANCH NOT TAKEN DO NOTHING T = 4 E IF D ID C EX B MEM A WB 20 Predict branch: Not Taken; Actually Taken T = 1 B IF A ID EX MEM WB T = 2 C IF B ID A EX MEM WB T = 3 D IF C ID B EX A MEM WB Branch actually taken: FLUSH pipeline Make B,C,D NOPS T = 4 P IF ID EX MEM A WB 21 More Control Speculation Can we predict branch as taken ? • Speculatively fetch and execute instructions at the branch target • Useful only if target address is known earlier than branch outcome • May require stall cycles until target address known • Flush pipeline if prediction is incorrect • Must ensure that flushed instructions do not update any machine state • Assume that target address is computed in the ID stage • Stall of 1 cycle till PC updated with target address (ALWAYS!) • Assume branch outcome known at the end of cycle 3 in EX stage 22 Predict branch taken T = 1 B IF A ID EX MEM WB T = 2 P IF ID A EX MEM WB T = 3 Q IF P ID EX A MEM WB Branch actually taken: Single stall cycle T = 4 R IF Q ID P EX MEM A WB 23 Predict branch taken T = 1 B IF A ID EX MEM WB EX T = 2 P IF ID A MEM WB Branch actually not FLUSH pipeline taken: 2 wasted cycles Make P NOP T = 3 B IF ID EX A MEM WB T = 4 C IF B ID EX MEM A WB 24 More Control Speculation (contd … ) Reduce branch delay (from 3 cycles of first design) to 1 or 2 Early Branch Detection hardware to compute: Target address : Easy to move to ID stage Branch outcome: Easy to move to EX stage Predict Not Taken: Actually Not Taken: No stalls Actually Taken: 2 cycles Predict Taken: Actually Taken: 1 cycle Actually Not Taken: 2 cycles 26 More Control Speculation Predict Branch Taken Branch Actually Taken: 1 cycle penalty Branch Actually Not Taken: 2 cycle penalty • 16% of instructions were conditional branches • 4% of instructions were unconditional branches • 62% of conditional branches were taken CPI = 1 + 16% x 62% x 1 + 16% x 38% x 2 + 4% x 1 = 1.26 Taken Conditional Branch Not Taken Conditional Branch Unconditional Branch (Taken) Predict Not Taken: Branch Actually Taken: 2 cycle penalty Branch Actually Not Taken: 0 cycle penalty Know its TAKEN CPI = 1 + 16% x 62% x 2 + 4% x 1 = 1.14 27 Summary: Control (Branch) Hazard • Do branch resolution (outcome and target address) early • Stall pipeline for required number of cycles Methods that reduce the branch penalty 1.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    31 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us