L20-Ddg-Pipeline-II
Total Page:16
File Type:pdf, Size:1020Kb
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Review: Pipeline (1/2) Lecture #20 • Optimal Pipeline Introduction to Pipelined Execution, pt II • Each stage is executing part of an There is one handout instruction each clock cycle. CPS today at the front and today! 2005-11-09 back of the room! • One inst. finishes during each clock cycle. Lecturer PSOE, new dad Dan Garcia • On average, execute far more quickly. www.cs.berkeley.edu/~ddgarcia • What makes this work? History!s worst SW bugs! ! • Similarities between instructions allow us How does your Proj1 Peg to use same stages for all instructions Solitaire bug compares to the top 10 (generally). worst bugs of all time? How many can • Each stage takes about the same amount you name? Wired chose the list… of time as all others: little wasted time. wired.com/news/technology/bugs/0,2924,69355,00.html CS61C L20 Introduction to Pipelined Execution, pt II (1) Garcia, Fall 2005 © UCB CS61C L20 Introduction to Pipelined Execution, pt II (2) Garcia, Fall 2005 © UCB Review: Pipeline (2/2) Data Hazards (1/2) • Pipelining is a BIG IDEA • Consider the following sequence of • widely used concept instructions • What makes it less than perfect? add $t0, $t1, $t2 • Structural hazards: suppose we had sub $t4, $t0 ,$t3 only one cache? ! Need more HW resources and $t5, $t0 ,$t6 • Control hazards: need to worry about or $t7, $t0 ,$t8 branch instructions? ! Delayed branch xor $t9, $t0 ,$t10 • Data hazards: an instruction depends on a previous instruction? CS61C L20 Introduction to Pipelined Execution, pt II (3) Garcia, Fall 2005 © UCB CS61C L20 Introduction to Pipelined Execution, pt II (4) Garcia, Fall 2005 © UCB Data Hazards (2/2) Data Hazard Solution: Forwarding Dependencies backwards in time are hazards • Forward result from one stage to another I Time (clock cycles) IF ID/RF EX MEM WB A n L IF ID/RF EX MEM WB add $t0,$t1,$t2 I$ Reg U D$ Reg A s L Reg U Reg add $t0,$t1,$t2 I$ D$ A L t sub $t4,$t0,$t3 I$ Reg U D$ Reg A L I$ Reg U D$ Reg r. sub $t4,$t0,$t3 A L I$ Reg U D$ Reg A and $t5,$t0,$t6 L I$ Reg U D$ Reg and $t5,$t0,$t6 A O L I$ Reg U D$ Reg A or $t7,$t0,$t8 r L I$ Reg U D$ Reg or $t7,$t0,$t8 A L d I$ Reg U D$ Reg A xor $t9,$t0,$t10 L e xor $t9,$t0,$t10 I$ Reg U D$ Reg r “or” hazard solved by register hardware CS61C L20 Introduction to Pipelined Execution, pt II (5) Garcia, Fall 2005 © UCB CS61C L20 Introduction to Pipelined Execution, pt II (6) Garcia, Fall 2005 © UCB Data Hazard: Loads (1/4) Data Hazard: Loads (2/4) • Dependencies backwards in time are • Hardware must stall pipeline hazards • Called “interlock” IF ID/RF EX MEM WB A lw $t0, 0($t1) L IF ID/RF EX MEM WB Reg U Reg A I$ D$ L lw $t0,0($t1) I$ Reg U D$ Reg A L A bub I$ Reg U D$ Reg L sub $t3,$t0,$t2 sub $t3,$t0,$t2 I$ Reg U D$ Reg ble A bub L I$ Reg U D$ Reg and $t5,$t0,$t4 ble A • Can!t solve with forwarding bub L I$ Reg U D$ or $t7,$t0,$t6 ble • Must stall instruction dependent on load, then forward (more hardware) CS61C L20 Introduction to Pipelined Execution, pt II (7) Garcia, Fall 2005 © UCB CS61C L20 Introduction to Pipelined Execution, pt II (8) Garcia, Fall 2005 © UCB Data Hazard: Loads (3/4) Data Hazard: Loads (4/4) • Stall is equivalent to nop Instruction slot after a load is called • A lw $t0, 0($t1) L “load delay slot” I$ Reg U D$ Reg • If that instruction uses the result of the nop bub bub bub bub bub load, then the hardware interlock will ble ble ble ble ble stall it for one cycle. A L sub $t3,$t0,$t2 I$ Reg U D$ Reg • If the compiler puts an unrelated A instruction in that slot, then no stall L and $t5,$t0,$t4 I$ Reg U D$ Reg A • Letting the hardware stall the instruction L in the delay slot is equivalent to putting or $t7,$t0,$t6 I$ Reg U D$ a nop in the slot (except the latter uses more code space) CS61C L20 Introduction to Pipelined Execution, pt II (9) Garcia, Fall 2005 © UCB CS61C L20 Introduction to Pipelined Execution, pt II (10) Garcia, Fall 2005 © UCB Historical Trivia Administrivia • First MIPS design did not interlock and • Any administrivia? stall on load-use data hazard • Real reason for name behind MIPS: • Advanced Pipelining! Microprocessor without Interlocked • “Out-of-order” Execution Pipeline • “Superscalar” Execution Stages • Word Play on acronym for Millions of Instructions Per Second, also called MIPS CS61C L20 Introduction to Pipelined Execution, pt II (11) Garcia, Fall 2005 © UCB CS61C L20 Introduction to Pipelined Execution, pt II (12) Garcia, Fall 2005 © UCB Review Pipeline Hazard: Stall is dependency Out-of-Order Laundry: Don’t Wait 6 PM 7 8 9 10 11 12 1 2 AM 6 PM 7 8 9 10 11 12 1 2 AM Time Time T 3030 30 30 30 30 30 T 3030 30 30 30 30 30 a A bubble a A bubble s s k B k B C C O O D D r r d E d E e e r F r F A depends on D; stall since folder tied up A depends on D; rest continue; need more resources to allow out-of-order CS61C L20 Introduction to Pipelined Execution, pt II (13) Garcia, Fall 2005 © UCB CS61C L20 Introduction to Pipelined Execution, pt II (14) Garcia, Fall 2005 © UCB Superscalar Laundry: Parallel per stage Superscalar Laundry: Mismatch Mix 6 PM 7 8 9 10 11 12 1 2 AM 6 PM 7 8 9 10 11 12 1 2 AM Time T 3030 30 30 30 30 30 3030 30 30 30 Time T a A (light clothing) s a A (light clothing) k s (dark clothing) k B (very dirty clothing) O C B (light clothing) O r D r (light clothing) d C (dark clothing) d E (dark clothing) e e r (very dirty clothing) (light clothing) r F D More resources, HW to match mix of Task mix underutilizes extra resources parallel tasks? CS61C L20 Introduction to Pipelined Execution, pt II (15) Garcia, Fall 2005 © UCB CS61C L20 Introduction to Pipelined Execution, pt II (16) Garcia, Fall 2005 © UCB Real-world pipelining problem Real-world pipelining problem solution 1 • You!re the manager of a HUGE • You remember: “a pipeline frequency assembly plant to build computers. is limited by its slowest stage”, so… Box Box • Main pipeline • Main pipeline Problem: Problem: • 10 minutes/ need to • 120h omuirnsu/ t e s / need to pipeline stage run 2 hr pipeline stage run 2 hr • 60 stages test before • 60 stages test before done..help! done..help! • Latency: 10hr • Latency: 1200hhr r CS61C L20 Introduction to Pipelined Execution, pt II (17) Garcia, Fall 2005 © UCB CS61C L20 Introduction to Pipelined Execution, pt II (18) Garcia, Fall 2005 © UCB Real-world pipelining problem solution 2 Wired Magazine: History!s Worst SW Bugs! • Engineers learn from failure! • Create a sub-pipeline! • Failure for most other engineers usually means people die as a result • Be thankful you weren!t a civil engineer designing Box steel truss bridges in the early 1900s, since • Main pipeline the failure rate was > 40%! • 10 minutes/ 2hr test • First “Bug”: in 1947 engineers found a pipeline stage (12 CPUs dead moth in Panel F, Relay #70 of the in this Harvard Mark 1 computer system. • 60 stages pipeline) • The following are sorted by date. wired.com/news/technology/bugs/0,2924,69355,00.html CS61C L20 Introduction to Pipelined Execution, pt II (19) Garcia, Fall 2005 © UCB CS61C L20 Introduction to Pipelined Execution, pt II (20) Garcia, Fall 2005 © UCB July 28, 1962 – Mariner I space probe 1982 – Soviet gas pipeline • “Operatives working for the CIA • “A bug in the flight software for the allegedly plant a bug in a Canadian Mariner 1 causes the rocket to divert computer system purchased to from its intended path on launch. control the trans-Siberian gas pipeline. Mission control destroys the rocket • Soviets had obtained the system as part of over the Atlantic Ocean. a wide-ranging effort to covertly purchase • The investigation into the accident or steal sensitive U.S. technology. discovers that a formula written on • The CIA reportedly found out about the paper in pencil was improperly program and decided to make it backfire transcribed into computer code, with equipment that would pass Soviet causing the computer to miscalculate inspection and then fail once in operation. the rocket's trajectory.” • The resulting event is reportedly the largest non-nuclear explosion in the planet's history…” wired.com/news/technology/bugs/0,2924,69355,00.html wired.com/news/technology/bugs/0,2924,69355,00.html CS61C L20 Introduction to Pipelined Execution, pt II (21) Garcia, Fall 2005 © UCB CS61C L20 Introduction to Pipelined Execution, pt II (22) Garcia, Fall 2005 © UCB 1985-1987 – Therac-25 medical accelerator 1988 – Buffer overflow in Berkeley Unix finger daemon • “A radiation therapy device malfunctions and delivers • “The first internet worm (the so-called lethal radiation doses at several medical facilities. Based upon a previous design, the Therac-25 was an "improved! therapy Morris Worm) infects between 2,000 and system that could deliver 2 different kinds of radiation: a low- 6,000 computers in less than a day by taking power electron beam (beta particles) or X-rays.