CS/ECE 752 Chapter 3 Instruction

D ynam ically S cheduled Processors CS/ECE 752 Chapter 3 • The dynamic scheduling (superscalar) mindset Instruction Level Parallelism and its • Control and Data dependences Dynamic Exploitation • Progression of performance increase Instructor: Prof. Wood • Multiple instruction issue • Basic out-of-order execution • Precise interrupt maintenance Computer Sciences Department University of Wisconsin • Speculative execution • Branch prediction • Register renaming Slides developed by Profs. Falsafi, Hill, Smith, Sohi, Vijaykumar, and Wood of Carnegie Mellon University, Purdue University, and University of Wisconsin • Case studies Copyright © 2002 Falsafi, Hill, Sohi, ✆ Copyright © 2002 Falsafi, Hill, Sohi, ✆ Smith, Vijaykumar, and Wood Chapter 4: ILP Smith, Vijaykumar, and Wood Chapter 4: ILP 1 2 T he Problem and M indset T he Problem and M indset, contd. • Start with a static program • Implement the execution schedule ❒ program written assuming sequential execution • Carry out above at desired speed • Sequence through static program to obtain dynamic ❒ use overlap techniques to improve speed sequence of operations • Maintain the appearance of sequential (serial) execution ❒ operations operate on values (present in storage locations) and create new values • Develop a schedule to execute operations in dynamic sequence • Operations constitute an instruction window ❒ instruction window is a portion of the dynamic instruction stream Copyright © 2002 Falsafi, Hill, Sohi, ✆ Copyright © 2002 Falsafi, Hill, Sohi, ✆ Smith, Vijaykumar, and Wood Chapter 4: ILP Smith, Vijaykumar, and Wood Chapter 4: ILP 3 4 1 T he B ig Picture D ependences Program Form Processing Phase • Determine the order of operations in an execution schedule Static program Instruction fetch & • Two types: control dependences and data dependences branch prediction Dynamic instruction Dependence checking & stream dispatch Instruction issue Execution window Instruction execution Completed instructions Instruction reorder & commit Copyright © 2002 Falsafi, Hill, Sohi, ✆ Copyright © 2002 Falsafi, Hill, Sohi, ✆ Smith, Vijaykumar, and Wood Chapter 4: ILP Smith, Vijaykumar, and Wood Chapter 4: ILP 5 6 Control D ependences Control D ependence E xam ple • True control dependences caused by flow of control through L2: program move r3, r7 # &a[i] lw r8, (r3) # load a[i] • Artificial control dependences caused by sequential nature of add r3, r3, 4 # &a[i + 1] for (i = 0; i < last; i++) { lw r9, (r3) # load a[i + 1] writing code: PC determines order in which instructions are to if (a[i] > a[i + 1]) { ble r8, r9,L3 # invert a[i] > a[i+1] be executed temp = a[i]; a[i] = a[i + 1]; move r3, r7 # &a[i] • Must be overcome to do multiple issue or out-of-order issue a[i + 1] = temp; sw r9, (r3) # store a[i] change++; add r3, r3, 4 # &a[i + 1] } sw r8, (r3) # store a[i + 1] } add r5, r5, 1 # change++ L3: add r6, r6, 1 # i++ add r7, r7, 4 # &a[i] blt r6, r4, L2 # i < last Copyright © 2002 Falsafi, Hill, Sohi, ✆ Copyright © 2002 Falsafi, Hill, Sohi, ✆ Smith, Vijaykumar, and Wood Chapter 4: ILP Smith, Vijaykumar, and Wood Chapter 4: ILP 7 8 2 D ata D ependences D ata D ependence E xam ple L2: move r3, r7 • Read after Write (RAW): true dependence lw r8, (r3) • Write after Read (WAR): anti-dependence add r3, r3, 4 lw r9, (r3) • Write after Write (WAW): output-dependence ble r8, r9.L3 WAW RAW move r3, r7 • Only RAW dependences need to be observed sw r9, (r3) add r3, r3, 4 WAR • WAR and WAW are an artifact of storage name used sw r8, (r3) ❒ also called name dependences add r5, r5, 1 ❒ can be overcome with storage renaming L3: add r6, r6, 1 add r7, r7, 4 blt r6, r4, L2 Copyright © 2002 Falsafi, Hill, Sohi, ✆ Copyright © 2002 Falsafi, Hill, Sohi, ✆ Smith, Vijaykumar, and Wood Chapter 4: ILP Smith, Vijaykumar, and Wood Chapter 4: ILP 9 10 Converting Control to D ata D ependences Exam ple L2: move r3, r7 # &a[i] • Associate condition operand (guard) with each instruction lw r8, (r3) # load a[i] add r3, r3, 4 # &a[i + 1] • Semantics of instruction: regular semantics if guard is TRUE, NOP if guard is FALSE lw r9, (r3) # load a[i + 1] set_gt r1, r8, r9 # r1 = (a[i] > a[i + 1]) • Data dependence between instruction setting guard and instructions using guard move_c r1, r3, r7 # &a[i] sw_c r1, r9, (r3) # store a[i] • Called if-conversion add_c r1, r3, r3, 4 # &a[i + 1] sw_c r1, r8, (r3) # store a[i + 1] add_c r1, r5, r5, 1 # change++ L3: add r6, r6, 1 # i++ add r7, r7, 4 # &a[i] blt r6, r4, L2 # i < last Copyright © 2002 Falsafi, Hill, Sohi, ✆ Copyright © 2002 Falsafi, Hill, Sohi, ✆ Smith, Vijaykumar, and Wood Chapter 4: ILP Smith, Vijaykumar, and Wood Chapter 4: ILP 11 12 3 Converting D ata to Control D ependences Execution S chedules • Can be done, but not important (yet) for our purpose • Given a set of operations to be executed, along with a dependence relationship, how to develop an “execution • Called reverse if-conversion schedule”? • Dependence relationships and operation latencies determine shape of schedule • “Better” schedule is one that takes a fewer number of time steps (i.e., is more parallel) • Reducing operation latencies allows creation of better schedules • Relaxing dependence constraints allows creation of better schedules Copyright © 2002 Falsafi, Hill, Sohi, ✆ Copyright © 2002 Falsafi, Hill, Sohi, ✆ Smith, Vijaykumar, and Wood Chapter 4: ILP Smith, Vijaykumar, and Wood Chapter 4: ILP 13 14 Execution S chedule E xam ples Progression of Perform ance Increase • Serial schedule • Serial processing ❒ • Pipelined schedule one operation at a time • • Parallel schedule Pipelining to overlap operation execution ❒ initiate one operation at a time, but overlap processing of • Parallel pipelined schedule multiple operations • Out-of-order schedule • Pipelining with out-of-order execution of instructions ❒ allow operations to execute in an order different from which they were initiated ❒ target artificial control and data dependences Copyright © 2002 Falsafi, Hill, Sohi, ✆ Copyright © 2002 Falsafi, Hill, Sohi, ✆ Smith, Vijaykumar, and Wood Chapter 4: ILP Smith, Vijaykumar, and Wood Chapter 4: ILP 15 16 4 Progression of Perform ance Increase In-O rder Execution • Parallel pipelines, i.e., multiple instruction issue Code fragment ❒ initiate multiple operations at a time, and overlap processing of divf f0, f2, f4 multiple operations addf f10, f0, f8 ❒ target artificial control dependences multf f7, f8, f14 ❒ the “original” superscalar Problem ❍ • Parallel pipelines with out-of-order execution of instructions addf stalls due to RAW hazard ❍ multf stalls because addf stalls ❒ target artificial data and control dependences 1 2 3 4 5 6 7 8 9 10 • Give the scheduler more choice of operations to choose from ❒ use speculation to “overcome true dependences” divf IF ID E/ E/ E/ E/ MEM WB ❒ modern OOO superscalar processor addf IF ID -- -- -- E+ MEM WB multf IF -- -- -- ID EX MEM WB In-order execution limits performance! Copyright © 2002 Falsafi, Hill, Sohi, ✆ Copyright © 2002 Falsafi, Hill, Sohi, ✆ Smith, Vijaykumar, and Wood Chapter 4: ILP Smith, Vijaykumar, and Wood Chapter 4: ILP 17 18 Instruction R e-ordering (S cheduling) D ynam ic vs. S tatic S cheduling • Eliminate false control dependences Solutions ❒ change order of instruction execution (statically or dynamically) ❒ dynamic vs. static scheduling • Eliminate false data (storage) dependences ❒ storage renaming Static scheduling (software) • Two problems/solutions go hand in hand ❍ compiler reorganizes instructions ❍ simpler hardware ❍ can use more power algorithms ❍ Itanium, Crusoe, lots of DSP chips Copyright © 2002 Falsafi, Hill, Sohi, ✆ Copyright © 2002 Falsafi, Hill, Sohi, ✆ Smith, Vijaykumar, and Wood Chapter 4: ILP Smith, Vijaykumar, and Wood Chapter 4: ILP 19 20 5 S tatic S cheduling D ynam ic S cheduling Reorder code fragment Dynamic scheduling (hardware) divf f0, f2, f4 divf f0, f2, f4 ❍ handles dependences unknown at compile time addf f10, f0, f8 multf f7, f8, f14 ❍ hardware reorganizes instructions! multf f7, f8, f14 addf f10, f0, f8 ❍ more complex hardware but code more portable Solution ❍ “the real stuff” ❍ multf independent of addf and divf, so reorder • Shorter (more parallel) execution schedule while preserving ❍ addf still stalls illusion of sequential execution 1 2 3 4 5 6 7 8 9 divf IF ID E/ E/ E/ E/ MEM WB multf IF ID E* E* MEM WB addf IF ID -- -- EX+ MEM WB Hides execution of multf Copyright © 2002 Falsafi, Hill, Sohi, ✆ Copyright © 2002 Falsafi, Hill, Sohi, ✆ Smith, Vijaykumar, and Wood Chapter 4: ILP Smith, Vijaykumar, and Wood Chapter 4: ILP 21 22 Conceptual M odel Issues to R em em ber Dyanmically build dataflow graph • Dynamic scheduling issues ❒ Respect RAW ❒ Data dependences: RAW (in registers, memory, condition ❒ Respect or avoid WAW, WAR codes, etc.) ❒ Name dependences: WAR and WAW Execute when operands are ready ❒ Traps, interrupts, and exceptions ❒ Exploit parallelism of independent instructions ❒ Control dependences: branches and jumps • Dynamic scheduling algorithms ¢¡¢£¥¤¦ ¢¡¢£¥¤¦§ ❒ Scoreboarding ¢¡¢£¥¤¦£ © ❒ Tomasulo’s algorithm ❒ RUU ¨ ❒ R10K, et al. ¡¢ ¦ Copyright © 2002 Falsafi, Hill, Sohi, ✆ Copyright © 2002 Falsafi, Hill, Sohi, ✆ Smith, Vijaykumar, and Wood Chapter 4: ILP Smith, Vijaykumar, and Wood Chapter 4: ILP 23 24 6 D ynam ic S cheduling: S coreboard D atapath w ith S coreboard Centralized control scheme ❍ controls all instruction issue ❍ detects all hazards Implemented in CDC6000 in 1964 ❍ CDC 6000 has 18 separate functional units (not pipelined) ❍ 4 FP: 2 multiply, 1 add, 1 divide ❍ 7 memory units:

CS/ECE 752 Chapter 3 Instruction

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support