Lab 9 Andy Setlak Page # 7 of 7

Lab 9 Andy Setlak page # 1 of 7 EEL4713 4/25/2018 F 2-5Lab No. 9Andy Setlak EEL4713 Section F 2-4 Lab Meeting Date and Time: Friday, April 14, 2004 TA: Matt AshoffI have performed this assignment myself. I have performed this work in accordance with the Lab Rules specified in 4713 Lab No. 0 and the University of Florida’s Academic Honesty manual. On my honor, I have neither given nor received unauthorized aid in doing this assignment.______Andrew T. Setlak Lab 9 Andy Setlak page # 2 of 7 EEL4713 4/25/2018 F 2-5 Introduction:This is the lab in which we use the lab 8 implementation of the mips2000 pipelined RISC microprocessor to demonstrate ways to fix the common hazards associated with pipelining. We will address a number of different types of hazards: control hazards, load hazards, and arithmetic hazards. To resolve these hazards we will use forwarding, stalling, and branch prediction. We will use MaxPlusII as the tool to create the simulation of this processor. Using both the graphic editor and VHDL defined behavioral parts to complete the task.Design:The first thing that I needed to do was to design the appropriate controllers to create the control signals to allow for forwarding, stalling, and branch prediction. The forwarding will be implemented by the use of a specially designed MUX/controller that takes in all of the possible EX stage ALU inputs and the needed stages IR registers, and then outputs the correct one to the ALU’s A and B inputs. To produce the needed stalling of the early stages of the pipeline the registers between these stages as well as the PC will be given a hold flag so that even on clock edges the registers/PC will not change. Lastly in order to implement the branch prediction a way to flush the first 3 stages is needed. This is accomplished by adding reset flags to the inter-stage registers that hold all of the values that are propagating throughout the pipeline. Clearly controllers will be needed to control all of these new signals. The most complex controller was needed for the forwarding hazard correction. This controller, the R-type to R-type process is shown as figure 1, the complete code is included on the Forwarder’s data sheet in the appendix, takes in the IR outputs from the EX stage the ID stage and the MEM stage, then using these signals determines what signals should be handed into the ALU.Figure 1Now that the forwarding logic is completed I need some way to allow for stalling of the pipeline. To do this I made some changes to the PC and to the registers in the IF stage adding hold signals to both of them, Lab 9 Andy Setlak page # 3 of 7 EEL4713 4/25/2018 F 2-5 figure 2 shows the new PC with the added hold signal. I will show the registers in the next part since they had multiple changes applied to them.Figure 2Now that I have my hardware that will use the hold signal I need to generate it, to do this I created a stall- needed controller, which is shown as figure 3 below.Figure 3Now that my stalls are implemented there is only one more piece that is needed and that is the logic to generate the flush signal and the hardware to implement the flushing. In order to generate the flush signal all I did was to go into my PC_MUX and add to each successful branch case flush<= ‘1’; and to simply allow flush <= ‘0’ for all non-branching cases. With this simple implementation of the logic created using previously existing logic paths all that was left was to create the flushable registers. Since in our architecture a nop is defined as an x“00000000” in the IR all that we need to flush a register is to set it to x”00000000”. Figure 4 shows a register with both the flush and hold signals implemented, in the non-IF stage registers the hold signal is simply connected directly to ground essentially making a hold impossible. Lab 9 Andy Setlak page # 4 of 7 EEL4713 4/25/2018 F 2-5Figure 4Now with all of the necessary parts created connecting them into the pipelined MIPS2000 design from lab 8 and testing this new pipelined microprocessor with its hazard controls in place is all that remains to be done.System Compilation:Since both the forwarding and the stalling logic use signals from multiple stages in the pipeline as their inputs I let both of these devices reside outside the pipeline in the top level of the design. The logic to create the flush flag however stayed in the same stage as the PC_MUX, aka the MEM stage. The new complete architecture is shown in figure 5Figure 5Testing:To test this architecture we were asked in the lab to run multiple times the different levels of implementation with test programs that showed that each type of hazard was being dealt with properly. In lieu of showing that here I will simply show one test program that demonstrates all of the types of hazards Lab 9 Andy Setlak page # 5 of 7 EEL4713 4/25/2018 F 2-5 and point out when these hazards are dealt with by the microprocessor. The test code that I chose to use is shown below as figure 6, The DMEM contains $01 at address $0000.Figure 6Then I ran this code through my Pipelined Microprocessor, the results of which are shown in figures 7, 8, and 9.Figure 7Load Hazard Creates a Data is forwarded Forwarding between Stall from the MEM Stage consecutive dependant or the EXo register add instructionsFlush caused by a taken branch Lab 9 Andy Setlak page # 6 of 7 EEL4713 4/25/2018 F 2-5Figure 8 This is not a stall it is Beginning to execute merely an artifact of Flush caused by a taken the NOP’s that follow the way that I jump branch the BEQFigure 9Shifting a register that was Beginning the execution of Flush caused by a jump just assigned in the the code at “here” because (j 100) previous instruction via an of the branch not taken add causes forwarding of assumption the result into the shift Lab 9 Andy Setlak page # 7 of 7 EEL4713 4/25/2018 F 2-5 Questions:1. What would be required in the way of a hardware modification to cause the machine to fetch instructions from the “destination address” of a branch if the branch was toward a lower memory address or continue with the instructions immediately following the branch if the branch was toward a higher memory address.We would need to move the branch address resolution earlier into the pipeline so that we can decide what the next address we will load will be based on the branch address, thus the branch address resolution needs to be move into the IF stage. Clearly we will also need to add some logic to decide weather the destination address is greater than or less than the current address. Actually come to think of it we wouldn’t even need to resolve the address, we could just check if the branch offset is positive or negative, then only resolve the address if the branch offset is negative, aka we will be assuming branch taken.2. Would the strategy in question 1 result in a means of branch prediction that was more efficient than the one chosen for our machine in lab? Why? You can use the sorting program for actual numerical evidence.It depends on if this heuristic that we have chosen to use of assuming branch taken on a negative offset and assuming that the branch will not be taken of a positive offset turns out to be true most of the time. Since I don’t have lab 10 done yet I don’t have a clue if this heuristic proves to be true. But if it does than yes this will be a more efficient way of going about branch prediction. If however it turns out not to be true than it could actually be worse because a miss-guessed branch prediction costs us a large amount of efficiency by forcing us to flush the pipeline.3. If a program consisted of a large number of loads and stores, what would the effective instruction execution rate be? It is not necessary to be numerically explicit, just give your analysis of how you would approach the problem given your integer pipeline and a hypothetical program that does a lot of data moving. You should check the assembler language version of the sorting program to see if it is such a program.Well, if we have a program that has a large number of loads that are followed by dependant stores this will greatly slow down the execution rate of our program because we will be doing a great number of stalls, thus causing our instructions per cycle to increase drastically to nearly 2. However if as we learned in class today these dependant loads and stores are “unrolled”, put through a software pipeline, or otherwise optimized, by for example using the a branch delay slot than we could execute such a program with a nearly 1 IPC rate.

Lab 9 Andy Setlak Page # 7 of 7

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support