Practice Problems for Weeks 9-10 (Lectures 16-18)

NOTE: The problems in this document are related to the material that was covered in class during weeks 9 and 10. These problems are provided to assist you in final exam preparation.

Problem No. 1

In this problem, we will analyze dynamic instruction scheduling via Tomasulo’s algorithm. Consider the following code sequence executed on a MIPS with Tomasulo hardware similar to the one shown in Figure 3.6:

MUL.D F6, F8, F2 DIV.D F0, F10, F12 SUB.D F2, F0, F6 ADD.D F14, F0, F2 MUL.D F2, F4, F16 S.D F4, 46(R2) (a) The following “Instruction status” table shows the current status of program execution: Instruction Status Instruction Issue Execute Write result MUL.D F6, F8, F2    DIV.D F0, F10, F12   SUB.D F2, F0, F6  ADD.D F14, F0, F2  MUL.D F2, F4, F16   S.D F2, 46(R2) 

Note that the first MUL.D instruction has completed and written its result to register F6; the DIV.D and the MULD instructions have started execution but have not completed execution yet; and the remaining instructions have not been able to start execution. Considering the current status of execution, fill in all the necessary entries for each and the “Register status” table. (b) Can the second MUL.D instruction in the above code sequence immediately write its result to the F2 register after it has completed execution? Justify your answer with detailed reasoning.

Problem No. 2 Consider a branch instruction which is executed 8 times in a program. The actual outcomes of the branch are NT,T,NT,T,NT,NT,NT,T where T = Taken and NT = Not taken. Dynamic branch prediction is used to predict the branch outcomes. Show the predictions made for each instance of the branch and calculate the branch prediction accuracy in case of the following two predictors:

(a) A 1-bit branch predictor which starts in the LNT state (b) A 2-bit branch predictor which starts in the SNT state

Refer to class notes (Lecture-17) for the state machine diagrams of the two predictors. Problem No. 3

A 5-stage pipelined RISC processor running at 2.4 GHz is used to execute a program. The instructions statistics for this program are as follows:

Branch: 20% Load: 20% Store: 10% Arithmetic Instructions: 50%

Assume that there are no data dependencies in the program. Also assume that all the instruction fetch operations hit in the whereas 2% of all data accesses incur a cache miss. The penalty to access the main memory for a cache miss is 10 cycles. The processor uses a dynamic branch predictor and a branch target buffer to predict all the branches during the IF stage. The computation of actual branch outcomes is carried out in the EX stage. A customer demands the processor manufacturer that the processor must achieve an instruction throughput of at least 2 billion . What must be the minimum branch prediction accuracy for the branch predictor to satisfy this demand?

Problem No. 4

Consider the following code fragment running on a MIPS processor which uses a 1-bit branch predictor to predict branch outcomes: loop: LD R4, #100(R1) ADD R5, R4, R5 ADDI R1, R1, #4 BNE R1, R2, loop

Assume that the initial values of R1 and R2 are 0 and 20 respectively. Also assume that all the branch predictor entries are initialized to the “LNT” state. Calculate the branch prediction accuracy for the given code fragment.

Problem No. 5

In this problem, we’ll explore different options for the (m,n) correlating predictor discussed in class. The following table shows the prediction accuracies for three different (m, n) choices: (i) A 2-bit predictor with no global history (m = 0, n = 2), (ii) A 2-bit predictor with one bit of global history (m = 1, n =2), and (iii) A 1-bit predictor with three bits of global history (m = 3, n = 1). For each predictor, table shows the increase in prediction accuracy with the length of the predictor (number of predictor entries):

Prediction Accuracy 256 entries 512 entries 1024 entries 2048 entries (0, 2) predictor 60% 65% 70% 72% (1, 2) predictor 65% 74% 86% 90% (3, 1) predictor 75% 82% 88% 92%

Assume that you have a total bit budget of 2K bits for the branch predictor. Which of the branch predictors will you choose to implement in the processor?