Homework 1 Elec/Comp 425 Fall 2010

Total Page:16

File Type:pdf, Size:1020Kb

Homework 1 Elec/Comp 425 Fall 2010

Homework 1 Elec/Comp 425 Fall 2010 Due: Monday, September 27

Please use this Word document for your answers. Show working steps. Print and submit hard copies. Reformat as desired. Delete the problem statements after you are done.

1. Assume a DLX design with the following delay parameters fro the different functional units: TALU = 80ns, TIM = TDM = 100ns, TDECODE = 30ns, TREG = 40ns; all other delays are negligible. Write an expression in terms of delay parameters for the minimum clock period T, and find the maximum clock rate for the specified parameters values.

a. a single-cycle design.

b. the multiple-cycle design discussed in class.

c. the 5-stage pipelined processor discussed in class assuming a delay of 5ns in writing to the pipeline register.

2. A benchmark program with a dynamic instruction count of 25,000 instructions has the following mix: 10% Loads, 10% Stores, 60% ALU, 10% unconditional branches, and the rest conditional branch instructions. Determine (i) average CPI (ii) execution time and (iii) MIPS rating in the following circumstances.

(a) single-cycle implementation with a 200 MHz clock.

CPI: Execution Time: MIPS: (b) multi-cycle implementation with a 1.0 GHz clock. Assume unconditional and conditional branches take the same number of cycles to execute. Assume that all memory references complete within a clock cycle.

Average CPI: Execution Time: MIPS:

(c) multi-cycle implementation with a 1.0 GHz clock. Assume that the memory system consists of IM and DM caches and a shared DRAM memory. Assume that 99% of all references to instruction memory are satisfied from the IM cache and require just 1 clock cycle; the remaining 1% require access to DRAM and incur an additional 50 clock cycles to access. Similarly 95% of data reads and 90% of data writes are handled by the DM cache in 1 clock cycle, while the rest require a DRAM access of 50 additional clock cycles. During the wait states while DRAM is being accessed there is no other activity.

Average CPI : Execution Time: MIPS:

(d) 5-stage pipelined implementation assuming no hazards and a 1.0 GHz clock. Assume that all memory access take 1 cycle. CPI: Execution Time: MIPS:

(e) 5-stage pipelined implementation assuming cache miss rates and penalties for instructions and data as in part (c). Clock frequency is 1.0 GHz clock. No other hazards.

CPI: Execution Time: MIPS:

(f) 5-stage pipeline assuming data hazards masked by forwarding, target address is computed in the EX stage, conditional branches are resolved at the end of the MEM stage, branches predicted as not taken and taken 40% of the time.

CPI: Execution Time: MIPS: 3. A transaction processing system consists of a single CPU based server. Transactions arriving at the server are queued and served one-at-a-time in an FCFS (First Come First Served) order. Each transaction is assumed to require 500,000 executed instructions. The server capacity is C MIPS.

A. Suppose that transactions arrive one-at-a-time at a uniform rate of 28800 transactions per hour.

A(i). What is the minimum MIPS rating of the server required to provide a maximum latency of 100 ms for each transaction.

A(ii). What is the minimum MIPS rating of the server required to provide a maximum latency of 500ms for each transaction.

A(iii). What is the transaction completion rate (throughput) of the server in TPM (transactions per minute) for C = 1.0 MIPS, 2.0 MIPS, 3.0 MIPS, 4.0 MIPS, 5.0 MIPS, 6.0 MIPS.

A(iv). What is the maximum response time (latency) in ms of a transaction for C = 1.0 MIPS, 2.0 MIPS, 3.0 MIPS, 4.0 MIPS, 5.0 MIPS, 6.0 MIPS

B. Suppose the server is replaced by a quad-core CMP (chip multiprocessor) with four processors of C MIPS each. Repeat all parts of Q3(a). Transactions are assumed to be single-threaded applications so that a single transaction cannot be parallelized.

B(i). What is the minimum MIPS rating of the server required to provide a maximum latency of 100 ms for each transaction.

B(ii). What is the minimum MIPS rating of the server required to provide a maximum latency of 500 ms for each transaction.

B(iii). What is the transaction completion rate (throughput) of the server in TPM for C = 1.0 MIPS, 2.0 MIPS, 3.0 MIPS, 4.0 MIPS, 5.0 MIPS, 6.0 MIPS. B(iv). What is the maximum response time (latency) of a transaction for C = 1.0 MIPS, 2.0 MIPS, 3.0 MIPS, 4.0 MIPS, 5.0 MIPS, 6.0 MIPS

4. A machine can be enhanced either making the multiply instruction run four times faster or making the memory unit twice as fast. Profiling the major application to be run on the machine indicates 20% of the time is spent on multiplies, 50% on memory accesses and the remaining 30% for other tasks.

(a) What is the speedup if only the multiply is improved?

(b) What is the speedup if only the memory unit is improved?

(c) What is the speedup if both units are improved?

(d) Suppose the multiplier could be enhanced to provide a speedup of Smul, at a cost of $10 x Smul. Similarly the memory unit could be made to yield a speedup of Smem at a cost of $20 x Smem. How should you spend a total budget of $100 to maximize the speedup? What is the speedup with this optimization?

Recommended publications