CS152 Computer Architecture and Engineering

The problem sets are intended to help you learn the material, and we encourage you to collaborate with other students and to ask questions in discussion sections and office hours to understand the problems. However, each student must turn in his own solution to the problems. CS152 Computer Architecture and #ngineering $S"s, Microprogramming and Pipelining Assigned 1/24/2018 &roblem Set #1 Due February 5 http://inst.eecs.berkeley.edu/~cs152/sp18 The problem sets also provide essential background material (or the qui))es. The problem sets will be graded primarily on an e*ort basis, but i( you do not work through the problem sets you are unlikely to succeed at the qui))es+ ,e will distribute solutions to the problem sets on the day the problem sets are due to give you (eedback. Homework assignments are due at the beginning of class on the due date. -ate homework will not be accepted, e.cept (or extreme circumstances and with prior arrangement. Problem 1: CISC, RISC, accumulator, and St!ck: Comparing IS#s In this problem, your task is to compare (our di*erent $S"s. ./0 is an extended accumulator, CISC architecture with variable1length instructions. 2ISC13 is a load-store, 2ISC architecture with 4.ed-length instructions 5(or this problem only consider the 32-bit (orm of its IS"). ,e will also look at a simple stack-based IS" and an accumulator architecture. Problem 1.A CISC -et us begin by considering the following C code8 int b; //a global variable void multiplyByB(int a){ int i, result; for(i = 0; i<b; i++){ result=result+a; } } 9sing gcc and ob:dump on a &entium III, we see that the above loop compiles to the (ollowing ./0 instruction sequence. 5;n entry to this code, register %ecx contains i, and register %edx contains result, and register %eax contains a. b is stored in memory at location 0x80495807 xor %edx,%edx xor %ecx,%ecx loop: cmp 0x8049580,%ecx jl L1 jmp done )*' add %eax,%edx inc %ecx jmp loop done: ... The meanings and instruction lengths of the instructions used above are given in the (ollowing table. Registers are denoted with ,-.B-/,012, register contents with <R-.B-/,0123+ Instruction $per!tion Lengt h add R45-2, R-,/ ,-,/ <= <R-,/> + <R4-23 2 bytes cmp imm32, R-,/7 Temp <= <R-,/7> - MEM[imm32] 6 bytes inc R45-2 ,45-2 <= <R45-2> + 1 1 byte jmp label :ump to the address speci4ed by label 2 bytes jl label i( (SF≠!= OF) 2 bytes :ump to the address speci4ed by label xor R45-2, R-,/ ,45-2 <= R45-2 xor R-,/ 2 bytes ! <otice that the :ump instruction :l 5:ump i( less than) depends on S= and ;=, which are status >ags. Status >ags are set by the instruction preceding the :ump, based on the result of the computation. Some instructions, like the cmp instruction, per(orm a computation and set status >ags, but do not return any result. The meanings of the status >ags are given in the (ollowing table8 &ame Purpose Condition Reported $' ;verflow 2esult e.ceeds positive or negative limit of number range S' Sign 2esult is negative (less than zero) How many bytes is the program? =or the above ./0 assembly code, how many bytes of instructions need to be (etched i( b @ 10? "ssuming 32-bit data values, how many bytes of data memory need to be fetched? Stored? Problem 1.B ISC Translate each of the ./0 instructions in the (ollowing table into one or more 2ISC13 instructions. &lace the -1 and loop labels where appropriate. Bou should use the minimum number of instructions needed to translate each ./0 instruction. "ssume that upon entry, x1 contains b, .! contains a, .6 contains i. .C should receive result. $( needed, use . as a condition register, and x6, x7, etc., (or temporaries. Bou should not need to use any >oating-point registers or instructions in your code. " description of the 2ISC13 instruction set architecture can be found in the class website, resources page. 6 x86 instruction label ISC+, instruction se-uence xor %edx,%edx xor %ecx,%ecx cmp 0x8049580,%ecx jl L1 jmp done add %eax,%edx inc %ecx jmp loop +++ done: +++ How many bytes is the RISC13 program using your direct translation? How many bytes of RISC1 3 instructions need to be (etched (or b @ 10 using your direct translation? "ssuming 32-bit data values, how many bytes of data memory need to be fetched? Stored? C Problem 1.C Stack In a stack architecture, all operations occur on top of the stack. ;nly push and pop access memory, and all other instructions remove their operands from the stack and replace them with the result. The hardware implementation we will assume (or this problem set uses stack registers (or the top two entriesE accesses that involve other stack positions (e.g., pushing or popping something when the stack has more than two entries7 use an extra memory re(erence. The table below gives a subset of a simple stack-style instruction set. "ssume each opcode is a single byte. -abels, constants, and addresses require two bytes. #.ample instruction %eaning &9SH A push MF"] onto stack &;& A pop stack and place popped value in MF"G "HH pop two values from the stackE AHH themE push result onto stack S9I pop two values from the stackE S9Itract top value from the 2nd; push result onto stack J#2; )eroes out the value at top of stack $<C pop value from top of stackE increments value by one push new value back on the stack I#KJ label pop value from stackE i( itLs zero, continue at labelE else, continue with next instruction I<#J label pop value from stackE i( itLs not zero, continue at labelE else, continue with next instruction M;T; label continue e.ecution at location label Translate the multiplyByB loop to the stack $S". =or uni(ormity, please use the same control >ow as in parts a and b. "ssume that when we reach the loop, a is the only thing on the stack. "ssume b is still at address 0x8000 (to fit within a 2 byte address speci4er). How many bytes is your program? 9sing your stack translations from part (c), how many bytes of stack instructions need to be (etched (or b @ 10? "ssuming 32-bit data values, how many bytes of data memory need to be (etched? Stored? $( you could push and pop to/from a (our-entry register 4le rather than memory (the Oava virtual machine does this), what would be the resulting number of bytes fetched and stored? Problem 1.D #ccumul!tor In an accumulator $S", one operand is implicitly a speci4c register (the same for all instructions), called the accumulator. For e.ample, to e.ecute C = A + B, the assembly code is8 -oad A -oad A into the accumulator "dd B "dd B to the accumulator (which contains ALs value7 Store CStore the accumulatorLs value into the address contained in C <otice that all instructions use the same accumulator. "lso note that there are no registers in this e.ample, but ", B, and C are all memory addresses. =or this question, we will consider a modi4ed architecture to make a solution possible. ,e will have accumulator instructions denoted by the post4. accumulator, and subtractor instructions which work the same way but re(erence a separate register. There(ore, a load can be either Qload " accumulatorR or Qload " subtractorR depending on where the value of address " should be loaded. 9sing this $S", and addition, subtraction, increment, decrement, )ero, and branch instructions similar to the stack IS", write the assembly for the program of this problem. How many bytes is your program (assume same byte usage as the stack architecture and that the speci4er (or accumulator and subtractor requires no extra bytes7? Can the same program be implemented with :ust one accumulator 5i.e., no subtractor)? $( not, how would you extend this $S" to implement this program with just one accumulator? 0 Problem 1.E Conclusions In just a few sentences, compare the four IS"s you have studied with respect to code si)e, number of instructions (etched, and data memory traffic. ,hich one would you choose i( you were to build a speciali)ed processor to e.ecute the code in this program, and why? Problem 1.F $ptimi0!tion To get more practice with 2ISC13, optimi)e the code from part I so that it can be expressed in (ewer instructions. There are solutions more efficient than simply translating each individual ./0 instruction as you did in part B. Bour solution should contain commented assembly code, a paragraph that explains your optimi)ations, and a short analysis of the savings you obtained. D Problem 2: Microprogramming and Bus+(!sed Architectures In this problem, we explore microprogramming by writing microcode (or the bus-based implementation of the 2ISC13 machine described in Handout #1 (Bus-Based 2ISC13 Implementation). 2ead the instruction (etch microcode in Table H1-3 of Handout #1. %ake sure that you understand how di*erent types of data and control trans(ers are achieved by setting the appropriate control signals be(ore attempting this problem. In order to (urther simpli(y this problem, ignore the busy signal, and assume that the memory is as fast as the register file. The 4nal solution should be elegant and efficient (e.g. number of new states needed, amount of new hardware added). Problem 2.A Implementin" 1emory-to-1emory Add =or this problem, you are to implement a new memory1memory add operation.

Load more