Cmpt 150 A Simple Computer March, 2012

A Simple Computer

It’s time to assemble some pieces and create a computer. We’re not going to cover this in depth — many details will be swept under the rug — but at the end we should have a pretty good idea of how assembly language connects with the underlying hardware.

Instruction Set Architecture

The true specification for a CPU is the instruction set architecture (ISA). In plain language, the set of instructions that the CPU hardware is capable of recognising and executing. In the roughly 70 years since the first digital computers were constructed1 hardware and software have co-evolved to their present state. Today there are two broad classes of ISAs:

@ Complex Instruction Set (CISC) ISAs are the older class, evolved from the pe- riod when the guiding philosophy was a rich set of machine instructions that facilitated assembly language programming by humans and closely matched the requirements of evolving higher-level programming languages (e.g., For- tran and Cobol). The dominant CISC ISA today is the Intel instruction set.

@ Reduced Instruction Set (RISC) ISAs are the newer class, evolving from the observation that the vast majority of assembly language today is written by compilers. RISC ISAs have a bare minimum of instructions, chosen to min- imise hardware complexity and maximise speed of execution. Multiple RISC instructions are often required to do the work of a single CISC instruction, but even so can be executed more quickly. Compilers and assemblers do the tedious work of translating programs in high-level language into machine instructions. RISC ISAs are also referred to as load/store instruction sets, because only load and store instructions can transfer data between the CPU registers and RAM memory. All other instructions operate on data in registers The dominant RISC ISAs today are the SPARC, MIPS, and PowerPC archi- tectures.

1Depending on where you want to start, the Atanasoff-Berry computer in 1937, the Harvard Mark I in 1944, or the Manchester SSEM in 1948. The SSEM was the first implementation of a computer where instructions and data were both stored in memory, the von Neumann architecture.

1 Cmpt 150 A Simple Computer March, 2012

It’s worth pointing out that Intel has put a huge amount of effort into speeding up the execution of their CISC ISA, and it can be argued that the resulting CPU hardware essentially translates the CISC instructions into RISC instructions on- the-fly for execution by the hardware. The ISA presented by Mano in Chapter 9 of the text is a RISC architecture. The ISA implemented in the HC12 is a CISC architecture. We’ll explore the differences when we’ve had a chance to look at both. A CPU implements an algorithm called the instruction execution cycle to pro- cess the instructions of a program. Here are the steps:

1. Instruction Fetch: Fetch the instruction from memory into the CPU for pro- cessing. 2. Instruction Decode: Inspect the instruction to see what operation it specifies, what operands it needs, and what it will do with the result. 3. Operand Fetch: Make the operands available to the appropriate functional unit for execution. 4. Operation Execution: Perform the operations specified by the instruction using the operands fetched in the previous step. 5. Store Result: Store the result of the operation.

All instructions require fetch and decode, but some instructions do not need all of the last three steps. In that case, the hardware is instructed to perform an appropriate ‘no operation’ for that particular step. We’ll see how this is done shortly. The number of clock periods required to execute an instruction, and the length of the clock period, depend on the the details of the implementation. We will design a simple, but relatively slow, implementation.

The Mano Simple Computer Instruction Set

Table 1 shows the complete instruction set for the Mano Simple Computer. This is a RISC (load/store) instruction set, recognisable because only the load (LD) and store (ST) instructions can transfer data between the registers and mem- ory. All other instructions operate on data in the registers. Digital hardware understands only 1’s and 0’s, so the assembly language of Table 1 must be translated into machine language using the formats shown in Figure 1. Let’s consider a few examples:

2 Cmpt 150 A Simple Computer March, 2012

Operation Mnemonic Action Status Fmt Move A MOVA 0000000 R[DR] ← R[SA] N, Z R Increment INC 0000001 R[DR] ← R[SA] + 1 N, Z R Add ADD 0000010 R[DR] ← R[SA] + R[SB] N, Z R Subtract SUB 0000101 R[DR] ← R[SA] + R[SB] + 1 N, Z R Decrement DEC 0000110 R[DR] ← R[SA] − 1 N, Z R AND AND 0001000 R[DR] ← R[SA] ∧ R[SB] N, Z R OR OR 0001001 R[DR] ← R[SA] ∨ R[SB] N, Z R Exclusive-OR XOR 0001010 R[DR] ← R[SA] ⊕ R[SB] N, Z R NOT NOT 0001011 R[DR] ← R[SA] N, Z R Move B MOVB 0001100 R[DR] ← R[SB]R Shift Right SHR 0001101 R[DR] ← sr R[SB]R Shift Left SHL 0001110 R[DR] ← sl R[SB]R Load Immediate LDI 1001100 R[DR] ← zf OP I Add Immediate ADI 1000010 R[DR] ← R[SA] + zf OP N, Z I Load LD 0010000 R[DR] ← M[R[SA]] R Store ST 0100000 M[R[SA]] ← R[SB]R Branch on Zero BRZ 1100000 if (R[SA] = 0) PC ← PC + se AD N,Z J if (R[SA] ≠ 0) PC ← PC + 1 Branch on Negative BRN 1100001 if (R[SA] < 0) PC ← PC + se AD N, Z J if (R[SA] ≥ 0) PC ← PC + 1 Jump JMP 1110000 PC ← R[SA]J

Where no PC assignment is specified, the instruction performs PC ← PC + 1.

Table 1: Mano Simple Computer Instruction Set ([1, Table 9-8])

@ Supppose that we want to subtract the contents of R[4] from R[5], placing the result in R[7]. In assembly language, we would write SUB R7, R5, R4 In machine language, this is an R format instruction:

15 9 8 6 5 3 2 0 0000101 111 101 100

@ Supppose that we want to load R[6] with the constant 7. In assembly lan- guage, we would write LDI R6, 7

3 Cmpt 150 A Simple Computer March, 2012

15 9 8 6 5 3 2 0 Destination Source Source Opcode Register (DR) Register A (SA) Register B (SB)

15 9 8 6 5 3 2 0 Destination Source Opcode Operand (OP) Register (DR) Register A (SA)

15 9 8 6 5 3 2 0 Address (AD) Source Address (AD) Opcode Left Register A (SA) Right

Figure 1: Machine Instruction Formats for the Mano Simple Computer ([1, Fig- ure 9-14])

In machine language, this is an I format instruction:

15 9 8 6 5 3 2 0 1001100 110 000 111

Note that the value of the RA field is not used; the actual value is unimpor- tant so it’s set to zero.

@ Supppose that we want to branch to PC − 5 if the value in R[3] is less than zero. In assembly language, we would write BRN R3, -5 In machine language, this is a J format instruction:

15 9 8 6 5 3 2 0 1100001 111 011 011

Remember that for a branch instruction, the values of the RD and RB fields are concatenated to form a six-bit two’s-complement value; in this case, −5 = 111011.

To get an understanding of how all of this fits together, let’s consider the fol- lowing (Java) program fragment: int[] arrayOfInt = new int[7] ; for (int ndx = 6 ; ndx >= 0 ; ndx = ndx-1) arrayOfInt[ndx] = -ndx ; How could we implement this in the Simple Computer? There are a number of issues to consider:

4 Cmpt 150 A Simple Computer March, 2012

@ Every data item requires space. We’ll have to identify some space in registers or memory to hold the values for ndx and arrayOfInt. And once we’ve decided on where to put them, we’ll have to figure out a way to create those values in our program.

@ Then there’s the matter of constructing the loop from the available instruc- tions.

Let’s see how we might go about this. Suppose that I decide to place my array at location 42 in memory. I’ll need to set a register to the value 42 so that I can use it in a ST instruction. A quick glance at the instruction set is enough to see that the largest value I can specify as an immediate operand in an instruction is 7. How can I construct the value 42? One trick is to use shift left to multiply by 2. I can construct the value 42 as 8 × 5 + 2:

LDI R7, 5 ; Load R7 with the seed SHL R7, R7 ; x 2 = 10 SHL R7, R7 ; x 2 = 20 SHL R7, R7 ; x 2 = 40 ADI R7, R7, 2 ; R7 now contains 42

This is more work that you might expect, and a practical instruction set will include the ability to specify immediate operands of a practical size (at least 16 bits) and the ability to shift by a variable amount. Even so, you may occasionally find yourself faced with a similar task. Another thing to take away from this example is that we need temporary reg- isters to hold intermediate values. If there are not enough registers, intermediate values must be stored in memory and recovered as needed. So . . . R7 contains the value 42, the address of arrayOfInt. What about ndx? For our purposes here, all I need is a register, which I can easily initialise to 6. Let’s assign R1 to hold the value of ndx. Skipping over the test at the head of the loop for a second, let’s consider the assignment statement in the body of the loop. I have two problems to deal with: I need to construct -ndx and I need to construct the address of arrayOfInt[ndx].

@ Recalling that the two’s-complement representation of −k is k + 1, I can write

NOT R2, R1 INC R2, R2 ; R2 now contains -ndx

5 Cmpt 150 A Simple Computer March, 2012

@ To construct the address of arrayOfInt[ndx], I need to add together the base (R7) and the value of ndx (R1). Then I can easily do the store:

; Set R6 to the address of arrayOfInt[ndx] and store -ndx ADD R6, R7, R1 ST R6, R2

And now I come to the final hurdle: constructing the loop. At entry, I want to check whether ndx is negative; if so, I can skip the loop entirely. And I’ll want to return here after each iteration of the loop, so let’s add a label.

; Branch to the instruction immediately after the end of the ; loop Loop: BRN R1, Done

But what about the other end of the loop? I need to return to Loop uncondition- ally.

@ A JMP will be awkward; I’ll need to construct the address in a register and that will be very difificult. There’s no way to capture the address of an in- struction in the Simple Computer ISA. I would need to decide in advance the location of the instruction and construct the corresponding address.

@ Perhaps I can easily construct a value that will satisfy one of the conditional branches. For BRZ, I need a zero. From boolean logic, x ⊕ x = 0, and I have an XOR instruction. Given 0, I can construct -1 in many ways, but that’s one addi- tional instruction. Let’s go with

; The need for zero as a constant is so common that on some ; RISC architectures R0 is permanently wired to 0 XOR R0, R0, R0 BRZ R0, Loop ; back to start of loop

6 Cmpt 150 A Simple Computer March, 2012

Let’s put it all together. R7 will point to the start of the block of storage for arrayOfInt and R1 will hold the value of ndx. R0, R2, and R6 will be used to hold temporary values. ; Load R7 with the array base, 42 LDI R7, 5 SHL R7, R7 SHL R7, R7 SHL R7, R7 ADI R7, R7, 2 ; Initialise the array index LDI R1, 6 ; Check to see if we need to perform an iteration of the loop Loop: BRN R1, Done ; Calculate -ndx NOT R2, R1 INC R2, R2 ; Set R6 to the address of arrayOfInt[ndx] and store -ndx ADD R6, R7, R1 ST R6, R2 ; Decrement the index and iterate DEC R1, R1 XOR R0, R0, R0 BRZ R0, Loop ; Continue from here when the loop is finished Done:

7 Cmpt 150 A Simple Computer March, 2012

Now that we have the code all in one place, it’s clear that we should simply set R0 to 0 right at the start and use it to calculate -ndx. The revised code looks like this: ; Make zero easily available XOR R0, R0, R0 ; Load R7 with the array base, 42 LDI R7, 5 SHL R7, R6 SHL R7, R6 SHL R7, R6 ADI R7, R7, 2 ; Initialise the array index LDI R1, 6 ; Check to see if we need to perform an iteration of the loop Loop: BRN R1, Done ; Set R6 to the address of arrayOfInt[ndx] and store -ndx ADD R6, R7, R1 SUB R2, R0, R1 ST R6, R2 ; Decrement the index and iterate DEC R1, R1 BRZ R0, Loop ; Continue from here when the loop is finished Done:

8 Cmpt 150 A Simple Computer March, 2012

The job of an assembler is to translate assembly language to machine lan- guage, but let’s do it by hand to get a feel for how it works. Let’s assume we’re starting at memory location 0x0000 to keep things simple. The first column is the memory address of the instruction, in hexadecimal. Columns 2 – 5 are the machine language (opcode and register specifiers) in binary. (A normal listing will give the machine language in hex, but binary is better to show the fields of the instructions, which don’t align with hex digit boundaries.) ; Make zero easily available 0000 0001010 000 000 000 XOR R0, R0, R0 ; Load R7 with the array base, 42 0001 1001100 111 000 101 LDI R7, 5 0002 0001110 111 000 111 SHL R7, R7 0003 0001110 111 000 111 SHL R7, R7 0004 0001110 111 000 111 SHL R7, R7 0005 1000010 111 111 010 ADI R7, R7, 2 ; Initialise the array index 0006 1001100 001 000 110 LDI R1, 6 ; Check to see if we need to perform ; an iteration of the loop 0007 1100001 000 001 110 Loop: BRN R1, Done ; (PC+6) ; Set R6 to the address of ; arrayOfInt[ndx] and store -ndx 0008 0000010 110 111 001 ADD R6, R7, R1 0009 0000101 010 000 001 SUB R2, R0, R1 000A 0100000 000 110 010 ST R6, R2 ; Decrement the index and iterate 000B 0000110 001 001 000 DEC R1, R1 000C 1100000 111 000 011 BRZ R0, Loop ; (PC-5) ; Continue from here when the loop is ; finished 000D xxxxxxx xxx xxx xxx Done:

The Single-Cycle Implementation of the Mano Simple Computer ISA

Figure 2 shows a block diagram of the single-cycle implementation of the Mano Simple Computer ISA. Mano’s Figure 9-15 is not the best; the figure here makes clear the number of bits in data paths and bundles of control signals, and clarifies the structure of the branch control logic. To make the final connections between the assembly language example and execution of instructions by the hardware, we need to do a bit more work on the major components of the design.

9 Cmpt 150 A Simple Computer March, 2012

AD (left) AD (right)

AD 6

Sign D RW rw Extend DA 3 DA 8 x 16 Register 16 File

AA 3 AA BA 3 BA N AB Z branch control PL and associated 16 16 JB logic BC

3 Zero Fill 16 16

1 0 MB PC Mux B

16 16

MW Address AB Data Inrw Address 64K x 16 FS FS Instruction 4 64K x 16 Function Memory Data V Unit N Memory Instruction Out C Z F Data Out

16 16 16 IR(2:0) 3 IR(8:6) 3

0 1 MD Instruction Decoder Mux D

16 3 3 3 4

D B A M F M R M P J B A A A B S D W W L B C

Figure 2: Single-Cycle Implementation of the Mano Simple CPU ISA (based on [1, Figure 9-15])

10 Cmpt 150 A Simple Computer March, 2012

FS A B

4 16 16

(1:0)

AB B ci S(1:0) (0) G Arithmetic / Logic Select Unit S(2:0) Shifter (3:1) (ALU) C co V ovf 0 ri li 0 G H

(15) N 16 16 Z

(3:2) 0 1 Mux F

16

F

Figure 3: Function Unit for the Mano Simple CPU (based on [1, Figure 9-1])

We’ve already looked at the workings of the Register File, and we know that the same principles apply to the Instruction Memory and Data Memory. The design of the Function Unit is straightforward and the text does an ade- quate job of developing the design in §§9.3 and 9.4, but Figure 9-1 does not do justice to the completed unit. Figure 3 shows the Function Unit with a bit more detail than is presented in the text. It is necessary to know the control signal val- ues required for a given operation when designing the Instruction Decoder; these are specified in Table 9-4 of the text. In the next two sections, we’ll explore the design of the Instruction Decoder and the Branch Control logic.

The Instruction Decoder

The Instruction Decoder takes as its input the opcode from the instruction and produces the control signals FS(3:0), MB, MD, RW , MW , PL , JB, and BC. The register select signals DA, AA, and BA really aren’t involved in the design

11 Cmpt 150 A Simple Computer March, 2012

Instruction Bits Control Word Bits Instruction Function Type 15 14 13 MB MD RW MW PL JB Function-unit operations 0 0 0 0 0 1 0 0 X using registers Memory read 0 0 1 0 1 1 0 0 X Memory write 0 1 0 0 X 0 1 0 X Function-unit operations 1 0 0 1 0 1 0 0 X using register and constant Conditional branch 1 1 0 X X 0 0 1 0 Unconditional jump 1 1 1 X X 0 0 1 1

Table 2: Truth Table for Simple Computer Instruction Decode Logic (adapted from [1, Table 9-10]) of the decoding logic; they are wired directly from the corresponding fields of the instruction to the address inputs of the Register File. Table 2 shows a compilation of control signals built by looking at each instruc- tion and determining the control signals required to execute that instruction. The table is adapted from Table 9-10 of the text, with IR(9) and BC removed to em- phasise the grouping according to the first three bits of the opcode, IR(15:13). As can be seen, the first three bits determine the basic mode of operation for the data path.

@ Function-unit instructions using registers take one or two values from the Register File, perform some operation in the Function Unit, and store the result back into the Register File. This implies that MuxB should select the value from the Register File (MB = 0) and MuxD should select the output of the Function Unit (MD = 0). RW is set to 1 to write the result back into the Register File. Notice that the low-order bits of the opcode, IR(12:9), are chosen so that they can be used directly as the FS input to the Function Unit. (Compare Table 9-4 in the text with IR(12:9) for this group of instructions.) Also see the comments for branch and jump instructions for the gating associated with FS(0).

@ The function-unit instructions using an immediate operand require that MuxB pass the immediate operand to the Function Unit (MB = 1). Other- wise, they have the same requirements as the function-unit operations using a B register.

12 Cmpt 150 A Simple Computer March, 2012

@ A memory read instruction requires that the output of the memory be se- lected by MuxD (MD = 1) and written back into the Register File.

@ A memory write instruction requires that the B output of the Register File be selected and passed to the Data Memory’s data input2. Further, we must write the memory (MW = 1) and we must not write the Register File (RW = 0).

@ For all of the above instructions, the PC should be incremented by 1, so the PC load signal PL is set to 0.

@ Branch and jump instructions do not write the Data Memory or the Register File. They will load a new value into the PC (PL = 1). The JB control signal chooses between a value from the Register File (JMP) or PC + se AD (BRZ, BRN). The BRZ and BRN instructions rely on the Function Unit to evaluate R[SA] and produce the correct values for the Z and N flags. In order to do this, FS(0) must be 0. The logic feeding FS(0) accomplishes this task, using IR(15) ⋅ IR(14) to identify a branch instruction3.

The corresponding implementation is shown in Figure 4.

The PC and Associated Control Logic

Let’s consider the branch (BRZ, BRN) and jump (JMP) instructions.

@ The branch instructions allow us to change the flow of instruction execution in response to data values. BRZ looks at the Z flag and allows us to branch when the ALU result is zero. BRN looks at the N flag and allows us to branch when the ALU result is negative. The change in instruction flow is relative to the current value of the . Given a six-bit offset interpreted as a two’s-complement value, we can move to any instruction within a range ±32 from the current instruction.

2There is no good reason for this choice given that the Simple Computer ISA does not contain an instruction to write an immediate operand to memory. We could just as easily connect the data input of the Data Memory directly to the B output of the Register File

3Mano makes you work a bit for this. If you follow through with the comparison of FS codes and IR(12:9) suggested above and work through the design of the Arith- metic/Logic Unit, you will notice that, as shown in Figure 3, FS(0) is the value of the carry-in for the arithmetic portion of the ALU.

13 Cmpt 150 A Simple Computer March, 2012

Instruction (from Instruction Memory

16

break out individual bits of instruction Opcode DR SA SB 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

3 3 3 BA AA DA

MB FS(3:0) MD RW MW PL JB BC

Figure 4: Instruction Decode Logic for the Mano Simple CPU (based on [1, Fig- ure 9-16])

@ The jump instruction only provides for unconditional change to the flow of instruction execution, but the range is greater. We can load the PC with the value of any register.

The portion of Figure 9-15 that implements the branch and jump instructions (the boxes labelled PC, Branch Control, and Extend) are very deceptive. Mano should be somewhat ashamed of his presentation for this part of the design. Let’s see if we can do better. From Table 9-8, the definition of these instructions is BRZ (R[SA] = 0) : PC ← PC + se AD (R[SA] ≠ 0) : PC ← PC + 1 BRN (R[SA] < 0) : PC ← PC + se AD (R[SA] ≥ 0) : PC ← PC + 1 JMP PC ← R[SA]

14 Cmpt 150 A Simple Computer March, 2012

The necessary data connections appear to be in place:

@ There’s a data path to present the sign-extended value of AD ≡ IR(8:6)||IR(2:0) at the input to the block labelled PC. @ There’s a data path to present the value of R[SA] at the input to the block labelled PC. @ The ALU result status signals V , C, N, and Z are all available as inputs to the block labelled Branch Control, even though we only need N and Z. @ The nature of the connection between the Branch Control block and the PC block is unspecified, as is the logic in both blocks.

Mano suggests three control signals, PL , JB, and BC. Looking at Table 9-10, PL will be 1 for any branch or jump instruction and is described in the text as ‘PL = 1 will load the PC, PL = 0 will increment it.’ JB will be 1 only for a jump, and BC chooses the branch condition. At first glance, these don’t seem to be a particularly good match to the definitions of the BRZ, BRN, and JMP instructions. Let’s step back for a moment and consider what must happen to implement the branch and jump instructions:

@ For an unsuccessful branch, we perform PC ← PC + 1. @ For a successful branch, we load the PC with one of (PC + se AD) or (R[SA]).

A straightforward digital circuit to do this is se AD shown to the right. 16 AB @ The signal JB will select the appropriate value to load into the PC. When it’s 1, Adder

we choose the value of R[SA]; when ∑ it’s 0, we choose the value of PC + se AD. 16 R[SA] 16 @ The signal ldPC is defined to be 0 1 JB Mux

ldPC = JMP + BRZ ⋅ Z + BRN ⋅ N 16 16

0 1 ldPC where JMP is shorthand for (opcode = +1 Mux

1110000), and similarly for BRZ and 16 BRN. PC

16 Looking at the definitions of BC and PL in Table 9-10, we could rewrite this as ldPC = PL ⋅ JB + PL ⋅ JB ⋅ BC ⋅ Z + PL ⋅ JB ⋅ BC ⋅ N

15 Cmpt 150 A Simple Computer March, 2012

Can we do better? Consider the logic shown to the right.

@ The signal takeJMP = PL ⋅ JB will con- se AD 16 trol the multiplexer. When it’s 1, we 16 takeBR choose the value of R[SA]; when it’s 0, AB we choose the value produced by the Adder adder. ci ∑ @ The adder now handles both PC + 1 and 16 R[SA] PC + se AD. 16

0 1 takeJMP When the signal takeBR is 0, the (multi- Mux bit) output of the AND gates is forced to 16 0, while ci is set to 1. The adder pro- duces PC + 1. When the signal takeBR PC is 1, the output of the AND gates is (se AD), ci is set to 0, and the output 16 of the adder is PC + se AD. The net result is to increment the PC when both takeBR and takeJMP are 0, load the PC with (PC + se AD) when takeBR = 1 and takeJMP = 0, and load the PC with R[SA] when takeJMP = 1. Looking at the definitions of BC and PL in Table 9-10, the signal takeBR can be defined as: takeBR = PL ⋅ JB ⋅ BC ⋅ Z + PL ⋅ JB ⋅ BC ⋅ N Noting that we don’t use the output of the adder when JB = 1, we can eliminate it from the expression for takeBR to get

takeBR = PL ⋅ BC ⋅ Z + PL ⋅ BC ⋅ N

Clearly other choices for PL , JB, and BC are possible, but the signals Mano has chosen can be used to choose the correct values with little additional logic.

References

[1] M. Mano and C. Kime. Logic and Computer Design Fundamentals. Prentice Hall, Upper Saddle River, New Jersey, 4/e edition, 2008.

16