Instructions and Other Documents for ENCM 369 Can Be Found At

Home , Binary file, Bit numbering, NOP (code)

page 1 of 9

ENCM 369 Winter 2014 Lab 9 for the Week of March 17

Steve Norman Department of Electrical & Computer Engineering University of Calgary March 2014

Lab instructions and other documents for ENCM 369 can be found at http://www.enel.ucalgary.ca/People/Norman/encm369winter2014/

1 Administrative details 1.1 You may work in pairs on this assignment You may complete this assignment individually or with one partner. Students working in pairs must make sure both partners understand all of the exercises being handed in. The point is to help each other learn all of the lab material, not to allow each partner to learn only half of it! Please keep in mind that you will not be able to rely on a partner to do work for you on midterm #2 or the ﬁnal exam. Two students working together should hand in a single assignment with names and lab section numbers for both students on the cover page. Names should be complete and spelled correctly. If you as an individual are making the cover page, please get the information you need from your partner. For partners who are not both in the same lab section, please hand in the assignment to the collection box for the student whose last name comes ﬁrst in alphabetical order.

1.2 Due Dates The Due Date for this assignment is 2:30pm, Tuesday, March 24. The Late Due Date is 2:30pm, Tuesday, March 25. The penalty for handing in an assignment after the Due Date but before the Late Due Date is 3 marks. In other words, X/Y becomes (X–3)/Y if the assignment is late. There will be no credit for assignments turned in after the Late Due Date; they will be returned unmarked.

1.3 Marking scheme A 5 marks B 2 marks C 4 marks total 11 marks

1.4 How to package and hand in your assignments Please see the Lab 1 instructions. ENCM 369 Winter 2014 Lab 9 page 2 of 9

Figure 1: For Exercise A, diagram to clarify which are the “A” and “B” inputs of the ALU, and to highlight the multiplexers added for forwarding in the Execute stage in textbook Figure 7.50. See the textbook ﬁgure for full details, including the inputs to the Hazard Unit.

CLK

ALUSrcE ALUControlE 3 00 SrcAE 01 A 10

00 0 ALU SrcBE 01 B 10 1 SignImmE WriteDataE 22 ALUOutM D/E pipeline register ResultW

ForwardBEForwardAE Hazard Unit

2 Exercise A: Forwarding details 2.1 Read This First The point of this exercise is to help you understand the forwarding logic presented in Figure 7.50 in your textbook. Figure1 shows the outputs of the Hazard Unit, and the multiplexers that guide the correct source data into the ALU.

2.2 What to Do, Part I There are four data hazards in the following instruction sequence:

sw $0, 0($29) # line 1 sw $0, 4($29) # line 2 sw $0, 8($29) # line 3 add $8, $16, $17 # line 4 sub $9, $18, $8 # line 5 lw $10, 0($9) # line 6 add $19, $19, $9 # line 7 or $20, $10, $20 # line 8 The ﬁrst hazard is the use of the ﬁrst add result as a source in the sub instruction of line 5. Here is a detailed description of how this hazard is managed by the forwarding unit:

During the Execute stage of sub, the Hazard Unit detects that RsE (01000two for $8) matches WriteRegM (also 01000two for $8) and that ENCM 369 Winter 2014 Lab 9 page 3 of 9

RegWriteM=1. So it sets ForwardBE=10 so that ALUOutM (the add result) is passed to the “B” input of the ALU. Identify the other three data hazards. For each hazard, give details of how the Hazard Unit solves the hazard, using the above example as a model.

2.3 What to Do, Part II There is an obvious data hazard in the following instruction sequence: lw $10, ($8) sw $10, ($9) This kind of code is very common when data is being copied from one memory location to another memory location. Consider the circuit of Figure 7.50 in the textbook. For the above instruction sequence, does the circuit • fail to handle the hazard properly, storing an out-of-date $10 value; • handle the hazard properly using only forwarding; • or handle the hazard properly using a combination of stalling and forwarding? Give a brief explanation for your answer—just a few short sentences.

2.4 What to Hand In Hand in your answers to Parts I and II.

3 Exercise B: Avoiding branch instructions 3.1 Read This First One of the conclusions that can be reached from reading the material titled “Solving Control Hazards” starting on page 421 of the textbook is that branch instructions are potentially very expensive. If there is no branch prediction, or if branch prediction is present but incorrect, a branch instruction may cause a stall of several clock cycles. As a result, modern compilers try to avoid generating branch instructions when possible. Here is an example C code fragment from lectures on solutions to control hazards: /* Count negative elements in an array of ints. */ do { if (*p < 0) count++; p++; } while (p != past_end); MIPS code I thought of for this fragment, aimed at the real MIPS instruction set with delayed branches, is L1: lw $t0, ($a0) slt $t1, $t0, $zero beq $t1, $zero, L2 addiu $a0, $a0, 4 addiu $t9, $t9, 1 L2: bne $a0, $t8, L1 nop ENCM 369 Winter 2014 Lab 9 page 4 of 9

However, when I gave the C code to gcc 3.4.4 for MIPS, it cleverly came up with code like this:

L1: lw $t0, ($a0) addiu $a0, $a0, 4 slt $t1, $t0, $zero L2: bne $a0, $t8, L1 addiu $t9, $t9, $t1 # add slt result to count Notice that the loop is reduced from seven instructions to ﬁve, and there is no possibility of a stall around the decision on *p < 0. Note also that the update to count is in the delay slot of bne, so the update is part of the loop, and that p++ got moved to avoid a stall due to the data hazard of loading to $t0 and then using $t0 as source in slt.

3.2 Read This Second Modern instruction sets oﬀer various ways to do some things conditionally without using a branch instruction. For example, MIPS has “conditional move” instructions called movz and movn: movz $t0, $t1, $t2 # if ($t2 == 0) $t0 = $t1 movn $s0, $a0, $a1 # if ($a1 != 0) $s0 = $a0 The operands can be any GPRs—the ones used above are just examples.

3.3 What to Do Consider this C code:

do { if (*p < min) sum += min; else sum += *p; p++; } while (p != past_last); Using ideas from “Read This First” and “Read This Second”, show how this loop can be coded in the real MIPS instruction set without using a branch instruction for the if-else statement, and without any nop instructions. Assume that p and past_last are of type pointer-to-int in $s0 and $s1, and that min and sum are ints in $s2 and $3. This is a pencil-and-paper exercise—you do not need to make it work in MARS.

3.4 What to Hand In Hand in your code.

4 Exercise C: Little-endian and big-endian memory organization 4.1 Read This First Endianness is an issue related to memory systems that was introduced in ENCM 369 along with the lb, lbu, and sb instructions, but has not been fully explored. ENCM 369 Winter 2014 Lab 9 page 5 of 9

Unlike caches and virtual memory—topics that we are about to study in detail in lectures, labs, and tutorials—endianness is not a deep design concept that has a major effect on program performance. Instead it is an annoying but important practical detail. Endianness refers to the organization of bytes within a memory word. Before stating exactly what endianness is, I will review some points about bit numbering within bytes and words, and about addresses of bytes and words in memory. A byte is an 8-bit chunk of data. (To be precise, eight bits is the near-universal size of a byte these days; some older computers had bytes of different sizes.) Indexing of bits within a byte goes from 0 for the least significant bit to 7 for the most significant bit. A word is a collection of bits that is typically larger than a byte. On MIPS systems, a word has 32 bits, but on other systems a word might have some other number of bits. Numbering of bits within an n-bit word goes from 0 for the least significant bit to n−1 for the most significant bit. (These bit-numbering conventions are near-universal, and are used throughout your textbook and lecture notes.) On MIPS and most other machines with 8-bit bytes and 32-bit words, memory can be accessed as a sequence of words or as a sequence of bytes (or as sequence of 16-bit halfwords, but this course pretty much ignores the existence of halfwords). Each memory word can also be accessed as a collection of four bytes. For example, the word at address 0x7fff_ff24 can also be viewed as four bytes at addresses 0x7fff_ff24, 0x7fff_ff25, 0x7fff_ff26, and 0x7fff_ff27. The issue of endianness is this: If a processor accesses a byte within a memory word, which bits within that word are being accessed? For example, if a MIPS processor stores a byte at address 0x7fff_ff24, how does that affect the word at 0x7fff_ff24? Obviously eight bits of the word are accessed, but which eight bits? Bits 31–24 of the word or bits 7–0 of the word? Or some other set of eight bits within the word? It turns out there are two main schemes in use for ordering of bytes within words: little-endian and big-endian. These are illustrated in Figure2. When comparing the little-endian and big-endian schemes, pay close attention to the difference in how byte offsets are set up. Consider the following sequence of MIPS instructions, and assume that $a0 points to a word in the data segment of a program:

ori $t0, $zero, 0x9 sb $t0, 0($a0) ori $t0, $zero, 0xa sb $t0, 1($a0) ori $t0, $zero, 0xb sb $t0, 2($a0) ori $t0, $zero, 0xc sb $t0, 3($a0) lw $s0, 0($a0) On a little-endian machine, the above sequence would put 0x0c0b_0a09 in $s0, but on a big-endian machine, the sequence would put 0x090a_0b0c in $s0. Here are some notes about endianness on various computer systems:

• x86 and x86-64 machines are little-endian. • Machines that used the Motorola 68000 series of processors were big-endian. • Apple Macintosh computers with PowerPC processors are big-endian. Newer Macs with Intel processors are little-endian. (Apple switched to from Pow- erPC to Intel in 2006, so there are likely relatively few PowerPC Macs in active use today.) ENCM 369 Winter 2014 Lab 9 page 6 of 9

Figure 2: Little-endian and big-endian organization of bytes within 32-bit words.

Little-endian byte oﬀsets within a 32-bit word numbers for bits within the word

31 ... 24 23 ...... 16... 15 870

7 ... 0 7 ...... 0 7 . 070 +2 +1 +0+3 address oﬀsets of bytes numbers for bits within bytes

Big-endian byte oﬀsets within a 32-bit word numbers for bits within the word

31 ... 24 23 ...... 16... 15 870

7 ... 0 7 ...... 0 7 . 070 +1 +2 +3+0 address oﬀsets of bytes numbers for bits within bytes

• The original MIPS processors were big-endian.

• MARS is little-endian. • Modern MIPS processors can work in either big-endian or little-endian mode.

4.2 What to do Assume that $a0 points to a word in the data segment of a little-endian MIPS- like computer. What will be in $s0, $s1, $s2, and $s3 just after the following instructions have been executed? (Pay close attention to the oﬀsets in all the instructions, and if you can’t remember what lb puts in bits 31–8 of its destination, look it up before doing the exercise.)

lui $t0, 0x2255 ori $t0, $t0, 0xaa99 sw $t0, 0($a0) lb $s0, 3($a0) lb $s1, 1($a0) lb $s2, 0($a0) sw $zero, 0($a0) sb $t0, 2($a0) ori $t1, $zero, 0xcc sb $t1, 0($a0) lw $s3, 0($a0) Repeat the exercise, assuming this time that the machine is big-endian. ENCM 369 Winter 2014 Lab 9 page 7 of 9

4.3 What to Hand In Hand in your answers. Be clear about which answers are little-endian and which are big-endian.

5 Optional Exercise D: Endianness and binary ﬁles

Do this exercise if you would like to get some more practice with endianness and also learn about the difference between text files and binary files.

5.1 Read This First One area where endianness is of major concern to applications programmers is a situation where a binary file is written by one computer system and read by another. Before introducing binary file operations, let’s very briefly review text file operations. Consider a C program using fprintf to write an int to a text file. Suppose that the character set in use is ASCII, that fp is of type FILE* and has been opened for output, and that i is an int. Consider this code fragment: i = 12345; fprintf(fp, "%d\n", 12345);

The fprintf function converts the computer’s internal representation of 12345 to a sequence of bytes: the ASCII code for ’1’, the ASCII code for ’2’, and so on. These five bytes followed by the code for ’\n’ are written to the file. So the effect of the function call is to put this sequence of bytes in the file:

00110001 00110010 00110011 00110100 00110101 00001010 (Actually, the above is what you would get on a Linux system. On Microsoft Windows you would get

00110001 00110010 00110011 00110100 00110101 00001101 00001010 because Windows uses a two-byte sequence to terminate lines in text files. But this exercise is mostly about binary files, not text files.) Unlike a text file, a binary file is not necessarily a sequence of character codes. The C view of a binary file is a sequence of bytes that might be character codes, pieces of instructions, pieces of integers or pieces of floating point numbers, or that might have some other meaning. The key C library functions for binary file input and output are fread and fwrite. The following code demonstrates writing the bit pattern for an int to a binary file. Again suppose that fp is of type FILE* and has been opened for input:

i = 12345; fwrite((void *) &i, sizeof(int), 1, fp); ENCM 369 Winter 2014 Lab 9 page 8 of 9

The arguments are: the address of the first byte to be copied to the file; the size, in bytes, of the data items to be copied to the file; the number of data items to be written to the file; fp, which specifies the file. The 32-bit representation of 12345 is

0000_0000_0000_0000_0011_0000_0011_1001 fwrite simply copies a sequence of bytes from memory to a ﬁle. So on a 32-bit big-endian machine, the call to fwrite will put the following sequence of bytes in the ﬁle:

00000000 00000000 00110000 00111001 But on a little-endian machine the bytes would be written in this order:

00111001 00110000 00000000 00000000 Note that in both cases the bytes are totally different from what is produced by the call to fprintf. The function to read from a binary file is fread. It should be clear that endianness can cause problems if a binary file is written using fwrite on a machine with one endianness and read using fread on a machine with the opposite endianness.

5.2 What to do, Part I Make a copy of the directory encm369w14lab09/exD There are two C programs in the directory. The program binwrite.c generates a small binary file with the following format: the first four bytes are an unsigned int giving the number of array elements stored in the file, and the remaining bytes are elements from an array of unsigned ints, four bytes per element. The program binread.c reads a file in the same format and displays the number of elements and array contents. Read the two programs carefully to get an idea of how they work. There are two data files in the directory: x86_binwrite_out.dat is an output file produced by binwrite.c on a (little-endian) x86-based Linux system; ppc_binwrite_out.dat is an output file produced by binwrite.c on a (big-endian) Apple iBook (with a big-endian PowerPC processor) running Mac OS X. On Linux, the hexdump tool can be useful for examining the contents of binary files. It takes various options for displaying file contents in various ways; with the -C option, each line of output shows 16 bytes from the input file as two-digit hexadecimal numbers, along with ASCII interpretations where possible. Enter the command

hexdump -C ppc_binwrite_out.dat and then the command

hexdump -C x86_binwrite_out.dat What you see should help you understand the eﬀect of endianness on the sequences of bytes in the two ﬁles. ENCM 369 Winter 2014 Lab 9 page 9 of 9

5.3 What to do, Part II Build an executable from binread.c. Run it using x86_binwrite_out.dat as the input file; then run it with ppc_binwrite_out.dat as the input file. The difference in behaviour will be obvious. Make a copy of binread.c and modify it so that it can correctly read the data in ppc_binwrite_out.dat. Do not modify the calls to fread; instead use operations such as shifts (C << and >> operators) and bitwise and’s and or’s (C & and | operators) to rearrange the bytes within variables.

5.4 What to Hand In Nothing. A solution will be posted on the course website during the week of March 24.

5.5 Closing Remark Storing numbers in binary files is somewhat more efficient than using text files, because binary files are smaller and because with binary files no time is spent converting base two numbers to sequences of base ten digits and vice versa. However, endianness is only one of the many issues that can arise when reading and writing binary files on different platforms. Two of the many other issues are variations in the size of integer types (16-bit, 32-bit, or 64-bit?) and old, non- standard formats for floating-point numbers.