<<

ENCM 501 W14 Slides for Lecture 6 slide 2/33 Previous Lecture Slides for Lecture 6 ENCM 501: Principles of Computer Architecture Winter 2014 Term

I introduction to ISA design ideas

Steve Norman, PhD, PEng I memory-register and load-store architectures

I a very brief history of RISC versus CISC Electrical & Computer Engineering Schulich School of Engineering I aspects of the ISA view of memory—flat address spaces, University of Calgary alignment rules 28 January, 2014

ENCM 501 W14 Slides for Lecture 6 slide 3/33 ENCM 501 W14 Slides for Lecture 6 slide 4/33 Today’s Lecture

This is not really an aspect of computer design in which there are interesting cost or performance tradeoffs. Rather, it’s an annoying detail that will occasionally bite you if you aren’t I endianness aware of it. I addressing modes Registers inside cores do not have endianness. An I examples of tradeoffs in instruction set design N- register just has N 1 (MSB), N 2, . . . , 2, 1, 0 (LSB). − − Related reading in Hennessy & Patterson: Sections A.3–A.7 Endianness is a property of the interface between the processor core and the memory, and comes from the fact that most ISAs allow memory reads and writes with various sizes, typically 1-, 2-byte, 4-byte, and 8-byte.

ENCM 501 W14 Slides for Lecture 6 slide 5/33 ENCM 501 W14 Slides for Lecture 6 slide 6/33 Endianness in 64-bit MIPS doublewords Endianness in 32-bit MIPS words

The byte offset gives the address of an individual byte relative The byte offset gives the address of an individual byte relative to the address of the entire doubleword. to the address of the entire word.

Bit numbering: 63 is MSB, 0 is LSB : 31 is MSB, 0 is LSB 63 56 55 48 47 40 39 32 31 24 23 16 15 870 31 24 23 16 15 870

+7 +6 +5 +4 +3 +2 +1 +0 +3 +2 +1 +0 LITTLE-endian byte offsets LITTLE-endian byte offsets

Bit numbering: 63 is MSB, 0 is LSB Bit numbering: 31 is MSB, 0 is LSB 63 56 55 48 47 40 39 32 31 24 23 16 15 870 31 24 23 16 15 870

+0 +1 +2 +3 +4 +5 +6 +7 +0 +1 +2 +3 BIG-endian byte offsets BIG-endian byte offsets ENCM 501 W14 Slides for Lecture 6 slide 7/33 ENCM 501 W14 Slides for Lecture 6 slide 8/33 Example effect of endianness in MIPS32 Practical code rarely (if ever) writes data as a word and later reads it back as , as was done in the example on the last slide. Why is endianness a practical concern? Assume that R8 contains Here is a practical problem: # LI: pseudoinstruction some valid address that is a I Program P1 on Computer C1 copies an array of integers # for "load immediate" multiple of four. or FP numbers from memory into a file using a function LI R9, 0x12345678 What goes into R10, R11, like fwrite in the library. SW R9, 0(R8) R12, R13, if the processor I On disk, the file is just a long sequence of bytes. LB R10, 0(R8) chip is in little-endian I Program P2 on Computer C2 opens the file and tries to LB R11, 1(R8) mode? read the array of numbers from the file into memory using LB R12, 2(R8) a function like fread in the C library. LB R13, 3(R8) What if the processor chip is in big-endian mode? I But C2 does not have the same endianness as C1, so the data does not make sense to P2. The same kind of problem can happen when streaming multi-byte numbers over a network.

ENCM 501 W14 Slides for Lecture 6 slide 9/33 ENCM 501 W14 Slides for Lecture 6 slide 10/33 Endianness and real systems Addressing modes

Today little-endianness is much more common than big-endianness. Here are some little-endian systems: Unlike endianness, selection of addressing modes for an ISA is a set of design decisions that involve interesting tradeoffs. I anything running on or x86-64;

I Apple iOS, (including Android), and Windows is a slightly misleading term, because it running on ARM. refers to the way in which an is accessed by an Some historically important big-endian machines were: instruction, and that might or might not involve generation of a . I Macs with 68000- or PowerPC-based processors; Addressing modes for data access are discussed as part of I 68000- and SPARC-based computers from Sun Microsystems. Section A.3 in the textbook. Many modern ISA families, for example, MIPS and ARM, Addressing modes for instruction access—needed, for allow the processor to switch back and forth between little- example, by branches and jumps—are discussed in Section A.6. and big-endian modes.

ENCM 501 W14 Slides for Lecture 6 slide 11/33 ENCM 501 W14 Slides for Lecture 6 slide 12/33 Examples of addressing modes for data Addressing modes: Register and Immediate

Figure A.6 in the textbook gives examples covering most addressing modes available in ISAs of the present and the Register: Data is coming from or going to a register. All recent past. three are accessed in register mode in this MIPS64 A typical ISA will support some but not all of these instruction: addressing modes. (Historical note: I think the MC68000 DADDU R10, R8, R9 series supported all of them and more, which is kind of awesome.) Immediate: Source data is a constant written into the instruction. Here is a MIPS64 example in which two operands This lecture won’t explain every addressing mode in detail, but are register-mode and one is immediate-mode: instead will look at the ones that are most common and DADDIU R16, R16, 8 important. Let’s start with the two modes that don’t involve generation of a memory address . . . ENCM 501 W14 Slides for Lecture 6 slide 13/33 ENCM 501 W14 Slides for Lecture 6 slide 14/33 Encoding of immediate operands in example ISAs The two simplest addressing modes for memory access x86-64: Instruction size is variable, so 1, 2, 4, or 8 bytes are used, as necessary, to describe the constant. Hint for comprehension: Roughly speaking, indirect means “via a pointer”. MIPS32 and MIPS64: Instructions are always 32 bits wide and the field size for immediate operands is always 16 bits Register indirect: Use the bits in a register as a memory wide. The range of constants is 32768 to +32767 for address. MIPS64 example: − instructions that use signed constants and 0 to 65535 for LD R8, (R9) # R8 = doubleword at address in R9 those that use unsigned constants. Displacement: Add a constant to the bits in a register to ARM: 12 bits within the fixed instruction size of 32 bits are generate a memory address. MIPS64 example: used for an immediate operand, in a complicated and # R10 = doubleword at address R10 + 64 bytes interesting way that could totally derail a lecture! (That’s one LD R10, 64(R11) of a few very good reasons why it would not be easy to switch from MIPS to ARM in ENCM 369.) Why is register indirect mode really just a special case of displacement mode?

ENCM 501 W14 Slides for Lecture 6 slide 15/33 ENCM 501 W14 Slides for Lecture 6 slide 16/33 Scaled mode: Good for array element access Autoincrement and autodecrement modes (1) Here is some x86-64 code you will look at Other names for these modes are post-increment and in Assignment 2 . . . pre-decrement. .L16: In either of these modes a load causes two register mov (%rbx,%rax,4), %edx updates—one to a destination register, and another to a addq $1, %rax pointer register. A store also causes two updates—one update addq %rdx, %rbp to a memory location and another to a pointer register. cmpq $500000000, %rax jne .L16 Both are useful for walking through arrays using pointer arithmetic. The mov instruction uses scaled mode: The address used to A store using pre-decrement mode is an efficient way to read memory is push a register value on to a stack. %rbx + 4 %rax × And a load using post-increment mode is an efficient way to %rbx is the address of element 0 of an array of 4-byte pop a register value from a stack. elements, and %rax is an index into that array.

ENCM 501 W14 Slides for Lecture 6 slide 17/33 ENCM 501 W14 Slides for Lecture 6 slide 18/33 Autoincrement and autodecrement modes (2) Memory indirect mode Example, using syntax from textbook Figure A.6: MOV R0, @(R1) The address in R1 is used to read a second address from memory. That second address is used to read from memory into R0. In a These modes closely match some famously tricky C and C++ typical load/store architecture this would be done with two expressions. instructions: a load followed by another load. Let’s write a couple of C statements that could be each be Another example, using the same syntax: implemented using a single instruction if autoincrement and MOV @(R2), R3 autodecrement modes are available. The address in R2 is used to read a second address from memory. That second address is used to write the data from R3 to memory. In a typical load/store architecture this would be done with two instructions: a load followed by a store. This mode is somewhat obsolete these days, but thinking about it helps to understand pointer-to-pointer types in C and C++. ENCM 501 W14 Slides for Lecture 6 slide 19/33 ENCM 501 W14 Slides for Lecture 6 slide 20/33 MIPS instruction format for loads and stores What limits the number of GPRs (or FPRs)

Just about all MIPS32 and MIPS64 load and store instructions available to an ISA? are organized like this: 31 26 25 21 20 16 150 The limit is not due to the chip area dedicated to registers! base rt offset For example, MIPS64 has 32 64-bit GPRs, which is a larger There are various different for loads and stores of than typical number of GPRs for current ISAs. MIPS64 various sizes of data. The address is formed by adding the requires an array of 32 64 one-bit cells, that is, × -extension of the 16-bit offset and the address in GPR 211 = 2048 bits, or 256 bytes. base. rt is the source register for a store and and the Currently, L1 caches are 32 kB or larger—much, much bigger destination register for a load. than 256 bytes. The addressing mode for memory is displacement. So why are ISAs with large number of GPRs—say, 64, or 256, What are some advantages and disadvantages of offering only or 1024—quite uncommon? displacement mode for loads and stores?

ENCM 501 W14 Slides for Lecture 6 slide 21/33 ENCM 501 W14 Slides for Lecture 6 slide 22/33 Load and store word examples in ARM7TDMI Warning: The details are quite complex, so I possibly have some of them wrong! Mistakes or not, the contrast with MIPS Here is one of many formats for instructions to load or store is striking. 32-bit words: 31 28 27 16 150 12 1120192423 5 43 Various other ARM load and store formats allow every cond 0111 001 Rn Rd Rm0 addressing mode in textbook Figure A.6—except memory indirect—and some interesting combinations of those modes. The above pattern is for load. Change bit 20 to 0 for store. What advantages are there to the huge variety of ARM load Rd gives the destination GPR for load, and source GPR for and store formats, compared to the distinct lack of variety in store. The memory address is computed using two GPRs, Rn MIPS load and store formats? What disadvantages might and Rm, plus, in a complicated way, constants encoded in there be? bits 23 and 11–5. Essentially, this particular format allows numerous variations of Note: Every ARM instruction starts with a 4-bit cond field. scaled addressing mode. We’ll get to that soon.

ENCM 501 W14 Slides for Lecture 6 slide 23/33 ENCM 501 W14 Slides for Lecture 6 slide 24/33 Instructions for control flow Target instructions and target addresses

A useful term related to control flow is target instruction, As discussed in textbook Section A.6, this category includes which is I conditional branches I in the case of conditional branch, the first instruction I jumps executed after a branch is taken—a branch is taken or not I procedure calls taken depending on whether some condition is true;

I procedure returns I in the cases of jumps, calls, and returns, the first In general, these are instructions that might (conditional instruction executed as a result of a jump, call, or return branch) or will (the others) cause a special update to the instruction. PC ( register). The target address is simply the address of the target instruction. ENCM 501 W14 Slides for Lecture 6 slide 25/33 ENCM 501 W14 Slides for Lecture 6 slide 26/33 Addressing modes for control flow instructions Conditional branch options Most ISAs make branch decisions based on a few bits called flag bits or condition code bits that sit within some kind of Addressing modes for control flow instructions are essentially processor status register. just methods for generating target addresses. Let’s look at this for a simple C example, in which j and k are For branches, jumps, and calls, the most common addressing int variables in registers: mode is PC-relative, in which an offset is extracted from the instruction and added to the current PC value. if (i < k) goto L1; In MIPS and ARM the offsets in PC-relative instructions are x86-64 translation, assuming i in %eax, k in %edx: numbers of instructions, but in x86 and x86-64 the offset is a cmpl %edx, %eax # compare registers number of bytes. Why is there a difference here? jl L1 # branch based on N and V flags Why would PC-relative addressing not work in procedure return instructions? jl means “jump if less than.” (Note: In reality the assembly language label almost certainly won’t be the same as the C label L1.)

ENCM 501 W14 Slides for Lecture 6 slide 27/33 ENCM 501 W14 Slides for Lecture 6 slide 28/33 Conditional instructions in ARM

For the same C code, here is an ARM translation, assuming Recall from Assignment 1 that MIPS offers the conditional i in r0, k in r1: move instructions MOVN and MOVZ. (MIPS also has some similar floating-point conditional move instructions). CMP r0, r1 ; compare registers ARM takes this idea to the extreme—every ARM instruction BLT L1 ; branch based on N and V flags is conditional! Bits 31–28 of an ARM instruction are the so-called cond field, which specifies that the instruction either MIPS is unusual—the comparison result goes into a GPR. performs some action or is a no-op, depending on some Suppose we have i in R4, k in R5 ... condition on zero or more of the N, Z, V and C flags. SLT R8, R4, R5 # R8 = (R4 < R5) Example ARM cond field patterns: BNE R8, R0, L1 # branch if R8 != 0 I 1110, for ALWAYS. The instruction is never a no-op. This is the default cond field in ARM assembly language.

I 0000, for EQUAL. Execute the instruction if and only if the Z flag is 1.

ENCM 501 W14 Slides for Lecture 6 slide 29/33 ENCM 501 W14 Slides for Lecture 6 slide 30/33

The power of ARM conditional instructions is illustrated by this example . . . Here is some C code: if (i == 33 || i == 63) count++; Acknowledgment: If i and count are ints in ARM registers r0 and r1, here is ARM assembly language for the C code: Example on previous slide adapted from an example on pages 129–130 of Hohl, W., ARM Assembly Language: TEQ r0, #33 ; # indicates immediate mode Fundamentals and Techniques, c 2009, ARM (UK), published

TEQNE r0, #63 by CRC Press. ADDEQ r1, #1, #1

The cond field for the first instruction is 1110, for “always”. For the second instruction, it’s 0001, for “do it only if the Z flag is 0”, and for the third, it’s 0000, for “do it only if the Z flag is 1”. ENCM 501 W14 Slides for Lecture 6 slide 31/33 ENCM 501 W14 Slides for Lecture 6 slide 32/33 MIPS versus ARM: Vague arguments MIPS versus ARM: How to be quantitative

A fair and thorough study would require at least: CPU time = IC CPI clock period × × I real applications that are reasonably good fits for both ISAs;

MIPS attacks CPI by making instructions very simple and easy I the best possible for each of the ISAs; to pipeline. I processors fabricated with the same transistor and ARM tries to be close to MIPS with respect to CPI, and is interconnect technology, and very similar die sizes. much better than older CISC ISAs for CPI. ARM attacks IC by doing things in one instruction that might sometimes take two Even then, it might not be a truly fair fight between ISAs, if or three MIPS instructions. one side has better digital designers than the other.

ENCM 501 W14 Slides for Lecture 6 slide 33/33 Upcoming Topics

I The memory hierarchy Related reading in Hennessy & Patterson: Appendix B