Chapter 2 Instruction Set Principles and Examples

EEF011 Computer Architecture 計算機結構 Chapter 2 Instruction Set Principles and Examples 吳俊興高雄大學資訊工程學系 October 2004 Chapter 2. Instruction Set Principles and Examples 2.1 Introduction 2.2 Classifying Instruction Set Architectures 2.3 Memory Addressing 2.4 Addressing Modes for Signal Processing 2.5 Type and Size of Operands 2.6 Operands for Media and Signal Processing 2.7 Operations in the Instruction Set 2.8 Operations for Media and Signal Processing 2.9 Instructions for Control Flow 2.10 Encoding an Instruction Set 2 2.1 Introduction Instruction Set Architecture – the portion of the machine visible to the assembly level programmer or to the compiler writer – In order to use the hardware of a computer, we must speak its language – The words of a computer language are called instructions, and its vocabulary is called an instruction set Instr. #Operation+Operands software i movl-4(%ebp), %eax (i+1) addl%eax, (%edx) instruction set (i+2) cmpl8(%ebp), %eax (i+3) jlL5 hardware : L5: 3 Topics 1.A taxonomy of instruction set alternatives and qualitative assessment 2.Instruction set quantitative measurements 3.Specific instruction set architecture 4.Issues and bearings of languages and compilers 5.Examples: MIPS and TrimediaTM32 CPU Appendices C-F: MIPS, PowerPC, Precision Architecture, SPARC ARM, Hitachi SH, MIPS 16, Thumb 80x86 (App. D),IBM 360/370 (App. E), VAX (App. F) 4 2.2 Classifying Instruction Set Architectures Operand storage in CPU Where are they other than memory # explicit operands How many? Min, Max, Average named per instruction Addressing mode How the effective address for an operand calculated? Can all use any mode? Operations What are the options for the opcode? Type & size of operands How is typing done? How is the size specified? These choices critically affect number of instructions, CPI, and CPU cycle time 5 ISA Classification • Most basic differentiation: internal storage in a processor – Operands may be named explicitly or implicitly • Major choices: 1.In an accumulator architecture one operand is implicitly the accumulator => similar to calculator 2.The operands in a stack architecture are implicitly on the top of the stack 3.The general-purpose register architectures have only explicit operands – either registers or memory location 6 Basic ISA Classes Explicit Operand ISA Type Examples operands perResult access ALU inst. Destination Method Stack B5500, B6500 0 Stack Push & Pop Stack HP 3000/70 Accumulator Motorola 6809 1 Accumulator Acc = Acc + mem[A] + ancient ones Register Set IBM 360 2 or 3 Registers Rx = Ry+ mem[A] DEC VAX or Rx = Rx + Ry(2) + all modern Memory Rx = Rx + Rz(3) micro’s Register-register, register-memory, and memory-memory (gone) options 7 Example Stack: 0 addressaddtos ¬ tos+ next Accumulator: 1 addressadd Aacc ¬ acc + mem[A] General Purpose Register (register-memory): 1 addressadd R1 AR1 ¬ R1 + mem[A] ALU Instructions can GPR (register-register or called load/store): have two operands. 0 addressload R1, AR1 ¬ mem[A] load R2, BR2 ¬ mem[B] add R3, R1, R2R3 ¬ R1+R2 ALU Instructions can have three operands. 8 Operand Locations and Code Sequence for C=A+B GPR GPR Stack Accumulator (register-memory) (load-store) Push A Load A Load R1, A Load R1, A Push B Add B Add R1, B Load R2, B Add Store C Store C, R1 Add R3, R1, R2 Pop C Store C, R3 9 Pro’s and Con’s ISA Type Advantages Disadvantages Stack • Simple effective address • Lack of random access • Short instructions • Efficient code is difficult to • Good code density generate • Stack is often a bottleneck Accumulator • Minimal internal state • Very high memory traffic • Fast context switch • Short instructions Register • Registers are faster than memory • Longer instructions • Registers can be used to hold • Possibly complex effective variables address generation +reduce memory traffic • Size and structure of register set +speed up programs has many options • Registers are more efficient for a compiler to use than other forms of internal storage Register is the class that won out! 10 Register Machines • How many registers are sufficient? • General-purpose registers vs. special-purpose registers • compiler flexibility and hand-optimization • Two major concerns for arithmetic and logical instructions (ALU) 1. Two or three operands X + Y Þ X X + Y Þ Z 2. How many of the operands may be memory addresses (0 – 3) Number of Max number of Examples memory addresses operands allowed Alpha, ARM, MIPS, PowerPC, Sparc, SuperH, Trimedia 0 3 TM5200 1 2 IBM 360/370, Intel 80x86, Motorola 68000, TI TMS320C54x 2 2 VAX, PDP-1, National 32x32, IBM 360SS 3 3 VAX Hence, register classification (# mem, # operands) 11 (0, 3): Register-Register ALU is Register to Register – also known as pureReduced Instruction Set Computer (RISC) oAdvantages – simple fixed length instruction encoding – decode is simple since instruction types are small – simple code generation model – instruction CPI tends to be very uniform • except for memory instructions of course • but there are only 2 of them -load and store oDisadvantages – instruction count tends to be higher – some instructions are short -wasting instruction word bits 12 (1, 2): Register-Memory Evolved RISC and also old CISC • new RISC machines capable of doing speculative loads • predicated and/or deferred loads are also possible oAdvantages – data access to ALU immediate without loading first – instruction format is relatively simpleto encode – code density is improved over Register (0, 3) model oDisadvantages – operands are not equivalent -source operand may be destroyed – need for memory address field may limit # of registers – CPI will vary • if memory target is in L0 cache then not so bad • if not -life gets miserable 13 (2, 2) or (3, 3): Memory-Memory True and most complex CISC model • currently extinct and likely to remain so • more complex memory actions are likely to appear but not directly linked to the ALU oAdvantages – most compact code – doesn’t waste registers for temporary values • good idea for use once data -e.g. streaming media oDisadvantages – large variation in instruction size -may need a shoe-horn – large variation in CPI -i.e. work per instruction – exacerbates the infamous memory bottleneck • register file reduces memory accesses if reused Not used today 14 2.3 Memory Addressing Interpreting Memory Addresses • In today’s machine, objects have byte addresses – an address refers to the number of bytes counted from the beginning of memory • Object Length: Provides access for bytes (8 bits), half words (16 bits), words (32 bits), and double words (64 bits). The type is implied in opcode(e.g., LDB – load byte; LDW – load word; etc.) • Byte Ordering – Little Endian: puts the byte whose address is xx00 at the least significant position in the word. (7,6,5,4,3,2,1,0) – Big Endian: puts the byte whose address is xx00 at the most significant position in the word. (0,1,2,3,4,5,6,7) • Problem occurs when exchanging data among machines with different orderings 15 Interpreting Memory Addresses • Alignment Issues – Accesses to objects larger than a byte must be aligned. An access to an object of size s bytes at byte address A is aligned if A mod s = 0. § Misalignment causes hardware complications, since the memory is typically aligned on a word or a double-word boundary § Misalignment typically results in an alignment fault that must be handled by the OS – Hence • byte address is anything -never misaligned • half word -even addresses -low order address bit = 0 ( XXXXXXX0) else trap • word -low order 2 address bits = 0 ( XXXXXX00) else trap • double word -low order 3 address bits = 0 (XXXXX000) else trap 16 Figure 2.5 17 Addressing Modes How do architectures specify the addr. of an object they will access? v Effective address: the actual memory address specified by the addressing mode. v “->” is for assignment. Mem[R[R1]] refers to the contents of the memory location whose location is given the contents of register 1 (R1). 18 Figure 2.7 Summary of use of memory addressing modes Based on a VAX which supported everything – from SPEC89 19 Displacement Addressing Mode How big should the displacement be? Figure 2.8 Displacement values are widely distributed 20 Displacement Addressing Mode (cont.) • Benchmarks show 12 bits of displacement would capture about 75% of the full 32-bit displacements and 16 bits should capture about 99% • Remember: optimize for the common case. Hence, the choice is at least 12-16 bits Ø For addresses that do fit in displacement size: Add R4, 10000 (R0) Ø For addresses that don’t fit in displacement size, the compiler must do the following: Load R1, 1000000 AddR1, R0 Add R4, 0 (R1) 21 Immediate Addressing Mode • Used where we want to get to a numerical value in an instruction • Around 20% of the operations have an immediate operand At high level: At Assembler level: a = b + 3; Load R2, 3 Add R0, R1, R2 if ( a > 17 ) Load R2, 17 CMPBGT R1, R2 gotoAddr Load R1, Address Jump (R1) 22 Immediate Addressing Mode How frequent for immediates? Figure 2.9 About one-quarter of data transfers and ALU operations have an immediate operand 23 Immediate Addressing Mode How big for immediates? Figure 2.10 Benchmarks show that 50%-70% of the immediatesfit within 8 bits and 75%-80% fit within 16 bits 24 2.4 Addressing Modes for Signal Processing Two addressing modes that distinguish DSPs 1.Modulo or circular addressing mode –autoincrement/autodecrementto support circular buffers •As data are added, a pointer is checked to see if it is pointingto the end of the buffer –If not, the pointer is incremented to the next

Chapter 2 Instruction Set Principles and Examples

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support