Instruction Sets

Computer Organization and Architecture What is an Instruction Set? Chapter 10 • The complete collection of instructions that Instruction Set Characteristics and Functions are understood by a CPU • Can be considered as a functional spec for a CPU — Implementing the CPU in large part is implementing the machine instruction set • Machine Code is rarely used by humans — Binary numbers / bits — Machine code is usually represented by human readable assembly codes — In general, one assembler instruction equals one machine instruction Elements of an Instruction Operands • Operation code (Op code) • Main memory (or virtual memory or cache) — Do this — Requires address • Source Operand reference • CPU register — To this — Can be an implicit reference (e.g., x87 FADD) or • Result Operand reference explicit operands (add eax, ecx) — Put the result here • I/O device • Next Instruction Reference — Several forms: – Specify I/O module and device — When you have done that, do this... – Specify address in I/O space — Next instruction reference often implicit (sequential – Memory-mapped I/O just another memory address execution) Instruction Cycle State Diagram Instruction Representation • In machine code each instruction has a unique bit pattern • For human consumption (well, programmers anyway) a symbolic representation is used — e.g. ADD, SUB, LOAD • Operands can also be represented in this way — ADD A,B 1 Simple Instruction Format Instruction Types • Data processing — Arithmetic and logical instructions • Data storage (main memory) • Data movement (I/O) • Program flow control — Conditional and unconditional branches — Call and Return Design Issues Number of Addresses (a) • These issues are still in dispute • 3 addresses • Operation Repertoire — Operand 1, Operand 2, Result — How many operations and how complex should they — a = b + c; be? Few operations = simple silicon; many ops — add ax, bx, cx makes it easier to program — May be a fourth address - next instruction (usually • Data types implicit)[not common] • Instruction Format and Encoding — Fixed or variable length? How many bits? How many addresses? This has some bearing on data types • 3 address format rarely used • Registers — Instructions are long because 3 or more operands have to be specified — How many can be addressed and how are they used? • Addressing Modes — Many of the same issues as Operation Repertoire Number of Addresses (b) Number of Addresses (c) • 2 addresses • 1 address — One address doubles as operand and result — Implicit second address — a = a + b — Usually a register (accumulator) — add ax, bx — Common on early machines — Reduces length of instruction over 3-address format • Used in some Intel x86 instructions with — Requires some extra work by processor implied operands – Temporary storage to hold some results — mul ax — idiv ebx 2 Number of Addresses (d) Computation of Y = (a-b) / (c + (d * e)) • 0 (zero) addresses • Three address instructions — All addresses implicit sub y,a,b — Uses a stack. X87 example c = a + b: mul t,d,e add t,t,c fld a ;push a div y,y,t fld b ;push b fadd ;st(1) <- a+b, pop stack fstp c ;store and pop c • Two address instructions mov y,a — Can reduce to 3 instructions: sub y,b fld a ;push a mov t,d fld b ;push b mul t,e faddp c ;add and pop add t,c div y,t Computation of Y = (a-b) / (c + (d * e)) How Many Addresses? • One address instructions • More addresses load d — More complex instructions mul e — More registers add c – Inter-register operations are quicker store y — Fewer instructions per program load a — More complexity in processor sub b • Fewer addresses div y — Less complex instructions store y — More instructions per program — Faster fetch/execution of instructions — Less complexity in processor — One address format however limits you to one register Design Decisions (1) Design Decisions (2) • Operation repertoire • Registers — How many ops? — Number of CPU registers available — What can they do? — Which operations can be performed on which — How complex are they? registers? • Data types • Addressing modes (later…) • Instruction formats — Length of op code field • RISC v CISC — Number of addresses 3 Types of Operands Pentium (32 bit) Data Types • Addresses • 8 bit Byte (unsigned or signed) • Numbers • 16 bit word (unsigned or signed) — Integer/floating point • 32 bit double word (unsigned or signed) — Binary / BCD integer representations • 64 bit quad word (unsigned or signed) • Characters — multiplication and division only in the integer processor plus a few obscure instructions such as — ASCII etc. cmpxchg8b • Logical Data • Integer unit has instruction set support for both — Bits or flags packed and unpacked BCD • (Aside: Is there any difference between numbers and characters? • Memory is byte-addressable Ask a C programmer!) • But dwords should be aligned on 4-byte address boundaries to avoid multiple bus cycles for single operands Pentium FPU data types Pentium Numeric Data Formats • Pentium FPU (x87) supports IEEE 32,64, and 80 bit floats • All numbers internally are 80 bit floats • Can load from and store to integer formats (signed) in word, dword, and qword formats • Can load from and store to a tenbyte packed BCD format (18 digits + one byte for sign) Pentium Byte Strings Specific Data Types • X86 processors have a set of 5 instructions • General - arbitrary binary contents called “string” instructions that can manipulate • Integer - single binary value 32 blocks of memory up to 2 -1 bytes in length • Ordinal - unsigned integer • Blocks of memory can be manipulated as bytes, • Unpacked BCD - One digit per byte words, or dwords • Packed BCD - 2 BCD digits per byte • Operations: • Near Pointer - 32 bit offset within segment — CMPS: mem-to-mem compare — MOVS: mem-to-mem copy • Bit field — SCAS: scan memory for match to value in • Byte String accumulator • Floating Point — STOS: store accumulator to memory — LODS: load accumulator from memory 4 ARM Data Types ARM Data Types • ARM Processors support • ARM processors support both signed 2’s — Byte 8 bits complement and unsigned interpretations of — Halfword 16 bits integers — Word 32 bits • Most implementations do not provide floating • Word access should be word aligned (memory point hardware address is divisible by 4) and halfword access should be halfword aligned — FP arithmetic can be implemented in software • Three alternatives for unaligned access — ARM specs also support an optional FP co-processor with IEEE 754 singles and doubles — Default (truncate address; set l.o. bits to 0) — Alignment check: fault on unaligned access — Unaligned access: allow unaligned access transparently to the programmer; may require more than one memory access to load the operand Endian Support Types of Operations • The system control register E-bit can be set • Data Transfer and cleared using the SETEND instruction • Arithmetic • Logical • Conversion • I/O • System Control • Transfer of Control Data Transfer Example IBM EAS/390 Operations • Specify — Source — Destination — Amount of data • May be different instructions for different movements source, destination, size — e.g. IBM mainframes • Or one instruction and different addresses — e.g. VAX, Pentium (MOV) 5 Types of data transfer operations Arithmetic Operations Comparison is usually considered an arithmetic operation – subtraction without storage of results Shift and Rotate Operations Logical Operations Translation and Conversion Input/Output • Conversion: convert one representation to • May be specific instructions (Pentium IN and another. Ex: x87 load and store instructions OUT) perform conversions from any numeric format • May be done using data movement instructions to IEEE and vice versa (memory mapped) • Translate: table based translation • May be done by a separate controller (DMA) — S/390 translate instruction TR R1, R2, L – R2 has address of a table of 8-bit codes – L bytes at addr R1 are replaced by byte at table entry in R2 indexed by that byte – Typically used for character code conversions — Intel XLATE instruction has implied operands – Bx points to the table – AL contains the index and is replaced by BX[al] 6 Systems Control Transfer of Control • Privileged instructions • Branch (JMP in x86) • CPU needs to be in specific state — Unconditional == goto — Ring 0 on 80386+ — Conditional — Kernel mode – Can be based on flags (condition codes) or other tests – 3 address format can have instructions such as • For operating systems use bre r1,r2, dest ;branch to dest if r1=r2 • Examples: Pentium • Skip instructions — LGDTR (Load Global Descriptor Table Register) — Skip the next instruction if condition is true — LAR (Load Access Rights) — Implied address — MOV to/from Control Register — May include other ops ISZ R1 ; increment R1 and skip if zero — Not present in x86 Branch Instruction Procedure (Function) Calls • Two instructions: CALL and RETURN • Most machines use a stack mechanism: — CALL pushes return address on stack — RETURN pops stack into program counter • Other approaches: — Store return address in special register — Store return address at start of procedure — Berkeley RISC: register windows • These approaches all suffer from problems with re-entrancy Software Interrupts Use of Stack • Many architectures provide a software interrupt — Intel x86 provides for 224 possible software interrupts • A software interrupt is an instruction, not an interrupt at all • Effect is similar to a CALL instruction • Return address and other data are pushed on the stack • Address to which transfer is passed is located using the same

Instruction Sets

Fill Your Boots: Enhanced Embedded Bootloader Exploits Via Fault Injection and Binary Analysis

Computer Organization and Architecture Designing for Performance Ninth Edition

ARM Instruction Set

3.2 the CORDIC Algorithm

Consider an Instruction Cycle Consisting of Fetch, Operators Fetch (Immediate/Direct/Indirect), Execute and Interrupt Cycles

Akukwe Michael Lotachukwu 18/Sci01/014 Computer Science

Parallel Computing

Instruction Cycle

VLIW Architectures Lisa Wu, Krste Asanovic

18-741 Advanced Computer Architecture Lecture 1: Intro And

Introduction to Cpu

Fpga-Based Digital Phase-Locked Loop Analysis and Implementation