Computer Organization and Architecture What is an Instruction Set?

Chapter 10 • The complete collection of instructions that Instruction Set Characteristics and Functions are understood by a CPU • Can be considered as a functional spec for a CPU — Implementing the CPU in large part is implementing the machine instruction set • Machine Code is rarely used by humans — Binary numbers / bits — Machine code is usually represented by human readable assembly codes — In general, one assembler instruction equals one machine instruction

Elements of an Instruction Operands

• Operation code (Op code) • Main memory (or or ) — Do this — Requires address • Source Operand reference • CPU register — To this — Can be an implicit reference (e.g., x87 FADD) or • Result Operand reference explicit operands (add eax, ecx) — Put the result here • I/O device • Next Instruction Reference — Several forms: – Specify I/O module and device — When you have done that, do this... – Specify address in I/O space — Next instruction reference often implicit (sequential – Memory-mapped I/O just another memory address execution)

Instruction Cycle State Diagram Instruction Representation

• In machine code each instruction has a unique bit pattern • For human consumption (well, programmers anyway) a symbolic representation is used — e.g. ADD, SUB, LOAD • Operands can also be represented in this way — ADD A,B

1 Simple Instruction Format Instruction Types

• Data processing — Arithmetic and logical instructions • Data storage (main memory) • Data movement (I/O) • Program flow control — Conditional and unconditional branches — Call and Return

Design Issues Number of Addresses (a)

• These issues are still in dispute • 3 addresses • Operation Repertoire — Operand 1, Operand 2, Result — How many operations and how complex should they — a = b + c; be? Few operations = simple silicon; many ops — add ax, bx, cx makes it easier to program — May be a fourth address - next instruction (usually • Data types implicit)[not common] • Instruction Format and Encoding — Fixed or variable length? How many bits? How many addresses? This has some bearing on data types • 3 address format rarely used • Registers — Instructions are long because 3 or more operands have to be specified — How many can be addressed and how are they used? • Addressing Modes — Many of the same issues as Operation Repertoire

Number of Addresses (b) Number of Addresses (c)

• 2 addresses • 1 address — One address doubles as operand and result — Implicit second address — a = a + b — Usually a register (accumulator) — add ax, bx — Common on early machines — Reduces length of instruction over 3-address format • Used in some Intel instructions with — Requires some extra work by implied operands – Temporary storage to hold some results — mul ax — idiv ebx

2 Number of Addresses (d) Computation of Y = (a-b) / (c + (d * e))

• 0 (zero) addresses • Three address instructions — All addresses implicit sub y,a,b — Uses a stack. X87 example c = a + b: mul t,d,e add t,t,c fld a ;push a div y,y,t fld b ;push b fadd ;st(1) <- a+b, pop stack fstp c ;store and pop c • Two address instructions mov y,a — Can reduce to 3 instructions: sub y,b fld a ;push a mov t,d fld b ;push b mul t,e faddp c ;add and pop add t,c div y,t

Computation of Y = (a-b) / (c + (d * e)) How Many Addresses?

• One address instructions • More addresses load d — More complex instructions mul e — More registers add c – Inter-register operations are quicker store y — Fewer instructions per program load a — More complexity in processor sub b • Fewer addresses div y — Less complex instructions store y — More instructions per program — Faster fetch/execution of instructions — Less complexity in processor — One address format however limits you to one register

Design Decisions (1) Design Decisions (2)

• Operation repertoire • Registers — How many ops? — Number of CPU registers available — What can they do? — Which operations can be performed on which — How complex are they? registers? • Data types • Addressing modes (later…) • Instruction formats — Length of op code field • RISC v CISC — Number of addresses

3 Types of Operands Pentium (32 bit) Data Types

• Addresses • 8 bit Byte (unsigned or signed) • Numbers • 16 bit word (unsigned or signed) — Integer/floating point • 32 bit double word (unsigned or signed) — Binary / BCD integer representations • 64 bit quad word (unsigned or signed) • Characters — multiplication and division only in the integer processor plus a few obscure instructions such as — ASCII etc. cmpxchg8b • Logical Data • Integer unit has instruction set support for both — Bits or flags packed and unpacked BCD • (Aside: Is there any difference between numbers and characters? • Memory is byte-addressable Ask a C programmer!) • But dwords should be aligned on 4-byte address boundaries to avoid multiple cycles for single operands

Pentium FPU data types Pentium Numeric Data Formats

• Pentium FPU (x87) supports IEEE 32,64, and 80 bit floats • All numbers internally are 80 bit floats • Can load from and store to integer formats (signed) in word, dword, and qword formats • Can load from and store to a tenbyte packed BCD format (18 digits + one byte for sign)

Pentium Byte Strings Specific Data Types

• X86 processors have a set of 5 instructions • General - arbitrary binary contents called “string” instructions that can manipulate • Integer - single binary value 32 blocks of memory up to 2 -1 bytes in length • Ordinal - unsigned integer • Blocks of memory can be manipulated as bytes, • Unpacked BCD - One digit per byte words, or dwords • Packed BCD - 2 BCD digits per byte • Operations: • Near Pointer - 32 bit offset within segment — CMPS: mem-to-mem compare — MOVS: mem-to-mem copy • Bit field — SCAS: scan memory for match to value in • Byte String accumulator • Floating Point — STOS: store accumulator to memory — LODS: load accumulator from memory

4 ARM Data Types ARM Data Types

• ARM Processors support • ARM processors support both signed 2’s — Byte 8 bits complement and unsigned interpretations of — Halfword 16 bits integers — Word 32 bits • Most implementations do not provide floating • Word access should be word aligned (memory point hardware address is divisible by 4) and halfword access should be halfword aligned — FP arithmetic can be implemented in • Three alternatives for unaligned access — ARM specs also support an optional FP co-processor with IEEE 754 singles and doubles — Default (truncate address; set l.o. bits to 0) — Alignment check: fault on unaligned access — Unaligned access: allow unaligned access transparently to the programmer; may require more than one memory access to load the operand

Endian Support Types of Operations

• The system control register E-bit can be set • Data Transfer and cleared using the SETEND instruction • Arithmetic • Logical • Conversion • I/O • System Control • Transfer of Control

Data Transfer Example IBM EAS/390 Operations

• Specify — Source — Destination — Amount of data • May be different instructions for different movements source, destination, size — e.g. IBM mainframes • Or one instruction and different addresses — e.g. VAX, Pentium (MOV)

5 Types of data transfer operations Arithmetic Operations

Comparison is usually considered an arithmetic operation – subtraction without storage of results

Shift and Rotate Operations Logical Operations

Translation and Conversion Input/Output

• Conversion: convert one representation to • May be specific instructions (Pentium IN and another. Ex: x87 load and store instructions OUT) perform conversions from any numeric format • May be done using data movement instructions to IEEE and vice versa (memory mapped) • Translate: table based translation • May be done by a separate controller (DMA) — S/390 translate instruction TR R1, R2, L – R2 has address of a table of 8-bit codes – L bytes at addr R1 are replaced by byte at table entry in R2 indexed by that byte – Typically used for character code conversions — Intel XLATE instruction has implied operands – Bx points to the table – AL contains the index and is replaced by BX[al]

6 Systems Control Transfer of Control

• Privileged instructions • Branch (JMP in x86) • CPU needs to be in specific state — Unconditional == goto — Ring 0 on 80386+ — Conditional — Kernel mode – Can be based on flags (condition codes) or other tests – 3 address format can have instructions such as • For operating systems use bre r1,r2, dest ;branch to dest if r1=r2 • Examples: Pentium • Skip instructions — LGDTR (Load Global Descriptor Table Register) — Skip the next instruction if condition is true — LAR (Load Access Rights) — Implied address — MOV to/from Control Register — May include other ops ISZ R1 ; increment R1 and skip if zero — Not present in x86

Branch Instruction Procedure (Function) Calls

• Two instructions: CALL and RETURN • Most machines use a stack mechanism: — CALL pushes return address on stack — RETURN pops stack into program • Other approaches: — Store return address in special register — Store return address at start of procedure — Berkeley RISC: register windows • These approaches all suffer from problems with re-entrancy

Software Interrupts Use of Stack

• Many architectures provide a software interrupt — Intel x86 provides for 224 possible software interrupts • A software interrupt is an instruction, not an interrupt at all • Effect is similar to a CALL instruction • Return address and other data are pushed on the stack • Address to which transfer is passed is located using the same vectoring mechanism that hardware interrupts use • Intel x86 provides the IRET instruction for executing a RETURN from an interrupt

7 Nested Procedure Calls Procedures and Functions

• Most high-level languages make a syntactic distinction between Procedures (no return value) and Functions (returns a value as a result of the call) • At the machine level this distinction does not exist • We have only a CALL with RETURN to the instruction after the call

Stack Frames and Parameters Stack Frame Example

• Stacks provide a very flexible mechanism for parameter passing • To call a procedure, first push parameters on the stack • Issue the CALL instruction • The procedure may also use the stack for storage of local (volatile) variables • The entire structure (parameters, frame pointer, return address, local storage) is called a Stack Frame

Frame pointer Other Specialized Instructions

• Intel architecture uses a semi-dedicated • Many architectures provide specialized register as a frame pointer instructions designed for use • EBP (or BP) is SS-relative (same as ESP or SP) • Intel x86 provides synchronization • All other address registers are DS relative instructions XADD, CMPXCHG, CMPXCHG8B except for EIP (CS relative) • Also bus control/synch operations ESC, LOCK, HALT, WAIT

8 Example: XADD Examples: HLT, WAIT, LOCK

MMX Instruction Set MMX Registers

• MMX (Multi-Media eXtensions) • 8 64-bit registers mm0 – mm1 • Optimized instruction set for multimedia tasks • Unfortunately these were aliased to the X87 • Instructions are SIMD (Single-Issue, Multiple ST(0) – ST(7) Data) • Cannot intermingle floating point instructions • Allow parallel operations on 2-8 operands in and MMX instructions one clock cycle • Pentium III added SSE (Streaming SIMD Extensions) along with 8 128-bit registers xmm0 – xmm7

Saturation Arithmetic Saturation arithmetic 2

• When 2’s complement or unsigned operations • Because 2’s complement and unsigned are performed, normal arithmetic results in representations have different min and max “wrap-around” with resulting overflow values, the instruction set provides signed and — e.g., in 8 bits 125+4 = -127 unsigned instructions for saturated arithmetic • In saturation arithmetic results are clamped to • Audio uses signed arithmetic (positive and min and max values of the representation negative amplitudes) — 125+4 = 127 • Color processing uses unsigned arithmetic — 125+125 = 127 (e.g., 32-bit color R,G,B, Alpha channels range • This is useful in , color from 0-255) arithmetic and other multimedia applications

9 MMX Instruction Set Overview MMX Instruction Set Overview

Saturation Arithmetic Example: Fade Video Fade

• Video fade-out, fade-in effect • In the MMX code sequence the 8 bit pixels are — One scene gradually dissolves into another converted to 16-bit elements to accommodate • Two images are combined, pixel-by-pixel, using the 16-bit multiplication capability a weighted average: • Assuming 640x480 resolution, and use of all 255 fade values the total number of instructions Result_pixel = pixelA * fade + pixelB * (1 –fade) executed by the mmx code is ~525,000,000 • Same calculation without MMX code requires • A series of frames is produced, gradually ~1,400,000,000 instructions changing the fade value from 0 to 1 (scaled to 8-bit integers)

Video Fade Algorithm MMX Code

10 ARM Operation Types Load and Store

• Principal categories • ARM Load and Store are used to access memory — Load and Store • All arithmetic and logical operations are — Branch performed on registers; may use immediate — Data Processing values but not memory operands — Multiply • Two broad types of load / store: — Parallel Addition and Subtraction — Load/store a 32-bit word or an 8-bit unsigned byte — Access — Load/store a 16-bit unsigned halfword or load and sign extend a 16-bit halfword or an 8-bit byte

Branch Data Processing and Multiply

• Unconditional branch forwards or backwords up • Includes the usual add, subtract, test, compare to 32MB and Boolean AND, OR, NOT instructions • The R15 is addressable as a — Add/subtract also have saturated versions register so writing a value to R15 executes a • Integer multiply instructions operate on halfword branch or word operands and can produce results with • Branch with Link (BL) can be used for function upper part discarded or long results (2N bits) calls. • There are some complex multiply and accumulate — It preserves the return address in the Link Register instructions (LR or R14) — multiply two registers and add the contents of a third • Conditional branches are determined by a 4-bit to the result condition code in the instruction (discussed • Note that there is no divide instruction! below)

Parallel Addition and Subtraction Condition Codes

• Similar to MMX, can use two halfword operands • ARM defines four condition flags that are stored or four byte operands in the program status register: • Example ADD16 adds halfwords of two registers N Negative Z Zero separately for two halfword results in a third C Carry V Overflow • Other operations include Halfword-wise exchange, add, subtract; unsigned sum of • Equivalent to x86 S, Z, C, V flags absolute differences • Conditions test various combinations of these flags

11 Conditional Execution Conditional Tests

• ALL instructions have a 4-bit condition code field Code is condition code field in instruction so virtually all can be conditionally executed • Any combination of bits except 1110 or 1111 designates conditional execution • All arithmetic and logical instructions have an S- bit that indicates whether flags are to be updated or not • Conditional execution is very useful for pipelined processors — Avoids need to flush pipeline when a branch is taken

12