Computer System Architecture Instruction Set Principles And

Hot Topics in Computer System Architecture Computer Architecture • 1950s and 1960s: Instruction Set Principles and – Computer Arithmetic Examples • 1970 and 1980s: – Instruction Set Design – ISA Appropriate for Compilers Chalermek Intanagonwiwat • 1990s: – Design of CPU – Design of memory system Slides courtesy of Graham Kirby, Mike Schulte, and – Instruction Set Extensions Peiyi Tang Hot Topics in Instruction Set Computer Architecture (cont.) Architecture (ISA) • “Instruction set architecture is the • 2000s: structure of a computer that a machine – Computer Arithmetic – Design of I/O system language programmer must understand to – Parallelism write a correct (timing independent) program for that machine.” – Source: IBM in 1964 when introducing the IBM 360 architecture, which eliminated 7 different IBM instruction sets. ISA (cont.) ISA (cont.) • The instruction set architecture is also • The instruction set architecture the machine description that a hardware serves as the interface between software and hardware. designer must understand to design a correct implementation of the computer. • It provides the mechanism by which the software tells the hardware what should be done. ISA (cont.) ISA Metrics • Orthogonality High level language code : C, C++, Java, Fortan, – No special registers, few special cases, compiler all operand modes available with any Assembly language code: architecture specific statements data type or instruction type assembler Machine language code: architecture specific bit patterns • Completeness – Support for a wide range of operations and target applications software instruction set hardware ISA Metrics (cont.) Instruction Set Design Issues • Regularity • Instruction set design issues include: – No overloading for the meanings of – Where are operands stored? instruction fields • registers, memory, stack, accumulator •Streamlined – How many explicit operands are there? – Resource needs easily determined •0, 1, 2, or 3 • Ease of compilation (or assembly – How is the operand location specified? language programming) • register, immediate, indirect, . • Ease of implementation Instruction Set Design Issues Evolution of Instruction Sets (cont.) Single Accumulator (EDSAC 1950) – What type & size of operands are Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953) supported? • byte, int, float, double, string, vector. Separation of Programming Model from Implementation – What operations are supported? • add, sub, mul, move, compare . High-level Language Based Concept of a Family (B5000 1963) (IBM 360 1964) General Purpose Register Machines Complex Instruction Sets Load/Store Architecture (Vax, Intel 8086 1977-80) (CDC 6600, Cray 1 1963-76) RISC (Mips,Sparc,88000,IBM RS6000, . .1987+) Classifying ISAs Classifying ISAs (cont.) Accumulator (before 1960): Register-Memory (1970s to present): 1 address add A acc ← acc + mem[A] 2 address add R1, A R1 ← R1 + mem[A] load R1, A R1 ← mem[A] Stack (1960s to 1970s): Register-Register (Load/Store) 0 address add tos ← tos + next (1960s to present): Memory-Memory (1970s to 1980s): 3 address add R1, R2, R3 R1 ← R2 + R3 load R1, R2 R1 ← mem[R2] 2 address add A, B mem[A] ← mem[A] + mem[B] store R1, R2 mem[R1] ← R2 3 address add A, B, C mem[A] ← mem[B] + mem[C] Classifying ISAs (cont.) Accumulator Architectures • Instruction set: add A, sub A, mult A, div A, load A, store A • Example: A*B - (A+C*B) load B mul C add A B B*C A+B*C A+B*C A A*B result store D load A mul B sub D Stack Architectures Accumulators: Pros and Cons • Instruction set: add, sub, mul, div, . •Pros push A, pop A – Very low hardware requirements • Example: A*B - (A+C*B) – Easy to design and understand push A push B •Cons mul – Accumulator becomes the bottleneck push A A B A*B AC B B*CA+B*Cresult A A*B A C AA*B – Little ability for parallelism or pipelining push C A*B A A*B – High memory traffic push B A*B mul add sub Stacks: Pros and Cons Stacks: Pros and Cons (cont.) •Pros •Cons – Good code density (implicit top of stack) – Stack becomes the bottleneck – Low hardware requirements – Little ability for parallelism or pipelining – Easy to write a simpler compiler for stack – Data is not always at the top of stack when architectures need – Difficult to write an optimizing compiler for stack architectures Memory-Memory Architectures Memory-Memory: • Instruction set: Pros and Cons (3 operands) add A, B, C sub A, B, C •Pros mul A, B, C – Requires fewer instructions (2 operands) add A, B sub A, B mul A, B – Easy to write compilers for • Example: A*B - (A+C*B) – 3 operands 2 operands •Cons mul D, A, B mov D, A – Very high memory traffic mul E, C, B mul D, B – Variable number of clocks per instruction add E, A, E mov E, C – With two operands, more data movements are sub E, D, E mul E, B required add E, A sub E, D Register-Memory Architectures Memory-Register: • Instruction set: Pros and Cons add R1, A sub R1, A mul R1, B •Pros load R1, A store R1, A – Some data can be accessed without loading first • Example: A*B - (A+C*B) – Instruction format easy to encode load R1, A – Good code density mul R1, B /* A*B */ •Cons store R1, D – Operands are not equivalent (poor orthogonal) load R2, C – Variable number of clocks per instruction mul R2, B /* C*B */ – May limit number of registers add R2, A /* A + CB */ sub R2, D /* AB - (A + C*B) */ Register-Register/Load-Store Register-Register/Load-Store: Architectures Pros and Cons • Instruction set: •Pros add R1, R2, R3 sub R1, R2, R3 mul R1, R2, R3 – Simple, fixed length instruction encodings load R1, A store R1, A move R1, R2 – Instructions take similar number of cycles • Example: A*B - (A+C*B) – Relatively easy to pipeline load R1, A load R2, B •Cons load R3, C – Higher instruction count mul R7, R3, R2 /* C*B */ add R8, R7, R1 /* A + C*B */ – Dependent on good compiler mul R9, R1, R2 /* A*B */ sub R10, R9, R8 /* A*B - (A+C*B) */ Registers: Registers: Advantages and Disadvantages Advantages and Disadvantages (cont.) •Advantages • Disadvantages – Faster than cache or main memory (no – Need to save and restore on procedure calls addressing mode) and context switch – Deterministic (no misses) – Can’t take the address of a register (for – Can replicate (multiple read ports) pointers) – Short identifier (typically 3 to 8 bits) – Fixed size (can’t store strings or structures efficiently) – Reduce memory traffic – Compiler must manage –Limited number Current Trends: Computation Byte Ordering Model • Practically every modern design uses a • Little Endian (e.g., in DEC, Intel) load-store architecture » low order byte stored at lowest address » byte0 byte1 byte2 byte3 • For a new ISA design: • Big Endian (e.g., in IBM, Motorolla, Sun, HP) – would expect to see load-store with plenty » high order byte stored at lowest address of general purpose registers » byte3 byte2 byte1 byte0 • Programmers/protocols should be careful when transferring binary data between Big Endian and Little Endian machines Operand Alignment Unrestricted Alignment • An access to an operand of size s bytes • If the architecture does not restrict at byte address A is said to be aligned memory accesses to be aligned then if – Software is simple A mod s = 0 – Hardware must detect misalignment and make two memory accesses 40 41 42 43 44 – Expensive logic to perform detection D0 D1 D2 D3 – Can slow down all references D0 D1 D2 D3 – Sometimes required for backwards compatibility Types of Addressing Modes Restricted Alignment (VAX) • If the architecture restricts memory Addressing Mode Example Action accesses to be aligned then 1. Register direct Add R4, R3 R4 <- R4 + R3 – Software must guarantee alignment 2.Immediate Add R4, #3 R4 <- R4 + 3 3.Displacement Add R4, 100(R1) R4 <- R4 + M[100 + R1] – Hardware detects misalignment access and traps 4.Register indirect Add R4, (R1) R4 <- R4 + M[R1] 5.Indexed Add R4, (R1 + R2) R4 <- R4 + M[R1 + R2] – No extra time is spent when data is aligned 6.Direct Add R4, (1000) R4 <- R4 + M[1000] • Since we want to make the common case 7.Memory Indirect Add R4, @(R3) R4 <- R4 + M[M[R3]] fast, having restricted alignment is often a better choice, unless compatibility is an issue. Types of Addressing Modes Use of Addressing Modes (cont.) (VAX) 8.Autoincrement Add R4, (R2)+ R4 <- R4 + M[R2] R2 <- R2 + d 9.Autodecrement Add R4, (R2)- R4 <- R4 + M[R2] R2 <- R2 - d 10.Scaled Add R4, 100(R2)[R3] R4 <- R4 + M[100 + R2 + R3*d] • Studies by [Clark and Emer] indicate that modes 1-4 account for 93% of all operands on the VAX. • Displacement dominates the memory addressing mode (32% - 55%). • Immediate tails displacement (17% - 43%). Displacement Value Distribution Immediate Value Distribution Current Trends: Memory Types of Operations Addressing • Arithmetic and Logic: AND, ADD • For a new ISA design would expect to see: – support for displacement, immediate and register • Data Transfer: MOVE, LOAD, STORE indirect addressing modes • Control BRANCH, JUMP, CALL »75% to 99% of SPEC measurements • System OS CALL – size of address for displacement to be at least 12-16 bits • Floating Point ADDF, MULF, DIVF – size of immediate field to be at least 8-16 bits • Decimal ADDD, CONVERT • As use of compilers becomes more dominant, emphasis is on simpler addressing modes • String MOVE, COMPARE • Graphics (DE)COMPRESS Instruction Frequency (Intel Current Trends: Operations 80x86 Integer) and Operands • For a new ISA design would expect to see: – support for 8, 16, 32, (64) bit integers – support for 32 and 64 bit IEEE 754 floating point – emphasis on efficient implementation of the simple common operations Control Flow Addressing Modes • Need to specify destination address – usually explicitly »displacement relative to PC (why?) – sometimes not known statically »specify register containing address Comparisons in Conditional Jump Distances (Alpha) Branches (Alpha) Procedure Call/Return Procedure Call/Return (cont.) • Need to save return address somewhere •Caller save (minimum) – calling procedure saves registers it will need – special register or GPR afterwards • CISC architectures tended to provide • Callee save more complex support – called procedure saves registers it needs to – e.g.

Computer System Architecture Instruction Set Principles And

Pdp11-40.Pdf

Appendix D an Alternative to RISC: the Intel 80X86

Computer Organization & Architecture Eie

Instruction Set Architecture

PDP-11 Conventions Manual

What Is ISA? Stack Accumulator General-Purpose Registers

X86 Instruction Set 4.2 Why Learn Assembly

Instruction Sets

A New Architecture for Mini-Computers -- the DEC PDP-11

(ISA) Evolution of Instruction Sets Instructions

Memory Buffer Register (MBR)

Leros the Return of the Accumulator Machine