Aspects of Isas Begin with Vonneumann Model • Implicit Structure of All Modern Isas • CPU + Memory (Data & Insns) • Sequential Instructions

Aspects of ISAs Begin with VonNeumann model • Implicit structure of all modern ISAs • CPU + memory (data & insns) • Sequential instructions • Format • Length and encoding • Operand model Aspects of ISAs • Where (other than memory) are operands stored? • Datatypes and operations • Control 27 28 The Sequential Model Instruction Length and Format Length • Implicit model of all modern ISAs • Fixed length • Basic feature: the program counter (PC) Fetch PC Fetch[PC] • Most common is 32 bits • Defines total order on dynamic instruction Decode Decode + Simple implementation (next PC often just PC+4) • Next PC is PC++ (except for ctrl insns) Read Inputs Read Inputs – Code density: 32 bits to increment a register by 1 • Order + named storage define computation Execute • Variable length Execute Write Output Value flows from X to Y via storage A iff: +Code density Write Output Next PC insn X A output A + x86 can do increment in one 8-bit instruction Next PC insn YA input A – Complex fetch (where does next instruction begin?) • Processor logically executes loop at left • Compromise: two lengths • Instruction execution assumed atomic • E.g., MIPS16 or ARM’s Thumb • Instruction X finishes before insn X+1 starts Encoding • More parallel alternatives have been proposed • A few simple encodings simplify decoder • x86 decoder one nasty piece of logic 29 30 Example Instruction Encodings Operations and Datatypes MIPS • Datatypes • Fixed length • S/W: attribute of data • H/W: attribute of operation, data is just 0/1’s • 32-bits, 3 formats, simple encoding Fetch Decode • All processors support R-type Op(6) Rs(5) Rt(5) Rd(5) Sh(5) Func(6) Read Inputs • 2’s comp. integer arithmetic/logic Execute (8/16/32/64-bit) I-type Op(6) Rs(5) Rt(5) Immed(16) Write Output • IEEE754 floating-point arithmetic (32/64 bit) Next Insn • Intel has 80-bit floating-point J-type Op(6) Target(26) • Most processors now support x86 • “Packed-integer” insns, e.g., MMX • Variable length encoding (1 to 16 bytes) • “Packed-fp” insns, e.g., SSE/SSE2 • For multimedia, more about these later • Processors no longer (??) support Prefix*(1-4) Op OpExt* ModRM* SIB* Disp*(1-4) Imm*(1-4) • Decimal, other fixed-point arithmetic • Binary-coded decimal (BCD) 31 32 Where Does Data Live? How Much Memory? Address Size • What does “64-bit” in a 64-bit ISA mean? • Memory • Support memory size of 264 • Fundamental storage space Fetch • Alternative (wrong) definition: width of calculation Decode operations Read Inputs • Registers • “Virtual” address size Execute Write Output • Faster than memory, quite handy • Determines size of addressable (usable) memory Next Insn • Most processors have these too • x86 evolution: • 4-bit (4004), 8-bit (8008), 16-bit (8086), 24-bit (80286), • 32-bit + protected memory (80386) • Immediates • 64-bit (AMD’s Opteron & Intel’s EM64T Pentium4) Values spelled out as bits in instructions • • Most ISAs moving to 64 bits (if not already there) • Input only 33 34 How Many Registers? How Are Memory Locations Specified? • Registers faster than memory, have as many as possible? • Registers are specified directly • No • Register names are short, encoded in instructions • One reason registers are faster: there are fewer of them • Some instructions implicitly read/write certain registers • Small is fast (hardware truism) • Another: they are directly addressed (no address calc) – More of them, means larger specifiers • How are addresses specified? – Fewer registers per instruction or indirect addressing • Addresses are long (64-bit) • Not everything can be put in registers • Addressing mode: how are insn bits converted to • Structures, arrays, anything pointed-to addresses? – More registers more saving/restoring • Trend: more registers: 8 (x86)32 (MIPS) 128 (IA64) • 64-bit x86 has 16 64-bit integer and 16 128-bit FP registers 35 36 Memory Addressing Addressing Modes Examples • Addressing mode: way of specifying address • MIPS • Used in mem-mem or load/store instructions in register ISA • Displacement: R1+offset (16-bit) • Examples • Experiments showed this covered 80% of accesses on VAX • Register-Indirect: R1=mem[R2] • Displacement: R1=mem[R2+immed] • x86 (MOV instructions) • Index-base: R1=mem[R2+R3] • Absolute: zero + offset (8/16/32-bit) • Memory-indirect: R1=mem[mem[R2]] • Register indirect: R1 • Auto-increment: R1=mem[R2], R2= R2+1 • Indexed: R1+R2 • Auto-indexing: R1=mem[R2+immed], R2=R2+immed • Displacement: R1+offset (8/16/32-bit) • Scaled: R1=mem[R2+R3*immed1+immed2] • Scaled: R1 + (R2*Scale) + offset(8/16/32-bit) • PC-relative: R1=mem[PC+imm] Scale = 1, 2, 4, 8 • What high-level program idioms are these used for? • What implementation impact? What impact on insn count? 2 more issues: alignment & endianness 37 38 How Many Explicit Operands / ALU Insn? Operand Model Pros and Cons • Operand model: how many explicit operands / ALU insn? • Metric I: static code size • 3: general-purpose • Want: many implicit operands (stack), high level insns add R1,R2,R3 means [R1] = [R2] + [R3] (MIPS) • 2: multiple explicit accumulators (output also input) • Metric II: data memory traffic add R1,R2 means [R2] = [R2] + [R1] (x86) • Want: many long-lived operands on-chip (load-store) • 1: one implicit accumulator add R1 means ACC = ACC + [R1] • Metric III: CPI • 0: hardware stack • Want: short latencies, little variability (load-store) add means STK[TOS++] = STK[--TOS] + STK[--TOS] • 4+: useful only in special situations • CPI and data memory traffic more important these days • Examples show register operands but operands can be memory addresses, or mixed register/memory • Trend: most new ISAs are load-store ISAs or hybrids • ISA w/register-only ALU insns are = load-store architecture 39 40 Control Transfers Control Transfers I: Computing Targets • Default next-PC is PC + sizeof(current insn) • The issues • Note: PC called IR (instruction register) in x86 • How far (statically) do you need to jump? (w/in fn vs outside) Fetch • Do you need to jump to a different place each time? Decode • Branches and jumps can change that • How many bits do you need to encode the target? Read Inputs • Otherwise dynamic program == static program • PC-relative Execute • Not useful Write Output • Position-independent within procedure Next Insn • Computing targets: where to jump to • Used for branches and jumps within a procedure • For all branches and jumps • Absolute • Absolute / PC-relative / indirect • Position independent outside procedure • Used for procedure calls • Testing conditions: whether to jump at all • For (conditional) branches only • Indirect (target found in register) • Compare-branch / condition-codes / condition • Needed for jumping to dynamic targets registers • For returns, dynamic procedure calls, switch statements 41 42 Control Transfers II: Testing Conditions ISAs Also Include Support For… • Compare and branch insns • Operating systems & memory protection branch-less-than R1,10,target • Privileged mode +Simple – Two ALUs (for condition & target address) • System call (TRAP) –Extra latency • Exceptions & interrupts • Implicit condition codes (x86) • Interacting with I/O devices cmp R1,10 // sets “negative” CC/flag branch-neg target + More room for target, condition codes set “for free” • Multiprocessor support + Branch insn simple and fast – Implicit dependence is tricky • “Atomic” operations for synchronization • Conditions in regs, separate branch (MIPS) set-less-than R2,R1,10 • Data-level parallelism branch-not-equal-zero R2,target – Additional insns • Pack many values into a wide register + one ALU per insn, explicit dependence • Intel’s SSE2: 4x32-bit float-point values in 128-bit register > 80% of branches are (in)equalities/comparisons to 0 • Define parallel operations (four “adds” in one cycle) 43 44 RISC and CISC • RISC: reduced-instruction set computer • Coined by Patterson in early 80’s • Berkeley RISC-I (Patterson), Stanford MIPS (Hennessy), IBM 801 (Cocke), PowerPC, ARM, SPARC, Alpha, PA-RISC • CISC: complex-instruction set computer • Term didn’t exist before “RISC” • x86, VAX, Motorola 68000, etc. • Philosophical war (one of several) started in mid 1980’s • RISC “won” the technology battles The RISC vs. CISC Debate • CISC won the high-end commercial war (1990s to today) • Compatibility a stronger force than anyone (but Intel) thought • RISC won the embedded computing war 45 46 The Setup The RISC Tenets • Pre 1980 • Single-cycle execution • Bad compilers (so assembly written by hand) • CISC: many multicycle operations • Hardwired control • Complex, high-level ISAs (easier to write assembly) • CISC: microcoded multi-cycle operations • Around 1982 • Load/store architecture • Moore’s Law makes fast single-chip microprocessor • CISC: register-memory and memory-memory possible… …but only for small, simple ISAs • Few memory addressing modes • CISC: many modes • Performance advantage of “integration” was compelling • Fixed-length instruction format • Compilers had to get involved in a big way • CISC: many formats and lengths • Reliance on compiler optimizations RISC manifesto: create ISAs that… • CISC: hand assemble to get good performance • Many registers (compilers are better at using them) • Simplify single-chip implementation • CISC: few registers • Facilitate optimizing compilation 47 48 CISCs and RISCs The Debate • RISC argument • The CISCs: x86, VAX (Virtual Address eXtension to PDP-11) • CISC is fundamentally handicapped by complexity • Variable length instructions: 1-321 bytes!!! • For a given technology, RISC will be better (faster) • 14 GPRs + PC + stack-pointer

Load more