Instruction Set Architecture (ISA)

Application • What is a good ISA? OS • Aspects of ISAs Firmware CIS 501 • RISC vs. CISC Introduction to Architecture CPU I/O • Implementing CISC: µISA Memory

Digital Circuits Unit 2: Instruction Set Architecture Gates & Transistors

CIS 501 (Martin/Roth): Instruction Set Architectures 1 CIS 501 (Martin/Roth): Instruction Set Architectures 2

Readings What Is An ISA?

• H+P • ISA (instruction set architecture) • Chapter 2 • A well-define hardware/software interface • Further reading: • Appendix (RISC) and Appendix () • The “contract” between software and hardware • Functional definition of operations, modes, and storage • Available from web page locations supported by hardware • Precise description of how to invoke, and access them • Paper • The Evolution of RISC Technology at IBM by John Cocke • No guarantees regarding • How operations are implemented • Much of this chapter will be “on your own reading” • operations are fast and which are slow and when • Which operations take more power and which take less • Hard to talk about ISA features without knowing what they do • We will revisit many of these issues in context

CIS 501 (Martin/Roth): Instruction Set Architectures 3 CIS 501 (Martin/Roth): Instruction Set Architectures 4 A Language Analogy for ISAs RISC vs CISC Foreshadowing

• A ISA is analogous to a human language • Recall performance equation: • Allows communication • (instructions/program) * (cycles/instruction) * (seconds/cycle) • Language: person to person • ISA: hardware to software • CISC (Complex Instruction Set ) • Need to speak the same language/ISA • Improve “instructions/program” with “complex” instructions • Many common aspects • Easy for assembly-level , good code density • Part of speech: verbs, nouns, adjectives, adverbs, etc. • Common operations: calculation, control/branch, memory • RISC (Reduced Instruction Set Computing) • Many different languages/ISAs, many similarities, many differences • Improve “cycles/instruction” with many single-cycle instructions • Different structure • Increases “instruction/program”, but hopefully not as much • Both evolve over time • Help from smart compiler • Key differences: ISAs must be unambiguous • Perhaps improve clock cycle time (seconds/cycle) • ISAs are explicitly engineered and extended • via aggressive implementation allowed by simpler instructions

CIS 501 (Martin/Roth): Instruction Set Architectures 5 CIS 501 (Martin/Roth): Instruction Set Architectures 6

What Makes a Good ISA? Programmability

• Programmability • Easy to express programs efficiently? • Easy to express programs efficiently? • For whom? • Implementability • Easy to design high-performance implementations? • Before 1985: human • More recently • were terrible, most code was hand-assembled • Easy to design low-power implementations? • Want high-level coarse-grain instructions • Easy to design high-reliability implementations? • As similar to high-level language as possible • Easy to design low-cost implementations? • Compatibility • After 1985: compiler • Easy to maintain programmability (implementability) as languages • Optimizing compilers generate much better code that you or I and programs (technology) evolves? • Want low-level fine-grain instructions • x86 (IA32) generations: 8086, 286, 386, 486, Pentium, PentiumII, PentiumIII, Pentium4,… • Compiler can’t tell if two high-level idioms match exactly or not

CIS 501 (Martin/Roth): Instruction Set Architectures 7 CIS 501 (Martin/Roth): Instruction Set Architectures 8 Human Programmability Compiler Programmability

• What makes an ISA easy for a human to program in? • What makes an ISA easy for a compiler to program in? • Proximity to a high-level language (HLL) • Low level primitives from which solutions can be synthesized • Closing the “semantic gap” • Wulf: “primitives not solutions” • Semantically heavy (CISC-like) insns that capture complete idioms • good at breaking complex structures to simple ones • “Access array element”, “loop”, “procedure call” • Requires traversal • Example: SPARC save/restore • Not so good at combining simple structures into complex ones • Bad example: x86 rep movsb (copy string) • Requires search, pattern matching (why AI is hard) • Ridiculous example: VAX insque (insert-into-queue) • Easier to synthesize complex insns than to compare them • “Semantic clash”: what if you have many high-level languages? • Rules of thumb • Stranger than fiction • Regularity: “principle of least astonishment” • People once thought computers would execute language directly • Orthogonality & composability • Fortunately, never materialized (but keeps coming back around) • One-vs.-all

CIS 501 (Martin/Roth): Instruction Set Architectures 9 CIS 501 (Martin/Roth): Instruction Set Architectures 10

Today’s Semantic Gap Implementability

• Popular argument • Every ISA can be implemented • Today’s ISAs are targeted to one language… • Not every ISA can be implemented efficiently • Just so happens that this language is very low level • The C • Classic high-performance implementation techniques • Pipelining, parallel execution, out-of-order execution (more later) • Will ISAs be different when Java/C# become dominant? • Object-oriented? Probably not • Certain ISA features make these difficult • Support for garbage collection? Maybe – Variable instruction lengths/formats: complicate decoding • Support for bounds-checking? Maybe – Implicit state: complicates dynamic scheduling • Why? – Variable latencies: complicates scheduling • Smart compilers transform high-level languages to simple – Difficult to interrupt instructions: complicate many things instructions • Any benefit of tailored ISA is likely small

CIS 501 (Martin/Roth): Instruction Set Architectures 11 CIS 501 (Martin/Roth): Instruction Set Architectures 12 Compatibility The Compatibility Trap

• No-one buys new hardware… if it requires new software • Easy compatibility requires forethought • was the first company to realize this • Temptation: use some ISA extension for 5% performance gain • ISA must remain compatible, no matter what • Frequent outcome: gain diminishes, disappears, or turns to loss • x86 one of the worst designed ISAs EVER, but survives – Must continue to support gadget for eternity • As does IBM’s 360/370 (the first “ISA family”) • Backward compatibility • Example: register windows (SPARC) • New processors must support old programs (can’t drop features) • Adds difficulty to out-of-order implementations of SPARC • Very important • Details shortly • Forward (upward) compatibility • Old processors must support new programs (with software help) • New processors redefine only previously-illegal opcodes • Allow software to detect support for specific new instructions • Old processors emulate new instructions in low-level software

CIS 501 (Martin/Roth): Instruction Set Architectures 13 CIS 501 (Martin/Roth): Instruction Set Architectures 14

The Compatibility Trap Door Aspects of ISAs

• Compatibility’s friends • VonNeumann model • Trap: instruction makes low-level “function call” to OS handler • Implicit structure of all modern ISAs • Nop: “no operation” - instructions with no functional semantics • Format • Backward compatibility • Length and encoding • Handle rarely used but hard to implement “legacy” opcodes • Operand model • Define to trap in new implementation and emulate in software • Where (other than memory) are operands stored? • Rid yourself of some ISA mistakes of the past • Datatypes and operations • Problem: performance suffers • Control • Forward compatibility • Reserve sets of trap & opcodes (don’t define uses) • Add ISA functionality by overloading traps • Overview only • Release firmware patch to “add” to old implementation • Read about the rest in the book and appendices • Add ISA hints by overloading nops

CIS 501 (Martin/Roth): Instruction Set Architectures 15 CIS 501 (Martin/Roth): Instruction Set Architectures 16 The Sequential Model Format

• Implicit model of all modern ISAs • Length • Often called VonNeuman, but in ENIAC before • Fixed length Fetch PC • Basic feature: the program (PC) • Most common is 32 bits Decode • Defines total order on dynamic instruction + Simple implementation: compute next PC using only PC Read Inputs • Next PC is PC++ unless insn says otherwise – Code density: 32 bits to increment a register by 1? Execute • Order and named storage define computation – x86 can do this in one 8-bit instruction • Value flows from insn X to Y via storage A iff… Write Output • Variable length • X names A as output, Y names A as input… Next PC – Complex implementation • And Y after X in total order + Code density • logically executes loop at left • Compromise: two lengths • Instruction execution assumed atomic • MIPS16 or ARM’s Thumb • Instruction X finishes before insn X+1 starts • Encoding • Alternatives have been proposed… • A few simple encodings simplify decoder implementation

CIS 501 (Martin/Roth): Instruction Set Architectures 17 CIS 501 (Martin/Roth): Instruction Set Architectures 18

Example: MIPS Format Operand Model: Memory Only

• Length • Where (other than memory) can operands come from? • 32-bits • And how are they specified? • Example: A = B + C • Encoding • Several options • 3 formats, simple encoding • Q: how many instructions can be encoded? A: 127 • Memory only add B,C,A mem[A] = mem[B] + mem[C] R-type Op(6) Rs(5) Rt(5) Rd(5) Sh(5) Func(6)

I-type Op(6) Rs(5) Rt(5) Immed(16)

J-type Op(6) Target(26) MEM

CIS 501 (Martin/Roth): Instruction Set Architectures 19 CIS 501 (Martin/Roth): Instruction Set Architectures 20 Operand Model: Accumulator Operand Model: Stack

• Accumulator: implicit single element storage • Stack: TOS implicit in instructions load B ACC = mem[B] push B stk[TOS++] = mem[B] add C ACC = ACC + mem[C] push C stk[TOS++] = mem[C] store A mem[A] = ACC add stk[TOS++] = stk[--TOS] + stk[--TOS] pop A mem[A] = stk[--TOS]

ACC TOS

MEM MEM

CIS 501 (Martin/Roth): Instruction Set Architectures 21 CIS 501 (Martin/Roth): Instruction Set Architectures 22

Operand Model: Registers Operand Model Pros and Cons

• General-purpose register: multiple explicit accumulator • Metric I: static code size load B,R1 R1 = mem[B] • Number of instructions needed to represent program, size of each add C,R1 R1 = R1 + mem[C] • Want many implicit operands, high level instructions store R1,A mem[A] = R1 • Good ! bad: memory, accumulator, stack, load-store • Load-store: GPR and only loads/stores access memory • Metric II: memory traffic load B,R1 R1 = mem[B] • Number of move to and from memory load C,R2 R2 = mem[C] • Want as many long-lived operands in on-chip storage add R1,R2,R1 R1 = R1 + R2 • Good ! bad: load-store, stack, accumulator, memory store R1,A mem[A] = R1 • Metric III: • Want short (1 cycle?), little variability, few nearby dependences • Good ! bad: load-store, stack, accumulator, memory

MEM • Upshot: most new ISAs are load-store or hybrids

CIS 501 (Martin/Roth): Instruction Set Architectures 23 CIS 501 (Martin/Roth): Instruction Set Architectures 24 How Many Registers? Register Windows

• Registers faster than memory, have as many as possible? • Register windows: hardware activation records • No • Sun SPARC (from the RISC I) • One reason registers are faster is that there are fewer of them • 32 integer registers divided into: 8 global, 8 local, 8 input, 8 output • Small is fast (hardware truism) • Explicit save/restore instructions • Another is that they are directly addressed (no address calc) • Global registers fixed – More of them, means larger specifiers • save: inputs “pushed”, outputs ! inputs, locals zeroed – Fewer registers per instruction or indirect addressing • restore: locals zeroed, inputs ! outputs, inputs “popped” • Not everything can be put in registers • Hardware stack provides few (4) on-chip register frames • Structures, arrays, anything pointed-to • Spilled-to/filled-from memory on over/under flow • Although compilers are getting better at putting more things in – More registers means more saving/restoring + Automatic parameter passing, caller-saved registers + No memory traffic on shallow (<4 deep) call graphs • Upshot: trend to more registers: 8 (x86)!32 (MIPS) !128 (IA32) – Hidden memory operations (some restores fast, others slow) • 64-bit x86 has 16 64-bit integer and 16 128-bit FP registers – A nightmare for (more later)

CIS 501 (Martin/Roth): Instruction Set Architectures 25 CIS 501 (Martin/Roth): Instruction Set Architectures 26

Virtual Address Size Memory Addressing

• What is a n-bit processor? • : way of specifying address • Support memory size of 2n • Used in memory-memory or load/store instructions in register ISA • Alternative (wrong) definition: size of calculation operations • Examples • Virtual address size • Register-Indirect: R1=mem[R2] • Determines size of addressable (usable) memory • Displacement: R1=mem[R2+immed] • Current 32-bit or 64-bit address spaces • Index-base: R1=mem[R2+R3] • All ISAs moving to (if not already at) 64 bits • Most critical, inescapable ISA design decision • Memory-indirect: R1=mem[mem[R2]] • Too small? Will limit the lifetime of ISA • Auto-increment: R1=mem[R2], R2= R2+1 • May require nasty hacks to overcome (E.g., x86 segments) • Auto-indexing: R1=mem[R2+immed], R2=R2+immed • x86 evolution: • Scaled: R1=mem[R2+R3*immed1+immed2] • 4-bit (4004), 8-bit (8008), 16-bit (8086), 20-bit (80286), • PC-relative: R1=mem[PC+imm] • 32-bit + protected memory (80386) • What high-level program idioms are these used for? • 64-bit (AMD’s & Intel’s EM64T Pentium4)

CIS 501 (Martin/Roth): Instruction Set Architectures 27 CIS 501 (Martin/Roth): Instruction Set Architectures 28 Example: MIPS Addressing Modes Two More Addressing Issues

• MIPS implements only displacement • Access alignment: address % size == 0? • Aligned: load-word @XXXX00, load-half @XXXXX0 • Why? Experiment on VAX (ISA with every mode) found distribution • Unaligned: load-word @XXXX10, load-half @XXXXX1 • Disp: 61%, reg-ind: 19%, scaled: 11%, mem-ind: 5%, other: 4% • Question: what to do with unaligned accesses (uncommon case)? • 80% use small displacement or register indirect (displacement 0) • Support in hardware? Makes all accesses slow • Trap to software routine? Possibility • I-type instructions: 16-bit displacement • Use regular instructions • Load, shift, load, shift, and • Is 16-bits enough? • MIPS? ISA support: unaligned access using two instructions • Yes? VAX experiment showed 1% accesses use displacement >16 lwl @XXXX10; lwr @XXXX10

• Endian-ness: arrangement of bytes in a word I-type Op(6) Rs(5) Rt(5) Immed(16) • Big-endian: sensible order (e.g., MIPS, PowerPC) • A 4- integer: “00000000 00000000 00000010 00000011” is 515 • Little-endian: reverse order (e.g., x86) • SPARC adds Reg+Reg mode • A 4-byte integer: “00000011 00000010 00000000 00000000 ” is 515 • Why little endian? To be different? To be annoying? Nobody knows CIS 501 (Martin/Roth): Instruction Set Architectures 29 CIS 501 (Martin/Roth): Instruction Set Architectures 30

Control Instructions Example: MIPS Conditional Branches

• One issue: testing for conditions • MIPS uses combination of options II/III • Option I: compare and branch insns • Compare 2 registers and branch: beq, bne branch-less-than R1,10,target • Equality and inequality only + Simple, – two ALUs: one for condition, one for target address + Don’t need an for comparison • Option II: implicit condition codes • Compare 1 register to zero and branch: bgtz, bgez, bltz, blez subtract R2,R1,10 // sets “negative” CC • Greater/less than comparisons branch-neg target + Don’t need adder for comparison + Condition codes set “for free”, – implicit dependence is tricky • Set explicit condition registers: slt, sltu, slti, sltiu, etc. • Option III: condition registers, separate branch insns set-less-than R2,R1,10 • Why? branch-not-equal-zero R2,target • More than 80% of branches are (in)equalities or comparisons to 0 – Additional instructions, + one ALU per, + explicit dependence • OK to take two insns to do remaining branches (MCCF)

CIS 501 (Martin/Roth): Instruction Set Architectures 31 CIS 501 (Martin/Roth): Instruction Set Architectures 32 Control Instructions II MIPS Control Instructions

• Another issue: computing targets • MIPS uses all three • Option I: PC-relative • PC-relative conditional branches: bne, beq, blez, etc. • Position-independent within procedure • 16-bit relative offset, <0.1% branches need more • Used for branches and jumps within a procedure • Option II: Absolute I-type Op(6) Rs(5) Rt(5) Immed(16) • Position independent outside procedure • Absolute jumps unconditional jumps: j • Used for procedure calls • Option III: Indirect (target found in register) • 26-bit offset • Needed for jumping to dynamic targets J-type Op(6) Target(26) • Used for returns, dynamic procedure calls, • Indirect jumps: jr • How far do you need to jump? • Typically not so far within a procedure (they don’t get that big) R-type Op(6) Rs(5) Rt(5) Rd(5) Sh(5) Func(6) • Further from one procedure to another

CIS 501 (Martin/Roth): Instruction Set Architectures 33 CIS 501 (Martin/Roth): Instruction Set Architectures 34

Control Instructions III RISC and CISC

• Another issue: support for procedure calls? • RISC: reduced-instruction set computer • Link (remember) address of calling insn + 4 so we can return to it • Coined by Patterson in early 80’s • Berkeley RISC-I (Patterson), Stanford MIPS (Hennessy), IBM 801 (Cocke) • MIPS • Examples: PowerPC, ARM, SPARC, Alpha, PA-RISC • Implicit return address register is $31 • CISC: complex-instruction set computer • Direct jump-and-link: jal • Term didn’t exist before “RISC” • Indirect jump-and-link: jalr • x86, VAX, , etc.

• Religious war (one of several) started in mid 1980’s • RISC “won” the technology battles • CISC won the commercial war • Compatibility a stronger force than anyone (but Intel) thought • Intel beat RISC at its own game

CIS 501 (Martin/Roth): Instruction Set Architectures 35 CIS 501 (Martin/Roth): Instruction Set Architectures 36 The Setup The RISC Tenets

• Pre 1980 • Single-cycle execution • Bad compilers • CISC: many multicycle operations • Complex, high-level ISAs • Hardwired control • Slow multi-chip micro-programmed implementations • CISC: microcoded multi-cycle operations • Vicious feedback loop • Load/store architecture • Around 1982 • CISC: register-memory and memory-memory • Advances in VLSI made single-chip possible… • Few memory addressing modes • Speed by integration, on-chip wires much faster than off-chip • CISC: many modes • …but only for very small, very simple ISAs • Fixed instruction format • Compilers had to get involved in a big way • CISC: many formats and lengths • RISC manifesto: create ISAs that… • Reliance on compiler optimizations • Simplify single-chip implementation • CISC: hand assemble to get good performance • Facilitate optimizing compilation

CIS 501 (Martin/Roth): Instruction Set Architectures 37 CIS 501 (Martin/Roth): Instruction Set Architectures 38

The CISCs The RISCs

• DEC VAX (Virtual Address eXtension to PDP-11): 1977 • Many similar ISAs: MIPS, PA-RISC, SPARC, PowerPC, Alpha • Variable length instructions: 1-321 bytes!!! • 32-bit instructions • 14 GPRs + PC + stack-pointer + condition codes • 32 registers • Data sizes: 8, 16, 32, 64, 128 bit, decimal, string • 64-bit • Memory-memory instructions for all data sizes • Fews addressing modes (SPARC and PowerPC have more) • Special insns: crc, insque, polyf, and a cast of hundreds • Why so many? Everyone invented their own new ISA • Intel x86 (IA32): 1974 • “Difficult to explain and impossible to love” • Variable length instructions: 1-16 bytes • DEC Alpha (Extended VAX): 1990 • 8 special purpose registers + condition codes • The most recent, cleanest RISC ISA • Data sizes: 8,16,32,64 (new) bit (overlapping registers) • 64-bit data (32,16,8 added only after software vendor riots) • Accumulators (register and memory) for integer, stack for FP • Only aligned memory access • Many modes: indirect, scaled, displacement + segments • One addressing mode: displacement • Special insns: push, pop, string functions, MMX, SSE/2/3 (later) • Special instructions: conditional moves, prefetch hints

CIS 501 (Martin/Roth): Instruction Set Architectures 39 CIS 501 (Martin/Roth): Instruction Set Architectures 40 Post-RISC The RISC Debate

• Intel/HP IA64 (): 2000 • RISC argument [Patterson et al.] • Fixed length instructions: 128-bit 3-operation bundles • CISC is fundamentally handicapped • EPIC: explicitly parallel instruction computing • For a given technology, RISC implementation will be better (faster) • 128 64-bit registers • Current technology enables single-chip RISC • When it enables single-chip CISC, RISC will be pipelined • Special instructions: true predication, software speculation • When it enables pipelined CISC, RISC will have caches • Every new ISA feature suggested in last two decades • When it enables CISC with caches, RISC will have next thing... • More later in course • CISC rebuttal [Colwell et al.] • CISC flaws not fundamental, can be fixed with more transistors • Moore’s Law will narrow the RISC/CISC gap (true) • Good pipeline: RISC = 100K transistors, CISC = 300K • By 1995: 2M+ transistors had evened playing field • Software costs dominate, compatibility is important (so true)

CIS 501 (Martin/Roth): Instruction Set Architectures 41 CIS 501 (Martin/Roth): Instruction Set Architectures 42

Current Winner (units sold): ARM Current Winner (revenue): x86

• ARM (Advanced RISC Machine) • x86 was first 16-bit chip by ~2 years • First ARM chip in mid-1980s (from Acorn Computer Ltd). • IBM put it into its PCs because there was no competing choice • 1.2 billion units sold in 2004 • Rest is historical inertia and “financial feedback” • More than half of all 32/64-bit CPUs sold • x86 is most difficult ISA to implement and do it fast but… • Low-power and embedded devices (iPod, for example) • Because Intel sells the most non-embedded processors… • 32-bit RISC ISA • It has the most money… • 16 registers • Which it uses to hire more and better engineers… • Many addressing modes (for example, auto increment) • Which it uses to maintain competitive performance … • Condition codes, each instruction can be conditional • And given equal performance compatibility wins… • Multiple compatible implementations • So Intel sells the most non-embedded processors… • Intel’s X-scale (original design was DEC’s, bought by Intel) • AMD as a competitor keeps pressure on x86 performance • Others: Freescale (was Motorola), IBM, Texas Instruments, Nintendo, STMicroelectronics, Samsung, Sharp, Philips, etc. • Moore’s law has helped Intel in a big way • “Thumb” 16-bit wide instructions • Most engineering problems can be solved with more transistors • Increase code density

CIS 501 (Martin/Roth): Instruction Set Architectures 43 CIS 501 (Martin/Roth): Instruction Set Architectures 44 Intel’s Trick: RISC Inside ’s Take: Code Morphing

• 1993: Intel wanted out-of-order execution in Pentium Pro • Code morphing: x86 translation performed in software • OOO was very hard to do with a coarse grain ISA like x86 • Crusoe/Astro are x86 , no actual x86 hardware anywhere • Their solution? Translate x86 to RISC uops in hardware • Only “code morphing” translation software written in native ISA push $eax • Native ISA is invisible to applications, OS, even BIOS is translated (dynamically in hardware) to • Different Crusoe versions have (slightly) different ISAs: can’t tell store $eax [$esp-4] • How was it done? addi $esp,$esp,-4 • Code morphing software resides in boot ROM • Processor maintains x86 ISA externally for compatibility • On startup boot ROM hijacks 16MB of main memory • But executes RISC µISA internally for implementability • Translator loaded into 512KB, rest is translation • Translation itself is proprietary, but 1.6 uops per x86 insn • Software starts running in interpreter mode • Given translator, x86 almost as easy to implement as RISC • Interpreter profiles to find “hot” regions: procedures, loops • Result: Intel implemented OOO before any RISC company • Hot region compiled to native, optimized, cached • Idea co-opted by other x86 companies: AMD and Transmeta • Gradually, more and more of application starts running native • The one company that resisted (Cyrix) couldn’t keep up

CIS 501 (Martin/Roth): Instruction Set Architectures 45 CIS 501 (Martin/Roth): Instruction Set Architectures 46

Emulation/ Virtual ISAs

• Compatibility is still important but definition has changed • Machine virtualization • Less necessary that processor ISA be compatible • Vmware & Xen: x86 on x86 (what is this good for?) • As long as some combination of ISA + software translation layer is • Old idea (from IBM mainframes), big revival in the near future • Advances in emulation, binary translation have made this possible • Java and C# use an ISA-like interface • Binary-translation: transform static image, run native • JavaVM uses a stack-based bytecode • Emulation: unmodified image, interpret each dynamic insn • C# has the CLR (common language runtime) • Typically optimized with just-in-time (JIT) compilation • Higher-level than machine ISA • Examples • Design for translation (not direct execution) • FX!32: x86 on Alpha • Goals: • IA32EL: x86 on IA64 • Portability (abstract away the actual hardware) • Rosetta: PowerPC on x86 • Target for high-level compiler (one per language) • Downside: performance overheads • Source for low-level translator (one per ISA) • Flexibility over time

CIS 501 (Martin/Roth): Instruction Set Architectures 47 CIS 501 (Martin/Roth): Instruction Set Architectures 48 ISA Research Summary

• Compatibility killed ISA research for a while • What makes a good ISA • But binary translation/emulation has revived it • {Programm|Implement|Compat}-ability • Current projects • Compatibility is a powerful force • “ISA for Instruction-Level Distributed Processing” [Kim,Smith] • Compatibility and implementability: µISAs, binary translation • Multi-level exposes local/global communication • Aspects of ISAs • “DELI: Dynamic Execution Layer Interface” [HP] • CISC and RISC • An open translation/optimization/caching infrastructure • “WaveScalar” [Swanson,Shwerin,Oskin] • Next up: caches • The vonNeumann alternative • “DISE: Dynamic Instruction Stream Editor” [Corliss,Lewis,Roth] • A programmable µISA: µISA/binary-rewriting hybrid • Local project: http://…/~eclewis/proj/dise/

CIS 501 (Martin/Roth): Instruction Set Architectures 49 CIS 501 (Martin/Roth): Instruction Set Architectures 50