ISA Supplement
Total Page:16
File Type:pdf, Size:1020Kb
Lecture 04: ISA Principles Supplements CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan [email protected] www.secs.oakland.edu/~yan 1 Contents 1. Introduc@on 2. ClAssifying Instruc@on Set Architectures 3. Memory Addressing 4. Type and Size of Operands 5. Operaons in the Instrucon Set 6. Instruc@ons for Control Flow 7. Encoding an Instruc@on Set 8. CrosscuMng Issues: The Role of Compilers 9. RISC-V ISA • Supplements 2 Lecture 03 Supplements • MIPS ISA • RISC vs CISC • Compiler compilaon stages • ISA Historical – Appendix L • Comparison of ISA – Appendix K 3 PuMng it All together: the MIPS Architecture(A simple 64-bit load-store architecture) • Use general-purpose registers with a load-store architecture • Support these addressing modes:displacement(with address offset of 12-16bits), immediate (size 8-16bits), and register indirect. • Support these data sizes and types: 8-, 16-, and 64- integers and 64-bit IEEE 754 floang-point numbers. 4 PuMng it all together:the MIPS Architecture(A simple 64-bit load-store architecture) • Support these simple instrucGons:load, store, add, subtract, move register-register, and shi\. • Compare equal, compare not equal, compare less, branch, jump, call, and return. • Use fixed instrucGon encoding if interested in performance, and use variable instrucGon encoding if interested in code size. 5 MIPS emphAsized • A simple load-store instrucGon set • Design for pipelining efficiency • Efficiency as a compiler target. 6 Instruc@on lAyout for MIPS 7 The loAd And store instruc@ons in MIPS 8 Examples of arithme@c/logical instruc@ons 9 TypicAl control flow instruc@ons in MIPS 10 Subset of the instruc@ons in MIPS64 11 MIPS dynAmic instruc@on mix for five SPECint2000 progrAms 12 MIPS dynAmic instruc@on mix for five SPECfp2000 progrAms 13 Graphical displAy of instrucons 14 Ra@o of execu@on @me And code size for compiled code versus hAndwrien code 15 SummAry: Instruc@on Set Design (MIPS) • Use general purpose registers with a load-store architecture: YES • Provide at least 16 general purpose registers plus separate floang-point registers: 31 GPR & 32 FPR • Support basic addressing modes: displacement (with an address offset size of 12 to 16 bits), immediate (size 8 to 16 bits), and register deferred; : YES: 16 bits for immediate, displacement (disp=0 => register deferred) • All addressing modes apply to all data transfer instrucGons : YES • Use fixed instrucGon encoding if interested in performance and use variable instrucGon encoding if interested in code size : Fixed • Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floang point numbers: YES • Support these simple instrucGons, since they will dominate the number of instrucGons executed: load, store, add, subtract, move register-register, and, shi\, compare equal, compare not equal, branch (with a PC-relave address at least 8-bits long), jump, call, and return: YES • Aim for a minimalist instrucGon set: YES 16 RISC Vs CISC • CISC (complex instrucGon set computer) – VAX, Intel X86, IBM 360/370, etc. • RISC (reduced instrucGon set computer) – MIPS, DEC Alpha, SUN Sparc, IBM 801 17 RISC vs. CISC • CharacterisGcs of ISAs CISC RISC Variable length Single word instruction instruction Variable format Fixed-field decoding Memory operands Load/store architecture Complex operations Simple operations 18 RISC vs. CISC Instruc@on Set Design • The historical background: – In first 25 years (1945-70) performance came from both technology and design. – Design constraints: • small and slow memories: compact programs are fast. • small no. of registers: memory operands. • aempts to bridge the semanGc gap: model high level language features in instrucGons. • no need for portability: same vendor applicaon, oS and hardware. • backward compability: every new ISA must carry the good and bad of all past ones. – Result: powerful and complex instrucGons that are rarely used. – IC technology and microprocessors in 1970s: lower costs, low power consumpGon, higher clock rates, cheaper and larger memories. 19 RISC vs. CISC Instruc@on Set Design • Emergence of RISC – Very large scale integraon (processor on a chip): silicon real-estate at a premium. Micro-store occupies about 70% of chip area: replace micro- store with registers ==> load/store ISA. – Increased difference between CPU and memory speeds. – Complex instrucGons were not used by new compilers. – So\ware changes: • reduced reliance on assembly programming, new ISA can be introduced. • standardized vendor independent oS (Unix) became very popular in some market segments (academia and research) – need for portability – Early RISC projects: IBM 801 (America), Berkeley SPUR, RISC I and RISC II and Stanford MIPS. 20 Complex vs. Simple Instructions • Complex instruction: An instruction does a lot of work, e.g. many operations – Insert in a doubly linked list – Compute FFT – String copy • Simple instruction: An instruction does small amount of work, it is a primitive using which complex operations can be built – Add – XOR – Multiply 21 Complex vs. Simple Instructions • Advantages of Complex instructions + Denser encoding à smaller code size à better memory utilization, saves off-chip bandwidth, better cache hit rate (better packing of instructions) + Simpler compiler: no need to optimize small instructions as much • Disadvantages of Complex Instructions - Larger chunks of work à compiler has less opportunity to optimize (limited in fine-grained optimizations it can do) - More complex hardware à translation from a high level to control signals and optimization needs to be done by hardware 22 ISA-level Tradeoffs: Semantic Gap • Where to place the ISA? Semantic gap – Closer to high-level language (HLL) à Small semantic gap, complex instructions – Closer to hardware control signals? à Large semantic gap, simple instructions • RISC vs. CISC machines – RISC: Reduced instruction set computer – CISC: Complex instruction set computer • FFT, QUICKSORT, POLY, FP instructions? • VAX INDEX instruction (array access with bounds checking) 23 ISA-level Tradeoffs: Semantic Gap • Some tradeoffs (for you to think about) • Simple compiler, complex hardware vs. complex compiler, simple hardware – Caveat: Translation (indirection) can change the tradeoff! • Burden of backward compatibility • Performance? – Optimization opportunity: Example of VAX INDEX instruction: who (compiler vs. hardware) puts more effort into optimization? – Instruction size, code size 24 X86: Small Semantic Gap: String Operations • An instruction operates on a string – Move one string of arbitrary length to another location – Compare two strings • Enabled by the ability to specify repeated execution of an instruction (in the ISA) – Using a “prefix” called REP prefix • Example: REP MOVS instruction – Only two bytes: REP prefix byte and MOVS opcode byte (F2 A4) – Implicit source and destination registers pointing to the two strings (ESI, EDI) – Implicit count register (ECX) specifies how long the string is 25 X86: Small Semantic Gap: String Operations REP MOVS (DEST SRC) How many instructions does this take in MIPS? 26 Small Semantic Gap Examples in VAX • FIND FIRST – Find the first set bit in a bit field – Helps OS resource allocation operations • SAVE CONTEXT, LOAD CONTEXT – Special context switching instructions • INSQUEUE, REMQUEUE – Operations on doubly linked list • INDEX – Array access with bounds checking • STRING Operations – Compare strings, find substrings, … • Cyclic Redundancy Check Instruction • EDITPC – Implements editing functions to display fixed format output • Digital Equipment Corp., “VAX11 780 Architecture Handbook,” 1977-78. 27 Small versus Large Semantic Gap • CISC vs. RISC – Complex instruction set computer à complex instructions • Initially motivated by “not good enough” code generation – Reduced instruction set computer à simple instructions • John Cocke, mid 1970s, IBM 801 – Goal: enable better compiler control and optimization • RISC motivated by – Memory stalls (no work done in a complex instruction when there is a memory stall?) • When is this correct? – Simplifying the hardware à lower cost, higher frequency – Enabling the compiler to optimize the code better • Find fine-grained parallelism to reduce stalls 28 How High or Low Can You Go? • Very large semantic gap – Each instruction specifies the complete set of control signals in the machine – Compiler generates control signals – Open microcode (John Cocke, circa 1970s) • Gave way to optimizing compilers • Very small semantic gap – ISA is (almost) the same as high-level language – Java machines, LISP machines, object-oriented machines, capability-based machines 29 A Note on ISA Evolution • ISAs have evolved to reflect/satisfy the concerns of the day • Examples: – Limited on-chip and off-chip memory size – Limited compiler optimization technology – Limited memory bandwidth – Need for specialization in important applications (e.g., MMX) • Use of translation (in HW and SW) enabled underlying implementations to be similar, regardless of the ISA – Concept of dynamic/static interface – Contrast it with hardware/software interface 30 Effect of Translation • One can translate from one ISA to another ISA to change the semantic gap tradeoffs • Examples – Intel’s and AMD’s x86 implementations translate x86 instructions into programmer-invisible microoperations (simple instructions) in hardware – Transmeta’s x86 implementations translated x86 instructions into “secret” VLIW instructions in software (code morphing software) • Think