ISA Supplement

Lecture 04: ISA Principles Supplements CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan [email protected] www.secs.oakland.edu/~yan 1 Contents 1. Introduc@on 2. ClAssifying Instruc@on Set Architectures 3. Memory Addressing 4. Type and Size of Operands 5. Operaons in the Instrucon Set 6. Instruc@ons for Control Flow 7. Encoding an Instruc@on Set 8. CrosscuMng Issues: The Role of Compilers 9. RISC-V ISA • Supplements 2 Lecture 03 Supplements • MIPS ISA • RISC vs CISC • Compiler compilaon stages • ISA Historical – Appendix L • Comparison of ISA – Appendix K 3 PuMng it All together: the MIPS Architecture(A simple 64-bit load-store architecture) • Use general-purpose registers with a load-store architecture • Support these addressing modes:displacement(with address offset of 12-16bits), immediate (size 8-16bits), and register indirect. • Support these data sizes and types: 8-, 16-, and 64- integers and 64-bit IEEE 754 floang-point numbers. 4 PuMng it all together:the MIPS Architecture(A simple 64-bit load-store architecture) • Support these simple instrucGons:load, store, add, subtract, move register-register, and shi\. • Compare equal, compare not equal, compare less, branch, jump, call, and return. • Use fixed instrucGon encoding if interested in performance, and use variable instrucGon encoding if interested in code size. 5 MIPS emphAsized • A simple load-store instrucGon set • Design for pipelining efficiency • Efficiency as a compiler target. 6 Instruc@on lAyout for MIPS 7 The loAd And store instruc@ons in MIPS 8 Examples of arithme@c/logical instruc@ons 9 TypicAl control flow instruc@ons in MIPS 10 Subset of the instruc@ons in MIPS64 11 MIPS dynAmic instruc@on mix for five SPECint2000 progrAms 12 MIPS dynAmic instruc@on mix for five SPECfp2000 progrAms 13 Graphical displAy of instrucons 14 Ra@o of execu@on @me And code size for compiled code versus hAndwrien code 15 SummAry: Instruc@on Set Design (MIPS) • Use general purpose registers with a load-store architecture: YES • Provide at least 16 general purpose registers plus separate floang-point registers: 31 GPR & 32 FPR • Support basic addressing modes: displacement (with an address offset size of 12 to 16 bits), immediate (size 8 to 16 bits), and register deferred; : YES: 16 bits for immediate, displacement (disp=0 => register deferred) • All addressing modes apply to all data transfer instrucGons : YES • Use fixed instrucGon encoding if interested in performance and use variable instrucGon encoding if interested in code size : Fixed • Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floang point numbers: YES • Support these simple instrucGons, since they will dominate the number of instrucGons executed: load, store, add, subtract, move register-register, and, shi\, compare equal, compare not equal, branch (with a PC-relave address at least 8-bits long), jump, call, and return: YES • Aim for a minimalist instrucGon set: YES 16 RISC Vs CISC • CISC (complex instrucGon set computer) – VAX, Intel X86, IBM 360/370, etc. • RISC (reduced instrucGon set computer) – MIPS, DEC Alpha, SUN Sparc, IBM 801 17 RISC vs. CISC • CharacterisGcs of ISAs CISC RISC Variable length Single word instruction instruction Variable format Fixed-field decoding Memory operands Load/store architecture Complex operations Simple operations 18 RISC vs. CISC Instruc@on Set Design • The historical background: – In first 25 years (1945-70) performance came from both technology and design. – Design constraints: • small and slow memories: compact programs are fast. • small no. of registers: memory operands. • aempts to bridge the semanGc gap: model high level language features in instrucGons. • no need for portability: same vendor applicaon, oS and hardware. • backward compability: every new ISA must carry the good and bad of all past ones. – Result: powerful and complex instrucGons that are rarely used. – IC technology and microprocessors in 1970s: lower costs, low power consumpGon, higher clock rates, cheaper and larger memories. 19 RISC vs. CISC Instruc@on Set Design • Emergence of RISC – Very large scale integraon (processor on a chip): silicon real-estate at a premium. Micro-store occupies about 70% of chip area: replace micro- store with registers ==> load/store ISA. – Increased difference between CPU and memory speeds. – Complex instrucGons were not used by new compilers. – So\ware changes: • reduced reliance on assembly programming, new ISA can be introduced. • standardized vendor independent oS (Unix) became very popular in some market segments (academia and research) – need for portability – Early RISC projects: IBM 801 (America), Berkeley SPUR, RISC I and RISC II and Stanford MIPS. 20 Complex vs. Simple Instructions • Complex instruction: An instruction does a lot of work, e.g. many operations – Insert in a doubly linked list – Compute FFT – String copy • Simple instruction: An instruction does small amount of work, it is a primitive using which complex operations can be built – Add – XOR – Multiply 21 Complex vs. Simple Instructions • Advantages of Complex instructions + Denser encoding à smaller code size à better memory utilization, saves off-chip bandwidth, better cache hit rate (better packing of instructions) + Simpler compiler: no need to optimize small instructions as much • Disadvantages of Complex Instructions - Larger chunks of work à compiler has less opportunity to optimize (limited in fine-grained optimizations it can do) - More complex hardware à translation from a high level to control signals and optimization needs to be done by hardware 22 ISA-level Tradeoffs: Semantic Gap • Where to place the ISA? Semantic gap – Closer to high-level language (HLL) à Small semantic gap, complex instructions – Closer to hardware control signals? à Large semantic gap, simple instructions • RISC vs. CISC machines – RISC: Reduced instruction set computer – CISC: Complex instruction set computer • FFT, QUICKSORT, POLY, FP instructions? • VAX INDEX instruction (array access with bounds checking) 23 ISA-level Tradeoffs: Semantic Gap • Some tradeoffs (for you to think about) • Simple compiler, complex hardware vs. complex compiler, simple hardware – Caveat: Translation (indirection) can change the tradeoff! • Burden of backward compatibility • Performance? – Optimization opportunity: Example of VAX INDEX instruction: who (compiler vs. hardware) puts more effort into optimization? – Instruction size, code size 24 X86: Small Semantic Gap: String Operations • An instruction operates on a string – Move one string of arbitrary length to another location – Compare two strings • Enabled by the ability to specify repeated execution of an instruction (in the ISA) – Using a “prefix” called REP prefix • Example: REP MOVS instruction – Only two bytes: REP prefix byte and MOVS opcode byte (F2 A4) – Implicit source and destination registers pointing to the two strings (ESI, EDI) – Implicit count register (ECX) specifies how long the string is 25 X86: Small Semantic Gap: String Operations REP MOVS (DEST SRC) How many instructions does this take in MIPS? 26 Small Semantic Gap Examples in VAX • FIND FIRST – Find the first set bit in a bit field – Helps OS resource allocation operations • SAVE CONTEXT, LOAD CONTEXT – Special context switching instructions • INSQUEUE, REMQUEUE – Operations on doubly linked list • INDEX – Array access with bounds checking • STRING Operations – Compare strings, find substrings, … • Cyclic Redundancy Check Instruction • EDITPC – Implements editing functions to display fixed format output • Digital Equipment Corp., “VAX11 780 Architecture Handbook,” 1977-78. 27 Small versus Large Semantic Gap • CISC vs. RISC – Complex instruction set computer à complex instructions • Initially motivated by “not good enough” code generation – Reduced instruction set computer à simple instructions • John Cocke, mid 1970s, IBM 801 – Goal: enable better compiler control and optimization • RISC motivated by – Memory stalls (no work done in a complex instruction when there is a memory stall?) • When is this correct? – Simplifying the hardware à lower cost, higher frequency – Enabling the compiler to optimize the code better • Find fine-grained parallelism to reduce stalls 28 How High or Low Can You Go? • Very large semantic gap – Each instruction specifies the complete set of control signals in the machine – Compiler generates control signals – Open microcode (John Cocke, circa 1970s) • Gave way to optimizing compilers • Very small semantic gap – ISA is (almost) the same as high-level language – Java machines, LISP machines, object-oriented machines, capability-based machines 29 A Note on ISA Evolution • ISAs have evolved to reflect/satisfy the concerns of the day • Examples: – Limited on-chip and off-chip memory size – Limited compiler optimization technology – Limited memory bandwidth – Need for specialization in important applications (e.g., MMX) • Use of translation (in HW and SW) enabled underlying implementations to be similar, regardless of the ISA – Concept of dynamic/static interface – Contrast it with hardware/software interface 30 Effect of Translation • One can translate from one ISA to another ISA to change the semantic gap tradeoffs • Examples – Intel’s and AMD’s x86 implementations translate x86 instructions into programmer-invisible microoperations (simple instructions) in hardware – Transmeta’s x86 implementations translated x86 instructions into “secret” VLIW instructions in software (code morphing software) • Think

ISA Supplement

A Developer's Guide to the POWER Architecture

I.T.S.O. Powerpc an Inside View

RISC Computers

Computer Architectures an Overview

Dynamicsilicon Gilder Publishing, LLC

Oral History of David (Dave) Ditzel

RISC Architecture

Brief History of Microprogramming

Reduced Instruction Set Computers

The International Journal of Science & Technoledge

RISC, CISC, and ISA Variations

Arm System-On-Chip Architecture.Pdf