Lecture 04: ISA Principles Supplements
CSE 564 Computer Architecture Summer 2017
Department of Computer Science and Engineering Yonghong Yan [email protected] www.secs.oakland.edu/~yan
1 Contents
1. Introduc on 2. Classifying Instruc on Set Architectures 3. Memory Addressing 4. Type and Size of Operands 5. Opera ons in the Instruc on Set 6. Instruc ons for Control Flow 7. Encoding an Instruc on Set 8. Crosscu ng Issues: The Role of Compilers 9. RISC-V ISA
• Supplements
2 Lecture 03 Supplements
• MIPS ISA • RISC vs CISC • Compiler compila on stages • ISA Historical – Appendix L • Comparison of ISA – Appendix K
3 Pu ng it all together: the MIPS architecture(A simple 64-bit load-store architecture) • Use general-purpose registers with a load-store architecture • Support these addressing modes:displacement(with address offset of 12-16bits), immediate (size 8-16bits), and register indirect. • Support these data sizes and types: 8-, 16-, and 64- integers and 64-bit IEEE 754 floa ng-point numbers.
4 Pu ng it all together:the MIPS architecture(A simple 64-bit load-store architecture) • Support these simple instruc ons:load, store, add, subtract, move register-register, and shi . • Compare equal, compare not equal, compare less, branch, jump, call, and return. • Use fixed instruc on encoding if interested in performance, and use variable instruc on encoding if interested in code size.
5 MIPS emphasized
• A simple load-store instruc on set • Design for pipelining efficiency • Efficiency as a compiler target.
6 Instruc on layout for MIPS
7 The load and store instruc ons in MIPS
8 Examples of arithme c/logical instruc ons
9 Typical control flow instruc ons in MIPS
10 Subset of the instruc ons in MIPS64
11 MIPS dynamic instruc on mix for five SPECint2000 programs
12 MIPS dynamic instruc on mix for five SPECfp2000 programs
13 Graphical display of instruc ons
14 Ra o of execu on me and code size for compiled code versus handwri en code
15 Summary: Instruc on Set Design (MIPS)
• Use general purpose registers with a load-store architecture: YES • Provide at least 16 general purpose registers plus separate floa ng-point registers: 31 GPR & 32 FPR • Support basic addressing modes: displacement (with an address offset size of 12 to 16 bits), immediate (size 8 to 16 bits), and register deferred; : YES: 16 bits for immediate, displacement (disp=0 => register deferred) • All addressing modes apply to all data transfer instruc ons : YES • Use fixed instruc on encoding if interested in performance and use variable instruc on encoding if interested in code size : Fixed • Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floa ng point numbers: YES • Support these simple instruc ons, since they will dominate the number of instruc ons executed: load, store, add, subtract, move register-register, and, shi , compare equal, compare not equal, branch (with a PC-rela ve address at least 8-bits long), jump, call, and return: YES • Aim for a minimalist instruc on set: YES
16 RISC Vs CISC
• CISC (complex instruc on set computer) – VAX, Intel X86, IBM 360/370, etc. • RISC (reduced instruc on set computer) – MIPS, DEC Alpha, SUN Sparc, IBM 801
17 RISC vs. CISC
• Characteris cs of ISAs
CISC RISC Variable length Single word instruction instruction Variable format Fixed-field decoding Memory operands Load/store architecture Complex operations Simple operations
18 RISC vs. CISC Instruc on Set Design
• The historical background: – In first 25 years (1945-70) performance came from both technology and design. – Design constraints: • small and slow memories: compact programs are fast. • small no. of registers: memory operands. • a empts to bridge the seman c gap: model high level language features in instruc ons. • no need for portability: same vendor applica on, OS and hardware. • backward compa bility: every new ISA must carry the good and bad of all past ones. – Result: powerful and complex instruc ons that are rarely used. – IC technology and microprocessors in 1970s: lower costs, low power consump on, higher clock rates, cheaper and larger memories.
19 RISC vs. CISC Instruc on Set Design
• Emergence of RISC – Very large scale integra on (processor on a chip): silicon real-estate at a premium. Micro-store occupies about 70% of chip area: replace micro- store with registers ==> load/store ISA. – Increased difference between CPU and memory speeds. – Complex instruc ons were not used by new compilers. – So ware changes: • reduced reliance on assembly programming, new ISA can be introduced. • standardized vendor independent OS (Unix) became very popular in some market segments (academia and research) – need for portability – Early RISC projects: IBM 801 (America), Berkeley SPUR, RISC I and RISC II and Stanford MIPS.
20 Complex vs. Simple Instructions
• Complex instruction: An instruction does a lot of work, e.g. many operations – Insert in a doubly linked list – Compute FFT – String copy
• Simple instruction: An instruction does small amount of work, it is a primitive using which complex operations can be built – Add – XOR – Multiply
21 Complex vs. Simple Instructions
• Advantages of Complex instructions + Denser encoding à smaller code size à better memory utilization, saves off-chip bandwidth, better cache hit rate (better packing of instructions) + Simpler compiler: no need to optimize small instructions as much
• Disadvantages of Complex Instructions - Larger chunks of work à compiler has less opportunity to optimize (limited in fine-grained optimizations it can do) - More complex hardware à translation from a high level to control signals and optimization needs to be done by hardware
22 ISA-level Tradeoffs: Semantic Gap
• Where to place the ISA? Semantic gap – Closer to high-level language (HLL) à Small semantic gap, complex instructions – Closer to hardware control signals? à Large semantic gap, simple instructions
• RISC vs. CISC machines – RISC: Reduced instruction set computer – CISC: Complex instruction set computer • FFT, QUICKSORT, POLY, FP instructions? • VAX INDEX instruction (array access with bounds checking)
23 ISA-level Tradeoffs: Semantic Gap
• Some tradeoffs (for you to think about)
• Simple compiler, complex hardware vs. complex compiler, simple hardware – Caveat: Translation (indirection) can change the tradeoff!
• Burden of backward compatibility
• Performance? – Optimization opportunity: Example of VAX INDEX instruction: who (compiler vs. hardware) puts more effort into optimization? – Instruction size, code size 24 X86: Small Semantic Gap: String Operations
• An instruction operates on a string – Move one string of arbitrary length to another location – Compare two strings
• Enabled by the ability to specify repeated execution of an instruction (in the ISA) – Using a “prefix” called REP prefix
• Example: REP MOVS instruction – Only two bytes: REP prefix byte and MOVS opcode byte (F2 A4) – Implicit source and destination registers pointing to the two strings (ESI, EDI) – Implicit count register (ECX) specifies how long the string is
25 X86: Small Semantic Gap: String Operations
REP MOVS (DEST SRC)
How many instructions does this take in MIPS? 26 Small Semantic Gap Examples in VAX
• FIND FIRST – Find the first set bit in a bit field – Helps OS resource allocation operations • SAVE CONTEXT, LOAD CONTEXT – Special context switching instructions • INSQUEUE, REMQUEUE – Operations on doubly linked list • INDEX – Array access with bounds checking • STRING Operations – Compare strings, find substrings, … • Cyclic Redundancy Check Instruction • EDITPC – Implements editing functions to display fixed format output
• Digital Equipment Corp., “VAX11 780 Architecture Handbook,” 1977-78.
27 Small versus Large Semantic Gap
• CISC vs. RISC – Complex instruction set computer à complex instructions • Initially motivated by “not good enough” code generation – Reduced instruction set computer à simple instructions • John Cocke, mid 1970s, IBM 801 – Goal: enable better compiler control and optimization
• RISC motivated by – Memory stalls (no work done in a complex instruction when there is a memory stall?) • When is this correct? – Simplifying the hardware à lower cost, higher frequency – Enabling the compiler to optimize the code better • Find fine-grained parallelism to reduce stalls
28 How High or Low Can You Go?
• Very large semantic gap – Each instruction specifies the complete set of control signals in the machine – Compiler generates control signals – Open microcode (John Cocke, circa 1970s) • Gave way to optimizing compilers
• Very small semantic gap – ISA is (almost) the same as high-level language – Java machines, LISP machines, object-oriented machines, capability-based machines
29 A Note on ISA Evolution
• ISAs have evolved to reflect/satisfy the concerns of the day
• Examples: – Limited on-chip and off-chip memory size – Limited compiler optimization technology – Limited memory bandwidth – Need for specialization in important applications (e.g., MMX)
• Use of translation (in HW and SW) enabled underlying implementations to be similar, regardless of the ISA – Concept of dynamic/static interface – Contrast it with hardware/software interface
30 Effect of Translation
• One can translate from one ISA to another ISA to change the semantic gap tradeoffs
• Examples – Intel’s and AMD’s x86 implementations translate x86 instructions into programmer-invisible microoperations (simple instructions) in hardware – Transmeta’s x86 implementations translated x86 instructions into “secret” VLIW instructions in software (code morphing software)
• Think about the tradeoffs
31 Compila on Process in C
• Compila on process: gcc hello.c –o hello – Construc ng an executable image for an applica on – FOUR stages – Command: gcc