Lecture 04: ISA Principles Supplements

CSE 564 Computer Architecture Summer 2017

Department of Computer Science and Engineering Yonghong Yan [email protected] www.secs.oakland.edu/~yan

1 Contents

1. Introducon 2. Classifying Instrucon Set Architectures 3. Memory Addressing 4. Type and Size of Operands 5. Operaons in the Instrucon Set 6. Instrucons for Control Flow 7. Encoding an Instrucon Set 8. Crosscung Issues: The Role of 9. RISC-V ISA

• Supplements

2 Lecture 03 Supplements

• MIPS ISA • RISC vs CISC • compilaon stages • ISA Historical – Appendix L • Comparison of ISA – Appendix K

3 Pung it all together: the MIPS architecture(A simple 64-bit load-store architecture) • Use general-purpose registers with a load-store architecture • Support these addressing modes:displacement(with address offset of 12-16bits), immediate (size 8-16bits), and register indirect. • Support these data sizes and types: 8-, 16-, and 64- integers and 64-bit IEEE 754 floang-point numbers.

4 Pung it all together:the MIPS architecture(A simple 64-bit load-store architecture) • Support these simple instrucons:load, store, add, subtract, move register-register, and shi. • Compare equal, compare not equal, compare less, branch, jump, call, and return. • Use fixed instrucon encoding if interested in performance, and use variable instrucon encoding if interested in code size.

5 MIPS emphasized

• A simple load-store instrucon set • Design for pipelining efficiency • Efficiency as a compiler target.

6 Instrucon layout for MIPS

7 The load and store instrucons in MIPS

8 Examples of arithmec/logical instrucons

9 Typical control flow instrucons in MIPS

10 Subset of the instrucons in MIPS64

11 MIPS dynamic instrucon mix for five SPECint2000 programs

12 MIPS dynamic instrucon mix for five SPECfp2000 programs

13 Graphical display of instrucons

14 Rao of execuon me and code size for compiled code versus handwrien code

15 Summary: Instrucon Set Design (MIPS)

• Use general purpose registers with a load-store architecture: YES • Provide at least 16 general purpose registers plus separate floang-point registers: 31 GPR & 32 FPR • Support basic addressing modes: displacement (with an address offset size of 12 to 16 bits), immediate (size 8 to 16 bits), and register deferred; : YES: 16 bits for immediate, displacement (disp=0 => register deferred) • All addressing modes apply to all data transfer instrucons : YES • Use fixed instrucon encoding if interested in performance and use variable instrucon encoding if interested in code size : Fixed • Support these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floang point numbers: YES • Support these simple instrucons, since they will dominate the number of instrucons executed: load, store, add, subtract, move register-register, and, shi, compare equal, compare not equal, branch (with a PC-relave address at least 8-bits long), jump, call, and return: YES • Aim for a minimalist instrucon set: YES

16 RISC Vs CISC

• CISC (complex instrucon set computer) – VAX, Intel X86, IBM 360/370, etc. • RISC (reduced instrucon set computer) – MIPS, DEC Alpha, SUN Sparc, IBM 801

17 RISC vs. CISC

• Characteriscs of ISAs

CISC RISC Variable length Single word instruction instruction Variable format Fixed-field decoding Memory operands Load/store architecture Complex operations Simple operations

18 RISC vs. CISC Instrucon Set Design

• The historical background: – In first 25 years (1945-70) performance came from both technology and design. – Design constraints: • small and slow memories: compact programs are fast. • small no. of registers: memory operands. • aempts to bridge the semanc gap: model high level language features in instrucons. • no need for portability: same vendor applicaon, OS and hardware. • backward compability: every new ISA must carry the good and bad of all past ones. – Result: powerful and complex instrucons that are rarely used. – IC technology and in 1970s: lower costs, low power consumpon, higher clock rates, cheaper and larger memories.

19 RISC vs. CISC Instrucon Set Design

• Emergence of RISC – Very large scale integraon (processor on a chip): silicon real-estate at a premium. Micro-store occupies about 70% of chip area: replace micro- store with registers ==> load/store ISA. – Increased difference between CPU and memory speeds. – Complex instrucons were not used by new compilers. – Soware changes: • reduced reliance on assembly programming, new ISA can be introduced. • standardized vendor independent OS (Unix) became very popular in some market segments (academia and research) – need for portability – Early RISC projects: IBM 801 (America), Berkeley SPUR, RISC I and RISC II and Stanford MIPS.

20 Complex vs. Simple Instructions

• Complex instruction: An instruction does a lot of work, e.g. many operations – Insert in a doubly linked list – Compute FFT – String copy

• Simple instruction: An instruction does small amount of work, it is a primitive using which complex operations can be built – Add – XOR – Multiply

21 Complex vs. Simple Instructions

• Advantages of Complex instructions + Denser encoding à smaller code size à better memory utilization, saves off-chip bandwidth, better cache hit rate (better packing of instructions) + Simpler compiler: no need to optimize small instructions as much

• Disadvantages of Complex Instructions - Larger chunks of work à compiler has less opportunity to optimize (limited in fine-grained optimizations it can do) - More complex hardware à translation from a high level to control signals and optimization needs to be done by hardware

22 ISA-level Tradeoffs: Semantic Gap

• Where to place the ISA? Semantic gap – Closer to high-level language (HLL) à Small semantic gap, complex instructions – Closer to hardware control signals? à Large semantic gap, simple instructions

• RISC vs. CISC machines – RISC: Reduced instruction set computer – CISC: Complex instruction set computer • FFT, QUICKSORT, POLY, FP instructions? • VAX INDEX instruction (array access with bounds checking)

23 ISA-level Tradeoffs: Semantic Gap

• Some tradeoffs (for you to think about)

• Simple compiler, complex hardware vs. complex compiler, simple hardware – Caveat: Translation (indirection) can change the tradeoff!

• Burden of backward compatibility

• Performance? – Optimization opportunity: Example of VAX INDEX instruction: who (compiler vs. hardware) puts more effort into optimization? – Instruction size, code size 24 X86: Small Semantic Gap: String Operations

• An instruction operates on a string – Move one string of arbitrary length to another location – Compare two strings

• Enabled by the ability to specify repeated execution of an instruction (in the ISA) – Using a “prefix” called REP prefix

• Example: REP MOVS instruction – Only two bytes: REP prefix byte and MOVS opcode byte (F2 A4) – Implicit source and destination registers pointing to the two strings (ESI, EDI) – Implicit count register (ECX) specifies how long the string is

25 X86: Small Semantic Gap: String Operations

REP MOVS (DEST SRC)

How many instructions does this take in MIPS? 26 Small Semantic Gap Examples in VAX

• FIND FIRST – Find the first set bit in a bit field – Helps OS resource allocation operations • SAVE CONTEXT, LOAD CONTEXT – Special context switching instructions • INSQUEUE, REMQUEUE – Operations on doubly linked list • INDEX – Array access with bounds checking • STRING Operations – Compare strings, find substrings, … • Cyclic Redundancy Check Instruction • EDITPC – Implements editing functions to display fixed format output

• Digital Equipment Corp., “VAX11 780 Architecture Handbook,” 1977-78.

27 Small versus Large Semantic Gap

• CISC vs. RISC – Complex instruction set computer à complex instructions • Initially motivated by “not good enough” code generation – Reduced instruction set computer à simple instructions • John Cocke, mid 1970s, IBM 801 – Goal: enable better compiler control and optimization

• RISC motivated by – Memory stalls (no work done in a complex instruction when there is a memory stall?) • When is this correct? – Simplifying the hardware à lower cost, higher frequency – Enabling the compiler to optimize the code better • Find fine-grained parallelism to reduce stalls

28 How High or Low Can You Go?

• Very large semantic gap – Each instruction specifies the complete set of control signals in the machine – Compiler generates control signals – Open (John Cocke, circa 1970s) • Gave way to optimizing compilers

• Very small semantic gap – ISA is (almost) the same as high-level language – Java machines, LISP machines, object-oriented machines, capability-based machines

29 A Note on ISA Evolution

• ISAs have evolved to reflect/satisfy the concerns of the day

• Examples: – Limited on-chip and off-chip memory size – Limited compiler optimization technology – Limited memory bandwidth – Need for specialization in important applications (e.g., MMX)

• Use of translation (in HW and SW) enabled underlying implementations to be similar, regardless of the ISA – Concept of dynamic/static interface – Contrast it with hardware/software interface

30 Effect of Translation

• One can translate from one ISA to another ISA to change the semantic gap tradeoffs

• Examples – Intel’s and AMD’s x86 implementations translate x86 instructions into programmer-invisible microoperations (simple instructions) in hardware – Transmeta’s x86 implementations translated x86 instructions into “secret” VLIW instructions in software (code morphing software)

• Think about the tradeoffs

31 Compilaon Process in C

• Compilaon process: gcc hello.c –o hello – Construcng an executable image for an applicaon – FOUR stages – Command: gcc

• Compiler Tool – gcc (GNU Compiler) • man gcc (on Linux m/c)

– icc (Intel C compiler)

4 Stages of Compilaon Process

Preprocessing gcc –E hello.c –o hello.i hello.c à hello.i Compilaon (aer preprocessing) gcc –S hello.i –o hello.s

Assembling (aer compilaon) gcc –c hello.s –o hello.o

Linking object files gcc hello.o –o hello

Output à Executable (a.out) Run à ./hello (Loader) 4 Stages of Compilaon Process

1. Preprocessing (Those with # …) – Expansion of Header files (#include … ) – Substute macros and inline funcons (#define …) 2. Compilaon – Generates assembly language – Verificaon of funcons usage using prototypes – Header files: Prototypes declaraon 3. Assembling – Generates re-locatable object file (contains m/c instrucons) – nm app.o 0000000000000000 T main U puts – nm or objdump tool used to view object files 4 Stages of Compilaon Process (contd..)

4. Linking – Generates executable file (nm tool used to view exe file) – Binds appropriate libraries • Stac Linking • Dynamic Linking (default)

• Loading and Execuon (of an executable file) – Evaluate size of code and data segment – Allocates address space in the user mode and transfers them into memory – Load dependent libraries needed by program and links them – Invokes Process Manager à Program registraon Compiling a C Program

• gcc program_name.c

• Opons: Four stages into one ------Wall: Shows all warnings -o output_file_name: By default a.out executable file is created when we compile our program with gcc. Instead, we can specify the output file name using "-o" opon. -g: Include debugging informaon in the binary.

• man gcc Linking Mulple files to make executable file

• Two programs, prog1.c and prog2.c for one single task – To make single executable file using following instrucons

First, compile these two files with opon "-c" gcc -c prog1.c gcc -c prog2.c

-c: Tells gcc to compile and assemble the code, but not link.

We get two files as output, prog1.o and prog2.o Then, we can link these object files into single executable file using below instrucon.

gcc -o prog prog1.o prog2.o

Now, the output is prog executable file. We can run our program using ./prog Linking with other libraries

• Normally, compiler will read/link libraries from /usr/lib directory to our program during compilaon process. – Library are precompiled object files

• To link our programs with libraries like pthreads and realme libraries (rt library). – gcc program_name.c -lpthread -lrt

-lpthread: Link with pthread library à libpthread.so file -lrt: Link with rt library à librt.so file Opon here is "-l"

Another opon "-L

" used to tell gcc compiler search for library file in given directory. Compilaon, Linking, Execuon of C/C++ Programs

source object file 1 file 1

source object linking file 2 file 2 load compilation (relocation + file library linking) object file 1 source object file N file N library object file M

usually performed by a compiler, usually in one uninterrupted sequence

hp://www.tenouk.com/ModuleW.html