17. Risc, Cisc, and Vliw

17. RISC, CISC, AND VLIW COS375 / ELE375 Mohammad Shahrad Acknowledgements: 3/10/2021 David August Margaret Martonosi David Patterson The Great Debate Ø CISC: Complex Instruction Set Computer Ø RISC: Reduced Instruction Set Computer Ø VLIW: Very Long Instruction Word Ø EPIC: Explicitly Parallel Instruction Computing 1 The CISC Design Philosophy MAKE MACHINE EASY TO PROGRAM! Support for frequent tasks Functions: Provide a “call” instruction, save registers Provide convenient addressing modes (See ISA Lectures) Make each instruction do lots of work Less explicit state necessary (e.g., block copy) Fewer instructions necessary May include variable width instructions Compression Easy to expand the ISA 2 The RISC Design Philosophy KEEP DESIGN SIMPLE! • Reduce number of instruction types • Fixed length instructions, easy to decode formats • Focus on making these few, simple instructions faster • Use only a few popular addressing modes 3 Example: MIPS vs. Intel 80x86 MIPS: “Three-address architecture” Arithmetic-logic specify all 3 operands add $s0,$s1,$s2 # s0=s1+s2 Benefit: fewer instructions Þ performance x86: “Two-address architecture” Only 2 operands, so the destination is also one of the sources add $s1,$s0 # s0=s0+s1 Often true in C statements: c += b; Benefit: smaller instructions Þ smaller code Attribution: David Patterson 4 Example: MIPS vs. Intel 80x86 MIPS: “load-store architecture” Only Load/Store access memory; rest operations register-register; e.g., lw $t0, 12($gp) add $s0,$s0,$t0 # s0=s0+Mem[12+gp] Benefit: simpler hardware Þ easier to pipeline, higher performance x86: “register-memory architecture” All operations can have an operand in memory; other operand is a register; e.g., add 12(%gp),%s0 # s0=s0+Mem[12+gp] Benefit: fewer instructions Þ smaller code Attribution: David Patterson 5 Example: MIPS vs. Intel 80x86 MIPS: “fixed-length instructions” All instructions same size, e.g., 4 bytes simple hardware Þ performance branches can be multiples of 4 bytes x86: “variable-length instructions” Instructions are multiple of bytes: 1 to 17; Þ small code size (30% smaller?) More Recent Performance Benefit: better instruction cache hit rates Instructions can include 8- or 32-bit immediates Attribution: David Patterson 6 RISC vs. CISC Arguments RISC Clearly superior, that is why it is the focus in the textbook Load/Store architectures dominate new architectures Easier to add advanced performance enhancements Smaller design teams necessary CISC Binaries are smaller (x86 is 20% smaller than MIPS) Machines are faster in practice!!! Almost all processors used in desktop/server computers are CISC Clearly the winner, because you probably use one now What do you think? Who is winning or who has won? Why? 7 List of ISAs Sun Microsystem’s SPARC IBM/Motorola’s PowerPC Hewlett-Packard’s PA-RISC Intel’s x86 DEC Alpha Motorola’s 68xxx Intel’s IA-64 SGI’s MIPS ARM SuperH TI’s C6x … 8 A Success: x86 – And Ever-Evolving Architecture Ø 1978: Intel announces 8086 (16-bit architecture) Ø 1980: 8087 floating point coprocessor is added Ø 1982: 80286 increases address space to 24 bits, new instructions Ø 1985: 80386 extends to 32 bits, new addressing modes Ø 1989-1995: 80486, Pentium, Pentium Pro add instructions Ø 1997: MMX (MultiMedia eXtension) is added Ø 1999: AMD extends to 64-bit (AMD64, x86-64) Ø 2000+: SSE, SSE2, SSE3, SSSE3, SSE4, SSE4.2, SSE5, AES-NI, CLMUL, RDRAND, SHA, MPX, SGX, XOP, F16C, ADX, BMI, FMA, AVX, AVX2, AVX512, VT-x, AMD-V, TSX, ASF “This history illustrates the impact of the “golden handcuffs” of compatibility” 9 x86: A Dominant Architecture Complexity: Instructions from 1 to 17 bytes long One operand must act as both a source and destination One operand may come from memory Supports complex addressing modes Example: “base or scaled index with 8- or 32-bit displacement” Saving graces: Most frequently used instructions are small, fast, easy to implement Compilers avoid the portions of the architecture that are slow Code size is smaller Maybe it is RISC after all????? “What the 80x86 lacks in style is made up in quantity, making it beautiful from the right perspective” -- unknown 10 The Intel x86 Architecture Today 11 The Intel x86 Architecture Today 12 13 Parallelism Parallelism at different granularities Instruction Level Example: Add instruction and multiply instruction execute at the same time Thread Level Example: screen redraw function executes concurrently with recalculate in spreadsheet Process Level Example: Simulation jobs runs on many machines 14 Independent units of work can execute concurrently if sufficient resources exist. 1 1 2 Time 2 3 Dependence 3 Limited 1 1 2 1 2 3 Resource Limited 2 3 3 Units 3 2 Units 1 Unit 15 Instruction-Level Parallelism (ILP) IPC (Instructions per Cycle) = 1/CPI CPI �� = �� × × �� 16 Pipelined Concurrent Instruction-Level (3 Stages) (3 Pipes) Parallelism (ILP) Time 17 17 Instruction-Level Parallelism (ILP) Pipelined & Concurrent (3 3-stage Pipes) Time 18 ILP in CISC, RISC, VLIW, and EPIC CISC: Complex Instruction Set Computer RISC: Reduced Instruction Set Computer VLIW: Very Long Instruction Word EPIC: Explicitly Parallel Instruction Computing 19 VLIW Ø ISA Support, simple hardware Ø Compiler expresses parallelism in a compiler/dynamic manner Attribution: Figure from Prof. Onur Multu’s Computer Architecture lectures. 20 To-do Items Ø Good luck with the midterm! Ø Reading assignments from P&H: Sections 5.1 and 5.2 See you on Friday! 21.

17. Risc, Cisc, and Vliw

Effective Virtual CPU Configuration with QEMU and Libvirt

New Instruction Set Extensions

Undocumented CPU Behavior: Analyzing Undocumented Opcodes on Intel X86-64 Catherine Easdon Why Investigate Undocumented Behavior? the “Golden Screwdriver” Approach

Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3C: System Programming Guide, Part 3

Hyper-Threading Performance with Intel Cpus for Linux SAP Deployment on Proliant Servers

SIMD Extensions

Historical Perspective and Further Reading 162.E1

The Von Neumann Computer Model 5/30/17, 10:03 PM

RISC-V Vector Extension Webinar I

Lecture Notes

SH-1/SH-2/SH-DSP Programming Manual

NASM – the Netwide Assembler