CISC Developments Overview CISC Designs RISC Designs Digital VAX VAX: Addressing Modes

Overview • Classic CISC design: Digital VAX CISC Developments • VAX’s RISC successor: PRISM/Alpha • Intel’s ubiquitous 80x86 architecture Over Twenty Years – 8086 through the Pentium Pro (P6) RJS 2/3/97 CISC Designs RISC Designs • Philosophy • Philosophy – Reduce code size – Simple, quick instructions. – Machine language should match the semantics • Results of high level language constructs – Instruction dependencies can be easily • Results determined. – Complicated instructions which could take – Instructions can be sub-divided and pipelined. many cycles to execute – High instruction throughput. Digital VAX VAX: Addressing Modes • Mid-70’s design Register r4 – 32-bit architecture Base (displacement) [r4 + offset] – Basically a general-purpose register machine Immediate 0xFFF0101 • CISC philosophy PC-relative [PC] + offset – Numerous addressing modes – Rich instruction suite Deferred (indirect) [ [r3 + offset] ] Index (scaled) [r3 + r4 * 8] VAX: Addt’l Addressing Modes VAX: Instruction Encoding • Byte, word, double word displacement • Operations: 1 byte • Auto-increment/auto-decrement • Each operand must be encoded to specify – Acesses memory and then increments/ the addressing mode decrements address • Example: – Integer add: 3-19 bytes VAX: Instructions CISC: Side effects • Push • Instruction basically has multiple results – Push an item onto a stack – Auto-increment • insque • combines a load/store with an addition – Insert an item onto a queue – Condition codes • Negative, Zero, oVerflow, Carry • aobleq op1 op2 • Arithmetic instructions set these codes – Add one to op1 and branch if equal to op2 • Codes are used for conditional branches • Special call/ret – Handles argument, stack setup automatically Digital: Beyond VAX INTEL • PRISM/Alpha • 1978: 8086 – RISC processor – 16-bit, extended accumulator machine • fixed size instructions • 1982: 80286 backward compatible • load/store architecture – 24-bit address space • out-of-order execution • speculation • 1985: 80386 backward compatible • Backward Compatibility? – 32-bit address space – Recompilation: VAX --> Alphas • 1992: Pentium backward compatible • 1996: P6 backward compatible 80x86: Overview 80x86: Segmented Addr Space • Has moved closer to a general purpose • Real Mode (8086) register machine – Segment register is shift to the left 4 bits and • Segmented address space the offet is added. • Instructions work on bytes, words, double • Protected Mode (80286) words. – Segment register selects an index into the segment address table. (24 bits) Offset is added. • Only four instructions added since 1989. • Protected Mode (80386, 80486, Pentium) – Three multiprocessing instructions – Segment descriptor is 32 bits. – One conditional move 80x86: Addressing Modes 80x86: Instruction Complications Absolute [0xa0000000] • Instruction prefixes [r4] Register – Override default data size, segment registers Based displacement [r4 + offset] – Lock the bus (i.e synchronization) Indexed [r3 + r4] – Repeat instruction until register CX counts to Based indexed [r3 + r4 + offset] zero. Base plus scaled index [r3 + r4 * scale] • Multiple segments complicate control-flow Base scaled [r3 + r4 * scale + offset] statements displacement 80x86: Instruction Usage Intel: Improving CISC • Instructions: 1 to 17 bytes • Pentium: Leveraged RISC technology – Integer programs: av=2.8 – 5-stage pipeline – Floating point programs: av=4.1 – Dual-issue • Most frequent addressing modes: Based – No branch prediction displacement and Based scaled indexing. – No out-of-order execution • Software basically needs recompilation Pentium Pro (P6) P6: Pipeline Stages 1 -4 • A RISC processor running a CISC • Stage 1 instruction set. – Determines next PC – Three way issue • Stage 2-4 – Speculative execution – Fetch and mark instruction – out-of-order execution • Superscalar pipeline – Allows higher clock rate while handling CISC P6: Pipeline Stages 5-6 P6: Pipeline Stages 7-8 • Instruction is decoded into a series of • Assign logical registers to physical registers micro-ops (uops) – 80x86 machine instructions are limited to the – Three decoders basic x86 registers. • Two decoders handle simple instructions – x86 architecture has 16 physical registers • Third decoder handles more complex cases – P6 has 40 physical registers • Falls through decoder to a special microcode area • Prepared uops are passed to the reservation – Majority translate to < 4 uops. station and reorder buffer – Worst case: 204 uops P6: Pipeline Stages 9-10 P6: Pipeline Stage 11 • Reservation station dispatches uops to one • Execution of five parallel execution units. – May overlap into stage 12 – Two integer, one load, one store, and one FPU • Reorder buffer holds the uops – Status flags signal dependencies P6: Pipeline Stages 12-14 P6: Pipeline Performance • Instructions are retired • 80x86 references memory more often than a – Must wait until all uops that comprise the typical RISC instruction set. instruction are complete – Floating point programs: 2-4x higher – Must handle precise exceptions – Integer programs: 1.25 higher • Superpipelining and branch prediction problems. – P6 uses 4-bit branch history P6: Completely RISC? References • Can a compiler directly generate uops? • Hennesy and Patterson, “Computer Organization – Bypass decoding phase and Design: The Hardware/Software Interface” – Leverage static scheduling techniques • Hennesy and Patterson, “Computer Architecture: A Quantitative Approach” – Force competing chip makers to use same basic design. • Bhandarkar, “Alpha Implementations and Architecture” • Could reduce differentiating points between competitors. • Byte, April 1995, “Intel’s P6” • http://www.intel/procs/ppro/info/isscc/index.htm.

Load more