<<

Overview

• Classic CISC design: Digital VAX CISC Developments • VAX’s RISC successor: PRISM/Alpha • Intel’s ubiquitous 80x86 architecture Over Twenty Years – 8086 through the Pentium Pro (P6)

RJS 2/3/97

CISC Designs RISC Designs

• Philosophy • Philosophy – Reduce code size – Simple, quick instructions. – Machine language should match the semantics • Results of high level language constructs – Instruction dependencies can be easily • Results determined. – Complicated instructions which could take – Instructions can be sub-divided and pipelined. many cycles to execute – High instruction throughput.

Digital VAX VAX: Addressing Modes

• Mid-70’s design Register r4 – 32- architecture Base (displacement) [r4 + offset] – Basically a general-purpose Immediate 0xFFF0101 • CISC philosophy PC-relative [PC] + offset – Numerous addressing modes – Rich instruction suite Deferred (indirect) [ [r3 + offset] ] Index (scaled) [r3 + r4 * 8] VAX: Addt’l Addressing Modes VAX: Instruction Encoding

• Byte, word, double word displacement • Operations: 1 byte • Auto-increment/auto-decrement • Each operand must be encoded to specify – Acesses memory and then increments/ the decrements address • Example: – Integer add: 3-19 bytes

VAX: Instructions CISC: Side effects

• Push • Instruction basically has multiple results – Push an item onto a stack – Auto-increment • insque • combines a load/store with an addition – Insert an item onto a queue – Condition codes • Negative, Zero, oVerflow, Carry • aobleq op1 op2 • Arithmetic instructions set these codes – Add one to op1 and branch if equal to op2 • Codes are used for conditional branches • Special call/ret – Handles argument, stack setup automatically

Digital: Beyond VAX INTEL

• PRISM/Alpha • 1978: 8086 – RISC – 16-bit, extended machine • fixed size instructions • 1982: 80286 backward compatible • load/store architecture – 24-bit address space • out-of-order execution • speculation • 1985: 80386 backward compatible • Backward Compatibility? – 32-bit address space – Recompilation: VAX --> Alphas • 1992: Pentium backward compatible • 1996: P6 backward compatible 80x86: Overview 80x86: Segmented Addr Space

• Has moved closer to a general purpose • Real Mode (8086) register machine – Segment register is shift to the left 4 and • Segmented address space the offet is added. • Instructions work on bytes, words, double • Protected Mode (80286) words. – Segment register selects an index into the segment address table. (24 bits) Offset is added. • Only four instructions added since 1989. • Protected Mode (80386, 80486, Pentium) – Three instructions – Segment descriptor is 32 bits. – One conditional move

80x86: Addressing Modes 80x86: Instruction Complications

Absolute [0xa0000000] • Instruction prefixes [r4] Register – Override default data size, segment registers Based displacement [r4 + offset] – Lock the (i.e synchronization) Indexed [r3 + r4] – Repeat instruction until register CX counts to Based indexed [r3 + r4 + offset] zero. Base plus scaled index [r3 + r4 * scale] • Multiple segments complicate control-flow Base scaled [r3 + r4 * scale + offset] statements displacement

80x86: Instruction Usage Intel: Improving CISC

• Instructions: 1 to 17 bytes • Pentium: Leveraged RISC technology – Integer programs: av=2.8 – 5-stage – Floating point programs: av=4.1 – Dual-issue • Most frequent addressing modes: Based – No branch prediction displacement and Based scaled indexing. – No out-of-order execution • Software basically needs recompilation Pentium Pro (P6) P6: Pipeline Stages 1 -4

• A RISC processor running a CISC • Stage 1 instruction set. – Determines next PC – Three way issue • Stage 2-4 – – Fetch and mark instruction – out-of-order execution • Superscalar pipeline – Allows higher while handling CISC

P6: Pipeline Stages 5-6 P6: Pipeline Stages 7-8

• Instruction is decoded into a series of • Assign logical registers to physical registers micro-ops (uops) – 80x86 machine instructions are limited to the – Three decoders basic registers. • Two decoders simple instructions – x86 architecture has 16 physical registers • Third decoder handles more complex cases – P6 has 40 physical registers • Falls through decoder to a special area • Prepared uops are passed to the reservation – Majority translate to < 4 uops. station and reorder buffer – Worst case: 204 uops

P6: Pipeline Stages 9-10 P6: Pipeline Stage 11

dispatches uops to one • Execution of five parallel execution units. – May overlap into stage 12 – Two integer, one load, one store, and one FPU • Reorder buffer holds the uops – Status flags signal dependencies P6: Pipeline Stages 12-14 P6: Pipeline Performance

• Instructions are retired • 80x86 references memory more often than a – Must wait until all uops that comprise the typical RISC instruction set. instruction are complete – Floating point programs: 2-4x higher – Must handle precise exceptions – Integer programs: 1.25 higher • Superpipelining and branch prediction problems. – P6 uses 4-bit branch history

P6: Completely RISC? References

• Can a directly generate uops? • Hennesy and Patterson, “ Organization – Bypass decoding phase and Design: The Hardware/Software Interface” – Leverage static scheduling techniques • Hennesy and Patterson, “: A Quantitative Approach” – Force competing chip makers to use same basic design. • Bhandarkar, “Alpha Implementations and Architecture” • Could reduce differentiating points between competitors. • Byte, April 1995, “Intel’s P6” • http://www.intel/procs/ppro/info/isscc/index.htm