Pentium Family

CISC architecture – executes IA-32 instruction set

Year Processor Clock L1 L2 cache cache KB KB 1993 Pentium 233-300 MHz 1995 Pentium Pro 100-200 MHz 8+8 256+1024 1998 Pentium II 233-450 MHz 16+16 256 + 512 (MMX added) 1999 Pentium II Xeon 400-450 MHz 16+16 512 + 1999 Celeron 500-900 MHz 16+16 2048 1999 Pentium III 450-1100 MHz 16+16 128 (70 new MMX 256+512 instructions) 2000 Pentium III Xeon 700-900 MHz 16+16 2001 Pentium 4 1.3 – 2.1 GHz 8+8 1024+2048 (NetBurst) 256+512

All Intel processors are backward compatible. Intel Pentium Registers

31 16 15 0 EAX AX primary acc. AH AL EBX BX arithmetic BH BL ECX CH CL CX loop count EDX DH DL DX mult & div ESI SI source index EDI DI dest. Index EBP BP base pointer ESP SP stack pointer CS code segment DS data segment SEGMENT SS stack segment REGISTERS ES extra segment FS extra segment GS extra segment

EIP IP instruction pointer EFLAG PSW program status wd Pentium Instruction Formats

0-5 1-2 0-1 0-1 0-4 0-4

Prefix opcode mode SIB offset immediate

6 1 1 2 3 3 operation Scale Index Base

byte/word desitnation 2 3 3 MOD REG R/M

Tells us about the operand

Complex, irregular, and suffers from the legacy of some bad irreversible design decisions made in the past. Pentium instruction formats Two operands: One operand is a Register (REG field), and the other specified by MOD and R/M.

R/M MOD=00 MOD=01 MOD=10 MOD=11 000 M[EAX] M[EAX+ M[EAX+ EAX or AL Offset 8] Offset 32] 001 M[ECX] M[ECX+ M[ECX+ ECX or CL Offset 8] Offset 32] 010 M[EDX] M[EDX+ M[EDX+ EDX or DL Offset 8] Offset 32] 011 M[EBX] M[EBX+ M[EBX+ EBX or BL Offset 8] Offset 32] 100 SIB SIB with SIB with ESP or AH Offset 8 Offset 32 101 Direct M[EBP+ M[EBP+ EBP or CH Offset 8] Offset 32]

® Using SIB, operand address = Base register + index register x scale factor (1/2/4) + offset.

® Lack of orthogonality is disturbing Observations about Pentium Instructions

The burden of backward compatibility…

Real mode Runs 8088 programs. Virtual 8086 mode Runs 8086 programs Protected mode Works like Pentium

Use of EBP and ESP…

ESP a stack frame

EBP

Pentium addressing …

offset Segment Register What is SIB? EBP Others (8 bytes) A[1] A[2] A[3] A[4]

Useful for addressing the elements of an array Let each element be 32 bits, i.e 4 bytes. Store the array index i in EAX.

Address of A[i] = EBP + 4*EAX + 8

What are MMX instructions?

80 bits

Supports multimedia operations Can store 8-bit colors for 8 pixels in one MMX register and execute SIMD instructions to accelarate the speed of graphics FP-cum-MMX What are Prefix bytes?

Attributes to instructions…

REP ·instructionÒ Repeat the instruction until ECX=0 REPZ ·instructionÒ Repeat the instruction until Z-flag is set. LOCK ·instructionÒ Lock the memory bus until the instruction execution is complete. Design of Pentium Pro

mop 1 mop 2 mop 3

Instruction 1

Instruction 2 mop 1

Instruction 3 mop 1 mop 2 mop 3 mop 4 Instruction 4

mop 1 mop 2

® Each instruction is converted into a sequence of mops that are RISC like instructions. This makes the pipeline much deeper and branch penalty increases.

® Uses a 512-entry 2-level branch predictor Pentium Microarchitecture outline

Instruction Fetch 16- bytes per cycle

16-bytes

Instruction decode 3 instructions per cycle

6 mops

Renaming 3 mops per cycle

20 reservation stations

Execution units (5)

40 re-order buffers

Graduation units 3 mops per cycle Virtual memory

P M I D L1 L2

Goals 1. Support large virtual address space 2. Make provisions for protection

Types Segmented, paged, segmented and paged.

Page size 4KB –64 KB Hit time 50-100 CPU clock cycles Miss penalty 106 - 107 clock cycles Access time 0.8 x 106 –0.8 x 107 clock cycles Transfer time 0.2 x 106 –0.2 x 107 clock cycles Miss rate 0.00001% - 0.001% Virtual address 4 GB -16 x 1018 byte Space size A quick look at different types of VM

Segments sizes not fixed

Page sizes are fixed

Segments can be paged Address Translation

Page Number Offset Virtual address page table Physical address Block Number Offset

Page Table format for Direct Map Page Presence Block no./ Disk addr Other attributes No. bit like protection 0 1 7 Read only 1 0 Sector 6, Track 18 2 1 45 Not cacheable 3 1 4 4 0 Sector 24,Track 32

Page Table format for Associative Map Pg, Blk, P Block no./ Disk addr Other attributes 0, 7, 1 7 Read only 1, ?, 0 Sector 6, Track 18 2, 45, 1 45 Not cacheable 3, 4, 1 4 4, ?, 0 Sector 24, Track 32 Address translation overhead

Hit time involves one extra table lookup. Can we reduce this overhead?

Average Memory Access Time = Hit time (no page fault) + Miss rate (page fault rate) x Miss penalty

Improving the Performance of Virtual Memory

1. Hit time can be reduced using a TLB.

2. Miss rate can be reduced by allocating enough memory that will hold the working set. Otherwise, thrashing is a possibility.

3. Miss penalty can be reduced by using disk cache. Examples of VM performance Hit time = 50 ns. Page fault rate = 0.001% Miss penalty = 2 ms

-5 6 Tav = 50 + 10 x 2 x 10 ns = 70 ns.

Working Set Consider a page reference string 0, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, … 100,000 references The size of the working set is 2 pages.

Page thrashing Fault Rate

Enough to Available M hold the working set

Always allocate enough memory to hold the working set of a program (Working Set Principle)

Disk cache Modern computers allocate up to 80% of the main memory as file cache. Similar principles apply to disk cache, which can drastically reduce the miss penalty.