CSC 252: Computer Organization Spring 2020: Lecture 16

Instructor: Yuhao Zhu

Department of Computer Science University of Rochester Carnegie Mellon

Announcements

• By default 252/452 will be pass/fail • 4/10 is the deadline to opt-out and request a letter grade • Mid-term solution has been posted • Mid-term grades will be posted this weekend • Lectures will be recorded and posted online • Office hours will be held through Zoom; links on the website

!2 Carnegie Mellon

The Problem

• Bigger is slower • Flip-flops/Small SRAM, sub-nanosec • SRAM, KByte~MByte, ~nanosec • DRAM, , ~50 nanosec • Hard Disk, Terabyte, ~10 millisec • Faster is more expensive (dollars and chip area) • SRAM, < 10$ per Megabyte • DRAM, < 1$ per Megabyte

• Hard Disk < 1$ per Gigabyte

• Other technologies have their place as well • PC-RAM, MRAM, RRAM

!3 Carnegie Mellon

We want both fast and large Memory

• But we cannot achieve both with a single level of memory

• Idea: • Have multiple levels of storage (progressively bigger and slower as the levels are farther from the processor) • ensure most of the data the processor needs in the near future is kept in the fast(er) level(s)

!4 Carnegie Mellon

Memory Hierarchy

move what you use here fast small

With good locality of reference, memory appears as fast as and as large as cheaperper backup faster perfaster byte everything here big but slow

!5 Carnegie Mellon

Memory Hierarchy

move what you use here fast small

With good locality of reference, memory appears as fast as and as large as cheaperper byte backup faster perfaster byte everything here big but slow

!5 Carnegie Mellon

Memory Hierarchy

• Fundamental tradeoff • Fast memory: small • Large memory: slow • Balance latency, cost, size, bandwidth

Hard Disk CPU Main Memory Registers (SRAM) (DRAM) (DFF)

!6 Carnegie Mellon

The Bookshelf Analogy

• Book in your hand • Desk • Bookshelf • Boxes at home • Boxes in storage

• Recently-used books tend to stay on desk • Comp Org. books, books for classes you are currently taking • Until the desk gets full • Adjacent books in the shelf needed around the same time

!7 Carnegie Mellon

A Modern Memory Hierarchy

Register File (DFF) 32 words, sub-nsec

L1 cache (SRAM) ~32 KB, ~nsec

L2 cache (SRAM) 512 KB ~ 1MB, many nsec

L3 cache (SRAM) .....

Main memory (DRAM), GB, ~100 nsec

Hard Disk 100 GB, ~10 msec

!8 Carnegie Mellon

How Things Have Progressed

Main RF Cache Memory (DFF) (SRAM) Disk (DRAM)

1995 low-mid 200B 64KB 32MB 2GB range 5ns 10ns 100ns 5ms Hennessy & Patterson, Computer Arch., 1996 2009 low-mid ~200B 8MB 4GB 750GB range 0.33ns 0.33ns <100ns 4ms www.dell.com, $449 including 17” LCD flat panel

2015 ~200B 8MB 16GB 256GB mid range 0.33ns 0.33ns <100ns 10us

!9 Carnegie Mellon

How to Make Effective Use of the Hierarchy

• Fundamental question: how do we know what data to put in the fast and small memory? • Answer: ensure most of the data the processor needs in the near future is kept in the fast(er) level(s) • How do we know what data will be needed in the future? • Do we know before the program runs? • If so, programmers or compiler can place the right data at the right place • Do we know only after the program runs? • If so, only the hardware can effectively place the data

!10 Carnegie Mellon

How to Make Effective Use of the Hierarchy

Cache CPU Registers Memory $

• Modern computers provide both ways • Register file: programmers explicitly move data from the main memory (slow but big DRAM) to registers (small, very fast) • movq (%rdi), %rdx • Cache, on the other hand, is automatically managed by hardware • Sits between registers and main memory, “invisible” to programmers • The hardware automatically figures out what data will be used in the near future, and place in the cache. • How does the hardware know that??

!1144 Carnegie Mellon

Locality

• Principle of Locality: Programs tend to use data and instructions with addresses near or equal to those they have used recently

!12 Carnegie Mellon

Locality

• Principle of Locality: Programs tend to use data and instructions with addresses near or equal to those they have used recently

• Temporal locality: • Recently referenced items are likely to be referenced again in the near future

!12 Carnegie Mellon

Locality

• Principle of Locality: Programs tend to use data and instructions with addresses near or equal to those they have used recently

• Temporal locality: • Recently referenced items are likely to be referenced again in the near future

• Spatial locality: • Items with nearby addresses tend to be referenced close together in time

!12 Carnegie Mellon

Locality Example

sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum;

• Data references • Spatial Locality: Reference array elements in succession (stride-1 reference pattern) • Temporal Locality: Reference variable sum each iteration.

• Instruction references • Spatial Locality: Reference instructions in sequence. • Temporal Locality: Cycle through loop repeatedly.

!13 Carnegie Mellon

Cache Illustrations

CPU

0 1 2 3 Memory 4 5 6 7 (big but slow) 8 9 10 11 12 13 14 15

!14 Carnegie Mellon

Cache Illustrations

CPU

Cache 8 9 14 3 (small but fast)

0 1 2 3 Memory 4 5 6 7 (big but slow) 8 9 10 11 12 13 14 15

!15 Carnegie Mellon

Cache Illustrations

CPU

Request Data Data in address b is needed at Address 14 Cache 8 9 14 3 (small but fast)

0 1 2 3 Memory 4 5 6 7 (big but slow) 8 9 10 11 12 13 14 15

!15 Carnegie Mellon

Cache Illustrations

CPU

Request Data Data in address b is needed at Address 14 Cache 8 9 14 3 Address b is in cache: Hit! (small but fast)

0 1 2 3 Memory 4 5 6 7 (big but slow) 8 9 10 11 12 13 14 15

!15 Carnegie Mellon

Cache Illustrations

CPU

Cache 8 9 14 3 (small but fast)

0 1 2 3 Memory 4 5 6 7 (big but slow) 8 9 10 11 12 13 14 15

!16 Carnegie Mellon

Cache Illustrations

CPU

Request data at Address 12 Cache 8 9 14 3 (small but fast)

0 1 2 3 Memory 4 5 6 7 (big but slow) 8 9 10 11 12 13 14 15

!16 Carnegie Mellon

Cache Illustrations

CPU

Request data Data in address b is needed at Address 12 Cache Address b is not in 8 9 14 3 (small but fast) cache: Miss!

0 1 2 3 Memory 4 5 6 7 (big but slow) 8 9 10 11 12 13 14 15

!16 Carnegie Mellon

Cache Illustrations

CPU

Request data Data in address b is needed at Address 12 Cache Address b is not in 8 9 14 3 (small but fast) cache: Miss!

Address b is fetched from Request: 12 memory

0 1 2 3 Memory 4 5 6 7 (big but slow) 8 9 10 11 12 13 14 15

!16 Carnegie Mellon

Cache Illustrations

CPU

Request data Data in address b is needed at Address 12 Cache Address b is not in 8 9 14 3 (small but fast) cache: Miss!

Address b is fetched from Request: 12 memory

0 1 2 3 Memory 4 5 6 7 (big but slow) 8 9 10 11 12 13 14 15

!16 Carnegie Mellon

Cache Illustrations

CPU

Request data Data in address b is needed at Address 12 Cache Address b is not in 8 9 14 3 (small but fast) cache: Miss!

Address b is fetched from 12 Request: 12 memory

0 1 2 3 Memory 4 5 6 7 (big but slow) 8 9 10 11 12 13 14 15

!16 Carnegie Mellon

Cache Illustrations

CPU

Request data Data in address b is needed at Address 12 Cache Address b is not in 8 129 14 3 (small but fast) cache: Miss!

Address b is fetched from Request: 12 memory

0 1 2 3 Memory 4 5 6 7 (big but slow) Address b is stored in cache 8 9 10 11 12 13 14 15

!16 Carnegie Mellon

Cache Hit Rate

• Cache hit is when you find the data in the cache • Hit rate indicates the effectiveness of the cache

# Hits Hit Rate = # Accesses

!17 Carnegie Mellon

Two Fundamental Issues in Cache Management

• Finding the data in the cache • Given an address, how do we decide whether it’s in the cacher not? • Kicking data out of the cache • Cache is small than memory, so when there’s no place left in the cache, we need to kick something out before we can put new data into it, but who to kick out?

!18 Carnegie Mellon

A Simple Cache

Cache Memory • 16 memory locations 0000 • 4 cache locations 0001 Content Valid? 0010 0011 a 0100 00 0101 01 0110 10 0111 11 1000 1001 1010 1011 1100 1101 1110 1111

!19 Carnegie Mellon

A Simple Cache

Cache Memory • 16 memory locations 0000 • 4 cache locations 0001 • Also called cache-line Content Valid? 0010 0011 a 0100 00 0101 01 0110 10 0111 11 1000 1001 1010 1011 1100 1101 1110 1111

!19 Carnegie Mellon

A Simple Cache

Cache Memory • 16 memory locations 0000 • 4 cache locations 0001 • Also called cache-line Content Valid? 0010 0011 a • Every location has a valid bit, indicating 0100 whether that location contains valid data; 0 initially. 00 0101 01 0110 10 0111 11 1000 1001 1010 1011 1100 1101 1110 1111

!19 Carnegie Mellon

A Simple Cache

Cache Memory • 16 memory locations 0000 • 4 cache locations 0001 • Also called cache-line Content Valid? 0010 0011 a • Every location has a valid bit, indicating 0100 whether that location contains valid data; 0 initially. 00 0101 01 0110 • For now, assume cache location size 10 0111 == memory location size == 1 B 11 1000 1001 1010 1011 1100 1101 1110 1111

!19 Carnegie Mellon

A Simple Cache

Cache Memory • 16 memory locations 0000 • 4 cache locations 0001 • Also called cache-line Content Valid? 0010 0011 a • Every location has a valid bit, indicating 0100 whether that location contains valid data; 0 initially. 00 0101 01 0110 • For now, assume cache location size 10 0111 == memory location size == 1 B 11 1000 • Assume each memory location can 1001 1010 only reside in one cache-line 1011 1100 1101 1110 1111

!19 Carnegie Mellon

A Simple Cache

Cache Memory • 16 memory locations 0000 • 4 cache locations 0001 • Also called cache-line Content Valid? 0010 0011 a • Every location has a valid bit, indicating 0100 whether that location contains valid data; 0 initially. 00 0101 01 0110 • For now, assume cache location size 10 0111 == memory location size == 1 B 11 1000 • Assume each memory location can 1001 1010 only reside in one cache-line 1011 • Cache is smaller than memory 1100 (obviously) 1101 1110 1111

!19 Carnegie Mellon

A Simple Cache

Cache Memory • 16 memory locations 0000 • 4 cache locations 0001 • Also called cache-line Content Valid? 0010 0011 a • Every location has a valid bit, indicating 0100 whether that location contains valid data; 0 initially. 00 0101 01 0110 • For now, assume cache location size 10 0111 == memory location size == 1 B 11 1000 • Assume each memory location can 1001 1010 only reside in one cache-line 1011 • Cache is smaller than memory 1100 (obviously) 1101 1110 • Thus, not all memory locations can 1111 be cached at the same time

!19 Carnegie Mellon

Cache Placement

Cache Memory

0000 0001 0010 0011 a 0100 00 0101 01 0110 10 0111 11 1000 1001 CA 1010 1011 1100 1101 ??? 1110 1111

Mem addr

!20 Carnegie Mellon

Cache Placement

Cache Memory • Given a memory addr, say 0x0001, we want to put the data there into the 0000 cache; where does the data go? 0001 0010 0011 a 0100 00 0101 01 0110 10 0111 11 1000 1001 CA 1010 1011 1100 1101 ??? 1110 1111

Mem addr

!20 Carnegie Mellon

Function to Address Cache

Cache Memory • Simplest way is to take a subset of address bits 0000 0001 • Six combinations in total 0010 • CA = ADDR[3],ADDR[2] 0011 a • CA = ADDR[3],ADDR[1] 0100 • CA = ADDR[3],ADDR[0] 00 0101 • CA = ADDR[2],ADDR[1] 01 0110 • CA = ADDR[2],ADDR[0] 10 0111 11 1000 • CA = ADDR[1],ADDR[0] 1001 Direct-Mapped Cache 1010 • CA • CA = ADDR[1],ADDR[0] 1011 1100 1101 addr[1:0] 1110 1111

Mem addr

!21 Carnegie Mellon

Direct-Mapped Cache

Cache Memory •Direct-Mapped Cache • CA = ADDR[1],ADDR[0] 0000 • Always use the lower order address 0001 bits 0010 0011 0100 00 0101 01 0110 10 0111 11 1000 1001 1010 1011 1100 1101 addr[1:0] 1110 1111

Mem addr

!22 Carnegie Mellon

Direct-Mapped Cache

Cache Memory •Direct-Mapped Cache • CA = ADDR[1],ADDR[0] 0000 • Always use the lower order address 0001 bits 0010 0011 •Multiple addresses can be 0100 mapped to the same location 00 0101 01 0110 10 0111 11 1000 1001 1010 1011 1100 1101 addr[1:0] 1110 1111

Mem addr

!22 Carnegie Mellon

Direct-Mapped Cache

Cache Memory •Direct-Mapped Cache • CA = ADDR[1],ADDR[0] 0000 • Always use the lower order address 0001 bits 0010 0011 •Multiple addresses can be 0100 mapped to the same location 00 0101 • E.g., 0010 and 1010 01 0110 10 0111 11 1000 1001 1010 1011 1100 1101 addr[1:0] 1110 1111

Mem addr

!22 Carnegie Mellon

Direct-Mapped Cache

Cache Memory •Direct-Mapped Cache • CA = ADDR[1],ADDR[0] 0000 • Always use the lower order address 0001 bits 0010 0011 •Multiple addresses can be 0100 mapped to the same location 00 0101 • E.g., 0010 and 1010 01 0110 10 0111 •How do we differentiate between 11 1000 different memory locations that 1001 are mapped to the same cache 1010 location? 1011 1100 1101 addr[1:0] 1110 1111

Mem addr

!22 Carnegie Mellon

Direct-Mapped Cache

Cache Memory •Direct-Mapped Cache • CA = ADDR[1],ADDR[0] 0000 • Always use the lower order address 0001 bits 0010 0011 •Multiple addresses can be 0100 mapped to the same location 00 0101 • E.g., 0010 and 1010 01 0110 10 0111 •How do we differentiate between 11 1000 different memory locations that 1001 are mapped to the same cache 1010 location? 1011 • Add a tag field for that purpose 1100 1101 addr[1:0] 1110 1111

Mem addr

!22 Carnegie Mellon

Direct-Mapped Cache

Cache Memory •Direct-Mapped Cache • CA = ADDR[1],ADDR[0] 0000 • Always use the lower order address 0001 bits 0010 0011 •Multiple addresses can be 0100 mapped to the same location 00 0101 • E.g., 0010 and 1010 01 0110 10 0111 •How do we differentiate between 11 1000 different memory locations that 1001 are mapped to the same cache 1010 location? 1011 • Add a tag field for that purpose 1100 1101 • ADDR[3] and ADDR[2] in this addr[1:0] 1110 particular example 1111

Mem addr

!22 Carnegie Mellon

Direct-Mapped Cache

Cache Memory •Direct-Mapped Cache • CA = ADDR[1],ADDR[0] 0000 • Always use the lower order address 0001 bits 0010 0011 •Multiple addresses can be 0100 mapped to the same location 00 addr [3:2] 0101 • E.g., 0010 and 1010 01 addr [3:2] 0110 10 addr [3:2] 0111 •How do we differentiate between 11 addr [3:2] 1000 different memory locations that 1001 are mapped to the same cache 1010 location? 1011 • Add a tag field for that purpose 1100 1101 • ADDR[3] and ADDR[2] in this addr[1:0] 1110 particular example 1111

Mem addr

!22 Carnegie Mellon

Direct-Mapped Cache

Cache Memory •Direct-Mapped Cache • CA = ADDR[1],ADDR[0] 0000 • Always use the lower order address Tag 0001 bits 0010 0011 •Multiple addresses can be 0100 mapped to the same location 00 addr [3:2] 0101 • E.g., 0010 and 1010 01 addr [3:2] 0110 10 addr [3:2] 0111 •How do we differentiate between 11 addr [3:2] 1000 different memory locations that 1001 are mapped to the same cache 1010 location? 1011 • Add a tag field for that purpose 1100 1101 • ADDR[3] and ADDR[2] in this addr[1:0] 1110 particular example 1111

Mem addr

!22 Carnegie Mellon

Direct-Mapped Cache

Cache Memory •Direct-Mapped Cache • CA = ADDR[1],ADDR[0] 0000 • Always use the lower order address Tag 0001 bits 0010 0011 •Multiple addresses can be 0100 mapped to the same location 00 addr [3:2] 0101 • E.g., 0010 and 1010 01 addr [3:2] 0110 10 addr [3:2] 0111 •How do we differentiate between 11 addr [3:2] 1000 different memory locations that 1001 are mapped to the same cache 1010 location? 1011 • Add a tag field for that purpose 1100 1101 • ADDR[3] and ADDR[2] in this addr[1:0] = Hit? 1110 particular example 1111

Mem addr addr[3:2]

!22 Carnegie Mellon

Example: Direct-Mapped Cache

Cache Memory for (i = 0; i < 4; ++i) { A += mem[i]; 0000 0001 } 0010 for (i = 0; i < 4; ++i) { 0011 a 0100 B *= (mem[i] + A); 00 0101 } 01 0110 10 0111 11 1000 a Assume mem == 0b1000 1001 b • 1010 • Read 0b1000 1011 d Read 0b1001 1100 • 1101 • Read 0b1010 addr[1:0] = Hit? 1110

1111 • Read 0b1011 • Read 0b1000; cache hit? addr addr[3:2]

!23 Carnegie Mellon

Example: Direct-Mapped Cache

Cache Memory for (i = 0; i < 4; ++i) { A += mem[i]; 0000 0001 } 0010 for (i = 0; i < 4; ++i) { 0011 a 0100 B *= (mem[i] + A); 00 a 1 10 0101 } 01 0110 10 0111 11 1000 a Assume mem == 0b1000 1001 b • 1010 c • Read 0b1000 1011 d Read 0b1001 1100 • 1101 • Read 0b1010 addr[1:0] = Hit? 1110

1111 • Read 0b1011 • Read 0b1000; cache hit? addr addr[3:2]

!23 Carnegie Mellon

Example: Direct-Mapped Cache

Cache Memory for (i = 0; i < 4; ++i) { A += mem[i]; 0000 0001 } 0010 for (i = 0; i < 4; ++i) { 0011 a 0100 B *= (mem[i] + A); 00 a 1 10 0101 } 01 b 1 10 0110 10 0111 11 1000 a Assume mem == 0b1000 1001 b • 1010 c • Read 0b1000 1011 d Read 0b1001 1100 • 1101 • Read 0b1010 addr[1:0] = Hit? 1110

1111 • Read 0b1011 • Read 0b1000; cache hit? addr addr[3:2]

!23 Carnegie Mellon

Example: Direct-Mapped Cache

Cache Memory for (i = 0; i < 4; ++i) { A += mem[i]; 0000 0001 } 0010 for (i = 0; i < 4; ++i) { 0011 a 0100 B *= (mem[i] + A); 00 a 1 10 0101 } 01 b 1 10 0110 10 c 1 10 0111 11 1000 a Assume mem == 0b1000 1001 b • 1010 c • Read 0b1000 1011 d Read 0b1001 1100 • 1101 • Read 0b1010 addr[1:0] = Hit? 1110

1111 • Read 0b1011 • Read 0b1000; cache hit? addr addr[3:2]

!23 Carnegie Mellon

Example: Direct-Mapped Cache

Cache Memory for (i = 0; i < 4; ++i) { A += mem[i]; 0000 0001 } 0010 for (i = 0; i < 4; ++i) { 0011 a 0100 B *= (mem[i] + A); 00 a 1 10 0101 } 01 b 1 10 0110 10 c 1 10 0111 11 d 1 10 1000 a Assume mem == 0b1000 1001 b • 1010 c • Read 0b1000 1011 d Read 0b1001 1100 • 1101 • Read 0b1010 addr[1:0] = Hit? 1110

1111 • Read 0b1011 • Read 0b1000; cache hit? addr addr[3:2]

!23 Carnegie Mellon

Example: Direct-Mapped Cache

Cache Memory for (i = 0; i < 4; ++i) { A += mem[i]; 0000 0001 } 0010 for (i = 0; i < 4; ++i) { 0011 a 0100 B *= (mem[i] + A); 00 a 1 10 0101 } 01 b 1 10 0110 10 c 1 10 0111 11 d 1 10 1000 a Assume mem == 0b1000 1001 b • 1010 c • Read 0b1000 1011 d Read 0b1001 1100 • 1101 • Read 0b1010 addr[1:0] = Hit? 1110

1111 • Read 0b1011 • Read 0b1000; cache hit? addr addr[3:2]

!23 Carnegie Mellon

Conflicts

Cache Memory Assume the following memory access stream: 0000 0001 0010 0011 a 0100 00 0101 01 0110 10 0111 11 1000 a 1001 b 1010 c 1011 1100 1101 d addr[1:0] = Hit? 1110 1111

addr addr[3:2]

!24 Carnegie Mellon

Conflicts

Cache Memory Assume the following memory access stream: 0000 0001 • Read 1000 0010 0011 a 0100 00 0101 01 0110 10 0111 11 1000 a 1001 b 1010 c 1011 1100 1101 d addr[1:0] = Hit? 1110 1111

addr addr[3:2]

!24 Carnegie Mellon

Conflicts

Cache Memory Assume the following memory access stream: 0000 0001 • Read 1000 0010 0011 a 0100 00 a 1 10 0101 01 0110 10 0111 11 1000 a 1001 b 1010 c 1011 1100 1101 d addr[1:0] = Hit? 1110 1111

addr addr[3:2]

!24 Carnegie Mellon

Conflicts

Cache Memory Assume the following memory access stream: 0000 0001 • Read 1000 0010 • Read 1001 0011 a 0100 00 a 1 10 0101 01 0110 10 0111 11 1000 a 1001 b 1010 c 1011 1100 1101 d addr[1:0] = Hit? 1110 1111

addr addr[3:2]

!24 Carnegie Mellon

Conflicts

Cache Memory Assume the following memory access stream: 0000 0001 • Read 1000 0010 • Read 1001 0011 a 0100 00 a 1 10 0101 01 b 1 10 0110 10 0111 11 1000 a 1001 b 1010 c 1011 1100 1101 d addr[1:0] = Hit? 1110 1111

addr addr[3:2]

!24 Carnegie Mellon

Conflicts

Cache Memory Assume the following memory access stream: 0000 0001 • Read 1000 0010 • Read 1001 0011 a 0100 • Read 1010 00 a 1 10 0101 01 b 1 10 0110 10 0111 11 1000 a 1001 b 1010 c 1011 1100 1101 d addr[1:0] = Hit? 1110 1111

addr addr[3:2]

!24 Carnegie Mellon

Conflicts

Cache Memory Assume the following memory access stream: 0000 0001 • Read 1000 0010 • Read 1001 0011 a 0100 • Read 1010 00 a 1 10 0101 01 b 1 10 0110 10 c 1 10 0111 11 1000 a 1001 b 1010 c 1011 1100 1101 d addr[1:0] = Hit? 1110 1111

addr addr[3:2]

!24 Carnegie Mellon

Conflicts

Cache Memory Assume the following memory access stream: 0000 0001 • Read 1000 0010 • Read 1001 0011 a 0100 • Read 1010 00 a 1 10 0101 • Read 1101 (kick out 1001) 01 b 1 10 0110 10 c 1 10 0111 11 1000 a 1001 b 1010 c 1011 1100 1101 d addr[1:0] = Hit? 1110 1111

addr addr[3:2]

!24 Carnegie Mellon

Conflicts

Cache Memory Assume the following memory access stream: 0000 0001 • Read 1000 0010 • Read 1001 0011 a 0100 • Read 1010 00 a 1 10 0101 • Read 1101 (kick out 1001) 01 b 1 10 0110 10 c 1 10 0111 • addr[1:0]: 01 11 1000 a 1001 b 1010 c 1011 1100 1101 d addr[1:0] = Hit? 1110 1111

addr addr[3:2]

!24 Carnegie Mellon

Conflicts

Cache Memory Assume the following memory access stream: 0000 0001 • Read 1000 0010 • Read 1001 0011 a 0100 • Read 1010 00 a 1 10 0101 • Read 1101 (kick out 1001) 01 b 1 10 0110 10 c 1 10 0111 • addr[1:0]: 01 11 1000 a • addr[3:2]: 11 1001 b 1010 c 1011 1100 1101 d addr[1:0] = Hit? 1110 1111

addr addr[3:2]

!24 Carnegie Mellon

Conflicts

Cache Memory Assume the following memory access stream: 0000 0001 • Read 1000 0010 • Read 1001 0011 a 0100 • Read 1010 00 a 1 10 0101 • Read 1101 (kick out 1001) 01 d 1 11 0110 10 c 1 10 0111 • addr[1:0]: 01 11 1000 a • addr[3:2]: 11 1001 b 1010 c 1011 1100 1101 d addr[1:0] = Hit? 1110 1111

addr addr[3:2]

!24 Carnegie Mellon

Conflicts

Cache Memory Assume the following memory access stream: 0000 0001 • Read 1000 0010 • Read 1001 0011 a 0100 • Read 1010 00 a 1 10 0101 • Read 1101 (kick out 1001) 01 d 1 11 0110 10 c 1 10 0111 • addr[1:0]: 01 11 1000 a • addr[3:2]: 11 1001 b 1010 c • Read 1001 -> Miss! 1011 1100 1101 d addr[1:0] = Hit? 1110 1111

addr addr[3:2]

!24 Carnegie Mellon

Conflicts

Cache Memory Assume the following memory access stream: 0000 0001 • Read 1000 0010 • Read 1001 0011 a 0100 • Read 1010 00 a 1 10 0101 • Read 1101 (kick out 1001) 01 d 1 11 0110 10 c 1 10 0111 • addr[1:0]: 01 11 1000 a • addr[3:2]: 11 1001 b 1010 c • Read 1001 -> Miss! 1011 1100 • Why? Each memory location is 1101 d mapped to only one cache Hit? addr[1:0] = 1110 location 1111

addr addr[3:2]

!24 Carnegie Mellon

Sets • Each cacheable memory location is Cache Memory mapped to a set of cache locations • A set is one or more cache locations 0000 0001 • Set size is the number of locations in a 0010 set, also called associativity 0011 0100 00 0101 01 0110 10 0111 11 1000 1001 1010 1011 1100 1101 1110 1111

!25 Carnegie Mellon

2 Way Set Associative

0 Set 0

1 Set 1

!26 Carnegie Mellon

2 Way Set Associative • 2 sets, each set has two entries

0 Set 0

1 Set 1

!26 Carnegie Mellon

2 Way Set Associative • 2 sets, each set has two entries • Only need one bit, addr[0] to index into the cache now

0 Set 0

1 Set 1

!26 Carnegie Mellon

2 Way Set Associative • 2 sets, each set has two entries • Only need one bit, addr[0] to index into the cache now • Correspondingly, the tag needs 3 bits: Addr[3:1]

0 Set 0

1 Set 1

!26 Carnegie Mellon

2 Way Set Associative • 2 sets, each set has two entries • Only need one bit, addr[0] to index into the cache now • Correspondingly, the tag needs 3 bits: Addr[3:1] • Either entry can store any address that gets mapped to that set

0 Set 0

1 Set 1

!26 Carnegie Mellon

2 Way Set Associative • 2 sets, each set has two entries • Only need one bit, addr[0] to index into the cache now • Correspondingly, the tag needs 3 bits: Addr[3:1] • Either entry can store any address that gets mapped to that set • Now with the same access stream: 0 Set 0

1 Set 1

!26 Carnegie Mellon

2 Way Set Associative • 2 sets, each set has two entries • Only need one bit, addr[0] to index into the cache now • Correspondingly, the tag needs 3 bits: Addr[3:1] • Either entry can store any address that gets mapped to that set • Now with the same access stream: 0 100 Set 0 • Read 1000

1 Set 1

!26 Carnegie Mellon

2 Way Set Associative • 2 sets, each set has two entries • Only need one bit, addr[0] to index into the cache now • Correspondingly, the tag needs 3 bits: Addr[3:1] • Either entry can store any address that gets mapped to that set • Now with the same access stream: 0 100 Set 0 • Read 1000 • Read 1001 1 100 Set 1

!26 Carnegie Mellon

2 Way Set Associative • 2 sets, each set has two entries • Only need one bit, addr[0] to index into the cache now • Correspondingly, the tag needs 3 bits: Addr[3:1] • Either entry can store any address that gets mapped to that set • Now with the same access stream: 0 100 Set 0 101 • Read 1000 • Read 1001 1 100 Set 1 • Read 1010

!26 Carnegie Mellon

2 Way Set Associative • 2 sets, each set has two entries • Only need one bit, addr[0] to index into the cache now • Correspondingly, the tag needs 3 bits: Addr[3:1] • Either entry can store any address that gets mapped to that set • Now with the same access stream: 0 100 Set 0 101 • Read 1000 • Read 1001 1 100 Set 1 • Read 1010 • Read 1101 (1001 can still stay)

!26 Carnegie Mellon

2 Way Set Associative • 2 sets, each set has two entries • Only need one bit, addr[0] to index into the cache now • Correspondingly, the tag needs 3 bits: Addr[3:1] • Either entry can store any address that gets mapped to that set • Now with the same access stream: 0 100 Set 0 101 • Read 1000 • Read 1001 1 100 Set 1 110 • Read 1010 • Read 1101 (1001 can still stay) • Read 1001 -> Hit!

!26 Carnegie Mellon

4 Way Set Associative

• One single set that contains all the cache locations • Also called Fully-Associative Cache • Every entry can store any cache-line that maps to that set

0

!27 Carnegie Mellon

4 Way Set Associative

• One single set that contains all the cache locations • Also called Fully-Associative Cache • Every entry can store any cache-line that maps to that set

0

Assuming the same access stream

!27 Carnegie Mellon

4 Way Set Associative

• One single set that contains all the cache locations • Also called Fully-Associative Cache • Every entry can store any cache-line that maps to that set

0 1000

Assuming the same access stream • Read 1000

!27 Carnegie Mellon

4 Way Set Associative

• One single set that contains all the cache locations • Also called Fully-Associative Cache • Every entry can store any cache-line that maps to that set

0 1000 1001

Assuming the same access stream • Read 1000 • Read 1001

!27 Carnegie Mellon

4 Way Set Associative

• One single set that contains all the cache locations • Also called Fully-Associative Cache • Every entry can store any cache-line that maps to that set

0 1000 1001 1010

Assuming the same access stream • Read 1000 • Read 1001 • Read 1010

!27 Carnegie Mellon

4 Way Set Associative

• One single set that contains all the cache locations • Also called Fully-Associative Cache • Every entry can store any cache-line that maps to that set

0 1000 1001 1010 1101

Assuming the same access stream • Read 1000 • Read 1001 • Read 1010 • Read 1101

!27 Carnegie Mellon

4 Way Set Associative

• One single set that contains all the cache locations • Also called Fully-Associative Cache • Every entry can store any cache-line that maps to that set

0 1000 1001 1010 1101

Assuming the same access stream • Read 1000 • Read 1001 • Read 1010 • Read 1101 • Read 1001 -> Hit!

!27 Carnegie Mellon

Associative verses Direct Mapped Trade-offs

!28 Carnegie Mellon

Associative verses Direct Mapped Trade-offs

• Direct-Mapped cache • Generally lower hit rate • Simpler, Faster

!28 Carnegie Mellon

Associative verses Direct Mapped Trade-offs

• Direct-Mapped cache • Generally lower hit rate • Simpler, Faster

00 a 1 10 01 b 1 10 10 c 1 10 11 d 1 10

addr[1:0] = Hit?

addr addr[3:2]

!28 Carnegie Mellon

Associative verses Direct Mapped Trade-offs

• Direct-Mapped cache • Generally lower hit rate • Simpler, Faster • Associative cache • Generally higher hit rate. Better utilization of cache resources • Slower. Why?

00 a 1 10 0 a 1 101 c 1 100 01 b 1 10 1 b 1 101 d 1 100 10 c 1 10 11 d 1 10

addr[1:0] = Hit?

addr addr[3:2]

!28 Carnegie Mellon

Associative verses Direct Mapped Trade-offs

• Direct-Mapped cache • Generally lower hit rate • Simpler, Faster • Associative cache • Generally higher hit rate. Better utilization of cache resources • Slower. Why?

00 a 1 10 0 a 1 101 c 1 100 01 b 1 10 1 b 1 101 d 1 100 10 c 1 10 11 d 1 10

Hit? addr[1:0] = addr[0]

addr addr[3:2] addr

!28 Carnegie Mellon

Associative verses Direct Mapped Trade-offs

• Direct-Mapped cache • Generally lower hit rate • Simpler, Faster • Associative cache • Generally higher hit rate. Better utilization of cache resources • Slower. Why?

00 a 1 10 0 a 1 101 c 1 100 01 b 1 10 1 b 1 101 d 1 100 10 c 1 10 11 d 1 10

= =

Hit? addr[1:0] = addr[0]

addr addr[3:2] addr addr[3:1]

!28 Carnegie Mellon

Associative verses Direct Mapped Trade-offs

• Direct-Mapped cache • Generally lower hit rate • Simpler, Faster • Associative cache • Generally higher hit rate. Better utilization of cache resources • Slower. Why?

00 a 1 10 0 a 1 101 c 1 100 01 b 1 10 1 b 1 101 d 1 100 10 c 1 10 11 d 1 10

= =

Hit? addr[1:0] = addr[0] Or

addr addr[3:2] addr addr[3:1] Hit?

!28