Lecture #11 "Cache Memories & Non-Volatile Storage"

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 18-600 Foundations of Computer Systems Lecture 11: “Cache Memories & Non-Volatile Storage” John P. Shen & Gregory Kesden October 4, 2017 ➢ Required Reading Assignment: • Chapter 6 of CS:APP (3rd edition) by Randy Bryant & Dave O’Hallaron 10/04/2017 (© John Shen) 18-600 Lecture #11 1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 18-600 Foundations of Computer Systems Lecture 11: “Cache Memories & Non-Volatile Storage” A. Cache Organization and Operation B. Performance Impact of Caches a. The Memory Mountain b. Rearranging Loops to Improve Spatial Locality c. Using Blocking to Improve Temporal Locality C. Non-Volatile Storage Technologies a. Disk Storage Technology b. Flash Memory Technology 10/04/2017 (© John Shen) 18-600 Lecture #11 2 From Lecture #10 … Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Memory Hierarchy (where do all the bits live?) Register File 32 words, sub-nsec L1 cache (SRAM) ~32 KB, ~nsec L2 cache (SRAM) 512 KB ~ 1MB, many nsec Memory L3 cache, (SRAM) Abstraction ..... Main Memory (DRAM) 2-8 GB, ~100 nsec Disk Storage 200-1K GB, ~10 msec 10/04/2017 (© John Shen) 18-600 Lecture #11 3 Memory Hierarchy (where do all the bits live?) Register File 32 words, sub-nsec L1 cache (SRAM) ~32 KB, ~nsec L2 cache (SRAM) 512 KB ~ 1MB, many nsec Memory L3 cache, (SRAM) Abstraction ..... Main Memory (DRAM) 2-8 GB, ~100 nsec Disk Storage 200-1K GB, ~10 msec 10/04/2017 (© John Shen) 18-600 Lecture #11 4 From Lecture #7 … Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition (Cache) Memory Implementation Options index index key idx key index tag data tag data decoder decoder decoder decoder Indexed Memory Associative Memory N-Way Indexed Memory (RAM) (CAM) Set-Associative Memory (Multi-Ported) k-bit index no index k-bit index (2x) k-bit index k 2k blocks unlimited blocks 2 • N blocks (2x) 2k blocks 10/04/2017 (© John Shen) 18-600 Lecture #11 5 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition General Cache Concept Smaller, faster, more expensive Cache 84 9 1410 3 memory caches a subset of the blocks Data is copied in block-sized 104 transfer units (or cache “lines”) Larger, slower, cheaper memory Memory 0 1 2 3 viewed as partitioned into “blocks” 4 5 6 7 (or cache “lines”) 8 9 10 11 12 13 14 15 10/04/2017 (© John Shen) 18-600 Lecture #11 6 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition General Cache Concepts: Hit Data in block b is needed Request: 14 Block b is in cache: Cache 8 9 14 3 Hit! Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 10/04/2017 (© John Shen) 18-600 Lecture #11 7 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition General Cache Concepts: Miss Data in block b is needed Request: 12 Block b is not in cache: Cache 8 129 14 3 Miss! Block b is fetched from 12 Request: 12 memory Block b is stored in cache Memory 0 1 2 3 • Placement policy: 4 5 6 7 determines where b goes 8 9 10 11 • Replacement policy: determines which block 12 13 14 15 gets evicted (victim) 10/04/2017 (© John Shen) 18-600 Lecture #11 8 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition General Caching Concepts: Types of Cache Misses (3 C’s) • Cold (compulsory) miss • Cold misses occur because the cache is empty. • Capacity miss • Occurs when the set of active cache blocks (working set) is larger than the cache. • Conflict miss • Occur when the level k cache is large enough, but multiple data objects all map to the same level k block. • E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time. 10/04/2017 (© John Shen) 18-600 Lecture #11 9 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Cache Memories • Cache memories are small, fast SRAM-based memories managed automatically in hardware • Hold frequently accessed blocks of main memory • CPU looks first for data in cache • Typical system structure: CPU chip Register file Cache ALU memory System bus Memory bus I/O Main Bus interface bridge memory 10/04/2017 (© John Shen) 18-600 Lecture #11 10 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition General Cache Organization (S, E, B) E = 2e lines per set set line S = 2s sets Cache size: C = S x E x B data bytes v tag 0 1 2 B-1 valid bit B = 2b bytes per cache block (the data) 10/04/2017 (© John Shen) 18-600 Lecture #11 11 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition • Locate set Cache Read • Check if any line in set E = 2e lines per set has matching tag • Yes + line valid: hit • Locate data starting at offset Address of word: t bits s bits b bits s S = 2 sets tag set block index offset data begins at this offset v tag 0 1 2 B-1 valid bit B = 2b bytes per cache block (the data) 10/04/2017 (© John Shen) 18-600 Lecture #11 12 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Example: Direct Mapped Cache (E = 1) Direct mapped: One line per set Assume: cache block size 8 bytes v tag 0 1 2 3 4 5 6 7 Address of int: t bits 0…01 100 v tag 0 1 2 3 4 5 6 7 find set S = 2s sets v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 10/04/2017 (© John Shen) 18-600 Lecture #11 13 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Example: Direct Mapped Cache (E = 1) Direct mapped: One line per set Assume: cache block size 8 bytes Address of int: valid? + match: assume yes = hit t bits 0…01 100 v tag 0 1 2 3 4 5 6 7 block offset 10/04/2017 (© John Shen) 18-600 Lecture #11 14 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Example: Direct Mapped Cache (E = 1) Direct mapped: One line per set Assume: cache block size 8 bytes Address of int: valid? + match: assume yes = hit t bits 0…01 100 v tag 0 1 2 3 4 5 6 7 block offset int (4 Bytes) is here If tag doesn’t match: old line is evicted and replaced 10/04/2017 (© John Shen) 18-600 Lecture #11 15 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Direct-Mapped Cache Simulation M=16 bytes (4-bit addresses), B=2 bytes/block, t=1 s=2 b=1 x xx x S=4 sets, E=1 Blocks/set Address trace (reads, one byte per read): 0 [00002], miss 1 [00012], hit 7 [01112], miss 8 [10002], miss 0 [00002] miss v Tag Block Set 0 10 01? M[0M[8?-1]9] Set 1 Set 2 Set 3 1 0 M[6-7] 10/04/2017 (© John Shen) 18-600 Lecture #11 16 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Address of short int: Assume: cache block size 8 bytes t bits 0…01 100 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 find set v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 10/04/2017 (© John Shen) 18-600 Lecture #11 17 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume: cache block size 8 bytes Address of short int: t bits 0…01 100 compare both valid? + match: yes = hit v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 block offset 10/04/2017 (© John Shen) 18-600 Lecture #11 18 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition E-way Set Associative Cache (Here: E = 2) E = 2: Two lines per set Assume: cache block size 8 bytes Address of short int: t bits 0…01 100 compare both valid? + match: yes = hit v tag 0 1 2 3 4 5 6 7 v tag 0 1 2 3 4 5 6 7 block offset short int (2 Bytes) is here No match: • One line in set is selected for eviction and replacement • Replacement policies: random, least recently used (LRU), … 10/04/2017 (© John Shen) 18-600 Lecture #11 19 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 2-Way Set Associative Cache Simulation t=2 s=1 b=1 M=16 byte addresses, B=2 bytes/block, xx x x S=2 sets, E=2 blocks/set Address trace (reads, one byte per read): 0 [00002], miss 1 [00012], hit 7 [01112], miss 8 [10002], miss 0 [00002] hit v Tag Block Set 0 10 00? M[0? -1] 10 10 M[8-9] 10 01 M[6-7] Set 1 0 10/04/2017 (© John Shen) 18-600 Lecture #11 20 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition What about writes? • Multiple copies of data exist: • L1, L2, L3, Main Memory, Disk • What to do on a write-hit? • Write-through (write immediately to memory) • Write-back (defer write to memory until replacement of line) • Need a dirty bit (line different from memory or not) • What to do on a write-miss? • Write-allocate (load into cache, update line in cache) • Good if more writes to the location follow • No-write-allocate (writes straight to memory, does not load into cache) • Typical • Write-through + No-write-allocate • Write-back + Write-allocate 10/04/2017 (© John Shen) 18-600 Lecture #11 21 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Intel Core i7 Cache Hierarchy Processor package Core 0 Core 3 L1 i-cache and d-cache: Regs Regs 32 KB, 8-way, Access: 4 cycles L1 L1 L1 L1 d-cache i-cache d-cache i-cache L2 unified cache: … 256 KB, 8-way, L2 unified cache L2 unified cache Access: 10 cycles L3 unified cache: L3 unified cache 8 MB, 16-way, (shared by all cores) Access: 40-75 cycles Block size: 64 bytes for all Main memory caches.

Load more