Memory Hierarchy: locality, and storage technologies
Yipeng Huang
Rutgers University
April 1, 2021
1/16 Table of contents
Announcements
Cache, memory, storage, and network hierarchy trends Static random-access memory (caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage)
Locality: How to create illusion of fast access to capacious data Spatial locality Temporal locality
2/16 Looking ahead
Class plan 1. Today, Thursday, 4/1: Introduction to locality and the memory hierarchy. Caches. 2. Preview of next two assignments—PA5: cache simulator and performance. PA6: digital logic.
3/16 Table of contents
Announcements
Cache, memory, storage, and network hierarchy trends Static random-access memory (caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage)
Locality: How to create illusion of fast access to capacious data Spatial locality Temporal locality
4/16 Cache, memory, storage, and network hierarchy trends
L0: Regs CPU registers hold words Smaller, retrieved from the L1 cache. faster, L1: L1 cache and (SRAM) L1 cache holds cache lines retrieved from the L2 cache. costlier L2 cache I Assembly (per byte) L2: (SRAM) storage L2 cache holds cache lines programming view devices retrieved from L3 cache L3: L3 cache of computer: CPU (SRAM) L3 cache holds cache lines and memory. retrieved from main memory. Larger, I Full view of slower, L4: Main memory and (DRAM) Main memory holds disk computer cheaper blocks retrieved from (per byte) local disks. architecture and storage L5: Local secondary storage devices (local disks) systems: +caches, Local disks hold files retrieved from disks +storage, +network on remote servers L6: Remote secondary storage (e.g., Web servers)
Figure: Memory hierarchy. Image credit CS:APP 5/16 Cache, memory, storage, and network hierarchy trends
100,000,000.0 10,000,000.0 1,000,000.0
100,000.0 Disk seek time Topic of this chapter: 10,000.0 SSD access time 1,000.0 DRAM access time I Technology trends
Time (ns) Time 100.0 SRAM access time that drive 10.0 CPU cycle time Effective CPU cycle time CPU-memory gap. 1.0 0.1 I How to create 0.0 1985 1990 1995 2000 2003 2005 2010 2015 illusion of fast Year access to capacious data. Figure: Widening gap: CPU processing time vs. memory access time. Image credit CS:APP
6/16 Static random-access memory (caches)
I SRAM is bistable logic I Access time: 1 to 10 CPU clock cycles I Implemented in the same transistor technology as CPUs, so improvement has matched pace.
Figure: SRAM operating principle. Image credit Wikimedia
7/16 Dynamic random-access memory (main memory)
I Needs refreshing every 10s of milliseconds I 8GB typical in laptop; 1TB on each ilab machine I Memory gap: DRAM technological improvement slower relative to CPU/SRAM. Figure: DRAM operating principle. Image credit ocw.mit.edu
8/16 CPU / DRAM main memory interface
CPU chip!
Register file!
ALU!
System bus! Memory bus!
I/O ! Main! Bus interface! bridge! memory!
Figure: Memory Bus. Image credit CS:APP Figure: Intel 2020 Gulftown die shot. Image credit AnandTech I DDR4 bus standard supports 25.6GB/s transfer rate
9/16 Solid state and hard disk drives (storage)
I/O bus
Requests to read and ! write logical disk blocks! Solid State Disk (SSD) Flash translation layer Flash memory Technology Block 0 Block B-1 Page 0 Page 1 … Page P-1 … Page 0 Page 1 … Page P-1 I SSD: flash nonvolatile memory stores data as charge. I HDD: magnetic orientation. Figure: SSD. Image credit CS:APP For in-depth on storage, file systems, and operating
systems, take: Cylinder k
CS214 Systems Programming Surface 0! I Platter 0! Surface 1! Surface 2! I CS416 Operating Systems Design Platter 1! Surface 3! Surface 4! Platter 2! Over the summer, LCSR (admins of iLab) will be moving the storage systems Surface 5!
that supports everyone’s home directories to SSD. https://resources.cs. Spindle! rutgers.edu/docs/file-storage/storage-technology-options/ Figure: HDD. Image credit CS:APP 10/16 I/O interfaces
CPU! Register file!
ALU! System bus! Memory bus! Storage interfaces SATA 3.0 interface (6Gb/s I/O ! Main! I Bus interface! bridge! memory! transfer rate) typical I PCIe (15.8 GB/s) becoming commonplace for SSD I/O bus! Expansion slots for! other devices such! USB! Graphics! Host bus ! as network adapters! I But interface data rate is controller! adapter! adapter ! ! (SCSI/SATA)! rarely the bottleneck. Mouse! Solid ! Key! Monitor! state ! board! Disk ! disk! controller! For in-depth on computer Disk drive! network layers, take: I CS352 Internet Technology
Figure: I/O Bus. Image credit CS:APP 11/16 Cache, memory, storage, and network hierarchy trends
100,000,000.0 10,000,000.0 1,000,000.0
100,000.0 Disk seek time Topic of this chapter: 10,000.0 SSD access time 1,000.0 DRAM access time I Technology trends
Time (ns) Time 100.0 SRAM access time that drive 10.0 CPU cycle time Effective CPU cycle time CPU-memory gap. 1.0 0.1 I How to create 0.0 1985 1990 1995 2000 2003 2005 2010 2015 illusion of fast Year access to capacious data. Figure: Widening gap: CPU processing time vs. memory access time. Image credit CS:APP
12/16 Table of contents
Announcements
Cache, memory, storage, and network hierarchy trends Static random-access memory (caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage)
Locality: How to create illusion of fast access to capacious data Spatial locality Temporal locality
13/16 Locality: How to create illusion of fast access to capacious data
From the perspective of memory hierarchy, locality is using the data in at any particular level more frequently than accessing storage at next slower level. Well-written programs maximize locality I Spatial locality I Temporal locality
14/16 Spatial locality
1 double dotProduct (
2 double a[N], double b[N], 3 Spatial locality 4 ){ 5 double sum = 0.0; I Programs tend to access adjacent 6 for(size_t i=0; i 10 } 15/16 Temporal locality 1 double dotProduct ( 2 double a[N], double b[N], 3 Temporal locality 4 ){ 5 double sum = 0.0; I Programs tend to access data over 6 for(size_t i=0; i 10 } 16/16