Memory Hierarchy: locality, and storage technologies

Yipeng Huang

Rutgers University

April 1, 2021

1/16 Table of contents

Announcements

Cache, memory, storage, and network hierarchy trends Static random-access memory (caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage)

Locality: How to create illusion of fast access to capacious Spatial locality Temporal locality

2/16 Looking ahead

Class plan 1. Today, Thursday, 4/1: Introduction to locality and the . Caches. 2. Preview of next two assignments—PA5: simulator and performance. PA6: digital logic.

3/16 Table of contents

Announcements

Cache, memory, storage, and network hierarchy trends Static random-access memory (caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage)

Locality: How to create illusion of fast access to capacious data Spatial locality Temporal locality

4/16 Cache, memory, storage, and network hierarchy trends

L0: Regs CPU registers hold words Smaller, retrieved from the L1 cache. faster, L1: L1 cache and (SRAM) L1 cache holds cache lines retrieved from the L2 cache. costlier L2 cache I Assembly (per ) L2: (SRAM) storage L2 cache holds cache lines programming view devices retrieved from L3 cache L3: L3 cache of computer: CPU (SRAM) L3 cache holds cache lines and memory. retrieved from main memory. Larger, I Full view of slower, L4: Main memory and (DRAM) Main memory holds disk computer cheaper blocks retrieved from (per byte) local disks. architecture and storage L5: Local secondary storage devices (local disks) systems: +caches, Local disks hold files retrieved from disks +storage, +network on remote servers L6: Remote secondary storage (e.g., Web servers)

Figure: Memory hierarchy. Image credit CS:APP 5/16 Cache, memory, storage, and network hierarchy trends

100,000,000.0 10,000,000.0 1,000,000.0

100,000.0 Disk seek time Topic of this chapter: 10,000.0 SSD access time 1,000.0 DRAM access time I Technology trends

Time (ns) Time 100.0 SRAM access time that drive 10.0 CPU cycle time Effective CPU cycle time CPU-memory gap. 1.0 0.1 I How to create 0.0 1985 1990 1995 2000 2003 2005 2010 2015 illusion of fast Year access to capacious data. Figure: Widening gap: CPU processing time vs. memory access time. Image credit CS:APP

6/16 Static random-access memory (caches)

I SRAM is bistable logic I Access time: 1 to 10 CPU clock cycles I Implemented in the same transistor technology as CPUs, so improvement has matched pace.

Figure: SRAM operating principle. Image credit Wikimedia

7/16 Dynamic random-access memory (main memory)

I Needs refreshing every 10s of milliseconds I 8GB typical in laptop; 1TB on each ilab machine I Memory gap: DRAM technological improvement slower relative to CPU/SRAM. Figure: DRAM operating principle. Image credit ocw.mit.edu

8/16 CPU / DRAM main memory interface

CPU chip!

Register file!

ALU!

System bus! Memory bus!

I/O ! Main! Bus interface! bridge! memory!

Figure: Memory Bus. Image credit CS:APP Figure: Intel 2020 Gulftown die shot. Image credit AnandTech I DDR4 bus standard supports 25.6GB/s transfer rate

9/16 Solid state and hard disk drives (storage)

I/O bus

Requests to read and ! write blocks! Solid State Disk (SSD) Flash translation layer Technology 0 Block B-1 Page 0 Page 1 … Page P-1 … Page 0 Page 1 … Page P-1 I SSD: flash nonvolatile memory stores data as charge. I HDD: magnetic orientation. Figure: SSD. Image credit CS:APP For in-depth on storage, file systems, and operating

systems, take: Cylinder k

CS214 Systems Programming Surface 0! I Platter 0! Surface 1! Surface 2! I CS416 Operating Systems Design Platter 1! Surface 3! Surface 4! Platter 2! Over the summer, LCSR (admins of iLab) will be moving the storage systems Surface 5!

that supports everyone’s home directories to SSD. https://resources.cs. Spindle! rutgers.edu/docs/file-storage/storage-technology-options/ Figure: HDD. Image credit CS:APP 10/16 I/O interfaces

CPU! Register file!

ALU! System bus! Memory bus! Storage interfaces SATA 3.0 interface (6Gb/s I/O ! Main! I Bus interface! bridge! memory! transfer rate) typical I PCIe (15.8 GB/s) becoming commonplace for SSD I/O bus! Expansion slots for! other devices such! USB! Graphics! Host bus ! as network adapters! I But interface data rate is controller! adapter! adapter ! ! (SCSI/SATA)! rarely the bottleneck. Mouse! Solid ! Key! Monitor! state ! board! Disk ! disk! controller! For in-depth on computer Disk drive! network layers, take: I CS352 Internet Technology

Figure: I/O Bus. Image credit CS:APP 11/16 Cache, memory, storage, and network hierarchy trends

100,000,000.0 10,000,000.0 1,000,000.0

100,000.0 Disk seek time Topic of this chapter: 10,000.0 SSD access time 1,000.0 DRAM access time I Technology trends

Time (ns) Time 100.0 SRAM access time that drive 10.0 CPU cycle time Effective CPU cycle time CPU-memory gap. 1.0 0.1 I How to create 0.0 1985 1990 1995 2000 2003 2005 2010 2015 illusion of fast Year access to capacious data. Figure: Widening gap: CPU processing time vs. memory access time. Image credit CS:APP

12/16 Table of contents

Announcements

Cache, memory, storage, and network hierarchy trends Static random-access memory (caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage)

Locality: How to create illusion of fast access to capacious data Spatial locality Temporal locality

13/16 Locality: How to create illusion of fast access to capacious data

From the perspective of memory hierarchy, locality is using the data in at any particular level more frequently than accessing storage at next slower level. Well-written programs maximize locality I Spatial locality I Temporal locality

14/16 Spatial locality

1 double dotProduct (

2 double a[N], double b[N], 3 Spatial locality 4 ){ 5 double sum = 0.0; I Programs tend to access adjacent 6 for(size_t i=0; i

10 }

15/16 Temporal locality

1 double dotProduct (

2 double a[N], double b[N], 3 Temporal locality 4 ){ 5 double sum = 0.0; I Programs tend to access data over 6 for(size_t i=0; i

10 }

16/16