Memory Hierarchy: Locality, and Storage Technologies
Total Page:16
File Type:pdf, Size:1020Kb
Memory Hierarchy: locality, and storage technologies Yipeng Huang Rutgers University April 1, 2021 1/16 Table of contents Announcements Cache, memory, storage, and network hierarchy trends Static random-access memory (caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage) Locality: How to create illusion of fast access to capacious data Spatial locality Temporal locality 2/16 Looking ahead Class plan 1. Today, Thursday, 4/1: Introduction to locality and the memory hierarchy. Caches. 2. Preview of next two assignments—PA5: cache simulator and performance. PA6: digital logic. 3/16 Table of contents Announcements Cache, memory, storage, and network hierarchy trends Static random-access memory (caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage) Locality: How to create illusion of fast access to capacious data Spatial locality Temporal locality 4/16 Cache, memory, storage, and network hierarchy trends L0: Regs CPU registers hold words Smaller, retrieved from the L1 cache. faster, L1: L1 cache and (SRAM) L1 cache holds cache lines retrieved from the L2 cache. costlier L2 cache I Assembly (per byte) L2: (SRAM) storage L2 cache holds cache lines programming view devices retrieved from L3 cache L3: L3 cache of computer: CPU (SRAM) L3 cache holds cache lines and memory. retrieved from main memory. Larger, I Full view of slower, L4: Main memory and (DRAM) Main memory holds disk computer cheaper blocks retrieved from (per byte) local disks. architecture and storage L5: Local secondary storage devices (local disks) systems: +caches, Local disks hold files retrieved from disks +storage, +network on remote servers L6: Remote secondary storage (e.g., Web servers) Figure: Memory hierarchy. Image credit CS:APP 5/16 Cache, memory, storage, and network hierarchy trends 100,000,000.0 10,000,000.0 1,000,000.0 100,000.0 Disk seek time Topic of this chapter: 10,000.0 SSD access time 1,000.0 DRAM access time I Technology trends Time (ns) Time 100.0 SRAM access time that drive 10.0 CPU cycle time Effective CPU cycle time CPU-memory gap. 1.0 0.1 I How to create 0.0 1985 1990 1995 2000 2003 2005 2010 2015 illusion of fast Year access to capacious data. Figure: Widening gap: CPU processing time vs. memory access time. Image credit CS:APP 6/16 Static random-access memory (caches) I SRAM is bistable logic I Access time: 1 to 10 CPU clock cycles I Implemented in the same transistor technology as CPUs, so improvement has matched pace. Figure: SRAM operating principle. Image credit Wikimedia 7/16 Dynamic random-access memory (main memory) I Needs refreshing every 10s of milliseconds I 8GB typical in laptop; 1TB on each ilab machine I Memory gap: DRAM technological improvement slower relative to CPU/SRAM. Figure: DRAM operating principle. Image credit ocw.mit.edu 8/16 CPU / DRAM main memory interface CPU chip! Register file! ALU! System bus! Memory bus! I/O ! Main! Bus interface! bridge! memory! Figure: Memory Bus. Image credit CS:APP Figure: Intel 2020 Gulftown die shot. Image credit AnandTech I DDR4 bus standard supports 25.6GB/s transfer rate 9/16 Solid state and hard disk drives (storage) I/O bus Requests to read and ! write logical disk blocks! Solid State Disk (SSD) Flash translation layer Flash memory Technology Block 0 Block B-1 Page 0 Page 1 … Page P-1 … Page 0 Page 1 … Page P-1 I SSD: flash nonvolatile memory stores data as charge. I HDD: magnetic orientation. Figure: SSD. Image credit CS:APP For in-depth on storage, file systems, and operating systems, take: Cylinder k CS214 Systems Programming Surface 0! I Platter 0! Surface 1! Surface 2! I CS416 Operating Systems Design Platter 1! Surface 3! Surface 4! Platter 2! Over the summer, LCSR (admins of iLab) will be moving the storage systems Surface 5! that supports everyone’s home directories to SSD. https://resources.cs. Spindle! rutgers.edu/docs/file-storage/storage-technology-options/ Figure: HDD. Image credit CS:APP 10/16 I/O interfaces CPU! Register file! ALU! System bus! Memory bus! Storage interfaces SATA 3.0 interface (6Gb/s I/O ! Main! I Bus interface! bridge! memory! transfer rate) typical I PCIe (15.8 GB/s) becoming commonplace for SSD I/O bus! Expansion slots for! other devices such! USB! Graphics! Host bus ! as network adapters! I But interface data rate is controller! adapter! adapter ! ! (SCSI/SATA)! rarely the bottleneck. Mouse! Solid ! Key! Monitor! state ! board! Disk ! disk! controller! For in-depth on computer Disk drive! network layers, take: I CS352 Internet Technology Figure: I/O Bus. Image credit CS:APP 11/16 Cache, memory, storage, and network hierarchy trends 100,000,000.0 10,000,000.0 1,000,000.0 100,000.0 Disk seek time Topic of this chapter: 10,000.0 SSD access time 1,000.0 DRAM access time I Technology trends Time (ns) Time 100.0 SRAM access time that drive 10.0 CPU cycle time Effective CPU cycle time CPU-memory gap. 1.0 0.1 I How to create 0.0 1985 1990 1995 2000 2003 2005 2010 2015 illusion of fast Year access to capacious data. Figure: Widening gap: CPU processing time vs. memory access time. Image credit CS:APP 12/16 Table of contents Announcements Cache, memory, storage, and network hierarchy trends Static random-access memory (caches) Dynamic random-access memory (main memory) Solid state and hard disk drives (storage) Locality: How to create illusion of fast access to capacious data Spatial locality Temporal locality 13/16 Locality: How to create illusion of fast access to capacious data From the perspective of memory hierarchy, locality is using the data in at any particular level more frequently than accessing storage at next slower level. Well-written programs maximize locality I Spatial locality I Temporal locality 14/16 Spatial locality 1 double dotProduct ( 2 double a[N], double b[N], 3 Spatial locality 4 ){ 5 double sum = 0.0; I Programs tend to access adjacent 6 for(size_t i=0; i<N; i++){ data. 7 sum += a[i] * b[i]; I Example: stride 1 memory access in 8 } a and b. 9 return sum; 10 } 15/16 Temporal locality 1 double dotProduct ( 2 double a[N], double b[N], 3 Temporal locality 4 ){ 5 double sum = 0.0; I Programs tend to access data over 6 for(size_t i=0; i<N; i++){ and over. 7 sum += a[i] * b[i]; I Example: sum gets accessed N times 8 } in iteration. 9 return sum; 10 } 16/16.