The Memory Hierarchy

The Memory Hierarchy Chapter 6 1 Outline ! Storage technologies and trends ! Locality of reference ! Caching in the memory hierarchy 2 Random-Access Memory (RAM) ! Key features ! RAM is tradi+onally packaged as a chip. ! Basic storage unit is normally a cell (one bit per cell). ! Mul+ple RAM chips form a memory. ! Sta@c RAM (SRAM) ! Each cell stores a bit with a four or six-transistor circuit. ! Retains value indefinitely, as long as it is kept powered. ! Relavely insensi+ve to electrical noise (EMI), radiaon, etc. ! Faster and more expensive than DRAM. ! Dynamic RAM (DRAM) ! Each cell stores bit with a capacitor. One transistor is used for access ! Value must be refreshed every 10-100 ms. ! More sensi+ve to disturbances (EMI, radiaon,…) than SRAM. ! Slower and cheaper than SRAM. 3 SRAM vs DRAM Summary Transistors Access Needs Needs per bit time refresh? EDC? Cost Applications SRAM 4 or 6 1X No Maybe 100x Cache memories DRAM 1 10X Yes Yes 1X Main memories, frame buffers 4 Enhanced DRAMs ! Basic DRAM cell has not changed since its inven@on in 1966. ! Commercialized by Intel in 1970. ! DRAM cores with beKer interface logic and faster I/O : ! Synchronous DRAM (SDRAM) ! Uses a conven+onal clock signal instead of asynchronous control ! Allows reuse of the row addresses (e.g., RAS, CAS, CAS, CAS) ! Double data-rate synchronous DRAM (DDR SDRAM) ! Double edge clocking sends two bits per cycle per pin ! Different types dis+nguished by size of small prefetch buffer: – DDR (2 bits), DDR2 (4 bits), DDR4 (8 bits) ! By 2010, standard for most server and desktop systems ! Intel Core i7 supports only DDR3 SDRAM 5 animaon Memory Modules addr (row = i, col = j) : supercell (i,j) DRAM 0 64 MB memory module DRAM 7 consisting of eight 8Mx8 DRAMs bits bits bits bits bits bits bits bits 56-63 48-55 40-47 32-39 24-31 16-23 8-15 0-7 63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0 Memory controller 64-bit doubleword at main memory address A 64-bit doubleword 6 Oponal Slides Conven@onal DRAM Organiza@on ! d × w DRAM: (e.g. 16 × 8 = 128 bits) Each supercell has w bits (e.g., w=8) Number of supercells is rows × cols = d (e.g., 4 × 4 = 16) 16 x 8 DRAM chip cols 0 1 2 3 2 bits 0 / addr 1 rows Memory 2 supercell controller (to/from CPU) (2,1) 8 bits 3 / data Internal row buffer 7 Oponal Slides animaon Reading DRAM Supercell (2,1) 16 x 8 DRAM chip Cols 0 1 2 3 RAS = 2 2 / 0 addr 1 Rows Memory controller 2 8 3 / data Internal row buffer 8 Oponal Slides animaon Reading DRAM Supercell (2,1) 16 x 8 DRAM chip Cols 0 1 2 3 CAS = 1 2 / 0 addr To CPU 1 Rows Memory controller 2 supercell 8 3 (2,1) / data supercell Internal row buffer (2,1) 9 Nonvola@le Memories ! DRAM and SRAM are vola@le memories ! Lose informaon if powered off. ! Nonvolale memories retain value even if powered off ! Read-only memory (ROM): programmed during produc+on ! Programmable ROM (PROM): can be programmed once ! Eraseable PROM (EPROM): can be bulk erased (UV, X-Ray) ! Electrically eraseable PROM (EEPROM): electronic erase capability ! Flash memory: EEPROMs with par+al (sector) erase capability ! Wears out aer about 100,000 erasings. ! Uses for Nonvola@le Memories ! Firmware programs stored in a ROM (BIOS, controllers for disks, network cards, graphics accelerators, security subsystems,…) ! Solid state disks (replace rotang disks in thumb drives, smart phones, mp3 players, tablets, laptops,…) ! Disk caches 10 Tradional Bus Structure A bus is a collecon of parallel wires that carry address, data, and control signals. Buses are typically shared by mulple devices. CPU chip Register file ALU System bus Memory bus I/O Main Bus interface bridge memory 11 Memory Read Transac@on The CPU places address A on the memory bus. CPU chip Register file Load operation: movq A,%rax ALU %rax Main memory I/O bridge 0 Bus interface x A 12 Memory Read Transac@on The CPU places address A on the memory bus. CPU chip Register file Load operation: movq A,%rax ALU %rax Main memory I/O bridge Addr A 0 Bus interface x A 13 Memory Read Transac@on Main memory reads A from the memory bus, retrieves word x, and places it on the bus. CPU chip Register file Load operation: movq A,%rax ALU %rax Main memory I/O bridge X 0 Bus interface x A 14 Memory Read Transac@on The CPU reads word x from the bus and copies it into register %rax. CPU chip Register file Load operation: movq A,%rax ALU %rax X Main memory I/O bridge 0 Bus interface x A 15 Memory Write Transac@on The CPU places address A on bus. Main memory reads it and waits for the corresponding data word to arrive. CPU chip Register file Store operation: movq %rax,A ALU %rax Y Main memory I/O bridge Addr A 0 Bus interface A 16 Memory Write Transac@on The CPU places data word Y on the bus. CPU chip Register file Store operation: movq %rax,A ALU %rax Y Main memory I/O bridge Y 0 Bus interface A 17 Memory Write Transac@on Main memory reads data word Y from the bus and stores it at address A. CPU chip register file Store operation: movq %rax,A ALU %rax Y main memory I/O bridge 0 bus interface Y A 18 What’s Inside A Disk Drive? Platters Spindle Arm Read/Write head Actuator Electronics (including a processor SCSI and memory!) connector Image courtesy of Seagate Technology 19 Disk Geometry Sector Track Bits Recording Spindle Surface 20 Disk Geometry A sector is a sequence of bytes (usually 512 bytes). Each track consists of sectors separated by gaps. Each surface consists of many tracks. Disks consist of plaers, each with two surfaces. Tracks Surface Track k Gaps Spindle Sector 21 Disk Capacity Capacity: maximum number of bits that can be stored. Vendors express capacity in units of gigabytes (GB) Note: 1 GB = 109 Bytes, not 230 Bytes Capacity is determined by these technology factors: Recording density (bits/in): number of bits that can be squeezed into a 1 inch segment of a track. Track density (tracks/in): number of tracks that can be squeezed into a 1 inch radial segment. Areal density (bits/in2): product of recording and track density. 22 Disk Geometry (Muliple-PlaKer View) Surface 0 Platter 0 Surface 1 Surface 2 Platter 1 Surface 3 Surface 4 Platter 2 Surface 5 Spindle 23 Disk Geometry (Muliple-PlaKer View) Aligned tracks form a cylinder. Cylinder k Surface 0 Platter 0 Surface 1 Surface 2 Platter 1 Surface 3 Surface 4 Platter 2 Surface 5 Spindle 24 Recording Zones Sector Want each sector to be about the same size. Outer tracks have lots of sectors Spindle Inner tracks have fewer sectors 25 Recording Zones Modern disks par@@on tracks into disjoint subsets called recording zones Each track in a zone has the same number of sectors. (Determined by the circumference of innermost track) Each zone has a different number of sectors/track. ! Outer zones have more sectors/track ! Inner zones have fewer sectors/track Use average number of sectors/track when compung capacity. 26 Compu@ng Disk Capacity Capacity = (# bytes/sector) × (average # sectors/track) × (# tracks/surface) × (# surfaces/plaKer) × (# plaKers/disk) Example: ! 512 bytes/sector ! 300 sectors/track (on average) ! 20,000 tracks/surface ! 2 surfaces/plaer ! 5 plaers/disk Capacity = 512 × 300 × 20000 × 2 × 5 = 30,720,000,000 = 30.72 GB 27 Disk Opera@on (Single-PlaKer View) The disk surface The read/write head spins at a fixed is attached to the end rotational rate of the arm and flies over the disk surface on a thin cushion of air. spindle spindle spindle spindle By moving radially, the arm can position the read/write head over any track. 28 Disk Opera@on (Mul@-PlaKer View) Read/write heads move in unison from cylinder to cylinder Arm Spindle 29 animaon Disk Structure - top view of single plaKer Surface organized into tracks Tracks divided into sectors 30 Disk Access Head in position above a track 31 Disk Access Rotation is counter-clockwise 32 Disk Access – Read About to read blue sector 33 Disk Access – Read After BLUE read After reading blue sector 34 Disk Access – Read After BLUE read Red request scheduled next 35 Disk Access – Seek After BLUE read Seek for RED Seek to red’s track 36 Disk Access – Rota@onal Latency After BLUE read Seek for RED Rotational latency Wait for red sector to rotate around 37 Disk Access – Read After BLUE read Seek for RED Rotational latency After RED read Complete read of red 38 Disk Access – Service Time Components After BLUE read Seek for RED Rotational latency After RED read Data transfer Seek Rota@onal Data transfer latency 39 Disk Access Time Average @me to access some target sector: Taccess = Tavg seek + Tavg rotaon + Tavg transfer Seek @me (Tavg seek) Time to posion heads over cylinder containing target sector. Typical: 3—9 ms Rota@onal latency (Tavg rotaon) Time waing for first bit of target sector to pass under r/w head. Tavg rotaon = 1/2 × 1/RPM × 60,000 ms/min Typical: 1/2 × 1/7200 × 60,000 ms/min = 4 ms Transfer me (Tavg transfer) Time to read the bits in the target sector. Tavg transfer = 1/RPM × 1/(avg # sectors/track) × 60,000 ms/min Typical: 1/7200 × 1/400 sectors × 60,000ms/min = 0.02 ms 40 Disk Access Time Example Given: Rotaonal rate = 7,200 RPM Average seek +me = 9 ms.

The Memory Hierarchy

United States Patent (10) Patent N0.: US 7,290,102 B2 Lubbers Et A]

Cortex-A9 Single Core Microarchitecture

Memory Hierarchy Summary

Lecture 22: Mass Storage

Tuning IBM System X Servers for Performance

POWER7 and POWER7+ Optimization and Tuning Guide

Fasttrak Tx2200 / Tx2300 Tx4200 / Tx4300 User Manual

Introduction to Computer Systems 15-213/18-243, Spring 2009

Introduction to Computer Systems 15-213/18-243, Spring 2009

Spheras Storage Director Installation & User Guide

Algorithms and Data Structures for External Memory Algorithms and Data Structures 2:4 for External Memory Jeffrey Scott Vitter

CS429: Computer Organization and Architecture Storage Technologies