Memory Hierarchy

Memory The Memory/Processor Performance Gap Memory-Processor Performance Gap The memory hierarchy CPU Increasing distance Level 1 from the CPU in access time Levels in the Level 2 memory hierarchy Level n Size of the memory at each level The memory hierarchy Fundamentals of Memory Hierarchy Locality Temporal locality The currently required data are likely to need again in the near future Spatial locality There is high probability that the other data nearby will be need soon Two Processor/Memory Architectures Processor Processor Princeton Fewer memory wires Harvard Program Data memory Memory Simultaneous memory (program and data) program and data memory access Harvard Princeton Memory Access Process Physical Address Virtual Tag address hit miss CPU Main TLB Cache Memory miss hit Address Translation data TLB (Translation lookaside buffer): Translate a virtual address to a physical address Memory management units Handles DRAM refresh, bus interface and arbitration Takes care of memory sharing among multiple processors Translates logic memory addresses from processor to physical memory addresses logical physical address memory address main CPU management memory unit Memory Data Organization Endianness Big Endian/Little Endian Memory data alignment Endianness The order of bytes (sometimes “bit”) in memory to represent different data types Little/Big Endian Little Endian: put the least-significant byte first (at lower address) e.g. Intel Processor Big Endian: put the most-significant byte first e.g.some PowerPCs, Motorola, MIPS, SPARC Big/Little Endian Example 32bit data 0xFABC0123 at address 0xFF20 0xFF20 0xFF21 0xFF22 0xFF23 Big Endian 0xFA 0xBC 0x01 0x23 Little Endian 0x23 0x01 0xBC 0xFA Data Alignment Data alignment A datum with multiple bytes need to be allocated to an address that is a multiple of its size Examples 0bxxxxxxxxxx byte (8bit) aligned 0bxxxxxxxxx0 half word (16bit) aligned 0bxxxxxxxx00 word (32bit) aligned 0bxxxxxxx000 double word (64bit) aligned Why aligned? Misalignment causes implementation complications and reduces performance Memory device: basic concepts m × n memory Stores large number of bits … m x n: m words of n bits each … words 2 k = Log (m) address input signals m or m = 2^k words e.g., 4,096 x 8 memory: n bits per word 32,768 bits 12 address input signals memory external view r/w 8 input/output data signals 2k × n read and write memory Memory access enable A0 r/w: selects read or write … enable: read or write only when asserted Ak-1 … multiport: multiple accesses to different locations simultaneously Qn-1 Q0 Memory Types ROM: “Read-Only” Memory Nonvolatile memory Can be read from but not written to, by a processor in an embedded system External view Traditionally written to, “programmed”, before inserting to enable 2k × n ROM embedded system A0 … Uses Ak-1 … Store software program for general-purpose processor Q Q program instructions can be one or more ROM words n-1 0 Store constant data needed by system Implement combinational circuit Example: 8 x 4 ROM Horizontal lines = words Internal view Vertical lines = data 8 × 4 ROM Lines connected only at circles word 0 enable 3×8 word 1 decoder Decoder sets word 2’s line to 1 if address input word 2 word line is 010 A0 A1 Data lines Q3 and Q1 are set to 1 because there A2 is a “programmed” connection with word 2’s line data line programmable wired-OR Word 2 is not connected with data lines Q2 and connection Q0 Q3 Q2 Q1 Q0 Output is 1010 Implementing combinational function Any combinational circuit of n functions of same k variables can be done with 2^k x n ROM Truth table Inputs (address) Outputs a b c y z 8×2 ROM word 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 word 1 0 1 0 0 1 0 1 0 1 1 1 0 enable 1 0 1 0 0 1 0 1 0 1 0 1 1 1 c 1 1 1 1 0 1 1 1 1 b 1 1 1 1 1 1 1 word 7 a y z Types of ROM Written during manufacture Programmable (once) PROM Needs special equipment to program Read “mostly” Erasable Programmable (EPROM) Erased by UV can program and erase individual words Electrically Erasable (EEPROM) Takes much longer to write than read can program and erase individual words as well Flash memory Large blocks of memory read/write at once, rather than one word at a time Faster erase RAM external view r/w 2k × n read and write Random access memory enable memory Typically volatile memory A 0 … bits are not held without power supply Ak-1 Read and written easily by processor during execution … Internal structure more complex than ROM Qn-1 Q0 a word consists of several memory cells, each storing 1 bit internal view I I I I each input and output data line connects to each cell in its 3 2 1 0 column 4×4 RAM rd/wr connected to every cell enable 2×4 decoder when row is enabled by decoder, each cell has logic that stores input data bit when rd/wr indicates write or outputs stored bit A0 A1 when rd/wr indicates read Memory cell rd/wr To every cell Q3 Q2 Q1 Q0 Types of RAM SRAM: Static RAM Memory cell uses flip-flop to store bit Holds data as long as power supplied DRAM: Dynamic RAM Memory cell uses transistor and capacitor to store bit More compact than SRAM “Refresh” required due to capacitor leak Slower to access than SRAM Device Schematic and Time Diagram data<7…0> 11-13, 15-19 11-13, 15-19 data<7…0> 2,23,21,24, addr<15...0> 27,26,2,23,21, addr<15...0> 25, 3-10 24,25, 3-10 22 /OE 22 /OE 27 /WE 20 /CS 20 /CS1 27C256 26 CS2 HM6264 block diagrams Device Access Time (ns) Standby Pwr. (mW) Active Pwr. (mW) Vcc Voltage (V) HM6264 85-100 .01 15 5 27C256 90 .5 100 5 device characteristics Read operation Write operation data data addr addr OE WE /CS1 /CS1 CS2 CS2 timing diagrams Composing memory Increase number of words Memory size needed often differs from size of readily available 2m+1 × n ROM memories 2m × n ROM When available memory is larger, simply ignore unneeded high- A0 … … order address bits and higher data lines Am-1 1 × 2 … When available memory is smaller, compose several smaller Am decoder memories into one larger memory 2m × n ROM enable Connect side-by-side to increase width of words … Connect top to bottom to increase number of words … added high-order address line selects smaller memory containing desired word using a decoder Combine techniques to increase number and width of words … Qn-1 Q0 A 2m × 3n ROM Increase number enable 2m × n ROM 2m × n ROM 2m × n ROM and width of Increase width words A of words 0 … … … enable Am … … … outputs Q3n-1 Q2n-1 Q0 Summary Memory hierarchy Memory/processor architecture Memory access process Endianness Data alignment Memory data organization Memory devices Basics ROM/RAM .

Memory Hierarchy

Data Storage the CPU-Memory

Managing the Memory Hierarchy

Embedded DRAM

The Memory Hierarchy

Lecture 10: Memory Hierarchy -- Memory Technology and Principal of Locality

Chapter 2: Memory Hierarchy Design

Memory Hierarchy Summary

Memory Hierarchy Design

Memory Hierarchy

The Lower Levels of the Memory Hierarchy: Storage Systems

14. Caches & the Memory Hierarchy

Chapter 5 Memory Hierarchy