Memory The Memory/ Performance Gap

Memory-Processor Performance Gap The

CPU

Increasing distance Level 1 from the CPU in access time

Levels in the Level 2 memory hierarchy

Level n

Size of the memory at each level The memory hierarchy Fundamentals of Memory Hierarchy  Locality  Temporal locality  The currently required are likely to need again in the near future  Spatial locality  There is high probability that the other data nearby will be need soon Two Processor/Memory Architectures

Processor Processor  Princeton  Fewer memory wires

 Harvard Program Data memory Memory  Simultaneous memory (program and data) program and data

memory access Harvard Princeton Memory Access Process

Physical Address Virtual Tag address hit miss

CPU Main TLB Memory

miss hit

Address Translation

data

TLB (Translation lookaside buffer): Translate a virtual address to a physical address Memory management units

 Handles DRAM refresh, bus interface and arbitration  Takes care of memory sharing among multiple processors  Translates logic memory addresses from processor to physical memory addresses

logical physical address memory address main CPU management memory unit Memory Data Organization  Endianness  Big Endian/Little Endian

 Memory data alignment Endianness  The order of (sometimes “bit”) in memory to represent different data types  Little/Big Endian  Little Endian:  put the least-significant first (at lower address)  e.g. Intel Processor  Big Endian:  put the most-significant byte first  e.g.some , Motorola, MIPS, SPARC Big/Little Endian Example  32bit data 0xFABC0123 at address 0xFF20

0xFF20 0xFF21 0xFF22 0xFF23 Big Endian 0xFA 0xBC 0x01 0x23 Little Endian 0x23 0x01 0xBC 0xFA Data Alignment  Data alignment  A datum with multiple bytes need to be allocated to an address that is a multiple of its size  Examples  0bxxxxxxxxxx byte (8bit) aligned  0bxxxxxxxxx0 half word (16bit) aligned  0bxxxxxxxx00 word (32bit) aligned  0bxxxxxxx000 double word (64bit) aligned  Why aligned?  Misalignment causes implementation complications and reduces performance Memory device: basic concepts

m × n memory

 Stores large number of bits …

 m x n: m words of n bits each

… words  2

k = Log (m) address input signals m  or m = 2^k words

 e.g., 4,096 x 8 memory: n bits per word  32,768 bits

 12 address input signals memory external view r/w  8 input/output data signals 2k × n read and write memory  Memory access enable

 A0 r/w: selects read or write …

 enable: read or write only when asserted Ak-1 …  multiport: multiple accesses to different locations simultaneously Qn-1 Q0 Memory Types ROM: “Read-Only” Memory  Nonvolatile memory  Can be read from but not written to, by a processor in an embedded system External view

 Traditionally written to, “programmed”, before inserting to enable 2k × n ROM

embedded system A0

 Uses Ak-1 …  Store software program for general-purpose processor Q Q  program instructions can be one or more ROM words n-1 0  Store constant data needed by system  Implement combinational circuit Example: 8 x 4 ROM

 Horizontal lines = words Internal view  Vertical lines = data 8 × 4 ROM  Lines connected only at circles word 0

enable 3×8 word 1 decoder  Decoder sets word 2’s line to 1 if address input word 2 word line is 010 A0 A1  Data lines Q3 and Q1 are set to 1 because there A2 is a “programmed” connection with word 2’s line data line programmable wired-OR  Word 2 is not connected with data lines Q2 and connection

Q0 Q3 Q2 Q1 Q0  Output is 1010 Implementing combinational function

 Any combinational circuit of n functions of same k variables can be done with 2^k x n ROM

Truth table Inputs (address) Outputs a b y z 8×2 ROM word 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 word 1 0 1 0 0 1 0 1 0 1 1 1 0 enable 1 0 1 0 0 1 0 1 0 1 0 1 1 1 c 1 1 1 1 0 1 1 1 1 b 1 1 1 1 1 1 1 word 7 a y z Types of ROM

 Written during manufacture  Programmable (once)  PROM  Needs special equipment to program  Read “mostly”  Erasable Programmable (EPROM)  Erased by UV  can program and erase individual words  Electrically Erasable (EEPROM)  Takes much longer to write than read  can program and erase individual words as well   Large blocks of memory read/write at once, rather than one word at a time  Faster erase RAM

external view  r/w 2k × n read and write Random access memory enable memory  Typically A 0 …  bits are not held without power supply Ak-1  Read and written easily by processor during execution …

 Internal structure more complex than ROM Qn-1 Q0

 a word consists of several memory cells, each storing 1 bit internal view I I I I  each input and output data line connects to each cell in its 3 2 1 0

column 4×4 RAM

 rd/wr connected to every cell enable 2×4 decoder  when row is enabled by decoder, each cell has logic that stores input data bit when rd/wr indicates write or outputs stored bit A0 A1 when rd/wr indicates read rd/wr To every cell

Q3 Q2 Q1 Q0 Types of RAM  SRAM: Static RAM  Memory cell uses flip-flop to store bit  Holds data as long as power supplied

 DRAM: Dynamic RAM  Memory cell uses transistor and capacitor to store bit  More compact than SRAM  “Refresh” required due to capacitor leak  Slower to access than SRAM Device Schematic and Time Diagram

data<7…0> 11-13, 15-19 11-13, 15-19 data<7…0> 2,23,21,24, addr<15...0> 27,26,2,23,21, addr<15...0> 25, 3-10 24,25, 3-10 22 /OE 22 /OE

27 /WE 20 /CS

20 /CS1 27C256 26 CS2 HM6264 diagrams

Device Access Time (ns) Standby Pwr. (mW) Active Pwr. (mW) Vcc Voltage (V) HM6264 85-100 .01 15 5 27C256 90 .5 100 5

device characteristics

Read operation Write operation

data data addr addr OE WE /CS1 /CS1 CS2 CS2 timing diagrams Composing memory

Increase number of words  Memory size needed often differs from size of readily available 2m+1 × n ROM memories 2m × n ROM  When available memory is larger, simply ignore unneeded high- A0 … … order address bits and higher data lines Am-1 1 × 2 …  When available memory is smaller, compose several smaller Am decoder memories into one larger memory 2m × n ROM enable  Connect side-by-side to increase width of words …  Connect top to bottom to increase number of words …  added high-order address line selects smaller memory containing desired word using a decoder  Combine techniques to increase number and width of words … Q Q n-1 0 A 2m × 3n ROM Increase number enable 2m × n ROM 2m × n ROM 2m × n ROM and width of Increase width words A of words 0 … … … enable Am … … … outputs

Q3n-1 Q2n-1 Q0 Summary  Memory hierarchy  Memory/processor architecture  Memory access process  Endianness  Data alignment  Memory data organization  Memory devices  Basics  ROM/RAM