Memory-System Design

A processor needs to retrieve instructions and data from memory, and store results into memory. We call this memory Random Access Memory (RAM).

Instructions Processor Memory Data (RAM)

There are two general types of Random Access Memory (RAM) ¨ Static RAM – fast and expensive.

¨ Dynamic RAM – slow and cheap.

It would be nice if we could use dynamic RAM (DRAM) for all our main memory needs, but: ¨ Best access times for DRAM are 50 to 60 nanoseconds (ns).

¨ Processor logic speeds are over 500 MHz – 2 ns access requirement.

Memory Systems Architecture of Parallel Computers 1 Memory Addressing

A 32 bit memory cell takes a 5-bit address and reads or writes 1 bit of data at a time:

A0 A1 Memory 5 bit Address in A2 D0 1 bit data (in or out) A3 Array A4

The actual memory is organized in an array of rows and columns, and addressed with a multiplexed address using a Row Address Strobe (RAS) and a Column Address Strobe (CAS):

CAS Column decode and sense D0 1 bit data (in or out) RAS

1 bit 1 bit 1 bit 1 bit

1 bit 1 bit 1 bit 1 bit

1 bit 1 bit 1 bit 1 bit A0

Row 1 bit 1 bit 1 bit 1 bit decode A1 and latch 1 bit 1 bit 1 bit 1 bit

A2 1 bit 1 bit 1 bit 1 bit

1 bit 1 bit 1 bit 1 bit

1 bit 1 bit 1 bit 1 bit

© 1997, 1999 G.Q. Kenney CSC 506, Summer 1999 2 Although the memory array addresses a single bit, we can replicate the arrays so we can get multiple bits in or out in parallel. For example, if we put four 32-bit arrays into a single chip, we get a total of 128 bits, organized as 32 x 4:

D3 A0 D2 4 bits data (in or out) A1 Memory D1 5 bit Address in A2 D0 A3 Array A4

Most computer systems today address memory in (8 bit) increments. That is, each sequential address is assumed to identify a byte.

Bits, and words Memory is really stored in a bit array.

Eight-bit computers access memory and manipulate data a byte (8 bits) at a time. Each sequential memory address denotes a byte of data. Data are stored and retrieved from memory a byte at a time.

16-bit computers access memory and manipulate data up to 16 bits at a time. 16 bits is the word length. In most 16-bit computers, each sequential memory address still denotes a byte of data. However, data are stored and retrieved from memory two bytes at a time.

32-bit computers access memory and manipulate data up to 32 bits at a time. 32 bits is the word length. In most 32-bit computers, each sequential memory address still denotes a byte of data. However, data are stored and retrieved four bytes at a time.

Some 32-bit computers access memory 64 bits at a time. 64 bits is the word length. Each sequential memory address still denotes a byte of data. However, data are stored and retrieved from memory eight bytes at a time.

Memory Systems Architecture of Parallel Computers 3 Standard Memory Packaging Early memory modules (chips) were always organized x 1 bit. They are called DIP (dual in-line package) chips. We need eight of them at a time for a computer that uses an 8-bit memory bus:

A0

20 bit Address in

A19

D0 D1 D2 D3 D4 D5 D6 D7

8 bit data in or out Later, the memory chips were packaged onto a small board called a SIMM – Single In-line Memory Module that provided eight data bits in parallel. These are called 30 pin . 30-pin SIMMs must be installed in groups of four on computers that access 32 bits in parallel.

In order to accommodate the 32-bit computers, the 72-pin SIMM was developed. It provided a single package with 32 bits in parallel.

More recent computers use a 64-bit bus width. They need 72-pin SIMMS installed in pairs. So, the 168-pin Dual In-line Memory Module (DIMM) was developed. It has 168 pins and a data width of 64 bits.

This is an example 168-pin DIMM package:

© 1997, 1999 G.Q. Kenney CSC 506, Summer 1999 4 DRAM Timing

Access to dynamic RAM requires: ¨ Row address placed on the address pins

¨ RAS signal given

¨ Column address placed on the address pins

¨ CAS signal given

¨ Read data from the output pin(s)

There is a minimum time required between each of the steps, and these taken together determine the minimum time required to retrieve data from the DRAM. The timings for individual memory parts are given in AC timing diagrams.

The access time of asynchronous DRAM is normally specified as the time from when the RAS signal is given to the time that data is valid on the output pins. Note that there is also a minimum row address setup time that needs to be added to that time.

The cycle time of DRAM is the rate at which successive random accesses can be made to the RAM. For example 60 ns access RAM can be cycled only every 110 ns.

Memory Systems Architecture of Parallel Computers 5 Ø IBM FPM DRAM block diagram and Read cycle timing diagram from “Fast Page Mode DRAM,” and “EDO DRAM” on class website.

© 1997, 1999 G.Q. Kenney CSC 506, Summer 1999 6 Typical RAM Parameters

Fast page mode DRAM parameters Parameter -50 -60 Units tRAC RAS Access Time 50 60 ns tCAC CAS Access Time 13 15 ns tAA Column Address Access Time 25 30 ns tRC Cycle Time 95 110 ns tPC Fast Page Mode Cycle Time 35 40 ns

Extended Data Out (EDO) DRAM parameters Parameter -50 -60 Units tRAC RAS Access Time 50 60 ns tCAC CAS Access Time 13 15 ns tAA Column Address Access Time 25 30 ns tRC Cycle Time 84 104 ns tHPC EDO (Hyper Page) Mode Cycle Time 20 25 ns

10 ns Synchronous DRAM (SDRAM) parameters Parameter Units fCK Clock Frequency 100 66 33 MHz tCK Clock Cycle Time 10 15 30 ns tAA CAS Latency 3 2 1 CLK tRL RAS Latency 6 4 2 CLK tRC Bank Cycle Time 8 5 3 CLK

Pipeline burst SRAM parameters Parameter -4 -5 Units tCYCLE Cycle Time 10.0 10.0 ns tAS Address Setup Time 2.5 2.5 ns tAH Address Hold Time 0.5 0.5 ns tCQ Clock to Output Valid 4.0 5.0 ns

Memory Systems Architecture of Parallel Computers 7 Because access time is faster than cycle time, memory may be installed in two banks, with successive words alternating between the banks. We can then partially overlap access to successive memory words by starting the access to the second bank while the first one is finishing.

Interleaved Memory

D7 D6 Memory D5 D4 Array D3 A0 D2 A1 D1 A2 Bank 0 D0 D7 A3 D6

Addr. Latch A4 D5 D4 D3 A0 (BS) D2 A1 Data Latch D1 A2 D7 D0 A3 D6 A4 D5 A5 Memory D4 Array D3 A0 D2 A1 D1 A2 Bank 1 D0 A3

Addr. Latch A4

With a byte-wide bus, sequential bytes are in alternate banks. Bank 0 contains byte addresses 0, 2, 4, 6, etc. and bank 1 contains addresses 1, 3, 5, 7, etc.

If we have a word width (bus) greater than a byte, sequential words are in alternate banks. For example, if we have a word width of 32 bits (4 bytes), 32 words in each bank, and two banks of memory (for a total of 2048 bits or 256 bytes of memory), the following diagram shows the addressing:

© 1997, 1999 G.Q. Kenney CSC 506, Summer 1999 8 5 32 Bank 0 8 D0 - D7 Addr. Latch 8 D8 - D15 A2 (BS) 8 5 D16 - D23

A3 - A7 Data Latch 8 5 32 D24 - D31 Bank 1 Addr. Latch

Memory Address

5 bits 1 bit 2 bits 8 bit address

4 Bytes within a word

One of two banks

32 words in each bank

Bank address Bank 0 Bank 1 00000 0 1 2 3 4 5 6 7

00001 8 9 10 11 12 13 14 15

00010 16 17 18 19 20 21 22 23

00011 24 25 26 27 28 29 30 31

00100

00101

Memory Systems Architecture of Parallel Computers 9 Synchronous DRAM (SDRAM)

SDRAM has actually taken the step of packaging interleaved memory into the memory chip.

Ø IBM Synchronous DRAM (SDRAM) block diagram and timing diagram from “Synchronous DRAM” on class website.

© 1997, 1999 G.Q. Kenney CSC 506, Summer 1999 10 Static RAM

Static RAM (SRAM) does not need separate RAS and CAS signals. The data is stored in flip-flops and so doesn’t need to be destroyed and re-written on access, and it doesn’t need to be periodically refreshed.

SRAM is expensive (it takes more than four times the chip area than DRAM for an equivalent number of bits), but access times are fast.

Ø IBM Burst Pipeline SRAM timing diagram from “Static RAM” on class website.

Memory Systems Architecture of Parallel Computers 11 © 1997, 1999 G.Q. Kenney CSC 506, Summer 1999 12