CHAPTER 10 DRAM Memory System Organization

Previous chapters examine the basic building In the past few decades, the growth rate of DRAM blocks of DRAM devices and signaling issues that device storage capacity has roughly paralleled the constrain the transmission and subsequent storage growth rate of the size of memory systems for desk- of data into the DRAM devices. In this chapter, basic top computers, workstations, and servers. The paral- terminologies and building blocks of DRAM memory lel growth rates have dictated system designs in that systems are described. Using the building blocks multiple DRAM devices must be connected together described in the previous chapters, the text in this to form memory systems in most computing plat- chapter examines the construction, organization, and forms. In this chapter, the organization of different operation of multiple DRAM devices in a larger mem- multi-chip DRAM memory systems and different ory system. This chapter covers the terminologies interconnection strategies deployed for cost and per- and topology, as well as the organization of various formance concerns are explored. types of memory modules. In Figure 10.1, multiple DRAM devices are intercon- nected together to form a single memory system that is managed by a single memory controller. In modern computer systems, one or more DRAM memory con- 10.1 Conventional Memory System trollers (DMCs) may be contained in the processor The number of storage bits contained in a given package or integrated into a system controller that DRAM device is constrained by the manufacturing resides outside of the processor package. Regardless process technology, the size, the array effi ciency, of the location of the DRAM memory controller, its and the effectiveness of the defect-cell remapping functionality is to accept read and write requests to mechanism for yield enhancement. As the manu- a given address in memory, translate the request to facturing process technology advances in line with one or more commands to the memory system, issue Moore’s Law, the number of storage bits contained in those commands to the DRAM devices in the proper a given DRAM device doubles every few years. How- sequence and proper timing, and retrieve or store data ever, the unspoken corollary to Moore’s Law states on behalf of the processor or I/O devices in the sys- that software written by software companies in the tem. The internal structures of a system controller are Pacifi c Northwest and elsewhere will automatically examined in a separate chapter. This chapter focuses expand to fi ll available memory in a given system. on the organization of DRAM devices in the context of Consequently, the number of storage bits contained multi-device memory systems. in a single DRAM device at any given instance in time has been and will continue to be inadequate to serve as the main memory for most computing 10.2 Basic Nomenclature platforms with the exception of specialty embedded The organization of multiple DRAM devices into a systems. memory system can impact the performance of the

409 410 Memory Systems: Cache, DRAM, Disk

Channel? Rank? Bank?Row? Column?

Data Channel Address =? Rank Address = ? Memory Bank Address = ? Controller Row address =? Column Address =? Command Sequence

FIGURE 10.1: Multiple DRAM devices connected to a processor through a DRAM memory controller.

memory system in terms of system storage capac- of the word bank can be inferred from the context ity, operating data rates, access latency, and sustain- in each case, the overloading and repeated use of able bandwidth characteristics. It is therefore of great the word introduces unnecessary confusion into importance that the organization of multiple DRAM discussions about DRAM memory systems. In this devices into larger memory systems be examined in section, the usage of channel, rank, bank, row, and detail. However, the absence of commonly accepted column is defi ned, and discussions in this and sub- nomenclature has hindered the examination of DRAM sequent chapters will conform to the usage in this memory-system organizations. Without a common chapter. basis of well-defi ned nomenclature, technical articles and data sheets sometimes succeed in introducing confusion rather than clarity into discussions on 10.2.1 Channel DRAM memory systems. In one example, a technical Figure 10.2 shows three different system control- data sheet for a system controller used the word bank lers with slightly different confi gurations of the DRAM in two bulleted items on the same page to mean two memory system. In Figure 10.2, each system con- different things. In this data sheet, one bulleted item troller has a single DRAM memory controller (DMC), proclaimed that the system controller could support and each DRAM memory controller controls a single 6 banks (of DRAM devices). Then, several bulleted channel of memory. In the example labelled as the items later, the same data sheet stated that the same typical system controller, the system controller con- system controller could support SDRAM devices with trols a single 64-bit-wide channel. In modern DRAM 4 banks. In a second example, an article in a well- memory systems, commodity DRAM memory mod- respected technical journal examined the then-new ules are standardized with 64-bit-wide data busses, i875P system controller from Intel and proceeded and the 64-bit data bus width of the memory mod- to discuss the performance advantage of the system ule matches the data bus width of the typical per- controller due to the fact that the i875P system con- sonal computer system controller.1 In the example troller could control 2 banks of DRAM devices (it can labelled as Intel i875P system controller, the system control two entire channels). controller connects to a single channel of DRAM with In these two examples, the word bank was used a 128-bit-wide data bus. However, since commod- to mean three different things. While the meaning ity DRAM modules have 64-bit-wide data busses,

1Commodity memory modules designed for error correcting memory systems are standardized with a 72-bit-wide data bus. Chapter 10 DRAM MEMORY SYSTEM ORGANIZATION 411

the i875P system controller requires matching pairs Direct RDRAM memory modules are designed with of 64-bit wide memory modules to operate with the 16-bit-wide data busses, and high-performance sys- 128-bit-wide data bus. The paired- tem controllers that use Direct RDRAM, such as the confi guration of the i875P is often referred to as a Intel i850 system controller, use matched pairs of dual channel confi guration. However, since there is Direct RDRAM memory modules to form a 32-bit- only one memory controller, and since both memory wide channel that operates in lockstep across the two modules operate in lockstep to store and retrieve physical channels of memory. data through the 128-bit-wide data bus, the paired- In contrast to system controllers that use a single memory module confi guration is, logically, a 128-bit- DRAM memory controller to control the entire mem- wide single channel memory system. Also, similar to ory system, Figure 10.3 shows that the Alpha EV7 pro- SDRAM and DDR SDRAM memory systems, standard cessor and the Intel i925x system controller each have

“Typical” 64 One “physical channel” of 64 bit width system controller DMC DDR One DMC: One logical 64 bit wide channel

64 DDR Intel i875P Two “physical channels” of 64 bit wide busses DMC 128 system controller One DMC: One logical 128 bit wide channel 64 DDR

16 D-RDRAM Two “physical channels” of 16 bit width Intel i850 32 DMC system controller One DMC: One logical 32 bit wide channel 16 D-RDRAM

FIGURE 10.2: Systems with a single memory controller and different data bus widths.

16 D-RDRAM 64 D-RDRAM 64 D-RDRAM DDR2 DMC 16 DMC D-RDRAM 64 D-RDRAM DDR2 16 DMC DMC 64 D-RDRAM

HPQ Alpha EV7 processor D-RDRAM Intel i925X system controller 16 D-RDRAM Two Channels: 64 bit wide per channel Two Channels: 64 bit wide per channel

FIGURE 10.3: Systems with two independent memory controllers and two logical channels of memory. 412 Memory Systems: Cache, DRAM, Disk

two DRAM controllers that independently control in an asymmetric mode and independently controls 64-bit-wide data busses.2 The use of indepen- the physical channels of DRAM modules. However, dent DRAM memory controllers can lead to higher since there is only one DRAM memory controller, the sustainable bandwidth characteristics, since the multiple physical channels of mismatched memory narrower channels lead to longer data bursts per modules cannot be accessed concurrently, and only cacheline request, and the various ineffi ciencies one channel of memory can be accessed at any given dictated by DRAM-access protocols can be better instance in time. In the asymmetric confi guration, amortized. As a result, newer system controllers are the maximum system bandwidth is the maximum often designed with multiple memory controllers bandwidth of a single physical channel. despite the additional die cost. A second variation of the single-controller- Modern memory systems with one DRAM multiple-physical-channel confi guration can be memory controller and multiple physical chan- found in high-performance FPM DRAM memory nels of DRAM devices such as those illustrated in systems that were designed prior to the emergence of Figure 10.2 are typically designed with the physi- SDRAM-type DRAM devices that can burst out multi- cal channels operating in lockstep with respect to ple columns of data with a given column access com- each other. However, there are two variations to the mand. Figure 10.4 illustrates a sample timing diagram single-controller-multiple-physical-channel con- of a column access in an SDRAM memory system. fi guration. One variation of the single-controller- Figure 10.4 shows that an SDRAM device is able to multiple-physical-channel confi guration is that return a burst of multiple columns of data for a single some system controllers, such as the Intel i875P sys- column access command. However, an FPM DRAM tem controller, allow the use of mismatched pairs of device supported neither single-access-multiple- memory modules in the different physical channels. burst capability nor the ability to pipeline multiple In such a case, the i875P system controller operates column access commands. As a result, FPM DRAM

ck cas 1 data 1 FPM data 2 cmd 2 cas 1 t CAS Burst Length FPM data 2

cas 1 SDRAM memory bursts multiple DMC columns of data (2) for each column 72 access command (1). data 2 FPM

FPM DRAM returns one column of cas 1 data (2) for each column access FPM command (1). Column accesses data 2 cannot be pipelined. Solution: stagger 72 High performance system controller of High performance column accesses to different physical FPM mem. with interleaved yesteryear tCAS channels of FPM DRAM devices

FIGURE 10.4: High-performance memory controllers with four channels of interleaved FPM DRAM devices.

2Ignoring additional bitwidths used for error correction and cache directory. Chapter 10 DRAM MEMORY SYSTEM ORGANIZATION 413 devices need multiple column accesses to retrieve 10.2.2 Rank the multiple columns of data for a given cacheline Figure 10.5 shows a memory system populated access, column accesses that cannot be pipelined to with 2 ranks of DRAM devices. Essentially, a rank of a single FPM DRAM device. memory is a “bank” of one or more DRAM devices One solution deployed to overcome the shortcom- that operate in lockstep in response to a given com- ings of FPM DRAM devices is the use of multiple mand. However, the word bank has already been FPM DRAM channels operating in an interleaved used to describe the number of independent DRAM fashion. Figure 10.4 also shows how a sophisticated arrays within a DRAM device. To lessen the confu- FPM DRAM controller can send multiple column sion associated with overloading the nomenclature, accesses to different physical channels of memory the word rank is now used to denote a set of DRAM so that the data for the respective column accesses devices that operate in lockstep to respond to a given appears on the data bus in consecutive cycles. In this command in a memory system. confi guration, the multiple FPM DRAM channels can Figure 10.5 illustrates a confi guration of 2 ranks provide the sustained throughput required in high- of DRAM devices in a classical DRAM memory sys- performance workstations and servers before the tem topology. In the classical DRAM memory system appearance of modern synchronous DRAM devices topology, address and command busses are con- that can burst through multiple columns of data in nected to every DRAM device in the memory system, consecutive cycles.

address and command

data bus Bank 16

data bus DMC 16

data bus 16

data bus 16 chip-select 0 chip-select 1 Rank

FIGURE 10.5: Memory system with 2 ranks of DRAM devices. 414 Memory Systems: Cache, DRAM, Disk

CKE row control addr Bank 0 CLK logic mux row rowaddress Bank 1 row addresslatch & DRAM CS# rowaddresslatch & Array addressdecoder Bank 2 latchdecoder & WE# latch & decoder Bank 3 CAS# decoder

command decode sense amp array RAS# refresh sense amp array sense amp array counter sense amp array mode register data out addr register bus I/Oread gating data latch / bank data I/O address write drivers register control data in column address column register columncolumndecoder counter columndecoderdecoder decoder

FIGURE 10.6: SDRAM device with 4 banks of DRAM arrays internally.

but the wide data bus is partitioned and connected and different physical channels of memory. In this to different DRAM devices. The memory control- chapter, the word bank is only used to denote a set of ler in this classical system topology then uses chip- independent memory arrays inside a DRAM device. select signals to select the appropriate rank of DRAM Figure 10.6 shows an SDRAM device with 4 banks devices to respond to a given command. of DRAM arrays. Modern DRAM devices contain mul- In modern memory systems, multiple DRAM tiple banks so that multiple, independent accesses to devices are commonly grouped together to provide different DRAM arrays can occur in parallel. In this the data bus width and capacity required by a given design, each bank of memory is an independent memory system. For example, 18 DRAM devices, array that can be in different phases of a row access each with a 4-bit-wide data bus, are needed in a given cycle. Some common resources, such as I/O gating rank of memory to form a 72-bit-wide data bus. In that allows access to the data pins, must be shared contrast, embedded systems that do not require as between different banks. However, the multi-bank much capacity or data bus width typically use fewer architecture allows commands such as read requests devices in each rank of memory—sometimes as few to different banks to be pipelined. Certain commands, as one device per rank. such as refresh commands, can also be engaged in multiple banks in parallel. In this manner, multiple banks can operate independently or concurrently 10.2.3 Bank depending on the command. For example, multiple As described previously, the word bank had been banks within a given DRAM device can be activated used to describe a set of independent memory arrays independently from each other—subject to the power inside of a DRAM device, a set of DRAM devices constraints of the DRAM device that may specify how that collectively act in response to commands, closely such activations can occur in a given period of Chapter 10 DRAM MEMORY SYSTEM ORGANIZATION 415

time. Multiple banks in a given DRAM device can also the number of DRAM devices in a given rank, and a be precharged or refreshed in parallel, depending on DRAM row spans across the multiple DRAM devices the design of the DRAM device. of a given rank of memory. A row is also referred to as a DRAM page, since a row activation command in essence caches a page 10.2.4 Row of memory at the sense amplifi ers until a subse- In DRAM devices, a row is simply a group of stor- quent precharge command is issued by the DRAM age cells that are activated in parallel in response to memory controller. Various schemes have been pro- a row activation command. In DRAM memory sys- posed to take advantage of locality at the DRAM page tems that utilize the conventional system topology level. However, one problem with the exploitation such as SDRAM, DDR SDRAM, and DDR2 SDRAM of locality at the DRAM page level is that the size of memory systems, multiple DRAM devices are typi- the DRAM page depends on the confi guration of the cally connected in parallel in a given rank of mem- DRAM device and memory modules, rather than the ory. Figure 10.7 shows how DRAM devices can be architectural page size of the processor. connected in parallel to form a rank of memory. The effect of DRAM devices connected as ranks of DRAM devices that operate in lockstep is that a row activa- 10.2.5 Column tion command will activate the same addressed row In DRAM memory systems, a column of data is in all DRAM devices in a given rank of memory. This the smallest addressable unit of memory. Figure 10.8 arrangement means that the size of a row—from the illustrates that, in memory systems such as SDRAM perspective of the memory controller—is simply the and DDRx3 SDRAM with topology similar to the size of a row in a given DRAM device multiplied by memory system illustrated in Figure 10.5, the size of

DRAM devices arranged in parallel in a given rank

DRAM DRAM DRAM Array Array Array

sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array

one row spanning multiple DRAM devices

FIGURE 10.7: Generic DRAM devices with 4 banks, 8196 rows, 512 columns per row, and 16 data bits per column.

3DDRx denotes DDR SDRAM and evolutionary DDR memory systems such as DDR2 and DDR3 SDRAM memory systems, inclusively. 416 Memory Systems: Cache, DRAM, Disk

DRAM devices arranged in parallel in a given rank

DRAM DRAM DRAM DRAM DRAM Array Array Array Array Array

sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array sense amp array

SDRAM memory systems: width of data bus = column size

FIGURE 10.8: Classical DRAM system topology; width of data bus equals column size.

a column of data is the same as the width of the data Direct RDRAM devices is transmitted in order and bus. In a Direct RDRAM device, a column is defi ned treated as a single column of data. as 16 bytes of data, and each read command fetches a single column of data 16 bytes in length from each physical channel of Direct RDRAM devices. 10.2.6 Memory System Organization: An A beat is simply a data transition on the data bus. In Example SDRAM memory systems, there is one data transition Figure 10.9 illustrates a DRAM memory system with per clock cycle, so one beat of data is transferred per 4 ranks of memory, where each rank of memory consists clock cycle. In DDRx SDRAM memory systems, two of 4 devices connected in parallel, each device contains data transfers can occur in each clock cycle, so two 4 banks of DRAM arrays internally, each bank contains beats of data are transferred in a single clock cycle. 8192 rows, and each row consists of 512 columns of The use of the beat terminology avoids overloading data. To access data in a DRAM-based memory sys- the word cycle in DDRx SDRAM devices. tem, the DRAM memory controller accepts a physical In DDRx SDRAM memory systems, each column address and breaks down the address into respective access command fetches multiple columns of data address fi elds that point to the specifi c channel, rank, depending on the programmed burst length. For bank, row, and column where the data is located. example, in a DDR2 DRAM device, each memory read Although Figure 10.9 illustrates a uniformly command returns a minimum of 4 columns of data. organized memory system, memory system orga- The distinction between a DDR2 device returning a nizations of many computer systems, particularly minimum burst length of 4 beats of data and a Direct end-user confi gurable systems, may be typically RDRAM device returning a single column of data over non-uniformly organized. The reason that the DRAM 8 beats is that the DDR2 device accepts the address of memory systems organizations in many computer a specifi c column and returns the requested columns systems are typically non-uniform is because most in different orders depending on the programmed computer systems are designed to allow end-users behavior of the DRAM device. In this manner, each to upgrade the capacity of the memory system by column is separately addressable. In contrast, Direct inserting and removing commodity memory mod- RDRAM devices do not reorder data within a given ules. To support memory capacity upgrades by the burst, and a 16-byte burst from a single channel of end-user, DRAM controllers have to be designed to Chapter 10 DRAM MEMORY SYSTEM ORGANIZATION 417

rank ID = 1 bank ID = 1 column ID = 0x187 row ID = 0x0B1D

Memory System

FIGURE 10.9: Location of data in a DRAM memory system.

fl exibly adapt to different confi gurations of DRAM diffi cult, as DRAM devices had to be individually devices and modules that the end-user could place removed and inserted into each socket. Pins on the into the computer system. This support is provided DRAM devices may have been bent and not visu- for through the use of address range registers whose ally detected as such. Defective DRAM chips were functionality is examined separately in the chapter diffi cult to locate, and routing of sockets for a large on memory controllers. memory system required large surface areas on the system board. Moreover, it was physically possible to place DRAM devices in the wrong orientation in 10.3 Memory Modules the socket—180º from the intended placement. Cor- The fi rst generations of computer systems allowed rect placement with proper orientation depended on end-users to increase memory capacity by providing clearly labelled sockets, clearly labelled devices, and sockets on the system board where additional DRAM an end-user that paid careful attention while insert- 4 devices could be inserted. The use of sockets on the ing the devices into the sockets. The solution to the system board made sense in the era where the price problems associated with memory upgradability was of DRAM devices was quite expensive relative to the the creation and use of memory modules. cost of the sockets on the system board. In these Memory modules are essentially miniature system early computer systems, system boards were typi- boards that hold a number of DRAM devices. Mem- cally designed with sockets that allowed end-users ory modules provide an abstraction at the module to remove and insert individual DRAM devices, usu- interface so that different manufacturers can manu- ally contained in dual in-line packages (DIPs). The facture memory upgrades for a given computer sys- process of memory upgrade was cumbersome and tem with different DRAM devices. DRAM memory

4The author of this text can personally attest to the consequences of inserting chips into sockets with incorrect orientation. 418 Memory Systems: Cache, DRAM, Disk

modules also reduce the complexity of the memory 64- or 72-bit-wide data bus interface to the memory upgrade process. Instead of the removal and inser- system. The difference between a SIMM and a DIMM tion of individual DRAM chips, memory upgrades is that contacts on either side of a DIMM are elec- with modules containing multiple DRAM chips can trically different. The electrically different contacts be quickly and easily inserted into and removed from allow a denser routing of electrical signals from the a module socket. The fi rst generations of memory system board through the connector interface to the modules typically consisted of specially created, memory module. system-specifi c memory modules that a given com- Typically, a DIMM designed for the commodity puter manufacturer used in a given computer sys- desktop market contains little more than the DRAM tem. Over the years, memory modules have obtained devices and passive resistor and capacitors. These a level of sophistication, and they are now specifi ed are not buffered on either the address path as a part of the memory-system defi nition process. from the memory controller to the DRAM devices or the datapath between the DRAM devices and the memory controller. Consequently, these DIMMs are 10.3.1 Single In-line Memory Module (SIMM) also referred to as Unbuffered DIMMs (UDIMMs). In the late 1980s and early 1990s, the personal computer industry fi rst standardized on the use of 30-pin and then later moved to 72-pin 10.3.3 Registered Memory Module (RDIMM) SIMMs. SIMMs, or Single In-line Memory Modules, To meet the widely varying requirements of sys- are referred to as such due to the fact that the con- tems with end-user confi gurable memory systems, tacts on either side of the bottom of the module are memory modules of varying capacity and timing electrically identical. characteristics are needed in addition to the typical A 30-pin SIMM provides interconnects to 8 or 9 UDIMM. For example, workstations and servers typi- signals on the data bus, as well as power, ground, cally require larger memory capacity than those seen address, command, and chip-select signal lines for the desktop computer systems. The problem asso- between the system board and the DRAM devices. ciated with large memory capacity memory modules A 72-pin SIMM provides interconnects to 32 to 36 sig- is that the large number of DRAM devices in a mem- nals on the data bus in addition to the power, ground, ory system tends to overload the various multi-drop address, command, and chip-select signal lines. Typ- busses. The large number of DRAM devices, in turn, ically, DRAM devices on a 30 pin, 1 Megabyte SIMM creates the loading problem on the various address, collectively provide a 9-bit, parity protected data bus command, and data busses. interface to the memory system. Personal computer Registered Dual In-line Memory Modules systems in the late 1980s typically used sets of four (RDIMMs) alleviate the issue of electrical loading of matching 30-pin SIMMs to provide a 36-bit-wide large numbers of DRAM devices in a large memory memory interface to support parity checking by the system through the use of registers that buffer the memory controller. Then, as the personal computer address and control signals at the interface of the system moved to support memory systems with memory module. Figure 10.10 illustrates that regis- wider data busses, the 30-pin SIMM was replaced by tered memory modules use registers at the interface 72-pin SIMMs in the early 1990s. of the memory module to buffer the address and control signals. In this manner, the registers greatly reduce the number of electrical loads that a memory 10.3.2 Dual In-line Memory Module (DIMM) controller must drive directly, and the signal inter- In the late 1990s, as the personal computer indus- connects in the memory system are divided into two try transitioned from FPM/EDO DRAM to SDRAM, separate segments: between the memory controller 72-pin SIMMs were, in turn, phased out in favor of and the register and between the register and DRAM Dual In-line Memory Modules (DIMMs). DIMMs devices. The segmentation allows timing characteris- are physically larger than SIMMs and provide a tics of the memory system to be optimized by limiting Chapter 10 DRAM MEMORY SYSTEM ORGANIZATION 419

the number of electrical loads, as well as by reduc- integrity in a large memory system is paid in terms of ing the path lengths of the critical control signals in additional latency for all memory transactions. individual segments of the memory system. However, the drawback to the use of the registered latches on a memory module is that the buffering of the address 10.3.4 Small Outline DIMM (SO-DIMM) and control signals introduces delays into the mem- Over the years, memory module design has become ory-access latency, and the cost of ensuring signal ever more sophisticated with each new generation of DRAM devices. Currently, different module specifi - command/ DRAM cations exist as standardized, multi-source compo- devices address nents that an end-user can purchase and reasonably register expect trouble-free compatibility between memory modules manufactured by different module manu- facturers at different times. To ensure system-level address/ compatibility, memory modules are specifi ed as part command of the memory system standards defi nition process. More specifi cally, different types of memory modules are specifi ed, with each targeting different markets. Typically, UDIMMs are used in desktop computers, RDIMMs are used in workstation and server systems, and the Small Outline Dual In-line Memory Module memory (SO-DIMM) has been designed to fi t into the limited module space found in mobile notebook computers. Figure 10.11 shows the standardized placement of FIGURE 10.10: Registered latches buffer the address and eight DDR2 SDRAM devices in Fine Ball Grid Array command and also introduce additional latency into the DRAM (FBGA) packages along with the required serial ter- access. mination resistors and decoupling capacitors on

67.6 mm

DDR2 DDR2 DDR2 DDR2 SDRAM SDRAM SPD SDRAM SDRAM

FBGA FBGA FBGA FBGA package package package package decoupling capacitors DDR2 DDR2 DDR2 DDR2 30mm SDRAM SDRAM SDRAM SDRAM input FBGA FBGA resistors FBGA FBGA package package package package

Pin Interface Pin Interface

FIGURE 10.11: Component placement specification for a DDR2 SO-DIMM. 420 Memory Systems: Cache, DRAM, Disk a 200-pin SO-DIMM. Figure 10.11 shows that the 10.3.5 Memory Module Organization outline of the SO-DIMM is standardized with specifi c Modern DRAM memory systems often support dimensions: 30 mm 67.6 mm. The specifi cation of large varieties of memory modules to give end-users the SO-DIMM dimension illustrates the point that as the fl exibility of selecting and confi guring the desired part of the effort to ensure system-level compatibil- memory capacity. Since the price of DRAM devices ity between different memory modules and system fl uctuates depending on the unpredictable com- boards, mechanical and electrical characteristics of modity market, one memory module organization SO-DIMMs, UDIMMs, and RDIMMs have been care- may be less expensive to manufacture than another fully defi ned. Currently, commodity DRAM devices organization at a given instance in time, while the and memory modules are defi ned through long and reverse may be true at a different instance in time. arduous standards-setting processes by DRAM device As a result, a memory system that supports different manufacturers and computer-system design houses. confi gurations of memory modules allows end-users The standards-setting process enables DRAM the fl exibility to purchase and use the most economi- manufacturers to produce DRAM devices that are cally organized memory module. However, one issue functionally compatible. The standards-setting that memory-system design engineers must account process further enables memory-module manufac- for in providing the fl exibility of memory system turers to take the functionally compatible DRAM confi guration to the end-user is that the fl exibility devices and construct memory modules that are translates into large combinations of memory mod- functionally compatible with each other. Ultimately, ules that may be placed into the memory system at the multi-level standardization enables end-users one time. Moreover, multiple organizations often to freely purchase memory modules from differ- exist for a given memory module capacity, and mem- ent module manufacturers, using DRAM devices ory system design engineers must often account for from different DRAM manufacturers, and to enjoy not only different combinations of memory modules reasonably trouble-free interoperability. Currently, of different capacities, but also different modules of standard commodity DRAM devices and memory different organizations for a given capacity. modules are specifi ed through the industry organi- Table 10.1 shows that a 128-MB memory zation known as the JEDEC Solid-State Technology module can be constructed from a combination 5 Association. of 16 64-Mbit DRAM devices, 8 128-Mbit DRAM Finally, to further minimize problems in achiev- devices, or 4 256-Mbit DRAM devices. Table 10.1 ing trouble-free compatibility between different shows that the different memory-module organi- DRAM devices and memory module manufacturers, zations not only use different numbers of DRAM JEDEC provides reference designs to memory mod- devices, but also present different numbers of rows ule manufacturers, complete with memory module and columns to the memory controller. To access raw card specifi cation, signal trace routings, and a the memory on the memory module, the DRAM bill of materials. The reference designs further enable controller must recognize and support the organi- memory module manufacturers to minimize their zation of the memory module inserted by the end- expenditure of engineering resources in the process user into the memory system. In some cases, new to create and validate memory module designs, thus generations of DRAM devices can enable memory lowering the barrier of entry to the manufacturing module organizations that a memory controller of high-quality memory modules and enhancing was not designed to support, and incompatibility competition in the memory module manufacturing follows naturally. business.

5JEDEC was once known as the Joint Electron Device Engineering Council. Chapter 10 DRAM MEMORY SYSTEM ORGANIZATION 421

10.3.6 Serial Presence Detect (SPD) To reduce the complexity and eliminate the Memory modules have gradually evolved as each confusion involved in the memory upgrading process, generation of new memory modules gains addi- the solution adopted by the computer industry is to tional levels of sophistication and complexity. Table store the confi guration information of the memory 10.1 shows that a DRAM memory module can be module on a read-only memory device whose con- organized as multiple ranks of DRAM devices on the tent can be retrieved by the memory controller as part same memory module, with each rank consisting of of the system initialization process. In this manner, multiple DRAM devices, and the memory module the memory controller can obtain the confi guration can have differing numbers of rows and columns. and timing parameters required to optimally access What is not shown in Table 10.1 is that each DRAM data from DRAM devices on the memory module. memory module may, in fact, have different mini- Figure 10.12 shows the image of a small fl ash mem- mum timing characteristics in terms of minimum ory device on a DIMM. The small read-only memory device is known as a Serial Presence Detect (SPD) tCAS, tRAS, tRCD, and tRP latencies. The variability of the DRAM modules, in turn, increases the complex- device, and it stores a wide range of variations that can ity that a memory-system design engineer must exist between different memory modules. Table 10.2 deal with. shows some parameters and values that are stored in the SPD of the DDR SDRAM memory module.

TABLE 10.1 Four different configurations for a 128-MB SDRAM memory module

Device Number of Devices Device Number Number Number of Capacity Density Ranks per Rank Width of Banks of Rows Columns

128 MB 64 Mbit 1 16 x4 4 4096 1024

128 MB 64 Mbit 2 8 x8 4 4096 512

128 MB 128 Mbit 1 8 x8 4 4096 1024

128 MB 256 Mbit 1 4 x16 4 8192 512

TABLE 10.2 Sample parameter values stored in SPD serial presence detect (SPD) Configuration Value (interpreted)

DRAM type DDR SDRAM

No. of row addresses 16384

No. of column addresses 1024

No. of banks 4

Data rate 400

Module type ECC FIGURE 10.12: The SPD device stores memory module CAS latency 3 confi guration information. 422 Memory Systems: Cache, DRAM, Disk

10.4 Memory System Topology essentially unchanged for Fast Page Mode DRAM (FPM), Synchronous DRAM (SDRAM), and Dual Data In Figure 10.13, a memory system where 16 DRAM Rate SDRAM (DDR) memory systems. Furthermore, devices are connected to a single DRAM controller is variants of the classical topology with fewer ranks are shown. In Figure 10.13, the 16 DRAM devices are orga- expected to be used for DDR2 and DDR3 memory nized into 4 separate ranks of memory. Although all systems. 16 DRAM devices are connected to the same DRAM controller, different numbers of DRAM devices are connected to different networks for the unidirec- 10.4.1 Direct RDRAM System Topology tional address and command bus, the bidirectional One memory system with a topology dramatically data bus, and the unidirectional chip-select lines. In different from the classical topology is the Direct this topology, when a command is issued, electrical RDRAM memory system. In Figure 10.14, four signals on the address and command busses are sent Direct RDRAM devices are shown connected to a to all 16 DRAM devices in the memory system, but single Direct RDRAM memory controller. Figure 10.14 the separate chip-select signal selects a set of 4 DRAM shows that in a Direct RDRAM memory system, the devices in a single rank to provide the data for a read DRAM devices are connected to a well-matched command or receive the data for a write command. network of interconnects where the clocking network, In this topology, each DRAM device in a given rank of the data bus, and the command busses are all path- memory is also connected to a subset of the width of length matched by design. The benefi t of the well- the data bus along with three other DRAM devices in matched interconnection network is that signal skew different ranks of memory. is minimal by design, and electrical signaling rates in Memory system topology determines the signal the Direct RDRAM memory system can be increased path lengths and electrical loading characteristics in to higher frequencies than a memory system with the memory system. As a result, designers of mod- the classic memory system topology. Modern DRAM ern high-performance DRAM memory systems must systems with conventional multi-rank topology can pay close attention to the topology and organizations also match the raw signaling rates of a Direct RDRAM of the DRAM memory system. However, due to the memory system. However, the drawback is that idle evolutionary nature of the memory system, the clas- cycles must be designed into the access protocol and sic system topology described above has remained devoted to system-level synchronization. As a result,

rank 0 rank 1 rank 2 rank 3

“Mesh Topology”

Single Channel SDRAM Controller

Addr & Cmd Data Bus Chip (DIMM) Select

FIGURE 10.13: Topology of a generic DRAM memory system. Chapter 10 DRAM MEMORY SYSTEM ORGANIZATION 423

Row Cmd Bus Column Cmd Bus D-RDRAM Data Bus Controller clock from master clock to master

terminated at D-RDRAM D-RDRAM D-RDRAM D-RDRAM end of channel device device device device “Matched Topology” rank 0 rank 1 rank 2 rank 3

FIGURE 10.14: Topology of a generic Direct RDRAM memory system.

SDRAM - Variants

DRAM Controller chips

complex inexpensive simple interconnect interface logic interface

D-RDRAM, now XDR

Controller DRAM chips (Flexphase) complex more simplified ~expensive complex logic interconnect ! interface logic interface

FIGURE 10.15: Philosophy differences.

even when pushed to comparable data rates, multi- classic memory system topology. In DRAM devices, rank DRAM memory systems with classical system complexity translates directly to increased costs. As topologies are somewhat less effi cient in terms of data a result, the higher data transport effi ciency of Direct transported per cycle per pin. RDRAM memory systems has to be traded off against The Direct RDRAM memory system achieves relatively higher DRAM device costs. higher effi ciency in terms of data transport per cycle per pin through the use of a novel system topology. However, in order to take advantage of the system topology and enjoy the benefi ts of higher pin data 10.5 Summary rates as well as higher data transport effi ciency, Direct Figure 10.15 shows the difference in philosophy RDRAM memory devices are by design more complex of commodity SDRAM variant devices such as DDR than comparable DRAM memory devices that use the SDRAM and high data rate DRAM memory devices 424 Memory Systems: Cache, DRAM, Disk

such as Direct RDRAM and XDR DRAM. Similar to memory systems rely on the re-engineering of the SDRAM variant memory systems, Direct RDRAM interconnection interface between the memory con- and XDR DRAM memory systems are engineered troller and the DRAM devices. In these high data rate to allow tens of DRAM devices to be connected to DRAM devices, far more circuitry is placed on the a single DRAM controller. However, to achieve high DRAM devices in terms of pin interface impedance signaling data rates, Direct RDRAM and XDR DRAM control and signal drive current strength.