DRAM: Architectures, Interfaces, and Systems a Tutorial

DRAM TUTORIAL ISCA 2002 Bruce Jacob David Wang University of Maryland DRAM: Architectures, DRAM: why bother? (i mean, Interfaces, and Systems besides the “memory wall” thing? ... is it just a performance issue?) think about embedded systems: think cellphones, think printers, A Tutorial think switches ... nearly every embedded product that used to be expensive is now cheap. why? Bruce Jacob and David Wang for one thing, rapid turnover from high performance to obsolescence guarantees generous supply of CHEAP, HIGH-PERFORMANCE Electrical & Computer Engineering Dept. embedded processors to suit nearly any design need. University of Maryland at College Park what does the “memory wall” mean in this context? perhaps it will take longer for a high- performance design to become http://www.ece.umd.edu/~blj/DRAM/ obsolete? UNIVERSITY OF MARYLAND DRAM TUTORIAL ISCA 2002 Outline Bruce Jacob David Wang • University of Basics Maryland • DRAM Evolution: Structural Path NOTE • Advanced Basics • DRAM Evolution: Interface Path • Future Interface Trends & Research Areas • Performance Modeling: Architectures, Systems, Embedded Break at 10 a.m. — Stop us or starve DRAM TUTORIAL ISCA 2002 Basics Bruce Jacob David Wang University of DRAM ORGANIZATION Maryland first off -- what is DRAM? an DRAM array of storage elements (capacitor-transistor pairs) Storage element Column Decoder “DRAM” is an acronym (explain) why “dynamic”? (capacitor) Word Line Data In/Out Sense Amps - capacitors are not perfect ... need recharging Buffers ... Bit Lines... - very dense parts; very small; Bit Line capactiros have very little charge ... thus, the bit lines are charged up to 1/2 voltage level and the ssense amps detect the Memory minute change on the lines, then recover the full signal d Lines ... Array or w Decoder W .. Switching Ro element DRAM TUTORIAL ISCA 2002 Basics Bruce Jacob David Wang University of BUS TRANSMISSION Maryland so how do you interact with this DRAM thing? let’s look at a traditional organization first ... CPU connects to a memory controller Column Decoder that connects to the DRAM itself. let’s look at a read operation Data In/Out Sense Amps Buffers MEMORY ... Bit Lines... CPU BUS CONTROLLER Memory d Lines ... Array or w Decoder W .. Ro DRAM TUTORIAL ISCA 2002 Basics Bruce Jacob David Wang University of [PRECHARGE and] ROW ACCESS Maryland at this point, all but lines are attt DRAM the 1/2 voltage level. the read discharges the Column Decoder capacitors onto the bit lines ... this pulls the lines just a little bit high or a little bit low; the sense Data In/Out Sense Amps amps detect the change and recover the full signal Buffers ... Bit Lines... the read is destructive -- the MEMORY capacitors have been CPU BUS CONTROLLER discharged ... however, when the sense amps pull the lines to the full logic-level (either high or Memory low), the transistors are kept open and so allow their attached d Lines ... Array or capacitors to become recharged w Decoder W .. (if they hold a ‘1’ value) AKA: OPEN a DRAM Page/Row . Ro or ACT (Activate a DRAM Page/Row) or RAS (Row Address Strobe) DRAM TUTORIAL ISCA 2002 Basics Bruce Jacob David Wang University of COLUMN ACCESS Maryland once the data is valid on ALL of DRAM the bit lines, you can select a subset of the bits and send them to the output buffers ... CAS Column Decoder picks one of the bits big point: cannot do another Data In/Out Sense Amps RAS or precharge of the lines until finished reading the column Buffers data ... can’t change the values ... Bit Lines... on the bit lines or the output of MEMORY the sense amps until it has been CPU BUS CONTROLLER read by the memory controller Memory d Lines ... Array or w Decoder W .. Ro READ Command or CAS: Column Address Strobe DRAM TUTORIAL ISCA 2002 Basics Bruce Jacob David Wang University of DATA TRANSFER Maryland then the data is valid on the data DRAM bus ... depending on what you are using for in/out buffers, you might be able to overlap a litttle Column Decoder or a lot of the data transfer with the next CAS to the same page (this is PAGE MODE) Data In/Out Sense Amps Buffers MEMORY ... Bit Lines... CPU BUS CONTROLLER Memory d Lines ... Array or w Decoder Data Out W .. Ro ... with optional additional CAS: Column Address Strobe note: page mode enables overlap with CAS DRAM TUTORIAL ISCA 2002 Basics Bruce Jacob David Wang University of BUS TRANSMISSION Maryland NOTE DRAM Column Decoder Data In/Out Sense Amps Buffers MEMORY ... Bit Lines... CPU BUS CONTROLLER Memory d Lines ... Array or w Decoder W .. Ro DRAM TUTORIAL ISCA 2002 Basics Bruce Jacob David Wang F University of Maryland DRAM CPU Mem Controller E DRAM “latency” isn’t 1 deterministic because of CAS or RAS+CAS, and there may be A significant queuing delays within the CPU and the memory controller B Each transaction has some D E2/E3 overhead. Some types of C overhead cannot be pipelined. This means that in general, longer bursts are more efficient. A: Transaction request may be delayed in Queue B: Transaction request sent to Memory Controller C: Transaction converted to Command Sequences (may be queued) D: Command/s Sent to DRAM E1: Requires only a CAS or E2: Requires RAS + CAS or E3: Requires PRE + RAS + CAS F: Transaction sent back to CPU “DRAM Latency” = A + B + C + D + E + F DRAM TUTORIAL ISCA 2002 Basics Bruce Jacob David Wang University of PHYSICAL ORGANIZATION Maryland NOTE x8 DRAM x2 DRAM x4 DRAM Column Decoder Column Decoder Column Decoder Data Sense Amps Data Sense Amps Data Sense Amps Buffers Buffers Buffers ... Bit Lines... ... Bit Lines... ... Bit Lines... Memory . Memory . Memory . Array . Array . Array w Decoder w Decoder w Decoder Ro Ro Ro x2 DRAM x4 DRAM x8 DRAM This is per bank … Typical DRAMs have 2+ banks DRAM TUTORIAL ISCA 2002 Basics Bruce Jacob David Wang University of Read Timing for Conventional DRAM Maryland let’s look at the interface another way .. the say the data sheets portray it. RAS Row Access [explain] main point: the RAS\ and CAS\ Column Access signals directly control the CAS latches that hold the row and Data Transfer column addresses ... Address Row Column Row Column Address Address Address Address DQ Valid Valid Dataout Dataout DRAM TUTORIAL ISCA 2002 DRAM Evolutionary Tree Bruce Jacob David Wang . University of . Maryland . MOSYS since DRAM’s inception, there have been a stream of changes to the design, from FPM to EDO Structural to Burst EDO to SDRAM. the Modifications changes are largely structural modifications -- nimor -- that Targeting target THROUGHPUT. FCRAM Latency [discuss FPM up to SDRAM Conventional Everything up to and including DRAM $ SDRAM has been relatively (Mostly) Structural Modifications inexpensive, especially when considering the pay-off (FPM Targeting Throughput VCDRAM was essentially free, EDO cost a latch, PBEDO cost a counter, SDRAM cost a slight re-design). however, we’re run out of “free” ideas, and now all changes are considered expensive ... thus there is no consensus on new directions and myriad of choices FPM EDO P/BEDO SDRAM ESDRAM has appeared [ do LATENCY mods starting Interface Modifications with ESDRAM ... and then the INTERFACE mods ] Targeting Throughput Rambus, DDR/2 Future Trends DRAM TUTORIAL ISCA 2002 DRAM Evolution Bruce Jacob David Wang University of Read Timing for Conventional DRAM Maryland Row Access NOTE Column Access Transfer Overlap Data Transfer RAS CAS Address Row Column Row Column Address Address Address Address DQ Valid Valid Dataout Dataout DRAM TUTORIAL ISCA 2002 DRAM Evolution Bruce Jacob David Wang University of Read Timing for Fast Page Mode Maryland Row Access FPM aallows you to keep th esense amps actuve for multiple Column Access CAS commands ... Transfer Overlap much better throughput Data Transfer problem: cannot latch a new RAS value in the column address buffer until the read-out of the data is complete CAS Address Row Column Column Column Address Address Address Address DQ Valid Valid Valid Dataout Dataout Dataout DRAM TUTORIAL ISCA 2002 DRAM Evolution Bruce Jacob David Wang University of Read Timing for Extended Data Out Maryland Row Access solution to that problem -- instead of simple tri-state Column Access buffers, use a latch as well. Transfer Overlap by putting a latch after the column mux, the next column Data Transfer address command can begin sooner RAS CAS Address Row Column Column Column Address Address Address Address DQ Valid Valid Valid Dataout Dataout Dataout DRAM TUTORIAL ISCA 2002 DRAM Evolution Bruce Jacob David Wang University of Read Timing for Burst EDO Maryland Row Access by driving the col-addr latch from an internal counter rather than Column Access an external signal, the minimum cycle time for driving the output Transfer Overlap bus was reduced by roughly 30% Data Transfer RAS CAS Address Row Column Address Address DQ Valid Valid Valid Valid Data Data Data Data DRAM TUTORIAL ISCA 2002 DRAM Evolution Bruce Jacob David Wang University of Read Timing for Pipeline Burst EDO Maryland Row Access “pipeline” refers to the setting up of the read pipeline ... first CAS\ Column Access toggle latches the column address, all following CAS\ Transfer Overlap toggles drive data out onto the bus. therefore data stops coming Data Transfer when the memory controller stops toggling CAS\ RAS CAS Address Row Column Address Address DQ Valid Valid Valid Valid Data Data Data Data DRAM TUTORIAL ISCA 2002 DRAM Evolution Bruce Jacob David Wang University of Read Timing for Synchronous DRAM Maryland Row Access Clock main benefit: frees up the CPU or memory controller from Column Access having to control the DRAM’s internal latches directly ... the RAS Transfer Overlap controller/CPU can go off and do other things during the idle Data Transfer cycles instead of “wait” ..

Load more