TRENDS IN SEMICONDUCTOR MEMORIES Yasunao Katayama computer system’s performance and mance of DRAM (dynamic random-access power consumption are often deter- memories), the primary products used for IBM Research, Tokyo Amined not only by the characteristics main memory. Of course, many of the con- of its data processing components but also cepts used in present memory systems, and by how well it transfers the right data to the particularly the DRAM concept, did not exist right place. Thus, memory system design is early in the history of computing. Before the becoming increasingly important. There are mid-1960s, computer memory systems con- three major reasons for this. First, each new sisted of cathode-ray storage tubes, ferrite generation of CMOS technology provides a cores, and thin magnetic films.1,2 As semi- better speed/power trade-off, in parallel conductor technology matured, semicon- with transistor miniaturization. This lets us ductor memories began replacing these leverage the CMOS scaling to reduce the preliminary devices. At first the standard time and energy required for data process- memory cell implementation was a six-tran- ing. Nevertheless, many of the time and sistor SRAM (static random-access memory) energy factors involved in data transfer are cell, which is now used mostly for cache and not reduced. For example, a typical CMOS battery-backup memory. gate delay in the “fan-out of four” condition, A breakthrough occurred with the inven- which was about 1 nanosecond 10 years tion of the one-transistor dynamic memory ago, is now about 0.1 ns. Meanwhile, the cell in 1968.3,4 The idea combines a capaci- Despite their great time needed to transfer data 10 cm within a tor, for storing different amounts of charge to typical printed circuit board is independent represent the distinguishable binary logic market success, of CMOS scaling and remains at about 1 ns. state, and a MOS transistor, for selecting a par- Second, the use of parallelism can ticular memory cell. A few years later, the DRAMs have not kept improve on-chip data processing perfor- DRAM became successful in computer main mance—for example, through superscalar memory applications. Since then, thanks to pace with or VLIW (very long instruction word) archi- the low bit cost and high density resulting tecture. However, interchip data transfer from its simple cell design and the maturity in microprocessor cannot exploit a large degree of parallelism producing MOS VLSI (very large scale inte- because the number of chip-to-chip con- gration) chips,5 the DRAM has dominated the improvements, so nections (the packaging) is limited. computer main memory market. Third, as memory density becomes greater, researchers are the fan-out and branching ratios required for DRAM success factors the memory circuitry become larger, resulting Why have DRAMs remained the technol- looking to advanced in at least a logarithmic increase in the time ogy of choice for computer main memory required for address decoding and appro- for so long? The answer lies in their excellent high-speed DRAM and priate data path selection. architecture and sophisticated operation As a result, while microprocessor perfor- scheme, which have given them the highest merged DRAM/logic mance improves exponentially according to density and the lowest bit cost among ran- Moore’s law, memory system performance dom-access memories. Figure 1 depicts a technologies to lacks a corresponding improvement. simplified memory architecture and the Memory performance is determined most- operation scheme for a typical FPM (first- increase memory ly by the choice of memory hierarchy (the page mode) DRAM design. use of cache memory and so on), the choice Cell design. The memory cell consists of system performance. of memory bus architecture, and the perfor- only the components that are absolutely 10 IEEE Micro 0272-1732/97/$10.00 © 1997 IEEE . Bitline n Bitline n + 1 Write-back Bitline n Bitline n + 1 Row Column Precharge Wordline n access access RAS Row Wordline n + 1 address Selection transistor 2λ 8λ2 Row address Wordline n + 2 Wordline and RAS multiplexer Row address decoder Bitline pair CAS Wordline n + 3 Storage capacitor 2λ RAS Column … … … … address Sense Sense I/O amplifier n amplifier n + 1 (b) CAS Column address I/O I/O decoder and data multiplexer driver (a) Column address Figure 1. Memory architecture (a) and read operation scheme (b) for a typical DRAM with folded bitlines and conven- tional first-page-mode interface. The peripheral circuits are simplified. necessary (storage capacitor and selection transistor). Its 2λ) for the original cross-point architecture. Nevertheless, binary state is represented by the amount of charge it holds. the common-mode capacitive noise during sensing is mini- Initially, the storage capacitor was implemented as a MOS mized by differential signal amplification using a balanced planar capacitor. Even though the memory cell may look latch circuit. Since the memory cell and bitline pairs are mas- trivial, storing the data in a passive MOS capacitor was a rev- sively parallel, a huge memory bandwidth (possibly more olutionary step, considering the level of MOS technology at than 100 gigabits per second) is available in the array. the time (junction leakage and so on) and the absence of a Operation scheme. The memory cell’s read operation proper sensing scheme for the passive capacitor. Nowadays, consists of row access, column access, write-back, and thanks to the simple cell structure, three-dimensional mem- precharge. ory cell implementations using either trench or stack struc- ture are widely used to maintain both memory cell scaling • The row access starts the row address latch and decoding and a reasonable cell capacitance. when the RAS (row address strobe) signal becomes low. Array architecture. The simple cell architecture also This path can be considered as a multiplexing path of the allows the adoption of a cross-point array architecture for RAS, which eventually activates an appropriate wordline the memory array, realizing the high-density implementa- according to the row address bits. The selected wordline tion of a solid-state RAM under lithography feature λ. The connects the selected row of memory cells to the bitline memory cell is defined by the intersection of two lines, one pairs. The charge transferred from each memory cell to for the data selection (the wordline) and one for the data one of the corresponding bitline pair is amplified differ- transfer (the bitline). In reality, the folded-bitline architec- entially and latched by the sense amplifier. ture, which is a modified version of the original cross-point • The column access path is the combination of the col- architecture, is widely used. The folded-bitline architecture umn decoding and the multiplexing of the data latched using bitline pairs takes at least 8λ2, as opposed to 4λ2 in the sense amplifiers. A fixed number of bits in the (assuming both bitlines and wordlines are placed in pitch sense amplifiers are selected and transferred to the November/December 1997 11 . Semiconductor trends external bus according to the column address bits. The voltage Vcell, which acts as a source of the NMOS transistor, row and column address inputs are usually multiplexed follows the wordline voltage minus threshold voltage VWL – to minimize the number of pins in the DRAM package. VT.) Therefore, the selection transistor’s resistance is much • Since the read operation is destructive (that is, the cell higher because of a smaller gate overdrive. Even though itself cannot restore the original signal), the data in the wordline boosting or limited bitline swing (using a higher selected row must be written back to the memory cells VWL than VBL) helps reduce the asymmetry, this is the most in parallel with the column access path. Even though fundamental performance limitation in the DRAM random- the write-back doesn’t affect the access time, it never- access cycle. theless causes a serious limitation in the RAS cycle time. In addition to the RC constant in the memory cell, the RC • The array must be precharged for the next memory time constant in the array, which results from the driver hier- access operation. Even though the column cycle can be archy’s large fan-out and branching ratios, further com- purely static regarding sense amplifiers as SRAM cells, pounds the performance problem. Although this issue is not the dynamic memory cell requires proper precharge unique to DRAMs,5 the situation is worse than in SRAMs operations in the row cycle. because of the DRAM’s higher density and the dynamic nature of the DRAM cell. The memory cell is arranged in one In the write operation, the bitline pairs are forced into a dimension and lacks internal structure from the logical point new logic state when the selected bitline pairs are connected of view. Nevertheless, in the physical implementation the to the external circuitry. The rest is basically the same as the enormous fan-out and branching ratios are handled by dis- read operation. In other words, writing to a DRAM cell is quite tributing them in the combination of the row and column similar to writing back to the cell with a new logic state. using the two-dimensional nature of the cross-point archi- The ohmic charge transfer (the current is carried by the elec- tecture. Still, even though from the performance aspect it is tron’s microscopic diffusive process) in both reads and writes optimal to maintain a ratio of 3 to 4, the area constraint makes the operation principle independent of the circuit’s requires the use of larger ratios, particularly within the mem- dimensions. Therefore, the DRAM has been miniaturized in ory array. To reduce the number of array circuits, such as the macroscopic scale without altering the basic operation wordline drivers and sense amplifiers, designers often con- scheme.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-