
Design Space Exploration for 3D-stacked DRAMs Christian Weis†, Norbert Wehn†, Loi Igor⋆ and Luca Benini⋆ †Microelectronic Systems Design Research Group, University of Kaiserslautern 67663 Kaiserslautern, Germany ⋆DEIS, University of Bologna, Bologna, Italy {weis, wehn}@eit.uni-kl.de {igor.loi, luca.benini}@unibo.it Abstract—3D integration based on TSV (through silicon via) [2]–[8]. There are basically two competing approaches using technology enables stacking of multiple memory layers and has TSV technology for 3D-DRAMs: the advantage of higher bandwidth at lower energy consumption for the memory interface. As in mobile applications energy • Commodity DRAMs with wide-IO interfaces [3]. Process efficiency is key, 3D integration is especially here a strategic technologies for commodity DRAMs are optimized for technology. In this paper we focus on the design space exploration low leakage and minimum cost/bit. This implies opti- of 3D-stacked DRAMs with respect to performance, energy and area efficiency for densities from 256Mbit to 4Gbit per 3D- mization for density and cheap technology. DRAM tech- DRAM channel. We investigate four different technology nodes nologies strongly differ from logic technologies, which from 75nm down to 45nm and show the optimal design point are optimized for transistor performance and interconnect. for the currently most common commodity DRAM density of As a consequence, random access times in commodity 1Gbit. Multiple channels can be combined for main memory DRAMs are still in the order of 25-30ns. sizes of up to 32GB. We present a functional SystemC model • Embedded DRAMs for the 3D-stacked DRAM which is coupled with a SDR/DDR (eDRAMs) mainly used as SRAM 3D-DRAM channel controller. Parameters for this model were cache replacement [9] in logic technologies. Due to the derived from detailed circuit level simulations. The exploration performance of logic technologies and relaxed density demonstrates that an optimized 1Gbit 3D-DRAM stack is 15x requirements, random access times of less than 2ns are more energy efficient compared to a commodity Low-Power DDR reported [10]. eDRAMs are limited in density and not SDRAM part without IO drivers and pads. To the best of our knowledge this is the first design space exploration for 3D-stacked as economical, especially when targeting memory sizes DRAM considering different technologies and real world physical greater than 72Mb. commodity DRAM data. In this paper we focus on 3D-stacked solutions using I. INTRODUCTION commodity DRAMs to benefit maximally from the large Mobile applications like video processing and graphics are progress in commodity DRAM development w.r.t. density, characterized by ever increasing demands on the memory low leakage, yield, availability and especially low cost. Our bandwidth and size. As a consequence, the numbers of IOs of 3D-stacked DRAM architecture exploration is based on a the memory subsystem is continuously increasing. The energy sophisticated SystemC model which is linked to a cycle- per bit consumed for going off-chip is many times higher than accurate channel controller to perform subsystem simulations the one required for on-chip accesses, as complex and power for mobile computing systems. The architecture evaluation for hungry IO transceiver circuits are needed to deal with the an optimized 3D-stacked DRAM is driven by area, perfor- electrical characteristics of interconnections between chips in a mance and most importantly energy efficiency metrics. Multi- conventional package. Moreover, the random access latencies dimensional metrics for a fair comparison are difficult to and the internal cycle times of DRAMs are not decreasing at identify. Thermal issues because of 3D stacking are out of the same rate as the microprocessor cycle time. This problem scope of this paper. The main steps forward with respect to is known in high-performance computing as the Memory Wall similar explorations reported in the literature are: (i) accurate [1], but it is even more daunting in mobile platforms, because models for DRAM area, power and speed based on detailed power and cost constraints are much more stringent. circuit-level information on commercial DRAM chips, (ii) 3D integration and 3D-stacked memories have been pro- cycle-accurate model of a channel controller specifically tuned posed as a promising solution to the power versus bandwidth to 3D-integrated DRAMs, (iii) focus on mobile systems, both dilemma and the Memory Wall. 3D-stacked memories reduce in terms of modeling and in terms of design specification and the distance between CPU and external RAM from centimeters constraints. to micrometers and improve the random access times - but The paper is structured as follows. Section II gives a brief more importantly, they provide a major boost in energy overview of the state-of-the-art. Models of the 3D-DRAM efficiency in comparison to standard SDR or DDR/2/3 DRAM stack and the subsystem are presented in Section III. In devices. The pairing of high bandwidth communication with Section IV the 3D-DRAM organization is discussed. Section V the lower power consumption of 3D integrated memory is an presents an overview of the 3D-channel controller architecture. ideal fit for mobile devices. In the last years 3D integration Section VI finally provides the results of the design space of ICs, especially of DRAMs, received tremendous attention exploration. 978-3-9810801-7-9/DATE11/©2011 EDAA II. RELATED WORK of our knowledge, ours is the first 3D-DRAM-specific channel Recent investigations [4]–[8] have shown the performance, controller model presented in the open literature. energy and form factor advantages of 3D integration by using III. MODELING THE 3D-DRAM SUBSYSTEM wide-IO buses and TSV interconnects. Facchini et al. [4] In a 3D-DRAM subsystem the 3D-DRAM stack is always mainly focus on the optimization of the interface between connected to a 3D-DRAM channel controller [4], [7]. Figure 1 processor and memory (e.g., DRAM). The internal structure shows the 3D subsystem architecture model used for our of the memory is untouched and not optimized. In contrast to exploration. [4] we put emphasis on the DRAM itself and we see further improvements if DRAMs are redesigned to take advantage of Single 3D layer organization the high vertical interconnect density. In [8] Loh uses a so- called “true-3D” configuration based on Tezzaron’s 3D tech- 64Mb 64Mb nology [11] for performance and power evaluations. However Tezzaron’s 3D-stacked DRAM approach requires an additional chip layer in logic technology to speed up the interface and also to reduce the access time to the DRAM. The widely used Number of 3D layers CACTI [12]1 tool is often used to evaluate the performance and the energy consumption of 3D-RAM architectures [5], [8]. 3D-DRAM stack Random access times for DRAMs below 2ns were published in [9]. However such low access times are only feasible for small Derived from eDRAM macros, e.g. 2.39Mb in SOI technology are reported commodity DRAM technology in [10], and not for commodity DRAMs. Moreover CACTI TSVs Logic technology assumes an equal shrinking for all devices in the DRAM which 3D-DRAM Channel is not true for advanced commodity DRAMs in which the controller memory cells shrink faster than the periphery. Fig. 1. 3D-DRAM subsystem architecture: A single vertical channel Following the 3D-DRAM integration taxonomy proposed by [4], we focus on scenario 3/4 (CMOS IOs). Thus we A. 3D-DRAM stack use CMOS IOs for the connection to the channel controller. The development of a new model from the ground up Therefore we removed the complex IOs (SSTL) for area, was forced because models provided by the memory vendors performance and power calculations. Our analysis and ex- are not flexible enough on the internal DRAM structure to ploration are based on state-of-the-art technology data from estimate the benefit of 3D-stacked DRAM. The architecture Inotera, Qimonda and Winbond. These data sets enable us to exploration of 3D-stacked DRAM requires a sophisticated accurately predict power, performance and area for 3D-DRAM modeling approach: architectures (i), and also to optimize the internal structure, 1) Extensive circuit level simulations with SPICE were organization and technology of 3D-DRAMs. Together with a performed to calculate the basic data (e.g., wiring delays cycle-accurate 3D-DRAM channel controller model (ii), which or energy per activation). This information together creates realistic traces for the simulations of the 3D-DRAM,a with architectural parameters, geometrical and electrical complete optimization of the 3D-DRAM subsystem integrated data are input parameters to the 3D-DRAM generator on mobile computing systems (iii) is done. model which calculates timing, power and area data, see Memory controllers manage the architectural and circuit Figure 2. level interface between processors and DRAM. State-of-the-art DRAM controllers [13] are still designed for narrow off-chip interfaces. They are quite complex components and deploy many complex features to maximize the exploitable interface bandwidth. A DRAM controller can be coarsely split in two parts, front end and back-end, also called channel controller. The front end includes a multiport arbitration interface and IO queues with reordering capabilities to improve power con- sumption and access latency. Many published works describe Fig. 2. Inputs/Outputs of the 3D-DRAM generator model advanced DRAM front-ends, see [14] for a survey. Front- 2) The outputs of the generator model are input parameters end design is out of the scope of this paper: we assume a to the functional SystemC model of the 3D-DRAM stack state-of-the-art front-end which delivers memory transactions for simulations. to the channel controller, and we focus on adaptation of the In addition to the parametrizable functional SystemC model channel controller to wide 3D-DRAM interfaces.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-