Exploration of Dram Scaling from Restoring Perspective

EXPLORATION OF DRAM SCALING FROM RESTORING PERSPECTIVE by Xianwei Zhang Bachelor of Engineering Northwestern Polytechnical University 2011 Submitted to the Graduate Faculty of the Kenneth P. Dietrich School of Arts and Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2016 UNIVERSITY OF PITTSBURGH COMPUTER SCIENCE DEPARTMENT This proposal was presented by Xianwei Zhang Dr. Youtao Zhang, Department of Computer Science, University of Pittsburgh Dr. Bruce R. Childers, Department of Computer Science, University of Pittsburgh Dr. Wonsun Ahn, Department of Computer Science, University of Pittsburgh Dr. Jun Yang, Electrical and Computer Engineering Department, University of Pittsburgh Dr. Guangyong Li, Electrical and Computer Engineering Department, University of Pittsburgh Dissertation Director: Dr. Youtao Zhang, Department of Computer Science, University of Pittsburgh ii Copyright c by Xianwei Zhang 2016 iii EXPLORATION OF DRAM SCALING FROM RESTORING PERSPECTIVE Xianwei Zhang, PhD University of Pittsburgh, 2016 Modern computer systems widely adopt chip multiprocessors, which integrate increasing number of cores as technology advances. To achieve scaling performance improvements, the systems place increasing demands of main memory with large capacity, high bandwidth, low access latency and efficient energy. However, DRAM (Dynamic Random Access Memory), the de facto memory technology over the past decades, faces significant scaling challenges in deep sub-micron regime. In particular, the intrinsic limitations of DRAM structures lead to slower, more leaky and less reliable cells at nano scale, making it more and more difficult to meet DRAM standards that strictly define DRAM timing parameters. While adhering to the standards suffers from chip yield degradation and per-bit-cost increase, relaxing the standards would cause significant performance degradations. This thesis aims at mitigating restore time deterioration due to DRAM scaling, an issue that has not been well studied but may become a major concern in future DRAM. I first characterize the restoring time increase problem by constructing detailed device models to quantify its severity and the impact to system performance and energy consumption. I then address the problem at different system levels, with a tradeoff among performance, energy consumption, and hardware cost. At the architecture level, faster memory access are enabled by chunk mapping, chip clustering and partial restore techniques, which mitigates performance loss due to slow cells. At the system level, performance improvements are maximized through fast chunk aware page allocation. At the application level, I further propose to tolerate slow cells by exploiting approximate computing to achieve performance and energy consumption improvements. iv TABLE OF CONTENTS 1.0 INTRODUCTION ................................1 1.1 Problem Statement...........................2 1.2 Research Overview............................3 1.2.1 Achieve Fast Rows via Reorganization (Completed)......4 1.2.2 Refresh-aware Partial Restore (Completed)...........5 1.2.3 Explore Restoring in Extended Scenarios (Future).......5 1.3 Contributions...............................7 1.4 OUTLINE................................7 2.0 BACKGROUND .................................8 2.1 DRAM Structure and Organization..................8 2.2 DRAM Operations and Timing Constraints.............. 10 2.3 DRAM Technology Scaling....................... 11 2.3.1 Scaling Issues........................... 11 2.3.2 Related Work on Restoring.................... 12 3.0 ACHIEVE FAST RESTORING VIA REORGANIZATION ...... 14 3.1 Modeling Restore Effects........................ 14 3.2 Proposed Designs............................ 15 3.2.1 Chip-specific Restoring Control................. 15 3.2.2 Chunk-specific Restoring Control................ 15 3.2.3 Chunk-specific with Remapping................. 16 3.2.4 Restore Time aware Rank Construction............. 17 3.2.5 Restore Time aware Page Allocation............... 18 v 3.3 Architectural Enhancements...................... 19 3.4 Experimental Methodology....................... 20 3.4.1 Configuration........................... 20 3.4.2 Workloads............................. 20 3.5 Results.................................. 21 3.5.1 Execution Time.......................... 22 3.5.2 Page Allocation Effects...................... 23 3.5.3 Further Studies.......................... 24 3.6 Conclusion................................ 24 4.0 SHORTEN RESTORING USING REFRESH ............... 25 4.1 Motivation of Partial Restore...................... 25 4.2 Proposed Designs............................ 26 4.2.1 RT-next: Refresh-aware Truncation............... 26 4.2.2 RT-select: Proactive Refresh Rate Upgrade........... 27 4.3 Architectural Enhancements...................... 29 4.4 Experimental Methodology....................... 31 4.4.1 Configuration........................... 31 4.4.2 Workloads............................. 31 4.5 Results.................................. 32 4.5.1 Impact on Performance...................... 32 4.5.2 Energy Consumption....................... 33 4.5.3 Comparison against the State-of-the-art............. 34 4.5.4 Further Studies.......................... 35 4.6 Conclusion................................ 35 5.0 EXPLORE RESTORING IN EXTENDED SCENARIOS ........ 36 5.1 Combine Restoring with Approximate Computing.......... 36 5.1.1 Introduction on Approximate Computing............ 36 5.1.2 Related Work........................... 37 5.1.3 General Ideas........................... 37 5.2 Study Security Issues of Restoring Variation.............. 38 vi 5.3 Explore Restoring in 3D Stacked DRAM................ 39 6.0 TIMELINE OF PROPOSED WORK .................... 40 7.0 SUMMARY .................................... 41 BIBLIOGRAPHY .................................... 43 vii 1.0 INTRODUCTION As a fundamental building block of modern computing systems, it is important for main memory to achieve short latency, high density, efficient energy and low cost, etc. Owing to the perfect balance on these aspects, dynamic random access memory (DRAM) has been the de facto standard in the past few decades, and is widely deployed in almost all systems, including servers, desktops, mobile devices and embedded systems. And, whereas a bunch alternative memories have been proposed, such as PCM [Lee et al., 2009a], STT-RAM [Kultursay et al., 2013] and 3D-XPoint [Patterson, 2015], none of them is strictly superior to DRAM, and thus DRAM is unlikely to be completely replaced. DRAM's great success is tremendously contributed by its continuous technology scaling to maintain performance growth and capacity enhancement. With the increasing stresses on memory capacity, bandwidth and performance from new high-performance applications and systems, DRAM must keep scaling down to meet the demands. Unfortunately, scaling has become difficult due to the challenges to further shrink sizes of DRAM circuits and devices [Mandelman et al., 2002; Hong, 2010; Childers et al., 2015]. As a result, memory accesses tends to take longer time to finish [Son et al., 2014; Wang, 2015], cell behaviors are expected to be more statistical [Childers et al., 2015] and stored data will be more vulnerable to disturbance and noise [Kim et al., 2014; Qureshi et al., 2015a; Kang et al., 2014]. Consequently, many cells will violate the timing standards, and DRAM chip yield and reliability will be degraded, which is undesirable for the already tight profit margin of DRAM manufacturing. To mitigate the performance loss and yield degradation in further scaling, we need to expose the scaling effect to higher levels such that more effective approaches can be achieved with better trade-offs. Candidate solutions can be proposed from multiple levels, ranging 1 from devices to architectures and even to applications. And, the efficient ones require the co- operation from different components, including DRAM chips, memory controllers, operating systems and processors, etc. 1.1 PROBLEM STATEMENT Scaling-down has been the conventionally effective method to grow DRAM chip capacity, to reduce power consumption and to lower the per-bit cost. As scaling down, DRAM cells have been shrinking to smaller dimensions, which results in size reduction of access transistor, storage capacitor and peripheral circuits. Firstly, smaller capacitor translates into a lower capacitance, reducing the stored charge; secondly, scaled DRAMs applies lower supply voltage [Mukundan et al., 2013; Zhang et al., 2016], which further decreases per-cell charge and also increases the gate induced drain leakage [Nair et al., 2013b]; meanwhile, at smaller dimensions, adjacent cells are likely to electrically disturb each other [Kim et al., 2014]; moreover, smaller cell geometry increases the resistance on both access transistor and bitline [Mukundan et al., 2013; Wang, 2015], obstructing the cell charging process. In addition to charge decrease, the input offset voltage on the sense amplifiers is also expected to increase [Zhang et al., 2016; Mukundan et al., 2013], which makes it more difficult to sense data. Moreover, scaling to smaller technology nodes grow the fabrication complexity and makes it more difficult to precisely control the dimensions of DRAM cells, which introduces more severe process variations (PV). Accordingly, DRAM cells are expected to show more statistical behaviors in a wide range; and, such uncertainty might cause increasingly more cells to violate the design specifications,

Exploration of Dram Scaling from Restoring Perspective

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support