Open Poremba-Dissertation.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
The Pennsylvania State University The Graduate School ARCHITECTING BYTE-ADDRESSABLE NON-VOLATILE MEMORIES FOR MAIN MEMORY A Dissertation in Computer Science and Engineering by Matthew Poremba c 2015 Matthew Poremba Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy May 2015 The dissertation of Matthew Poremba was reviewed and approved∗ by the following: Yuan Xie Professor of Computer Science and Engineering Dissertation Co-Advisor, Co-Chair of Committee John Sampson Assistant Professor of Computer Science and Engineering Dissertation Co-Advisor, Co-Chair of Committee Mary Jane Irwin Professor of Computer Science and Engineering Robert E. Noll Professor Evan Pugh Professor Vijaykrishnan Narayanan Professor of Computer Science and Engineering Kennith Jenkins Professor of Electrical Engineering Lee Coraor Associate Professor of Computer Science and Engineering Director of Academic Affairs ∗Signatures are on file in the Graduate School. Abstract New breakthroughs in memory technology in recent years has lead to increased research efforts in so-called byte-addressable non-volatile memories (NVM). As a result, questions of how and where these types of NVMs can be used have been raised. Simultaneously, semiconductor scaling has lead to an increased number of CPU cores on a processor die as a way to utilize the area. This has increased the pressure on the memory system and causing growth in the amount of main memory that is available in a computer system. This growth has escalated the amount of power consumed by the system by the de facto DRAM type memory. Moreover, DRAM memories have run into physical limitations on scalability due to the nature of their operation. NVMs, on the other hand, provide high scalability well into the future and have decreased static power, one of the major sources of power consumption in contemporary systems. For all of these reasons, NVMs have the potential to be an attractive alternative or even complete replacement for DRAM as main memory. For these types of devices to be feasible, there are some obstacles that must be overcome in order for there to be a compelling reason for NVMs to augment or replace DRAM. Although the static power and scalability are better, NVMs suffers from lower performance, higher dynamic power, and lower endurance than DRAM. Furthermore, the availability of architectural and comprehensive circuit models to explore how these issues can be resolved at a high level are lacking. This dissertation addresses these issues by proposing several models for NVMs at both the architectural and circuit level. The architectural model, NVMain, is built around the assumptions that NVMs may not be complete replacements and thus provides flexibility to model complex memory systems including hybrid and distributed levels of memory. The circuit-level model, DESTINY, combines NVMs with more recent three-dimensional circuit design proposals to obtain performance and energy balanced memory designs. These two models are leveraged to explore several NVM memory designs. The first design employs a hybrid DRAM and NVM and addresses an issue of caching large amounts of NVM data in the DRAM portion. The second design considers reworking memory bank design to provide an extremely high-density NVM bank with the capability to access individual sub-units of the memory bank. The final design leverages the high parallelism from access to individual sub-units to schedule memory requests in a more efficient manner. iii Table of Contents List of Figures vii List of Tables x Acknowledgments xi Chapter 1 Introduction 1 1.1 Background . 5 1.2 Related Work . 7 Chapter 2 Simulation Framework for Non-volatile Memories 10 2.1 Introduction . 10 2.2 Motivation . 11 2.3 Implementation . 12 2.3.1 Energy Modeling . 12 2.3.2 Non-volatile Memory Support . 12 2.3.3 Fine-grained Memory Architecture . 13 2.3.4 Memory System Flexibility . 13 2.3.5 Verification . 14 2.3.6 Timing Verification . 14 2.3.7 Energy Verification . 15 2.3.8 Data Verification . 15 2.3.9 Simulation Speed . 15 2.4 Case Studies . 16 2.4.1 MLC Simulation Accuracy . 16 2.4.2 Hybrid Memory System . 17 2.4.3 DRAM Cache . 19 2.5 Conclusions . 20 iv Chapter 3 Bank-level Modeling of 3D-stacked NVM and Embedded DRAM 21 3.1 Introduction . 21 3.2 Motivation . 22 3.2.1 Emerging Memory Technologies . 22 3.2.2 Modeling Tools . 23 3.3 Model Implementation . 24 3.3.1 eDRAM Model . 24 3.3.2 3D Model . 25 3.4 Validation Results . 26 3.4.1 3D SRAM Validation . 27 3.4.2 2D and 3D eDRAM Validation . 28 3.4.3 3D RRAM Validation . 29 3.5 Case Studies using DESTINY . 30 3.5.1 Finding the optimal memory technology . 30 3.5.2 Finding the optimal layer count in 3D stacking . 31 3.6 Conclusion . 31 Chapter 4 Improving Effectiveness of Hybrid-Memory Systems with High-Latency Caches 33 4.1 Motivation . 35 4.2 Implementation . 38 4.2.1 Managing the Fill Cache . 40 4.2.2 Re-routing Requests . 41 4.2.3 DRAM Cache Load . 42 4.2.4 Coalescing Fills . 42 4.2.5 Modifications to DRAM Cache . 43 4.3 Published Results . 44 4.3.1 Experimental Setup . 44 4.3.2 DRAM Cache Architectures . 45 4.3.3 Hardware Prefetcher . 46 4.3.4 Benchmark Selection . 47 4.3.5 Baseline Results . 48 4.3.6 Average Request Latency . 48 4.3.7 Prefetcher Effectiveness . 50 4.3.8 Set Indexing Effectiveness . 50 4.3.9 Coalesced Requests . 51 4.3.10 Sensitivity of Fill Cache Size . 51 4.3.11 Application Classification . 52 4.4 Conclusion . 53 Chapter 5 Leveraging Non-volatility Properties for High Performance, Low Power Main Memory 54 5.1 Introduction . 55 5.2 Motivation . 56 5.2.1 Non-Volatile Memory Design . 56 5.2.2 The Non-Volatility Property . 57 v 5.3 Implementation . 59 5.3.1 Partial-Activation . 59 5.3.2 Multi-Activation . 60 5.3.3 Backgrounded Writes . 60 5.3.4 Ganged Subarray Groups . 61 5.4 Published Results . 62 5.4.1 Memory Controller and Scheduling . 64 5.4.2 Multi-Issue Memory Controller . 64 5.4.3 Address Interleaving . 65 5.4.4 Number of Column Divisions and Subarray Groups . 65 5.4.5 Impact of Backgrounded Writes . 67 5.4.6 Energy Comparison . 68 5.4.7 Design Optimization . 68 5.4.8 Sensitivity Study . 69 5.4.9 Future Devices . 70 5.4.10 Application to STT-RAM and RRAM . 71 5.4.11 Comparison with Contemporary DRAM . 71 5.5 Design Implementation . 72 5.5.1 Overhead Costs . 73 5.5.2 Area Overhead . 73 5.5.3 Yield and NVM Lifetime . 74 5.6 Conclusion . 75 Chapter 6 Early Activation Scheduling for Main Memories 76 6.1 Motivation . 77 6.1.1 Baseline System Design . 78 6.1.2 Oracle Analysis . 80 6.2 Results and Analysis . 80 6.2.1 Missed Prediction Implications . 80 6.2.2 Limiting Amounts of Early-ACTs.