Resilient On-Chip Memory Design in the Nano Era

UC Irvine UC Irvine Electronic Theses and Dissertations Title Resilient On-Chip Memory Design in the Nano Era Permalink https://escholarship.org/uc/item/1fj2c0t2 Author Banaiyanmofrad, Abbas Publication Date 2015 License https://creativecommons.org/licenses/by/4.0/ 4.0 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, IRVINE Resilient On-Chip Memory Design in the Nano Era DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Computer Science by Abbas BanaiyanMofrad Dissertation Committee: Professor Nikil Dutt, Chair Professor Alex Nicolau Professor Alex Veidenbaum 2015 c 2015 Abbas BanaiyanMofrad DEDICATION To my wife Marjan whom provided me the necessary strength to pursue my dreams. ii TABLE OF CONTENTS Page LIST OF FIGURES vi LIST OF TABLES ix ACKNOWLEDGMENTS x CURRICULUM VITAE xii ABSTRACT OF THE DISSERTATION xv 1 Introduction 1 1.1 Nano Era Design Trends and Challenges . 1 1.1.1 Technology Trend . 1 1.1.2 Chip Design Trend . 2 1.1.3 Application Trend . 2 1.2 Memories and Errors . 3 1.3 State-of-the-art Research Efforts . 5 1.4 Motivation . 7 1.5 Thesis Contributions . 10 1.6 Thesis Organization . 14 2 Flexible and Low-Cost Fault-tolerant Cache Design 15 2.1 Introduction . 15 2.2 Related Work . 19 2.2.1 Circuit-level Techniques . 19 2.2.2 Error Coding Techniques . 20 2.2.3 Architecture-Level Techniques . 21 2.3 FFT-Cache . 23 2.3.1 Proposed Architecture . 23 2.3.2 Evaluation . 32 2.4 EEC: Embedded Erasure Coding . 41 2.4.1 Preliminaries . 41 2.4.2 Proposed EEC . 43 2.4.3 Evaluation . 48 2.5 Summary . 55 iii 3 Scalable Fault-tolerant Cache Design in CMPs 56 3.1 Introduction . 56 3.2 Related Work and Motivation . 59 3.3 RESCUE . 61 3.3.1 Baseline Architecture . 61 3.3.2 RESCUE Overview . 64 3.3.3 RESCUE Remapping Policies . 73 3.3.4 Evaluation . 78 3.3.5 Quantitative Comparison to Alternative Techniques . 82 3.3.6 Experimental Results . 83 3.4 Summary . 93 4 Interconnect-aware Resilient Cache Design 94 4.1 Introduction . 94 4.2 Related Work . 98 4.2.1 Fault-tolerant Cache Design . 99 4.2.2 Fault-tolerant NoC design . 99 4.2.3 Memory Reliability in Multi/Many-cores . 100 4.3 NoC-based Fault-tolerance of LLC . 101 4.3.1 Basic Idea and Baseline Architecture . 101 4.3.2 Proposed Architecture . 106 4.3.3 Evaluation . 113 4.3.4 Results . 116 4.4 CoDEC . 123 4.4.1 Background and Motivation . 123 4.4.2 CoDEC Overview . 127 4.4.3 Evaluation . 131 4.4.4 Experimental Results . 136 4.5 Summary . 141 5 Relaxing Error Resiliency in On-chip Memories 143 5.1 Introduction . 143 5.1.1 Cost of Guard-banding . 144 5.1.2 Approximate Computing . 144 5.1.3 Challenges for Effective Use of PFMs: . 145 5.1.4 Contributions . 146 5.2 Related Work . 147 5.2.1 Exploiting Memory Variability . 147 5.2.2 Approximate Computing . 148 5.3 Relaxed Cache . 150 5.3.1 Hardware Support . 150 5.3.2 Software Support . 155 5.4 Experimental Evaluation . 158 5.4.1 Experimental Setup . 158 5.4.2 Benchmarks . 159 iv 5.4.3 Results . 160 5.5 Summary . 163 6 System-wide Memory Resiliency Design Space Exploration 164 6.1 Introduction . 164 6.2 Background . 167 6.2.1 Reliable Memory Design . 167 6.2.2 Design Space Exploration . 168 6.2.3 Exemplar Many-core Architecture . 169 6.2.4 Motivation . 169 6.3 System-Level Fault-tolerance Modeling . 172 6.3.1 Reliability Clustering . 172 6.4 Evaluation . 173 6.4.1 Experimental Setup . 174 6.4.2 Design Space Exploration Studies . 175 6.4.3 Sample Exploration Results . 176 6.5 Summary . 183 7 Conclusion and Future Work 184 7.1 Conclusion . 184 7.2 Future Work . 185 Bibliography 189 v LIST OF FIGURES Page 1.1 Memory Abstractions, Errors, and Opportunities. 4 1.2 Summary of the thesis main contributions. 10 2.1 FFT-Cache Access Flowchart. 25 2.2 Details of a row of FDM. 26 2.3 An example of FDM configuration and remapping for a given distribution of faults in a 4-way set associative cache. 30 2.4 Architecture details of (a) a conventional 4-way set associative cache, (b) the proposed FFT-Cache with FDM and 2 sub-blocks per block. 31 2.5 The effect of changing one design parameter of L1 FFT-Cache while fixing other parameters for different Vdd values. 35 2.6 The effect of changing one design parameter of L2 FFT-Cache while fixing other parameters for different Vdd values. 36 2.7 Processor performance degradation when applying FFT-Cache for (a) L1 (b) L2 and (c) L1 and L2 at the same time. 37 2.8 Power and Area overheads of FFT-Cache. 38 2.9 Comparison of effective cache size for different fault-tolerant techniques. 39 2.10 MBU size distribution and particular patterns with high occurrence probabil- ities in a 45 nm SRAM Array . 42 2.11 Basic erasure coding technique. 43 2.12 A sample parity-based erasure code with one redundant block. 45 2.13 A 4-way L1 cache architecture protected with EEC that shows the updated write operation in three steps. 46 2.14 Performance overhead of EEC for L1 and L2 caches on Alpha processor . 51 3.1 Empirical data for the cache size per core across various CMP architecture. 62 3.2 Baseline 8 processor CMP structure. 63 3.3 Architecture details of (a) a conventional 4-banked 2-way set associative cache, (b)the proposed RESCUE with DFM and 2 sub-blocks per line. 66 3.4 RESCUE cache access flowchart. 67 3.5 DFM initialization algorithm. 69 3.6 DFM configuration algorithm. 70 3.7 Effective cache size for different techniques with the (a) two banks and (b) four banks LLC. 74 vi 3.8 Example showing the (a)DNUCA optimized scheme with remapping limited to one bank with 1 hop (b)SNUCA optimized scheme with remapping limited to one bank with 1 hop (c)Adjacent mapping scheme (d)Global mapping scheme. 76 3.9 Area, leakage, and dynamic power overheads of RESCUE for an 8-core CMP with 16 LLC banks in 45nm. ..

Load more