Opportunistic Memory Systems in Presence of Hardware Variability

Total Page:16

File Type:pdf, Size:1020Kb

Opportunistic Memory Systems in Presence of Hardware Variability UNIVERSITY OF CALIFORNIA Los Angeles Opportunistic Memory Systems in Presence of Hardware Variability A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering by Mark William Gottscho 2017 c Copyright by Mark William Gottscho 2017 ABSTRACT OF THE DISSERTATION Opportunistic Memory Systems in Presence of Hardware Variability by Mark William Gottscho Doctor of Philosophy in Electrical Engineering University of California, Los Angeles, 2017 Professor Puneet Gupta, Chair The memory system presents many problems in computer architecture and system design. An important challenge is worsening hardware variability that is caused by nanometer-scale manufac- turing difficulties. Variability particularly affects memory circuits and systems – which are essen- tial in all aspects of computing – by degrading their reliability and energy efficiency. To address this challenge, this dissertation proposes Opportunistic Memory Systems in Presence of Hardware Variability. It describes a suite of techniques to opportunistically exploit memory variability for energy savings and cope with memory errors when they inevitably occur. In Part 1, three complementary projects are described that exploit memory variability for im- proved energy efficiency. First, ViPZonE and DPCS study how to save energy in off-chip DRAM main memory and on-chip SRAM caches, respectively, without significantly impacting perfor- mance or manufacturing cost. ViPZonE is a novel extension to the virtual memory subsystem in Linux that leverages power variation-aware physical address zoning for energy savings. The kernel intelligently allocates lower-power physical memory (when available) to tasks that access data fre- quently to save overall energy. Meanwhile, DPCS is a simple and low-overhead method to perform Dynamic Power/Capacity Scaling of SRAM-based caches. The key idea is that certain memory cells fail to retain data at a given low supply voltage; when full cache capacity is not needed, the voltage is opportunistically reduced and any failing cache blocks are disabled dynamically. The third project in Part 1 is X-Mem: a new extensible memory characterization tool. It is used in a se- ii ries of case studies on a cloud server, including one where the potential benefits of variation-aware DRAM latency tuning are evaluated. Part 2 of the dissertation focuses on ways to opportunistically cope with memory errors when- ever they occur. First, the Performability project studies the impact of corrected errors in mem- ory on the performance of applications. The measurements and models can help improve the availability and performance consistency of cloud server infrastructure. Second, the novel idea of Software-Defined Error-Correcting Codes (SDECCs) is proposed. SDECC opportunistically copes with detected-but-uncorrectable errors in main memory by combining concepts from cod- ing theory with an architecture that allows for heuristic recovery. SDECC leverages available side information about the contents of data in memory to essentially increase the strength of ECC with- out introducing significant hardware overheads. Finally, a methodology is proposed to achieve Virtualization-Free Fault Tolerance (ViFFTo) for embedded scratchpad memories. ViFFTo guards against both hard and soft faults at minimal cost and is suitable for future IoT devices. Together, the six projects in this dissertation comprise a complementary suite of methods for opportunistically exploiting hardware variability for energy savings, while reducing the impact of errors that will inevitably occur. Opportunistic Memory Systems can significantly improve the energy efficiency and reliability of current and future computing systems. There remain several promising directions for future work. iii The dissertation of Mark William Gottscho is approved. Glenn D. Reinman Mani B. Srivastava Lara Dolecek Puneet Gupta, Committee Chair University of California, Los Angeles 2017 iv To my wife, who has inspired me, and to whom I have vowed to expand our horizons. v TABLE OF CONTENTS 1 Introduction ::::::::::::::::::::::::::::::::::::::: 1 1.1 Memory Challenges in Computing . 1 1.2 Hardware Variability is a Key Issue for Memory Systems . 6 1.3 Opportunistic Memory Systems in Presence of Hardware Variability . 10 I Opportunistically Exploiting Memory Variability 14 2 ViPZonE: Saving Energy in DRAM Main Memory with Power Variation-Aware Mem- ory Management :::::::::::::::::::::::::::::::::::::: 15 2.1 Introduction . 17 2.1.1 Related Work . 18 2.1.2 Our Contributions . 19 2.1.3 Chapter Organization . 20 2.2 Background . 20 2.2.1 Memory System Architecture . 21 2.2.2 Main Memory Interleaving . 22 2.2.3 Vanilla Linux Kernel Memory Management . 23 2.3 Motivational Results . 26 2.3.1 Test Methodology . 27 2.3.2 Test Results . 29 2.3.3 Summary of Motivating Results . 33 2.3.4 Exploiting DRAM Power Variation . 34 2.4 Implementation . 35 vi 2.4.1 Target Platform and Assumptions . 37 2.4.2 Back-End: ViPZonE Implementation in the Linux Kernel . 38 2.4.3 Front-End: User API and System Call . 42 2.5 Evaluation . 45 2.5.1 Testbed Setup . 45 2.5.2 Interleaving Analysis . 47 2.5.3 ViPZonE Analysis . 51 2.5.4 What-If: Expected Benefits With Non-VolatileMemories Exhibiting Ultra- Low Idle Power . 52 2.5.5 Summary of Results . 53 2.6 Discussion and Conclusion . 54 3 DPCS: Saving Energy in SRAM Caches with Dynamic Power/Capacity Scaling :: 57 3.1 Introduction . 59 3.2 Related Work . 61 3.2.1 Leakage Reduction . 61 3.2.2 Fault Tolerance . 61 3.2.3 Memory Power/Performance Scaling . 62 3.2.4 Novelty of This Work . 62 3.3 The SRAM Fault Inclusion Property . 63 3.4 A Case for the “Static Power vs. Effective Capacity” Metric . 65 3.4.1 Amdahl’s Law Applied to Fault-Tolerant Voltage-Scalable Caches . 65 3.4.2 Problems With Yield Metric . 66 3.4.3 Proposed Metric . 67 3.5 Power/Capacity Scaling Architecture . 67 vii 3.5.1 Mechanism . 68 3.5.2 Static Policy: SPCS . 75 3.5.3 Dynamic Policies: DPCS . 76 3.6 Modeling and Evaluation Methodology . 78 3.6.1 Modeling Cache Fault Behavior . 78 3.6.2 System and Cache Configurations . 80 3.6.3 Technology Parameters and Modeling Cache Architecture . 80 3.6.4 Simulation Approach . 81 3.7 Analytical Results . 84 3.7.1 Fault Probabilities and Yield . 84 3.7.2 Static Power vs. Effective Capacity . 87 3.7.3 Area Overhead . 88 3.8 Simulation Results . 89 3.8.1 Fault Map Distributions . 89 3.8.2 Architectural Simulation . 91 3.9 Conclusion . 96 4 X-Mem: A New Extensible Memory Characterization Tool used for Case Studies on Memory Performance Variability ::::::::::::::::::::::::::::: 97 4.1 Introduction . 99 4.2 Related Work . 102 4.3 Background . 105 4.3.1 Non-Uniform Memory Access (NUMA) . 106 4.3.2 DDRx DRAM Operation . 107 4.4 X-Mem: Design and Implementation . 109 viii 4.4.1 Access Pattern Diversity . 109 4.4.2 Platform Variability . 113 4.4.3 Metric Flexibility . 115 4.4.4 UI and Benchmark Management . 117 4.5 Experimental Platforms and Tool Validation . 119 4.6 Case Study Evaluations . 124 4.6.1 Case Study 1: Characterization of the Memory Hierarchy for Cloud Sub- scribers . 124 4.6.2 Case Study 2: Cross-Platform Insights for Cloud Subscribers . 129 4.6.3 Case Study 3: Impact of Tuning Platform Configurations for Cloud Providers and Evaluating the Efficacy of Variation-Aware DRAM Timing Optimiza- tions . 134 4.7 Conclusion . 139 II Opportunistically Coping with Memory Errors 140 5 Performability: Measuring and Modeling the Impact of Corrected Memory Errors on Application Performance :::::::::::::::::::::::::::::::: 141 5.1 Introduction . 143 5.2 Related Work . 145 5.3 Background: DRAM Error Management and Reporting . 145 5.4 Measuring the Impact of Memory Errors on Performance . 147 5.4.1 Experimental Methods . 147 5.4.2 Empirical Results using Fault Injection . 149 5.5 Analytical Modeling Assumptions . 155 5.6 Model for Application Thread-Servicing Time (ATST) . 157 ix 5.6.1 Model Derivation . 157 5.6.2 Numeric Computation of ATST . 161 5.6.3 Average ATST . 164 5.6.4 ATST Slowdown . 165 5.6.5 Using the ATST Model to Predict the Impact of Page Retirement on Ap- plication Performance . 166 5.7 ATST-based Analytical Models for Batch Applications . 167 5.7.1 Uniprocessor System . 167 5.7.2 Multiprocessor System . 168 5.7.3 Predicting the Performance of Batch Applications . 170 5.8 ATST-based Analytical Models for Interactive Applications . 172 5.8.1 Uniprocessor System . ..
Recommended publications
  • The Impact of Converging Information Technologies. Proceedings of the CAUSE National Conference (Monterey, California, December 9-12, 1986)
    DOCUMENT RESUME ED 283 430 HE 020 404 TITLE The Impact of Converging Information Technologies. Proceedings of the CAUSE National Conference (Monterey, California, December 9-12, 1986). INSTITUTION CAUSE, Boulder, Colo. PUB DATE Dec 86 NOTE 586p.; Photographs may not reproduce well. PUB TYFE Collected Works - Conference Proceedings (021) Viewpoints (120) EDRS PRICE MF03/PC24 Plus Postage. DESCRIPTORS *College Administration; College Planning; *Computer Oriented Programs; *Data Processing; Higher Education; Information Networks; *Information Technology; *Management Information Systems; *Microcomputers; Telecommunications; Users (Information) ABSTRACT Proceedings of a 1986 CAUSE conference on the impact of converging information technologies are presented. Topics of conferenco papers include: policy issues in higher education, planning and information technology, people issues in information technology, telecommunications/networking, special environments, microcomputer_issues and applications, and managing academic computing. Some of the papers (with the authors) are: "Distributed Access to Central Data: A Policy Issue" (Eugene W. Carson) "Distributed Access to Central Data: The Cons" (Katherine P. Hall);_ "Overselling Technology: Suppose You Gave a Computer Revolution and Nobody Came?" (Linda Fleit); "Selling the President on the Computing Plan: Strategic Funds Programming" (John L. Green); "A Preliminary Report of Institutional Experieace_with MIS Software" (Paul J. Plourde); "Policy Issues Surrounding Decisions to Use Mainframe or Micros" (Phyllis A. Sholtysi; "Alternative Models for the Delivery of Computing and Communications Services" (E. Michael Staman) "Converging Technologies Require Flexible Organizations" (Carole Barone); "Student Computing and Policy Issues" (Gerald McLaughlin, John A. Muffo, Ralph O. Mueller, Alan R. Sack); "Strategic Planning for Information Resources Management: Putting the Building Blocks Together" (James I. Penrod, Michael G. Dolence); "Planning for Administrative Computing in a Networked Environment" (Cynthia S.
    [Show full text]
  • Jab343.Pdf (9.410Mb)
    ELECTRONIC COMPONENTS AND HUMAN INTERVENTIONS: DISTRIBUTING TELEVISION NEWS ONLINE A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Joshua Albert Braun August 2011 © 2011 Joshua Albert Braun ELECTRONIC COMPONENTS AND HUMAN INTERVENTIONS: DISTRIBUTING TELEVISION NEWS ONLINE Joshua Albert Braun, Ph.D. Cornell University 2011 This manuscript examines distribution of television news products online, and includes case studies from observation and interviewing at the sister companies, MSNBC.com and MSNBC TV. In particular, I focus heavily on the cases of The Rachel Maddow Show, a news program that created a unique and highly popular Web presence; a team of Web producers at MSNBC.com responsible for handling television content; and Newsvine, a subsidiary of MSNBC.com that has built much of the infrastructure on which MSNBC television sites are based. I argue the forging of distribution paths is best understood through the frameworks provided by the sociology of socio-technical systems, and using the cases at hand, illustrate the implications of this perspective for sociological perspectives more commonly used to study media organizations. I use John Law’s framework of heterogeneous engineering, in tandem with insights from other sociologists of systems, as a springboard to examine the manner in which MSNBC.com has assembled diverse resources into a working, but highly dynamic, system of online distribution for television. I argue large contemporary media organizations are best understood, not as single, monolithic system builders, but as assemblages of myriad heterogeneous engineers pursuing related, but provincial objectives.
    [Show full text]
  • Software Protection As a Risk Analysis Process
    This paper was submitted to ACM TOPS © 2020 ACM. Personal use of this material is permitted. Permission from ACM must be obtained for all other uses, in any current or future media, including reprinting or republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Software Protection as a Risk Analysis Process DANIELE CANAVESE, LEONARDO REGANO, and CATALDO BASILE, Dipartimento di Automatica e Informatica, Politecnico di Torino, Italy BART COPPENS and BJORN DE SUTTER, Computer Systems Lab, Ghent University, Belgium The last years have seen an increase of Man-at-the-End (MATE) attacks against software applications, both in number and severity. However, MATE software protections are dominated by fuzzy concepts and techniques, and security-through-obscurity is omnipresent in this field. In this paper, we present a rationale for adopting and standardizing the protection of software as a risk management process according to the NIST SP800-39 approach. We examine the relevant aspects of formalizing and automating the risk management activities, to instigate the necessary actions for adoption. We highlight the open issues that the research community has to address. We discuss the benefits that such an approach can bring to all stakeholders, from software developers to protections designers, and for the security of all the citizens. In addition, we present a Proof of Concept (PoC) of a decision support system that automates the risk analysis methodology towards the protection of software applications. Despite being in an embryonic stage, the PoC proved during validation with industry experts that several aspects of the risk management process can already be formalized and that it is an excellent starting point for building an industrial-grade decision support system.
    [Show full text]
  • Trusted Computing and Secure Virtualization in Cloud Computing
    Trusted Computing and Secure Virtualization in Cloud Computing Master Thesis Nicolae Paladi Lule˚aUniversity of Technology Swedish Institute of Computer Science Dept. of Computer Science, Secure Systems Group Electrical and Space Engineering Div. of Computer and Systems Science August 7, 2012 ABSTRACT Large-scale deployment and use of cloud computing in industry is accompanied and in the same time hampered by concerns regarding protection of data handled by cloud computing providers. One of the consequences of moving data processing and storage off company premises is that organizations have less control over their infrastructure. As a result, cloud service (CS) clients must trust that the CS provider is able to protect their data and infrastructure from both external and internal attacks. Currently however, such trust can only rely on organizational processes declared by the CS provider and can not be remotely verified and validated by an external party. Enabling the CS client to verify the integrity of the host where the virtual machine instance will run, as well as to ensure that the virtual machine image has not been tampered with, are some steps towards building trust in the CS provider. Having the tools to perform such verifications prior to the launch of the VM instance allows the CS clients to decide in runtime whether certain data should be stored- or calculations should be made on the VM instance offered by the CS provider. This thesis combines three components { trusted computing, virtualization technology and cloud computing platforms { to address issues of trust and security in public cloud computing environments. Of the three components, virtualization technology has had the longest evolution and is a cornerstone for the realization of cloud computing.
    [Show full text]
  • Master's Thesis
    MASTER'S THESIS Trusted Computing and Secure Virtualization in Cloud Computing Nicolae Paladi Master of Science (120 credits) Information Security Luleå University of Technology Department of Computer Science, Electrical and Space Engineering Trusted Computing and Secure Virtualization in Cloud Computing Master Thesis Nicolae Paladi Lule˚aUniversity of Technology Swedish Institute of Computer Science Dept. of Computer Science, Secure Systems Group Electrical and Space Engineering Div. of Computer and Systems Science August 7, 2012 ABSTRACT Large-scale deployment and use of cloud computing in industry is accompanied and in the same time hampered by concerns regarding protection of data handled by cloud computing providers. One of the consequences of moving data processing and storage off company premises is that organizations have less control over their infrastructure. As a result, cloud service (CS) clients must trust that the CS provider is able to protect their data and infrastructure from both external and internal attacks. Currently however, such trust can only rely on organizational processes declared by the CS provider and can not be remotely verified and validated by an external party. Enabling the CS client to verify the integrity of the host where the virtual machine instance will run, as well as to ensure that the virtual machine image has not been tampered with, are some steps towards building trust in the CS provider. Having the tools to perform such verifications prior to the launch of the VM instance allows the CS clients to decide in runtime whether certain data should be stored- or calculations should be made on the VM instance offered by the CS provider.
    [Show full text]
  • Scaling Collaborative Open Data Science
    Scaling Collaborative Open Data Science by Micah J. Smith B.A., Columbia University (2014) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science in Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2018 © Massachusetts Institute of Technology 2018. All rights reserved. Author................................................................ Department of Electrical Engineering and Computer Science May 23, 2018 Certified by. Kalyan Veeramachaneni Principal Research Scientist Laboratory for Information and Decision Systems Thesis Supervisor Accepted by . Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science Chair, Department Committee on Graduate Students 2 Scaling Collaborative Open Data Science by Micah J. Smith Submitted to the Department of Electrical Engineering and Computer Science on May 23, 2018, in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Abstract Large-scale, collaborative, open data science projects have the potential to address important societal problems using the tools of predictive machine learning. However, no suitable framework exists to develop such projects collaboratively and openly, at scale. In this thesis, I discuss the deficiencies of current approaches and then develop new approaches for this problem through systems, algorithms, and interfaces. A cen- tral theme is the restructuring of data science projects into scalable, fundamental units of contribution. I focus on feature engineering, structuring contributions as the creation of independent units of feature function source code. This then facilitates the integration of many submissions by diverse collaborators into a single, unified, machine learning model, where contributions can be rigorously validated and verified to ensure reproducibility and trustworthiness.
    [Show full text]
  • Name Two Types of Application Software
    Name Two Types Of Application Software Tuscan and artefactual Vladamir readmit, but Darrell somewhither bulldog her parietal. Hateful and Etonian Garrot flogged, but Stevy contractually pollard her trioxide. Allin often mutters backwardly when monandrous Edmund stropped perilously and people her festival. These unix operating software application Instant messaging like theft, name suggests that the internet to leave this definition available. It is application type of applications software is a name suggests that we need certain data. They only be functionalities of application type of! There really four basic kinds of applications classical application online application unsolicited application and brief application. An application type of applications want to types. The two subcategories of names are discovered before it is computer hardware which of retail stores user concerns in touch soon you are accessed simultaneously. All your data structure and individuals or even an interface present when scalability and abacuses long do to? The market of one innovative new customers with. Phase is two types of names for malicious program. It is two types of names are inbuilt spelling issues come to type of application. Provide two types namely, application used by administrative rules of names for you have a grid. For two types namely, name suggests that are. Thus operating software applications are two sets of software components. This makes them to software are created by the benefit your user create. Comment below i have a customer requirements, allowing small programs are other such as it allows them off as operating system! Also focus of a level assembly, as shareware such as picasa, computing is very common and requirements or delete all of software edition is its.
    [Show full text]