Timing Speculation and Adaptive Reliable Overclocking Techniques for Aggressive Computer Systems Viswanathan Subramanian Iowa State University
Total Page:16
File Type:pdf, Size:1020Kb
Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2009 Timing speculation and adaptive reliable overclocking techniques for aggressive computer systems Viswanathan Subramanian Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Electrical and Computer Engineering Commons Recommended Citation Subramanian, Viswanathan, "Timing speculation and adaptive reliable overclocking techniques for aggressive computer systems" (2009). Graduate Theses and Dissertations. 10967. https://lib.dr.iastate.edu/etd/10967 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Timing speculation and adaptive reliable overclocking techniques for aggressive computer systems by Viswanathan Subramanian A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Engineering Program of Study Committee: Arun K. Somani, Major Professor Akhilesh Tyagi Randall L. Geiger Joseph A. Zambreno David Ferna ´ndez-Baca Iowa State University Ames, Iowa 2009 Copyright c Viswanathan Subramanian, 2009. All rights reserved. ii To my dear parents To my enlightening teachers To my loving wife To my caring family To my beloved friends iii TABLE OF CONTENTS LIST OF TABLES . vi LIST OF FIGURES . viii ACKNOWLEDGEMENTS . xii ABSTRACT . xiv CHAPTER 1. INTRODUCTION . 1 1.1 High Performance Computing . 3 1.1.1 Device Scaling . 4 1.1.2 Microprocessor Architectures . 4 1.1.3 Better-Than-Worst-Case Designs . 5 1.1.4 Adaptive Systems . 6 1.2 Fault Tolerant Computing . 7 1.2.1 Transient Faults . 8 1.2.2 Redundancy Techniques . 8 1.2.3 Fault Mitigation Techniques . 9 1.2.4 Exploiting Fault Tolerance to Improve Performance . 10 1.3 Power/Thermal Aware Computing . 11 1.4 Contributions of this Thesis . 12 CHAPTER 2. BACKGROUND . 16 2.1 Parameter Variations . 16 2.2 Reliable Overclocking . 18 2.2.1 Timing Error Detection and Recovery . 18 iv 2.2.2 Timing Error Rate Based Feedback Control System . 20 2.2.3 Timing Speculation . 20 2.3 Razor Architecture . 22 2.4 SPRIT3E Framework . 24 CHAPTER 3. MANIPULATING SHORT-PATHS FOR PERFORMANCE . 27 3.1 Impact of Short-paths . 28 3.1.1 Timing Constraints . 28 3.1.2 Variable or Fixed Phase Shift . 30 3.1.3 Manipulating Contamination Delay . 31 3.2 Increasing Contamination Delay of a CLA Adder Circuit - A Case Study . 33 3.2.1 Analysis of Reliable Overclocking Performance . 37 CHAPTER 4. CHARACTERIZING ADAPTIVE RELIABLE OVERCLOCKING . 41 4.1 Evaluating Speculative Reliable Overclocking . 42 4.1.1 Performance Metrics . 46 4.2 Analysis Framework . 47 4.2.1 Modeling a Reliably Overclocked Processor (ROP) . 48 4.2.2 Power and Thermal Modeling . 53 4.3 Adaptive Clocking . 55 4.3.1 Clock Tuning Schemes . 57 4.3.2 Comparing Adaptive Clocking Techniques . 59 4.4 Reliable Overclocking Analysis . 63 CHAPTER 5. THERMAL IMPACT OF RELIABLE OVERCLOCKING . 69 5.1 Thermal and Reliability Management . 70 5.2 Analysis Framework for Estimating On-chip Temperature . 71 5.2.1 Thermal Throttling . 72 5.2.2 Simulation Parameters . 72 5.3 On-chip Temperature Trends in Reliably Overclocked Processors . 73 v CHAPTER 6. RELIABLE OVERCLOCKING AND TECHNOLOGY SCALING . 78 6.1 Technology Scaling . 79 6.2 A Reliable Overclocking Approach . 81 6.3 Analysis Framework . 82 6.4 Performance at Different Technology Nodes . 84 6.5 Comparing Technology Scaling with Reliable Overclocking . 85 CHAPTER 7. FAULT TOLERANT AGGRESSIVE SYSTEMS . 91 7.1 Conjoined Pipeline Architecture . 93 7.1.1 Conjoined Pipeline Datapath Description . 93 7.1.2 Error Detection and Recovery . 95 7.2 Timing Requirements . 100 7.3 Implementation Considerations . 104 7.3.1 Two Clock Approach . 106 7.4 Experiments and Results . 106 CHAPTER 8. CONCLUSIONS AND FUTURE WORK . 110 vi LIST OF TABLES Table 3.1 Implementation details of CLA adder circuits . 37 Table 4.1 Processor specifications . 49 Table 4.2 Synthesis report of major pipeline stages . 49 Table 4.3 Simulator parameters . 53 Table 4.4 Comparing various performance metrics between a base non-overclocked pro- cessor, a reliably overclocked processor tuned using a single clock generator and a reliably overclocked processor tuned using dual clock generators. All the systems execute SPEC2000 integer benchmarks . 61 Table 4.5 Comparing various performance metrics between a base non-overclocked pro- cessor, a reliably overclocked processor tuned using a single clock generator and a reliably overclocked processor tuned using dual clock generators. All the systems execute SPEC2000 floating point benchmarks . 62 Table 4.6 Comparing various performance metrics for non-overclocked and reliably over- clocked processors executing SPEC2000 integer benchmarks . 67 Table 4.7 Comparing various performance metrics for non-overclocked and reliably over- clocked processors executing SPEC2000 floating point benchmarks . 67 Table 4.8 Effect of memory overclocking on the performance benefits of a ROP execut- ing SPEC2000 integer benchmarks . 68 Table 4.9 Effect of memory overclocking on the performance benefits of a ROP execut- ing SPEC2000 floating point benchmarks . 68 Table 5.1 Mean Time To Failure (MTTF) for critical wear out models . 71 vii Table 5.2 Simulator parameters . 73 Table 6.1 Technology scaling parameters . 82 Table 6.2 Comparing various performance metrics across different technology nodes for a non-overclocked processor executing SPEC2000 integer benchmarks . 83 Table 6.3 Comparing various performance metrics across different technology nodes for a non-overclocked processor executing SPEC2000 floating point benchmarks 84 Table 7.1 Possible error scenarios . 99 Table 7.2 Fault injection results . 107 Table 7.3 Timing errors . 108 viii LIST OF FIGURES Figure 2.1 Cross section of a n-channel MOSFET in the ON state showing channel for- mation. The channel exhibits pinch-off near drain indicating operation in sat- uration (active) region. 18 Figure 2.2 Typical pipeline stage in a ROP. Local timing error detection and recovery scheme for critical registers is shown in detail. 19 Figure 2.3 Timing diagram showing overclocking advantage per cycle, as compared to the worst-case clock . 20 Figure 2.4 Timing diagram showing pipeline stage level timing speculation . 21 Figure 2.5 Reduced overhead Razor flip-flop and metastability detection circuits (Figure reproduced from [27]) . 23 Figure 2.6 SPRIT3E framework . 25 Figure 3.1 Clock timing waveforms showing governing requirements, for MAINCLK and PSCLK, over the full range of overclocked aggressive frequencies (FMIN )* FMAX ) 28 Figure 3.2 Examples of Main and PS clocks with variable and fixed phase shifts . 31 Figure 3.3 Timing waveforms after increasing contamination delay to half the propaga- tion delay for the full range of overclocked aggressive frequencies (FMIN )* FMAX ) ...................................... 33 Figure 3.4 8-bit CLA adder . 34 Figure 3.5 Delay distribution for an 8-bit CLA adder . 34 Figure 3.6 8-bit CLA adder with additional delay blocks to increase contamination delay 35 Figure 3.7 Delay distribution for an 8-bit CLA adder after increasing contamination delay 36 Figure 3.8 Experimental setup to estimate performance improvement of CLA adder circuits 37 ix Figure 3.9 Percent of error cycles versus clock period for an 8-bit delay added CLA adder circuit . 38 Figure 3.10 Percent of error cycles versus clock period for a 32-bit delay added CLA adder circuit (Contamination delay 1:21ns)...................... 39 Figure 3.11 Percent of error cycles versus clock period for a 32-bit delay added CLA adder circuit (Contamination delay 1:38ns)...................... 39 Figure 3.12 Percent of error cycles versus clock period for a 64-bit delay added CLA adder circuit . 40 Figure 4.1 Alpha 21264 integer and floating point pipeline showing timing error detec- tion and recovery circuit for critical registers . 41 Figure 4.2 Simulation framework . 48 Figure 4.3 Cumulative error profile for all pipeline stages at overclocked operating fre- quencies for SPEC2000 integer benchmarks. Also shown separately are error profiles for issue stage and execute stage. 51 Figure 4.4 Error profile for three SPEC2000 integer benchmarks executing five different instruction and data sets . ..