Low-Power and Error-Resilient VLSI Circuits and Systems

Low-Power and Error-Resilient VLSI Circuits and Systems by Chia-Hsiang Chen A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering) in the University of Michigan 2014 Doctoral Committee: Assistant Professor Zhengya Zhang, Chair Associate Professor Achilleas Anastasopoulos Professor David Blaauw Assistant Professor Junjie Zhu c Chia-Hsiang Chen 2014 All Rights Reserved To my family and \TEA TIME" friends for their love and support ii ACKNOWLEDGEMENTS Many people have contributed to the completion of this dissertation to whom I will always be grateful. My adivisor, Professor Zhengya Zhang, provided me the exceptional scientific resources and guidance, and allowed me to grow intellectully and to explore exciting researches over the years. I would also like to thank Professor David Blaauw, Achilleas Anastasopoulos, and Junjie Zhu, for participating in my dissertation committee and providing valuable feedback. I appreciate my research sponsors, including Intel, DARPA under cooperative agreement HR0011-13-2-0006, NASA, and the University of Michigan. I would like to thank all individuals who have inspired and mentored me through- out the doctoral training. Professor Michael Flynn provided me insightful discussions over different ideas. Dr. Keith Bowman, Dr. Farhana Sheikh, Dr. Charles Au- gustine, and Jim Tschanz from Intel Circuit Research Labs, they inspired me and broadened my knowledge. Another group of people that assisted me through this journey were those in my research group, including Jungkuk Kim, Youn Sung Park, Phil Knag, Yaoyu Tao, Siming Song, Shuanghong, Thomas Chen, and Wei Tang. They shared their expertise and effort to accelerate my understanding on several re- searching topics. In addition to those in my research group, I would also like to thank Dr. Wei-Hsiang Ma, Dr. Zhiyoong Foo, Tai-Chuan Ou, and Ye-Sheng Kuo for bring enjoyable memories as circuit designers at the University of Michigan . Finally, I want to express my deepest appreciation to my parents and family in Taiwan for their unconditional love and support. iii TABLE OF CONTENTS DEDICATION :::::::::::::::::::::::::::::::::: ii ACKNOWLEDGEMENTS :::::::::::::::::::::::::: iii LIST OF FIGURES ::::::::::::::::::::::::::::::: vii LIST OF TABLES :::::::::::::::::::::::::::::::: xiv ABSTRACT ::::::::::::::::::::::::::::::::::: xvi CHAPTER I. Introduction ..............................1 1.1 Error characterization and impact analysis . .2 1.1.1 Data-retention failures and timing faults . .2 1.1.2 Soft errors . .3 1.2 Design and evaluation methodology for error detection techniques4 1.2.1 In situ error detection circuits . .4 1.2.2 FPGA-based transient simulator . .5 1.3 Confidence-driven computing architecture . .5 1.4 Low-power and error-resilient system with algorithm and architecture co-design . .5 1.5 Dissertation outline . .6 II. Timing Faults and Data-Retention Failures ...........7 2.1 Statistical circuit simulation methodology . .7 2.2 Data-retention failure and Vmin analysis . 10 2.3 Hold-time violation and Vmin analysis . 13 2.4 Vmin of sequential logic circuits . 17 2.5 Summary . 19 III. Heavy-Ion Induced Soft Error Characterization ........ 20 3.1 Test chip designs and test setup . 21 iv 3.1.1 Test chip 1 . 21 3.1.2 Test chip 2 . 24 3.1.3 Ion beam testing . 25 3.2 Circuit design considerations . 26 3.2.1 Radiation hardening by redundancy . 27 3.2.2 Increasing critical charge by upsizing . 28 3.2.3 Depth and sizing of combinational circuits . 29 3.2.4 Data dependence . 29 3.2.5 Latchup and total ionization dose . 31 3.3 Voltage and frequency dependence . 32 3.3.1 Supply voltage scaling . 32 3.3.2 Clock frequency effects . 33 3.4 Summary . 35 IV. Error-Detection Circuits and Simulation Methodology .... 36 4.1 Transient error detection circuits . 37 4.2 Flexible cross-edge checking . 38 4.2.1 Cross-edge circuitry . 40 4.2.2 Pre-edge circuitry . 43 4.2.3 Comparisons . 43 4.3 FPGA-based transient error simulator . 44 4.4 Transient error simulator . 45 4.4.1 Delay model and error model . 46 4.4.2 Transient simulation . 48 4.5 Evaluation of error-resilient designs . 50 4.5.1 Pre-edge technique . 51 4.5.2 Post-edge technique . 52 4.6 Summary . 54 V. Confidence-Driven Architecture for Error-Resilient Computing 55 5.1 Related work . 55 5.2 Confidence-driven computing . 57 5.2.1 Lockstep synchronization . 58 5.2.2 Speculative synchronization . 59 5.2.3 Spatial redundancy . 61 5.3 Reliability and performance evaluation using sample-based em- ulation . 61 5.3.1 Reliability evaluation . 62 5.3.2 Performance evaluation . 63 5.3.3 Implementation results and case study . 67 5.4 Early checking . 68 5.4.1 Early checking . 68 5.4.2 Functional analysis and comparison . 69 v 5.5 Reliability and performance evaluation using event-based simulation . 70 5.5.1 Error simulation . 71 5.5.2 Experimental evaluation . 71 5.5.3 Implementation results and case study . 75 5.6 Summary . 76 VI. Low-Power Error-Resilient Systems for Wireless Communi- cation ................................... 78 6.1 Background . 78 6.1.1 MIMO system model . 79 6.1.2 Non-iterative MIMO detection algorithms . 80 6.1.3 Iterative MIMO detection algorithms . 82 6.2 Iterative detection-decoding design overview . 83 6.3 SISO MMSE algorithm and optimization . 85 6.3.1 Reduced-complexity MMSE-PIC algorithm . 86 6.3.2 Algorithmic optimizations for efficient MMSE detector implementation . 87 6.4 Architecture and circuit optimization . 90 6.5 Nonbinary interface and optimization . 97 6.5.1 Detector to decoder: LLR computation with L1-norm 97 6.5.2 Decoder to detector: Efficient soft-symbol and vari- ance computation . 100 6.6 Measurements . 104 6.7 Conclusion . 105 VII. Conclusion and Outlook ....................... 109 7.1 Conclusion . 109 7.2 Outlook . 110 BIBLIOGRAPHY :::::::::::::::::::::::::::::::: 112 vi LIST OF FIGURES Figure 2.1 Master-slave flip-flop (MSFF) schematic. .8 2.2 Probability density distributions from MC simulations with 104 sam- ples for (a) normalized hold time at Vnorm and (b) normalized data- retention Vmin with Rg. The MC (a) hold time and (b) data- retention Vmin corresponding to a 3σ WID-variation probability are compared to MPP results. .8 2.3 Data-retention analysis for the slave latch of the MSFF while holding a logic 1. The PMOS process variation (circled) and gate-dielectric soft breakdown (Rg) on node S I limit the data-retention Vmin. 10 2.4 (a) Butterfly curves for the static DC analysis of data retention. (b) Description of the retention windows for the dynamic transient analysis of data retention. 12 2.5 Normalized data-retention Vmin with the individual and combined contributions of WID variation and Rg for the conventional static DC analysis and the dynamic transient analysis that captures the cycle time effect. 12 2.6 Hold time is defined as the CK-to-D delay where a 50% Vcc glitch occurs on node M O........................... 13 2.7 Dynamic transient simulation description for the worst-case data- dependent hold-time analysis. 14 2.8 Hold time versus normalized Vcc with different placement of Rg while not including WID variations. 15 2.9 Normalized hold time with and without Rg for a 5σ WID-variation target and an FO4 inverter chain delay versus the normalized Vcc. 15 vii 2.10 Hold time as a percentage of cycle time versus normalized Vcc. Hold- time Vmin is defined as the Vcc in which the hold time exceeds 10% of the cycle time. 16 2.11 Breakdown of the 5σ WID variation across all the transistors in the MSFF from the MPP simulations. 17 2.12 Vmin for data retention and hold time versus different combinations of Rg and 5σ WID variation. 18 2.13 Hold-time Vmin for the original MSFF in Fig. 2.1 and for the MSFF with a 2× larger first clock inverter. 18 3.1 Shift register chains implemented on the test chip 1. Each chain consists of 500 flip-flops. Chain 6 to 11 each has a varying number and size of inverters. 22 3.2 Three types of D flip-flops: (a) DICE flip-flop [1], (b) TMR flip- flop [2], and (c) unprotected standard D flip-flop. 23 3.3 (a) Test chip 1 layout, and (b) microphotograph. 24 3.4 Radiation test setup. 26 3.5 Automated testing of chip 2. 27 3.6 Cross section per bit with ion energy for standard, DICE and TMR flip-flops. 28 3.7 Cross section per bit after adding combinational circuits. 30 3.8 Latch design results in unequal upsets. 31 3.9 Cross section per bit of D flip-flop, DICE and TMR flip-flops at supply voltage of 1.0V and 0.7V. 32 3.10 Cross section per bit of standard and radiation-hardened correlator cores at supply voltage of 1.0V and 0.7V (clock frequency of 100MHz). 33 3.11 Cross section per bit of standard and radiation-hardened correlator cores at 100MHz and 500MHz (1.0V supply voltage). 34 4.1 (a) Timing margin allocated for static and dynamic variations, and (b) pre-edge and post-edge error checking window. 37 viii 4.2 (a) Checking window positions depending on path criticality, (b) fixed post-edge checking window causing race conditions, and (c) fixed pre- edge checking window leading to performance degradation. 39 4.3 (a) Cross-edge checking design based on a conventional transmission gate flip-flop [3], and (b) error detector design. 40 4.4 Waveforms illustrating the operations of the cross-edge circuitry. 42 4.5 (a) Pre-edge checking design, and (b) transition detector design. 42 4.6 Waveforms illustrating the operations of the pre-edge circuitry.

Low-Power and Error-Resilient VLSI Circuits and Systems

Pipelining and Vector Processing

Diploma Thesis

How Data Hazards Can Be Removed Effectively

Pipelining: Basic Concepts and Approaches

Powerpc 601 RISC Microprocessor Users Manual

UNIT 8B a Full Adder

Improving UNIX Kernel Performance Using Profile Based Optimization

Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: a Practical Approach∗

With Extreme Scale Computing the Rules Have Changed

Theoretical Peak FLOPS Per Instruction Set on Modern Intel Cpus

Lecture Topics RAW Hazards Stall Penalty

Misleading Performance Reporting in the Supercomputing Field David H