Fault-Tolerant Computation Using Algebraic Homomorphisms

Fault-Tolerant Computation using Algebraic Homomorphisms Paul E. Beckmann RLE Technical Report No. 580 June 1993 Research Laboratory of Electronics Massachusetts Institute of Technology Cambridge, Massachusetts 02139-4307 This work was supported in part by the Defense Advanced Research Projects Agency monitored by the U.S. Navy Office of Naval Research under Grant N00014-89-J-1489 and in part by the Charles S. Draper Laboratories under Contract DL-H-418472. Fault-Tolerant Computation using Algebraic Homomorphisms by Paul E. Beckmann Submitted to the Department of Electrical Engineering and Computer Science on August 14, 1992, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract Arithmetic codes are a class of error-correcting codes that are able to protect computation more efficiently than modular redundancy. In this thesis we consider the problem of de- signing an arithmetic code to protect a given computation. The main contributions are as follows: * The first constructive procedure, outside of modular redundancy, for generating arithmetic codes for a large class of operations. The approach is mathematically rigor- ous, based on group-theory, and fully characterizes the important class of systematic- separate codes. The results encompass computation that can be modeled as operations in an algebraic group, ring, field, or vector space. * A novel set-theoretic framework for characterizing the redundancy present in a system. We present a decomposition of a robust system into a cascade of three systems and determine general requirements for multiple error detection and correction. * We identify an important class of errors for which the redundancy present in a system may be completely characterized by a single integer, analogous to the minimum distance of a binary error-correcting code. * We unify the existing literature on arithmetic codes. A wide variety of seemingly unrelated codes are combined into a single general framework. * A large number of examples illustrating the application of our technique are presented. * Detailed analyses of two new and practical fault-tolerant systems: fault-tolerant convolution and A/D conversion. Thesis Supervisor: Bruce R. Musicus Title: Research Affiliate, M.I.T. Digital Signal Processing Group 2 Dedicated to the glory of God, Whatever you do, whether in word or deed, do it all in the name of the Lord Jesus, giving thanks to God the Father through him. - Colossians 3:17 and to my wife, Chin. A wife of noble characterwho can find? She is worth far more than rubies. - Proverbs 31:10 3 Acknowledgments I would like to thank my advisor, Bruce Musicus, for the excellent guidance and support that he has given me during the past three years. His enthusiasm, encouragement, and friendship have made my life as a graduate student both satisfying and enjoyable. Bruce was always available when I reached an impasse, and gave me the freedom to explore new ideas. I am grateful for the support of Professor Alan Oppenheim. Under his direction, the Digital Signal Processing Group provided a stimulating environment in which to carry out my research. I am also indebted to all the members of the group for their friendship and for many thought provoking discussions. I would like to particularly acknowledge Kambiz Zangi, Mike Richard, Steven Isabelle, and Andy Singer. They were always available and patiently let me bounce ideas off of them. I would also like to thank Deborah Gage for her constant encouragement and help, and Sally Bemus for assistant in the final preparation of the thesis and defense. This thesis was made possible by the generous financial support of Rockwell Interna- tional. They supported me unconditionally for the past three years and allowed me to pursue an interesting thesis topic. Additional funding was provided by the Defense Ad- vanced Research Projects Agency and Draper Laboratory. I would like to thank my parents for their steadfast love and support. They have always been an encouragement and inspiration to me. Finally, and most importantly, I would like to thank my wife Chin for her unfailing love, patience, and support. She has always been behind me 100%, and is a perpetual source of joy. This thesis would not have been possible without her. 4 Contents 1 Introduction 8 1.1 Major Contributions .......... 9 1.2 Outline of Thesis ............ 10 2 Previous Approaches to Fault-Tolerance 13 2.1 Introduction. ................ ·. 13 .............. 2.2 Modular Redundancy ........... .............. 15 2.2.1 Summary .............. .............. 17 2.3 Arithmetic Codes .............. .............. 17 2.3.1 aN Codes .............. .............. 22 2.3.2 (aN)M Codes ............ .............. 22 2.3.3 Integer Residue Codes ....... .............. 23 2.3.4 Integer Residue Number Systems . .............. 24 2.3.5 Comparison of Arithmetic Codes ·. .............. 25 2.4 Algorithm-Based Fault-Tolerance ..... .............. 26 2.4.1 Codes for the Transmission of Real Numbers ............. 26 . 2.4.2 Codes for Matrix Operations . .............. 27 2.4.3 Codes for Linear Transformations. .............. 31 2.4.4 Codes for FFTs ........... .............. 32 2.4.5 Codes for Convolution ....... .............. 33 2.5 Summary .................. .............. 34 3 Set-Theoretic Framework 35 3.1 Set-Theoretic Model of Computation . 35 .................... 3.2 Redundant Computation. ........ .................... 37 3.3 Symmetric Errors ............ .................... 43 3.4 Example - Triple Modular Redundancy .................... 46 3.5 Summary ................. .................... 49 4 Group-Theoretic Framework 50 4.1 Group-theoretic Model of Computation .................... 51 4.2 Redundant Computation......... .................... 52 4.3 Error Detection and Correction ..... .................... 57 4.3.1 Redundancy Conditions . .................... 57 4.3.2 Coset-Based Error Detection and Correction ............. 59 5 4.3.3 Determination of the Syndrome Homomorphism . 61 4.3.4 Symmetric Errors ............................ 63 4.4 Partial Homomorphism . 63 4.5 Code Selection Procedure . 65 4.6 Application to Other Algeboraic Systems .......................... 66 4.6.1 Rings ....... 67 4.6.2 Fields ....... 68 4.6.3 Vector Spaces . .. 68 4.7 Examples ......... 70 4.7.1 Triple Modular Redundancy ....................... 70 4.7.2 aN Codes ..... 73 4.7.3 Matrix Rings . 74 4.7.4 Finite Fields . 76 4.7.5 Linear Transformatiions ................................. 78 4.8 Summary ......... 79 5 Systematic-Separate Codes 81 5.1 Description of Codes .................... ........... 82 5.2 Determination of Possible Homomorphisms ....... ........... 84 5.3 Error Detection and Correction .............. ........... 86 5.4 Multiple Parity Channels ................. ........... 89 5.5 Application to Other Algebraic Systems ......... ........... 91 5.5.1 Rings ........................ ........... 91 5.5.2 Fields ........................ ........... 91 5.5.3 Vector Spaces ................... ........... 92 5.6 Examples .......................... ........... 92 5.6.1 Trivial Codes .................... ........... 93 5.6.2 Integer Residue Codes ............... ........... 93 5.6.3 Real Residue Codes ................ ........... 95 5.6.4 Multiplication of Nonzero Real Numbers ..... ........... 97 5.6.5 Linear Convolution ................. ........... 102 5.6.6 Linear Transformations ............. ........... 104 5.6.7 Gaussian Elimination and Matrix Inversion . ........... 107 5.7 Summary .......................... ........... 109 6 Fault-Tolerant Convolution 110 6.1 Introduction. ..................... ... .. 110 .... 6.2 Winograd Convolution Algorithm ......... ... .. .. 111 .... 6.3 Fault-Tolerant System ................ ... .. 114 ... 6.3.1 Multiprocessor Architecture ......... ... .. 115 .... 6.3.2 Relationship to Group-Theoretic Framework ... 117 ... 6.3.3 Fault Detection and Correction ....... ... 118 .... 6.4 Fast FFT-Based Algorithm ............ ... 122 .... 6.4.1 FFT Moduli ................. ... .. .. 122 ... 6.4.2 Algorithm Description ............ ... .. 123 ... 6 6.4.3 Algorithm Summary .... ....................... 127 6.5 Generalized Likelihood Ratio Test ....................... 129 6.6 Fault-Tolerance Overhead ..... ....................... 132 6.7 Conclusion ............ ....................... 134 6.A Proof of Theorems ......... ....................... 135 6.B Solution of Likelihood Equations ....................... 138 7 Fault-Tolerant A/D Conversion 142 7.1 Round-Robin A/D Converter .......................... 142 7.2 Relationship to Framework ........................... 144 7.3 Algorithm Development ............................. 148 7.3.1 Generalized Likelihood Ratio Test ................... 149 7.3.2 Required Number of Extra Converters ................. 155 7.3.3 Probability of False Alarm ....................... 155 7.3.4 Probability of Detection ........................ 158 7.3.5 Probability of Misdiagnosis ...................... 159 7.3.6 Variance of Low Pass Signal Estimate ................. 160 7.4 Simulation of Ideal System .......................... 160 7.5 Realistic Systems ................................. 162 7.5.1 Eliminate the Low Pass Filter ...................... 163 7.5.2 Eliminate the Dither System ...................... 163 7.5.3 Finite Length Filter ........................... 164 7.5.4 Finite Order Integrators ......................... 169 7.5.5 Real-Time Fault Detection Algorithm ................. 170 7.6 Conclusion ...................................

Load more