Criticality Assessments for Improving Algorithmic Robustness Thomas B
Total Page:16
File Type:pdf, Size:1020Kb
University of New Mexico UNM Digital Repository Computer Science ETDs Engineering ETDs Fall 11-15-2018 Criticality Assessments for Improving Algorithmic Robustness Thomas B. Jones University of New Mexico Follow this and additional works at: https://digitalrepository.unm.edu/cs_etds Part of the Computer and Systems Architecture Commons, Hardware Systems Commons, Other Computer Engineering Commons, Software Engineering Commons, Systems Architecture Commons, and the Theory and Algorithms Commons Recommended Citation Jones, Thomas B.. "Criticality Assessments for Improving Algorithmic Robustness." (2018). https://digitalrepository.unm.edu/ cs_etds/96 This Dissertation is brought to you for free and open access by the Engineering ETDs at UNM Digital Repository. It has been accepted for inclusion in Computer Science ETDs by an authorized administrator of UNM Digital Repository. For more information, please contact [email protected]. i Thomas Berry Jones Candidate Computer Science Department This dissertation is approved, and it is acceptable in quality and form for publication: Approved by the Dissertation Committee: David Ackley , Chairperson Darko Stefanovic George Luger Thomas Caudell Lance Williams ii Criticality Assessments for Improving Algorithmic Robustness by Thomas Berry Jones B.S., Computer Engineering and Mathematics, Gonzaga University, 2010 M.S., Computer Science, University of New Mexico, 2013 DISSERTATION Submitted in Partial Fulfillment of the Requirements for the Degree of Doctorate of Philosophy Computer Science The University of New Mexico Albuquerque, New Mexico December, 2018 iii Dedication To my family, Bryan, Colleen, Tim, and Jon, for their love and support and to my nephew and niece Colby and Olivia. A Vehicle, Heinsight, and Carbon on Paper iv Acknowledgments I would first like to thank my friends, especially James Orrell, Kyle Wilson, and Evan Poncelet who have, at various times, encouraged me to continue on this path. I don't think I would be here today without them. I would also like to thank my mother, Colleen Jones, who graciously helped with copy-editing the final document on very short notice. Next, I want to thank my committee members, who have all been very kind and have had great suggestions for improving the work. Dr. George Luger has provided immense intellectual, moral, and material assistance during this process. He has always supported my work and has helped me greatly with my career. I am sure that I would not have finished without his assistance. Dr. Darko Stefanovic was very gracious to join my committee with little notice and has taught me a great deal about programming languages. Additionally, I want to thank Dr. Lance Williams who has always inspired some of the most interesting ideas when I've spoken with him. His suggestions concerning failure redistribution methods were spot on. I would also like to thank Dr. Thomas Caudell who has been very benevolent in looking after the work of this wayward graduate student after his retirement. Finally, I would like to thank my advisor, Dave Ackley, for putting up with me all these years. I have learned more from him than from anyone else. v Criticality Assessments for Improving Algorithmic Robustness by Thomas Berry Jones B.S., Computer Engineering and Mathematics, Gonzaga University, 2010 M.S., Computer Science, University of New Mexico, 2013 Ph.D., Computer Science, University of New Mexico, 2018 Abstract Though computational models typically assume all program steps execute flaw- lessly, that does not imply all steps are equally important if a failure should occur. In the \Constrained Reliability Allocation" problem, sufficient resources are guaran- teed for operations that prompt eventual program termination on failure, but those operations that only cause output errors are given a limited budget of some vital resource, insufficient to ensure correct operation for each of them. In this dissertation, I present a novel representation of failures based on a combi- nation of their timing and location combined with criticality assessments|a method used to predict the behavior of systems operating outside their design criteria. I ob- serve that strictly correct error measures hide interesting failure relationships, failure importance is often determined by failure timing, and recursion plays an important role in structuring output error. I employ these observations to improve the out- put error of two matrix multiplication methods through an economization procedure vi that moves failures from worse to better locations, thus providing a possible solu- tion to the constrained reliability allocation problem. I show a 38% to 63% decrease in absolute value error on matrix multiplication algorithms, despite nearly identical failure counts between control and experimental studies. Finally, I show that efficient sorting algorithms are less robust at large scale than less efficient sorting algorithms. vii Contents List of Figures xiii List of Tables xxii Glossary xxiii 1 Introduction 1 1.1 Background . .1 1.1.1 From Fault Tolerance to Approximate Computing . .3 1.1.2 Best-Effort Computing . .4 1.2 Problem Statement . .6 1.3 Conceptual Framework . .7 1.4 Research Questions and Hypotheses . 10 1.4.1 How do Measures of Correctness Hide or Uncover Interesting Failure Dynamics Inside Algorithms? . 10 1.4.2 How Are an Algorithm's Failure Dynamics Related to the Al- gorithm's Known Behavior? . 11 Contents viii 1.4.3 Can Failure Dynamics Analysis Improve Algorithmic Perfor- mance in Resource Constrained Environments? . 11 1.4.4 Which Operations Respond Well to This Method? . 12 1.5 Procedures . 13 1.6 Thesis Outline . 14 2 Related Work 15 2.1 Criticality . 15 2.2 Fault Tolerance . 17 2.3 Fault Injection . 19 2.4 Quantified Correctness . 21 2.5 Approximate Algorithms and Computing . 23 2.6 Error Economization . 26 2.7 Theories of Robustness . 27 3 The Model 29 3.1 Author Contribution Statement . 29 3.2 Publication Notes . 29 3.3 Introduction . 30 3.4 Criticality Assessments . 31 3.5 Error Measures . 34 Contents ix 3.6 Mathematical Abstractions of Space and Time . 35 3.7 Criticality Defined . 37 3.8 Criticality Explorer . 40 3.9 Failure Shaping . 43 4 Sorting Algorithm Criticality 47 4.1 Author Contribution Statement . 47 4.2 Publication Notes . 47 4.3 Introduction . 48 4.3.1 Measuring Sortedness . 50 4.4 Results . 51 5 Criticality and Armoring in Matrix Multiplication 58 5.1 Author Contribution Statement . 58 5.2 Publication Notes . 58 5.3 Introduction . 59 5.4 Failure Shaping on Scalar Multiplication . 60 5.4.1 Criticality Assessment Results on Scalar Multiplication . 62 5.4.2 Failure Shaping Results on Scalar Multiplication . 62 5.5 Failure Shaping Matrix Multiplication . 63 5.5.1 Criticality Assessment Results on Matrix Multiplication . 64 Contents x 5.5.2 Failure Shaping Results on Matrix Multiplication . 65 5.6 Proxy Failure Shaping . 66 5.6.1 Criticality and Failure Shaping Results on Scalar Multiplica- tion Proxy Method . 66 6 Scalable Robustness 97 6.1 Author Contribution Statement . 97 6.2 Publication Notes . 97 6.3 Introduction . 98 6.4 Scalable Robustness Defined . 99 6.4.1 Average-Case Scalable Robustness . 99 6.4.2 Worst-Case Scalable Robustness . 102 6.5 Scalable Robustness on Pairwise-Comparison Sorting . 104 6.5.1 Algorithms Old and New . 105 6.6 Average-Case Scalable Robustness on Sorting Algorithms . 106 6.7 Worst-Case Scalable Robustness on the Sorting Algorithms . 111 6.7.1 Quicksort|Spearman's Footrule Error . 114 6.7.2 Bubble Sort|Spearman's Footrule Error . 114 6.7.3 Round Robin Sort|Spearman's Footrule Error . 115 7 Discussion 120 7.1 Summary of Work . 120 Contents xi 7.2 Questions Answered . 121 7.2.1 Strict Correctness Hides Criticality Structure While Quantified Correctness Reveals It . 122 7.2.2 Criticality Structure Seems Related to Known Algorithmic Be- haviors . 123 7.2.3 Algorithmic Performance in Resource-Constrained Environments Is Improved Through Failure Dynamic Analysis . 127 7.2.4 Bulk Program Operations Make Good Choice for Failure Shaping131 7.2.5 Numerical Stability . 131 7.2.6 When Will Criticality Structures Scale Effectively? . 132 7.3 Limitations of the Study . 132 7.4 Future Work . 135 7.4.1 Expanded Empirical Studies . 135 7.4.2 Scaling Through Search . 135 7.4.3 Machine Learning and Attribution . 136 7.4.4 Hardware Failure Studies . 138 7.4.5 Spatial Graphs for Coordinated Failures . 138 7.4.6 Stochastic and Continuous Failure Redistribution Policies . 138 7.4.7 Scalable Robustness Beyond Sorting . 139 7.5 Significance . 139 7.6 Conclusion . 141 Contents xii A Permission to Reproduce Previously Published Content 143 A.1 IEEE Reproduction Instructions . 145 A.2 Springer Nature License Terms and Conditions . 148 References 154 xiii List of Figures 3.1 Criticality Assessment and Failure Shaping Overview. Criti- cality Explorer takes a prepared algorithm with error measures, input generator, and annotated method level failure interfaces shown on the left. After a calibration step to evaluate the maximum failure point count, and samples per failure point, the tool then estimates failure point criticalities through a Monte Carlo simulation. Each sample produces the output of a single run with, and without, a failure in- jected on the a given failure point and scores the error between the two runs. The errorful inputs, outputs, and absolute difference score can be found in the bottom center of the figure. Failures are then shaped using the median criticality and error scores are produced for average i.i.d. as well as shaped failures. See Section 3.8 for details. Figure reprinted from [1] with permission. 46 4.1 Strict Correctness Criticality on Sorting Algorithms Extremal values dominate in a plot of strict correctness criticality (y axis) vs. the comparisons executed during a sort (`Comparison Index'; x axis): Most faults are either critical or not critical. See Section 4.4 for de- tails. Figure reprinted from [2] with permission.