Sensitivities for Guiding Refinement in Arbitrary-Precision Arithmetic

Sensitivities for Guiding Refinement in Arbitrary-Precision Arithmetic by Jesse Michel B.S., Massachusetts Institute of Technology (2019) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May 2020 © Massachusetts Institute of Technology 2020. All rights reserved. Author.................................................................... Department of Electrical Engineering and Computer Science May 18, 2020 Certified by . Michael Carbin Jamieson Career Development Assistant Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by............................................................... Katrina LaCurts Chair, Master of Engineering Thesis Committee 2 Sensitivities for Guiding Refinement in Arbitrary-Precision Arithmetic by Jesse Michel Submitted to the Department of Electrical Engineering and Computer Science on May 18, 2020, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract Programmers often develop and analyze numerical algorithms assuming that they operate on real numbers, but implementations generally use floating-point approximations. Arbitrary- precision arithmetic enables developers to write programs that operate over reals: given an output error bound, the program will produce a result within that bound. A key drawback of arbitrary-precision arithmetic is its speed. Fast implementations of arbitrary-precision arithmetic use interval arithmetic (which provides a lower and upper bound for all variables and expressions in a computation) computed at successively higher precisions until the result is within the error bound. Current approaches refine computations at precisions that increase uniformly across the computation rather than changing precisions per-variable or per-operator. This thesis proposes a novel definition and implementation of derivatives through interval code that I use to create a sensitivity analysis. I present and analyze the critical path algorithm, which uses sensitivities to guide precision refinements in the computation. Finally, I evaluate this approach empirically on sample programs and demonstrate its effectiveness. Thesis Supervisor: Michael Carbin Title: Jamieson Career Development Assistant Professor of Electrical Engineering and Computer Science 3 4 Acknowledgments I thank my advisor Michael Carbin. He helped guide the intuition and motivation that shaped this thesis and provided useful feedback and guidance on the experimental results. I would also like to thank Ben Sherman for helping to develop technical aspects of this thesis, for making the time to review my writing, and for his guidance throughout the research process. Alex Renda, Rogers Epstein, Stefan Grosser, and Nina Thacker provided useful feedback. I am grateful for the financial support that I have received from NSF grant CCF- 1751011. I thank my parents, sisters, and extended family for their love and support and my nephew Joseph for being a shining light in my life. 5 6 Contents 1 Introduction 13 1.1 Motivating example . 15 1.2 Thesis . 17 1.3 Outline . 17 2 Background on Interval Arithmetic 19 2.1 Interval addition . 20 2.2 Interval multiplication . 20 2.3 Interval sine . 21 2.4 Analysis . 23 3 Sensitivities for Precision Refinement 25 3.1 A baseline schedule . 26 3.2 Sensitivities from derivatives . 27 3.2.1 Constructing sensitivities . 27 3.2.2 Sensitivity as a derivative . 28 3.2.3 Introducing a cost model . 29 3.3 A schedule using sensitivities . 30 3.4 Analysis . 31 3.4.1 Uniform schedule . 32 3.4.2 Critical path schedule . 32 3.4.3 Cost-modeled schedule . 33 3.4.4 A comparison of schedules . 34 7 4 Automatic Differentiation of Interval Arithmetic 37 4.1 Introduction to automatic differentiation . 37 4.2 Automatic differentiation on intervals . 38 4.2.1 Derivative of interval addition . 39 4.2.2 Derivative of interval multiplication . 40 4.2.3 Derivative of interval sine . 41 4.3 Analysis . 42 5 Results 45 5.1 Schedules . 45 5.1.1 Baseline schedule . 45 5.1.2 Critical path schedule . 46 5.2 Empirical comparison . 46 5.2.1 Improving a configuration . 46 5.2.2 Improving a schedule . 47 5.3 Implementation . 48 6 Related Work 49 6.1 Mixed-precision tuning and sensitivity analysis . 50 6.2 Arbitrary-precision arithmetic . 50 6.2.1 Pull-based approaches . 51 6.2.2 Push-based approaches . 51 7 Discussion and Future Work 53 7.1 Benchmarks . 53 7.2 Further improving precision refinement . 54 7.2.1 Per-primitive cost modeling . 54 7.2.2 Unexplored trade-offs in precision refinement . 55 7.2.3 Generalizing the critical path algorithm . 56 7.3 New applications to experimental research . 56 8 Conclusions 57 8 List of Figures 1-1 Example of a uniform configuration . 15 1-2 Derivatives of the computation in Figure 1-1. 16 1-3 Sensitivities of the computation in Figure 1-1 . 16 1-4 Example of a non-uniform configuration . 16 2-1 The four key monotonic regions for the definition of interval sine. 22 2-2 A simple Python implementation of interval sin. 23 3-1 Computation graph for theoretical analysis . 31 4-1 Reverse-mode automatic differentiation on intervals. 39 4-2 Interval addition with derivatives. 40 9 10 List of Tables 3.1 Theoretical comparison of schedules . 35 5.1 Comparison of precisions for configurations . 47 5.2 Comparison of error and time for configurations . 47 7.1 FPBench benchmark results. 55 11 12 Chapter 1 Introduction Floating-point computations can produce arbitrarily large errors. For example, Python implements the IEEE-754 standard, which produces the following behavior for 64-bit floating- point numbers: >>> 1 + 1e17 - 1e17 0.0 The result of this computation is 0 instead of 1! This leads to an arbitrarily large error in results; for example, (1 + 1e17 − 1e17)푥 will be always be 0 instead of 푥. Resilience to numerical-computing error is especially desirable for safety-critical software such as control systems for vehicles, medical equipment, and industrial plants, which are known to produce incorrect results because of numerical errors [12]. In contrast to floating-point arithmetic, arbitrary-precision arithmetic computes a result within a given error bound. Concretely, given the function 푦 = 푓(푥) and an error bound 휖, arbitrary-precision arithmetic produces a result 푦˜ such that |푦˜ − 푦| < 휖. Arbitrary-precision primitives It is necessary to use a data-type that supports arbitrary rational numbers in order to refine to arbitrarily small error. I chose to use a multiple- precision floating-point implemented in MPFR [14]. To understand the representation, con- 13 sider the example of representing 휋 to 5 mantissa bits: exponent ⏞ ⏟ −3 11.0012 = 110012 ×2 . ⏟ ⏞ mantissa The exponent automatically adjusts as appropriate, so requesting 10 mantissa bits of precision results in −8 11.001001002 = 11001001002 × 2 . Since the exponent adjusts automatically, I focus on setting the number of mantissa bits for the variables and operators in the computation. For the rest of the thesis, bits of precision will denote mantissa bits. Implementing arbitrary-precision arithmetic The push-based approach to implementing arbitrary-precision sets the precisions at which to compute each variable and operator and computes the error in the output. It then refines results at increasingly high precisions until the result is within the given error bound. Each pass through the computation uses interval arithmetic, which computes error bounds by “pushing” bounds from the leaves of the computation graph up to the root. For example, assuming no error in addition, ; works such that [1, 2] ; [3, 4] = [4, 6]. 2 More realistically, suppose that the function +푝 : R × R → R for bounded addition at precision 푝. Then, +푝 and +푝 compute the lower and upper bound for adding inputs truncated to precision 푝. They satisfy the property that for all 푎, 푏 ∈ R, (푎 +푝 푏) ≤ (푎 + 푏) ≤ (푎 +푝 푏) where + is exact and where as 푝 → ∞ the inequality becomes an equality. Assuming error in addition, [1, 2] ;푝 [3, 4] = [1 +푝 3, 2 +푝 4], which will always have a lower bound ≤ 4 and an upper bound ≥ 6. Computing constants such as 휋 or 푒 makes the need for this type of approximation clearer since it requires infinite space to represent them exactly (since 휋 and 푒 are transcendental). However, they are soundly computed using arbitrary-precision arithmetic. 14 1.1 Motivating example Current push-based implementations refine precisions uniformly across the computation graph [30, 25]. Concretely, this means setting all variables and operators to the same precision (e.g. 1 mantissa bit) and if the error bound is not satisfied, repeating the computation at a higher precision (e.g. 2 mantissa bits). This means that certain variables and operators are computed to a high precision even when they contribute little to the error – an inefficient allocation of compute resources. For example, consider computing 푒 + 1000휋 to a generous error bound of 500. Existing approaches refine precision uniformly across variables and operators [30, 25]. In the best case scenario, these approaches require 5 mantissa bits for ;, 푒, 5, and 휋 (since 푘 is a constant, it remains at a fixed precision). Note that ; and 5 are the addition and multiplication operators over intervals respectively, described in detail in Chapter 2. An example of this computation is shown in Figure 1-1. Suppose the ; [3070, 3460] 푒 [2.62, 2.75] 5 [3070, 3330] 푘 [1000, 1000] 휋 [3.12, 3.25] Figure 1-1: The figure presents a computation graph evaluated at a uniform precision of5 mantissa bits of precision (except for the constant 푘) with an error of 3460 − 2070 = 390. approach to precision-refinement is to start at a uniform precision of 1 mantissa bit,and then increment precisions (mantissa bits) until the error bound is satisfied.

Load more