High-Order Automatic Differentiation of Unmodified Linear Algebra

High-Order Automatic Differentiation of Unmodified Linear Algebra Routines via Nilpotent Matrices by Benjamin Z Dunham B.A., Carroll College, 2009 M.S., University of Colorado, 2011 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Aerospace Engineering Sciences 2017 This thesis entitled: High-Order Automatic Differentiation of Unmodified Linear Algebra Routines via Nilpotent Matrices written by Benjamin Z Dunham has been approved for the Department of Aerospace Engineering Sciences Prof. Kurt K. Maute Prof. Alireza Doostan Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. iii Dunham, Benjamin Z (Ph.D., Aerospace Engineering Sciences) High-Order Automatic Differentiation of Unmodified Linear Algebra Routines via Nilpotent Matrices Thesis directed by Prof. Kurt K. Maute This work presents a new automatic differentiation method, Nilpotent Matrix Differentiation (NMD), capable of propagating any order of mixed or univariate derivative through common linear algebra functions – most notably third-party sparse solvers and decomposition routines, in addition to basic matrix arithmetic operations and power series – without changing data-type or modifying code line by line; this allows differentiation across sequences of ar- bitrarily many such functions with minimal implementation effort. NMD works by enlarging the matrices and vectors passed to the routines, replacing each original scalar with a matrix block augmented by derivative data; these blocks are constructed with special sparsity structures, termed “stencils,” each designed to be isomorphic to a particular mul- tidimensional hypercomplex algebra. The algebras are in turn designed such that Taylor expansions of hypercomplex function evaluations are finite in length and thus exactly track derivatives without approximation error. Although this use of the method in the “forward mode” is unique in its own right, it is also possible to apply it to existing implementations of the (first-order) discrete adjoint method to find high-order derivatives with lowered cost complexity; for example, for a problem with N inputs and an adjoint solver whose cost is independent of N – i.e., O.1/ – the N × N Hessian can be found in O.N/ time, which is comparable to existing second-order adjoint methods that require far more problem-specific implementation effort. Higher derivatives are likewise less expensive 2 – e.g., a N × N × N rank-three tensor can be found in O N . Alternatively, a Hessian-vector product can be found in O.1/ time, which may open up many matrix-based simulations to a range of existing optimization or surrogate modeling approaches. As a final corollary in parallel to the NMD-adjoint hybrid method, the existing complex-step differentiation (CD) technique is also shown to be capable of finding the Hessian-vector product. All variants are implemented on a stochastic diffusion problem and compared in-depth with various cost and accuracy metrics. Contents Abstract iii Table of Contents iv List of Figures x I Motivation and Background 1 1 Overview 2 1.1 Contributions . 2 1.2 A Motivating Example . 3 1.3 Differentiation Approaches . 5 1.4 An Example of Nilpotent Matrix Differentiation . 7 1.5 Dissertation Goals and Organization . 11 2 Motivation: Numerical Uses of High-Order Derivatives 14 2.1 Optimization . 14 2.1.1 Newton’s Method . 15 2.1.2 Linear and Nonlinear Conjugate Gradient Methods . 16 2.1.3 Truncated-Newton and Hessian-Free Optimization . 18 2.2 Uncertainty Quantification and Sensitivity Analysis . 20 2.3 Surrogate Modeling . 21 v 2.4 A Selection of Higher-Order Applications . 23 3 Standard Differentiation Approaches 25 3.1 Leading Non-Automatic Tools . 26 3.1.1 Finite Differences: Easy, Approximate, Forward-Mode Differentiation . 26 3.1.2 The Discrete Adjoint Method: Intrusive, Exact, Reverse-Mode Differentiation . 29 3.1.3 Second-Order Adjoints . 34 3.1.4 Directional Derivatives and Implicit Products . 39 3.2 Standard Automatic Differentiation . 40 3.2.1 Forward-Mode AD . 42 3.2.2 Reverse-Mode AD . 44 3.2.3 Automatic Source Modification vs Operator Overloading . 45 3.3 Multivariate and High-Order AD . 47 3.3.1 Vectorization . 47 3.3.2 Naive Induction . 49 3.4 Matrix Operations and AD . 50 4 Special Algebras for Differentiation 54 4.1 Complex-Step Differentiation (CD) . 54 4.1.1 History and Derivation . 54 4.1.2 Practical Usage . 58 4.2 Complex Variants . 60 4.2.1 Multicomplex Variables . 61 4.2.2 Quasi-Complex Gradients . 63 4.2.3 Shortcomings of Imaginary Elements . 65 4.3 Dual Numbers . 68 4.4 Hyper-Dual Numbers . 71 4.5 Truncated Taylor Polynomials . 74 vi II Hypercomplex Algebras and Nilpotent Matrix Differentiation 78 5 Generalized Differentiation via Nilpotent Elements 79 5.1 Hypercomplex Numbers . 79 5.2 Tensor Products and Indexing Notation . 82 5.3 Generalized Hypercomplex Algebras for AD . 89 6 Differentiation via Matrix Representations of Algebras 92 6.1 Introduction: 2 × 2 Matrix Representations . 93 6.1.1 Complex and Dual Numbers Expressed as Matrices . 93 6.1.2 A Single-Variable Example – Derivative of a Matrix Inverse . 94 6.2 Addressing Limitations . 98 6.3 Representations of Other Existing Algebras . 99 6.3.1 Multicomplex Numbers . 99 6.3.2 Multidual Numbers . 101 6.3.3 High-Order Univariate Nilpotents . 103 6.4 Representing Aribtrary Hypercomplex Algebras with Stencils . 107 7 Hybrid Methods and Directional Derivatives 114 7.1 High-Order Forward Directional Derivatives . 114 7.2 A Nilpotent-Complex Hybrid . 117 7.3 Nilpotent and Complex Adjoint Methods . 121 III Demonstrating Nilpotent Matrix Differentiation 126 8 Differentiating a Stochastic Diffusion Model 127 8.1 Detailed Problem Description . 128 8.1.1 A Deterministic Known Solution: The Heat Equation . 130 8.1.2 Galerkin Projection and Karhunen-Loève Expansion . 131 8.1.3 Sampling the Diffusivity Field . 136 vii 8.1.4 PDE Discretization, Solution, and Quantity of Interest . 137 8.1.5 Undifferentiated Program Results . 144 8.1.6 Adjoint Formulation . 147 8.2 Experiments in Nilpotent and Complex Differentiation . 152 8.3 Results: Differentiation Accuracy . 158 8.4 Results: Differentiation Costs . 162 9 Conclusions 165 Bibliography 167 Appendix A Time and Memory Benchmarks 174 A.1 Costs vs Number of Eigenfunctions . 175 A.1.1 Original Forward Problem . 175 A.1.2 Forward Complex-Step Differentiation . 176 A.1.3 Forward Dual-Number Stencil Nilpotent Matrix Differentiation . 176 A.1.4 Forward Two-Variable, Second-Order Stencil Nilpotent Matrix Differentiation . 177 A.1.5 Original Adjoint Problem . 178 A.1.6 Adjoint Complex-Step Differentiation . 179 A.1.7 Adjoint Dual-Number Stencil Nilpotent Matrix Differentiation . 179 A.1.8 Adjoint Two-Variable, First-Order Stencil Nilpotent Matrix Differentiation . 180 A.2 Costs vs Size of System (Number of Mesh Nodes) . 181 A.2.1 Original Forward Problem . 181 A.2.2 Forward Complex-Step Differentiation . 182 A.2.3 Forward Dual-Number Stencil Nilpotent Matrix Differentiation . 183 A.2.4 Forward Two-Variable, Second-Order Stencil Nilpotent Matrix Differentiation . 184 A.2.5 Original Adjoint Problem . 185 viii A.2.6 Adjoint Complex-Step Differentiation . 186 A.2.7 Adjoint Dual-Number Stencil Nilpotent Matrix Differentiation . 187 A.2.8 Adjoint Two-Variable, First-Order Stencil Nilpotent Matrix Differentiation . 188 Figures Figure 1.1 Simplified Schematic of an Unmodified User Program . 4 1.2 Simplified Schematic of an AD-Augmented Program . 12 3.1 Finite Differencing Error vs Step-Size . 28 3.2 Vectorization Savings vs Fixed Overhead . ..

High-Order Automatic Differentiation of Unmodified Linear Algebra

MATH 2370, Practice Problems

THREE STEPS on an OPEN ROAD Gilbert Strang This Note Describes

Polynomials and Hankel Matrices

Nilpotent Elements Control the Structure of a Module

Classifying the Representation Type of Infinitesimal Blocks of Category

(Hessenberg) Eigenvalue-Eigenmatrix Relations∗

Hyperbolicity of Hermitian Forms Over Biquaternion Algebras

Arxiv:1508.00183V2 [Math.RA] 6 Aug 2015 Atclrb Ed Yasemigroup a by ﬁeld

Chapter 6 Jordan Canonical Form

Mathematische Annalen Digital Object Identiﬁer (DOI) 10.1007/S002080100153

Elementary Invariants for Centralizers of Nilpotent Matrices

Commutative Algebra