Automatic Linearity Detection
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Mathematical Institute Eprints Archive Report no. [13/04] Automatic linearity detection Asgeir Birkisson a Tobin A. Driscoll b aMathematical Institute, University of Oxford, 24-29 St Giles, Oxford, OX1 3LB, UK. [email protected]. Supported for this work by UK EPSRC Grant EP/E045847 and a Sloane Robinson Foundation Graduate Award associated with Lincoln College, Oxford. bDepartment of Mathematical Sciences, University of Delaware, Newark, DE 19716, USA. [email protected]. Supported for this work by UK EPSRC Grant EP/E045847. Given a function, or more generally an operator, the question \Is it lin- ear?" seems simple to answer. In many applications of scientific computing it might be worth determining the answer to this question in an automated way; some functionality, such as operator exponentiation, is only defined for linear operators, and in other problems, time saving is available if it is known that the problem being solved is linear. Linearity detection is closely connected to sparsity detection of Hessians, so for large-scale applications, memory savings can be made if linearity information is known. However, implementing such an automated detection is not as straightforward as one might expect. This paper describes how automatic linearity detection can be implemented in combination with automatic differentiation, both for stan- dard scientific computing software, and within the Chebfun software system. The key ingredients for the method are the observation that linear operators have constant derivatives, and the propagation of two logical vectors, ` and c, as computations are carried out. The values of ` and c are determined by whether output variables have constant derivatives and constant values with respect to each input variable. The propagation of their values through an evaluation trace of an operator yields the desired information about the linearity of that operator. Key words and phrases: Linearity detection, Hessian sparsity patterns, automatic differentiation, Chebfun, object-oriented programming, Matlab. Oxford University Mathematical Institute Numerical Analysis Group 24-29 St Giles' Oxford, England OX1 3LB E-mail: [email protected] February, 2013 2 A. BIRKISSON AND T. A. DRISCOLL 1 Introduction The solution of linear problems lies at the heart of scientific computing. Many popular methods for nonlinear problems contain an iteration over successive linearizations of the full problem. When the underlying problem is linear, such iterations should in principle converge exactly in one step. However, the additional overhead and setup required for the general nonlinear problem can be wasteful, making it useful for a black-box solver to detect the difference between nonlinear and linear problems before proceeding. For example, in [6] the authors show that determining whether constraints in optimization problems are linear can reduce problem sizes and speed up the solution times. One method for obtaining the derivatives required for such iteration schemes is via automatic (or algorithmic) differentiation (AD). AD is a powerful method for obtaining derivatives accurately and robustly in scientific computing. Using the fact that most mathematical expressions can be thought of as series of subexpressions consisting of simple functions such as sin, exp and multiplication, it is possible to create and gather information when computer programs are executed to yield information about the deriva- tives of functions. AD has proven to be valuable in fields as diverse as optimization [14], aerodynamics [9], and chemistry [3]. We give a short introduction to AD in Section 3; for a comprehensive reference on the subject, see [7]. AD can be used to detect linearity automatically, as has been discussed before in the literature. In [10], the authors implement linearity detection via data-flow anal- ysis. The work [13] describes similar ideas to those presented here, in the context of obtaining sparsity patterns of Hessians. However, there are some differences in both the implementation and the usage between that work and the one presented here, as we will highlight. An example where the method of linearization is of crucial importance is the Chebfun project. Chebfun is concerned with extending familiar methods of scientific computing for vectors and matrices to the continuous setting, where vectors become functions and matrices become linear operators. This is achieved via techniques such as the Fast Fourier Transform (FFT), barycentric interpolation, and expansion in Chebyshev series. Chebfun is an open-source software project, implemented in object-oriented Matlab, and freely available to download at the website [12]. The work described in [5] allows Chebfun to solve linear boundary-value problems (BVPs) for ordinary differential equations (ODEs). A linear differential operator, or a linop, is represented by spectral differentiation matrices of variable size, or their functional equivalents. For information on spectral methods, see for example [4, 11]. In addition to providing a way to solve linear BVPs, linops can also be used for solv- ing eigenvalue problems and time-dependent partial differential equations (PDEs) via operator exponentiation. However, linear BVPs only make up a subset of problems of interest. The Chebfun approach to solving nonlinear BVPs is by implementing Newton's method in function space | the continuous analogue of Newton's method of rootfinding for scalar/vector equations. This requires derivatives of one function with respect to another, so-called Fr´echet derivatives. Fr´echet derivatives are linear operators, the continuous analogues AUTOMATIC LINEARITY DETECTION 3 of Jacobian matrices. Another object class in Chebfun, chebop, represents general differential operators, both linear and nonlinear. This offers the possibility of solving nonlinear BVPs in a few lines of code via (damped) Newton iteration | the details of the iteration are taken care of automatically. All linearizations required are computed via AD, saving the user the effort of supplying them analytically. In [2], the theory and implementation behind chebops are described. Since chebops can represent linear or nonlinear problems, they make a clear case for the usefulness of detecting linearity before solving a problem. Moreover, since Chebfun is an entirely numerical system, rather than symbolic, we seek a numerical approach of detecting linearity.1 This paper describes how such linearity detection is achieved. In Section 2, we start by discussing properties of linear and affine operators, and give a proof that their Fr´echet derivatives are invariant over the space they are defined on. That property is our basis for automatic linearity detection of computed quantities. We believe that the capability of detecting linearity automatically could prove valu- able in areas where AD is frequently used. In Section 3, we describe how linearity detection can be incorporated into a standard implementation of AD. It relies on defin- ing two logical vectors, ` and c, which contain information about whether the values and derivatives of output variables depend on the values of input variables. Loosely speaking, component j of ` (for \linear") takes the value 1 if all the partial derivatives of the output variable are constant regardless of the value of input variable j, and 0 otherwise; component j of c (for \constant") takes the value 1 if the output variable has a value which is constant with respect to input variable j, and 0 otherwise. The values of ` and c are then propagated as derivatives are evaluated, eventually yielding information about linearity. Section 4 discusses how linearity detection via AD can be extended to work within Chebfun. In both Sections 3 and4, we show examples of linearity detection in action. Finally, we discuss our conclusions and some potential future work in Section 5. 2 Linearity of computed quantities Suppose we wish to represent and operate on n-tuples of functions in a Banach space Y . (We are not concerned with the precise nature of the function space Y ; for sake of argument, its members may be regarded as functions that are at least continuous on an interval [α; β], or on all of R.) In a typical application, f(u1; : : : ; un) represents a differential operator of order m, but the framework we will describe below applies equally to integral and integro-differential operators, and to functionals that involve 1We mention that we have also implemented a symbolic approach to linearity detection, though it is not discussed in this article. It was based on similar ideas to those used in symbolic differentiation, i.e., performing lexical and grammatical analysis on strings used to represent functions, looking out for terms which could induce nonlinearity. We believe that the symbolic approach is not competitive because it will have difficulties with user-defined methods, which might not be available to the linearity detector for carrying out the required analysis. 4 A. BIRKISSON AND T. A. DRISCOLL point evaluation, definite integration, and other familiar operations. The integer n is the number of variables that the operator f(u1; : : : ; un) operates on, and is known as the arity of the operator. We have suppressed the fact that f may also explicitly depend on the common independent variable x (that is, the identity function in Y ), which often represents space or time. We will not attempt to track or analyze the nature of this dependence. This notion is familar from the analysis of differential equations, in which nonlinearly varying coefficient functions do not affect the fundamental notion of linearity of the problem. For example, if Y is the space of 3 00 analytic functions on [−1; 1] and n = 2, then the operator f = x u1(x) + u2(x) would be considered linear. We formally define linear operators as follows. Definition 1 Let f be an operator on Y n. Then f is said to be linear if f(φu1 + θv1; : : : ; φun + θvn) = φf(u1; : : : ; un) + θf(v1; : : : ; vn) (2.1) for all scalars φ and θ and n-tuples of functions in Y .