<<

View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Mathematical Institute Eprints Archive

Report no. [13/04]

Automatic linearity detection

Asgeir Birkisson a Tobin A. Driscoll b

aMathematical Institute, University of Oxford, 24-29 St Giles, Oxford, OX1 3LB, UK. [email protected]. Supported for this work by UK EPSRC Grant EP/E045847 and a Sloane Robinson Foundation Graduate Award associated with Lincoln College, Oxford. bDepartment of Mathematical , University of Delaware, Newark, DE 19716, USA. [email protected]. Supported for this work by UK EPSRC Grant EP/E045847.

Given a , or more generally an , the question “Is it lin- ear?” seems simple to answer. In many applications of scientific computing it might be worth determining the answer to this question in an automated way; some functionality, such as operator exponentiation, is only defined for linear operators, and in other problems, saving is available if it is known that the problem being solved is linear. Linearity detection is closely connected to sparsity detection of Hessians, so for large-scale applications, memory savings can be made if linearity information is known. However, implementing such an automated detection is not as straightforward as one might expect. This paper describes how automatic linearity detection can be implemented in combination with automatic differentiation, both for stan- dard scientific computing software, and within the Chebfun software system. The key ingredients for the method are the observation that linear operators have constant , and the propagation of two logical vectors, ` and c, as computations are carried out. The values of ` and c are determined by whether output variables have constant derivatives and constant values with respect to each input variable. The propagation of their values through an evaluation trace of an operator yields the desired information about the linearity of that operator.

Key words and phrases: Linearity detection, Hessian sparsity patterns, automatic differentiation, Chebfun, object-oriented programming, Matlab.

Oxford University Mathematical Institute Numerical Analysis Group 24-29 St Giles’ Oxford, England OX1 3LB E-mail: [email protected] February, 2013 2 A. BIRKISSON AND T. A. DRISCOLL

1 Introduction

The solution of linear problems lies at the heart of scientific computing. Many popular methods for nonlinear problems contain an iteration over successive linearizations of the full problem. When the underlying problem is linear, such iterations should in principle converge exactly in one step. However, the additional overhead and setup required for the general nonlinear problem can be wasteful, making it useful for a black-box solver to detect the difference between nonlinear and linear problems before proceeding. For example, in [6] the authors show that determining whether constraints in optimization problems are linear can reduce problem sizes and speed up the solution . One method for obtaining the derivatives required for such iteration schemes is via automatic (or algorithmic) differentiation (AD). AD is a powerful method for obtaining derivatives accurately and robustly in scientific computing. Using the fact that most mathematical expressions can be thought of as series of subexpressions consisting of simple functions such as sin, exp and multiplication, it is possible to create and gather information when computer programs are executed to yield information about the deriva- tives of functions. AD has proven to be valuable in fields as diverse as optimization [14], aerodynamics [9], and chemistry [3]. We give a short introduction to AD in Section 3; for a comprehensive reference on the subject, see [7]. AD can be used to detect linearity automatically, as has been discussed before in the literature. In [10], the authors implement linearity detection via data-flow anal- ysis. The work [13] describes similar ideas to those presented here, in the context of obtaining sparsity patterns of Hessians. However, there are some differences in both the implementation and the usage between that work and the one presented here, as we will highlight. An example where the method of linearization is of crucial importance is the Chebfun project. Chebfun is concerned with extending familiar methods of scientific computing for vectors and matrices to the continuous setting, where vectors become functions and matrices become linear operators. This is achieved via techniques such as the Fast Fourier Transform (FFT), barycentric , and expansion in Chebyshev series. Chebfun is an open-source software project, implemented in object-oriented Matlab, and freely available to download at the website [12]. The work described in [5] allows Chebfun to solve linear boundary-value problems (BVPs) for ordinary differential equations (ODEs). A linear differential operator, or a linop, is represented by spectral differentiation matrices of variable size, or their functional equivalents. For information on spectral methods, see for example [4, 11]. In to providing a way to solve linear BVPs, linops can also be used for solv- ing eigenvalue problems and time-dependent partial differential equations (PDEs) via operator exponentiation. However, linear BVPs only make up a subset of problems of interest. The Chebfun approach to solving nonlinear BVPs is by implementing Newton’s method in function space — the continuous analogue of Newton’s method of rootfinding for scalar/vector equations. This requires derivatives of one function with respect to another, so-called Fr´echet derivatives. Fr´echet derivatives are linear operators, the continuous analogues AUTOMATIC LINEARITY DETECTION 3 of Jacobian matrices. Another object class in Chebfun, chebop, represents general differential operators, both linear and nonlinear. This offers the possibility of solving nonlinear BVPs in a few lines of code via (damped) Newton iteration — the details of the iteration are taken care of automatically. All linearizations required are computed via AD, saving the user the effort of supplying them analytically. In [2], the theory and implementation behind chebops are described. Since chebops can represent linear or nonlinear problems, they make a clear case for the usefulness of detecting linearity before solving a problem. Moreover, since Chebfun is an entirely numerical system, rather than symbolic, we seek a numerical approach of detecting linearity.1 This paper describes how such linearity detection is achieved. In Section 2, we start by discussing properties of linear and affine operators, and give a proof that their Fr´echet derivatives are invariant over the space they are defined on. That property is our basis for automatic linearity detection of computed quantities. We believe that the capability of detecting linearity automatically could prove valu- able in areas where AD is frequently used. In Section 3, we describe how linearity detection can be incorporated into a standard implementation of AD. It relies on defin- ing two logical vectors, ` and c, which contain information about whether the values and derivatives of output variables depend on the values of input variables. Loosely speaking, component j of ` (for “linear”) takes the value 1 if all the partial derivatives of the output variable are constant regardless of the value of input variable j, and 0 otherwise; component j of c (for “constant”) takes the value 1 if the output variable has a value which is constant with respect to input variable j, and 0 otherwise. The values of ` and c are then propagated as derivatives are evaluated, eventually yielding information about linearity. Section 4 discusses how linearity detection via AD can be extended to work within Chebfun. In both Sections 3 and4, we show examples of linearity detection in action. Finally, we discuss our conclusions and some potential future work in Section 5.

2 Linearity of computed quantities

Suppose we wish to represent and operate on n-tuples of functions in a Banach space Y . (We are not concerned with the precise nature of the function space Y ; for sake of argument, its members may be regarded as functions that are at least continuous on an interval [α, β], or on all of R.) In a typical application, f(u1, . . . , un) represents a differential operator of order m, but the framework we will describe below applies equally to integral and integro-differential operators, and to functionals that involve

1We mention that we have also implemented a symbolic approach to linearity detection, though it is not discussed in this article. It was based on similar ideas to those used in symbolic differentiation, i.e., performing lexical and grammatical analysis on strings used to represent functions, looking out for terms which could induce nonlinearity. We believe that the symbolic approach is not competitive because it will have difficulties with user-defined methods, which might not be available to the linearity detector for carrying out the required analysis. 4 A. BIRKISSON AND T. A. DRISCOLL point evaluation, definite integration, and other familiar operations.

The integer n is the number of variables that the operator f(u1, . . . , un) operates on, and is known as the arity of the operator. We have suppressed the fact that f may also explicitly depend on the common independent variable x (that is, the identity function in Y ), which often represents space or time. We will not attempt to track or analyze the nature of this dependence. This notion is familar from the analysis of differential equations, in which nonlinearly varying coefficient functions do not affect the fundamental notion of linearity of the problem. For example, if Y is the space of 3 00 analytic functions on [−1, 1] and n = 2, then the operator f = x u1(x) + u2(x) would be considered linear. We formally define linear operators as follows.

Definition 1 Let f be an operator on Y n. Then f is said to be linear if

f(φu1 + θv1, . . . , φun + θvn) = φf(u1, . . . , un) + θf(v1, . . . , vn) (2.1)

for all scalars φ and θ and n-tuples of functions in Y .

Detection of linear operators is not sufficient for our purposes. The reason is that a nonhomogeneous ordinary differential equation might be expressed equally well as either Lu = b or Lu−b = 0; the operator f(u) = Lu is linear, but f(u) = Lu−b is not. Instead, both are instances of the larger class of affine operators.

Definition 2 Let f be an operator on Y n. Then f is affine if it satisfies (2.1) for φ + θ = 1 and all n-tuples of functions in Y .

Even so, we refer to the process of determining whether a given operator is affine as “linearity detection,” which seems to follow common usage of the terminology. In practice the property that we actually detect is the one in the following theorem.

Theorem 2.1 An affine operator is Fr´echetdifferentiable, and its Fr´echetderivative is constant over Y n.

Proof Let u, v, w be elements of Y n. Then

[f(u + w)−f(u)] − [f(v + w) − f(v)] 1 1  1 1  = 2 f(u + w) + f(v) − 2 f(v + w) + f(u) 2 2 2 2 u + w + v  v + w + u = 2f − 2f 2 2 = 0.

Hence the quantity f(u + w) − f(u) is independent of u. Now define the operator F by AUTOMATIC LINEARITY DETECTION 5

F (w) = f(w) − f(0). Then for any scalars φ, θ and elements w, y ∈ Y n,

F (φw + θy) − φF (w) − θF (y) = f(φw + θy) − φf(w) − θf(y) + (φ + θ − 1)f(0) = f(φw + θy) − [φf(w) + (1 − φ)f(0)] − [θf(y) + (1 − θ)f(0)] + f(0) = f(φw + θy) + f(0) − f(φw) − f(θy) 1 1 1 1 = 2[ 2 f(φw + θy) + 2 f(0)] − 2[ 2 f(φw) + 2 f(θy)]     = 2f (φw + θy)/2 + 0/2 − 2f (φw)/2 + (θy)/2 = 0, so F is a . Finally, we note that F (w + h) − F (w) = F (h), independently of w.

Thus F is the Fr´echet of f at w.  Observe that linearity detection requires the consideration of the Fr´echet derivative over all the variables considered as a group. The operator f(u1, u2) = u1u2 has a Fr´echet derivative with respect to u1 that is independent of u1, but the derivative with respect to the pair of variables is not independent of the pair. At this point we need to clarify realistic expectations for linearity detection. A “false negative” result, in which a linear operator is mistakenly not identified as linear, is all but inevitable. The obviously linear operator

f(u) = [u + sin(u)] − [sin(u)], (2.2)

has linearity that is not well-posed with respect to perturbations in the quantities in brackets. When (2.2) is evaluated in the implied manner using floating-point represen- tation, we cannot expect robust results for linearity detection. The impact of such false negatives, besides their being a rather trivial subset of problems, is negligible, because in the larger context a nonlinear algorithm will be applied to the problem and arrive at correct final results nevertheless. On the other hand, a “false positive” result, in which a nonlinear problem is identified as linear, will likely have a disastrous effect on subsequent calculations. The potential for false positive results argues against linearity detection through the simple expediency of testing (2.1) for a few random inputs. The necessity of equality testing in floating point, and the possibility of unfortunate choices for the inputs, makes false positives impossible to rule out in theory, and difficult to avoid robustly in practice.

3 Linearity detection via automatic differentiation

We now show how to construct a reliable numerical method for detecting linearity. It is based on ideas related to those which enable AD to offer a reliable and robust way to obtain derivatives in scientific computing. Here we describe how to incorporate linearity detection into standard forward-mode AD algorithms, similar to those described in [7] and [8]. Section 4 describes how this method works in the Chebfun context. 6 A. BIRKISSON AND T. A. DRISCOLL

3.1 notation The core of our algorithms for linearity detection is to work with Boolean vectors to propagate linearity information through our computations. We now describe the nota- tion used to work with this information. Boolean vectors live in the space Bn, that is, the space of vectors with n elements n which are either 0 or 1. We represent members of B as row vectors. By ri, we mean the ith of the vector r ∈ Bn. We use the following notation for the Boolean algebra arising when working with vectors r, s, t ∈ Bn:

∧ Logical AND. If t = r ∧ s, then ti = 1 iff ri = 1 and si = 1.

∨ Logical OR. If t = r ∨ s, then ti = 1 iff ri = 1 or si = 1 (or both).

¬ Logical NOT. If t = ¬r, then ti = 1 iff si = 0.

We also allow operations involving a scalar Boolean ρ ∈ B and a vector s ∈ Bn. These are operations that map B × Bn to Bn, with results given as follows: ∧ If t = ρ ∧ s, then t = s if ρ = 1, and t = 0 (n elements) if ρ = 0.

∨ If t = ρ ∨ s, then t = 1 (n elements) if ρ = 1, and t = s if ρ = 0.

Let r ∈ Bn be a Boolean vector. We define the logical function ∀(·): Bn → B as follows:

Definition 3 ∀(r) = 1 iff ri = 1 for all 1 ≤ i ≤ n, ∀(r) = 0 otherwise.

3.2 Propagation of information for linearity detection As (2.2) makes clear, the numerical detection of linearity (as opposed to linearity itself) depends not just on the operator f, but also the path taken to evaluate it. A sequence of elementary calculations taken to evaluate an operator is known as the evaluation trace in the AD literature, described as follows: An evaluation trace is basically a record of a particular run of a particular program, with particular specified values for the input variables, showing the sequence of floating point values calculated by a (slightly idealized) processor and the operators that computed them. [7, p. 4] Each part of the sequence is associated with what is known in the AD literature as an intermediate variable, and the final result is obtained by performing series of compu- tations, using elementary operators such as +, ∗, sin and exp, with these intermediate variables. All the elementary operators are assumed to be either unary or binary. At the start of a computation, we initialise the intermediate variables vk, for k ≤ 0, to be the input variables. We denote the kth elementary operator in the sequence by ϕk, and its result by the intermediate variable vk. For example, if ϕk is a binary operator, we have vk = ϕk(vi, vj), where vi and vj are intermediate variables. AUTOMATIC LINEARITY DETECTION 7

Assignment Value of vk v−1 = u1 4.0000 v0 = u2 0.7500 v1 = v−1 + v0 4.0000 + 0.7500 = 4.7500 v = 2v 2 ∗ 4.7500 = 9.5000 2 √ 1 √ v3 = v−1 4.0000 = 2.0000 v4 = v2 − v3 9.5000 − 2.0000 = 7.5000 v5 = cos(v1) cos(4.7500) = 0.0376 v6 = v4 ∗ v5 7.5000 ∗ 0.0376 = 0.2820

y = v6 0.2820 √  Table 1: An evaluation trace for y = 2(u1 + u2) − u1 cos(u1 + u2).

As an example of an evaluation trace, let y be a function of two variables that has the formula √ y = (2(u1 + u2) − u1) cos(u1 + u2)

Suppose that we want to obtain the value of y with u1 = 4 and u2 = 0.75. The evaluation trace for one implementation of that computation is shown in Table 1. By working with derivative information of the intermediate variables in the evaluation trace, AD can yield derivatives of the final result with respect to the input variables. We do not describe that process in this article. But along with the propagation of derivative information through the evaluation trace, we will also propagate information from two vk vk Boolean vectors: ` and c , assigned to every intermediate variable vk. The names of these vectors are meant to suggest a value that is guaranteed to be linear and constant with respect to the primitive variables. (Again, our understanding of the term “linear” is defined to be the possession of a constant Fr´echet derivative, and includes affine operators.) Also note that variable being constant with respect a primitive variable is equivalent to the variable having a zero Fr´echet derivative with respect to the primitive variable. The vectors `vk and cvk live in the space Bn, where n is the arity of the operator f. n For operations on Y , the primitive variable ui for 1 ≤ i ≤ n is initialized to have

`ui = 1 1 ··· 1 (3.1) cui = 1 1 ··· 0 ··· 1 (ith position). (3.2)

These assignments express the following facts. The expression ui has Fr´echet deriva- tive ZZ···I···Z (ith position), where Z and I are the zero and identity operators respectively on the space Y . This is independent of each single variable uj, ui which accounts for the fact that `j = 1 for all j. All entries in the Fr´echet derivative, except for the ith position, are the zero operator, which accounts for the entries equal to 1 in cui . The identity operator is found in the ith position of the Fr´echet derivative, ui so ci = 1. Put simply, the variable ui is linear in all the variables, and constant with respect to variation in all the variables except itself. 8 A. BIRKISSON AND T. A. DRISCOLL

Another way to think of the vectors ` and c is to note that according to Theorem 2.1, an operator f is linear in ui if its Fr´echet derivative with respect to ui is independent of the values of u1, . . . , un. That is equivalent to saying that the ith column of the vk Hessian of f has to have all zero entries. Hence, `i = 1 represents the condition that vk the ith column of the Hessian of vk has all zero entries. Similarly, ci = 1 represents the condition that the ith entry of the of vk is zero, since that will only happen if vk vk vk is independent of ui. Combining Theorem 2.1 with the definitions of c and ` , we obtain the following theorem on which the automatic linearity detection is based:

y Theorem 3.1 An operator of f is linear if for y = f(u1, . . . , un), ∀(` ) = 1.

Proof If ∀(`y) = 1, then y is guaranteed to been have computed only by elementary operations for which all the partical Fr´echet derivatives ∂y , i = 1 . . . , n remained independent of the ∂ui values of u1, . . . , un. Hence, by Theorem 2.1, f is linear.  We now describe how to compute the values of the vectors `vk and cvk through the evaluation trace. Each step in the trace is assumed to be either a unary or binary operator ϕk. Each elementary operator is supposed to be sufficiently elementary that its Fr´echet derivatives with respect to each argument are known (or known not to exist), which is a requirement for computing AD information in any case. The evaluation trace for an operator f(u1, . . . , un) generates a sequence of interme- diate values, v−n+1, . . . , v0, v1, . . . , vK . We need to consider two cases for the linearity information of the intermediate variables:

1. For k ≤ 0, we have vk = un+k, that is, the intermediate variables are initalised to be the primary variables. In this case, the vectors `vk and cvk are set according to values of `un+k and cun+k from (3.1) and (3.2).

2. For k > 0, we either have vk = ϕk(vi) or vk = ϕk(vi, vj) for some i, j < k. Here, vk vk the values of ` and c must be deduced from the interactions between ϕk and the input arguments. In Table 2, we list how this is done for the operators in consideration. There are two cases, depending on whether ϕk is unary or binary:

2 (a) If ϕk is unary, it can be assumed to be nonlinear. Hence, vk will inherit the vk vi independence information from vi, that is, c = c . Also, vk will be linear vk vi only in the primary variables vi is independent in, that is ` = c . See the first entry in Table 2. ∧ (b) If ϕk is binary, then it is one of the four operators ±, ∗,/, or . Here, for vk to be independent of a primary variable ui, both vi and vj have to be independent of that variable, so cvk = cvi ∧ cvj . For linearity, that is, the value of `vk , each operator has to treated separately, as they can give rise to nonlinearity in different ways. The remaining entries of Table 2 describe how `vk is computed for the different operators.

2The linear operators unary + and unary − are treated as special cases of the binary multiplication operator, that is, +u = 1u and −u = −1u. AUTOMATIC LINEARITY DETECTION 9

Operator Values of `vk and cvk Note This ensures sin(x) is detected as linear, and `vk = cvi , vk = sin(vi) sin(u1) as nonlinear. Same formulas apply for cvk = cvi . operators such as cos, exp, etc. Result is linear only if both arguments are `vk = `vi ∧ `vj , vk = vi ± vj linear. For an independent value, both argu- cvk = cvi ∧ cvj . ments have to have independent values. `vk = `vi ∧ cvj , For linearity, numerator has to be linear, and vk = vi/vj cvk = cvi ∧ cvj . denominator has to be independent of inputs. vk vi vj vj ` = c ∧ c , For linearity, both variables have to be inde- vk = v i cvk = cvi ∧ cvj . pendent of inputs. Result is linear if both arguments are linear, `vk = `vi ∧ `vj and if they both have a zero derivative (or if ∧([cvi ∧ cvj ]∨ v = v v one has an all-zero derivative). This ensures k i j [∀(cvi ) ∨ ∀(cvj )]), that xu1 is detected is linear, u1u1 as linear in cvk = cvi ∧ cvj . u2, while u1u2 is nonlinear in both variables. Table 2: How linearity information is propagated for elementary operators.

Again, we emphasize the ` and c vectors refer to that a variable is guaranteed to be linear or independent, respectively, with respect to input variables. For example, while 2 2 y y = u1 + u2 − u2 is linear in u2, ` = [1 0], as this type of situation cannot be recognized y robustly. For the same reason, while y = u1 − u1 evaluates to zero, c will not be equal to [1 1]. In Table 3, we show how linearity information is propagated in the evaluation trace, using the same example as above: √ y = (2(u1 + u2) − u1) cos(u1 + u2)

It begins by initalising the `vk and cvk vectors according to (3.1) and (3.2). It then computes the values of the vectors `vk and cvk according to the rules listed in Table 2 to arrive at the linearity information for the final result y.

3.3 A generic implementation of AD linearity detection As stated in the introduction, AD uses the fact that most mathematical expressions can be thought of as sequences of subexpressions consisting of simple functions such as sin, exp and multiplication. The simplest and easiest to understand implementation of AD is known in the literature as the forward-mode, operator-overloaded one. It involves introducing a new object class (which we name advar), which in addition to storing the numerical value of the variable also stores a numerical value containing the value of its derivative. An advar can be represented for example by a object using object- oriented programming (OOP) in Matlab. Note that the fields of advars can either be scalars or vectors (representing for example a discretization of a function). Each time a 10 A. BIRKISSON AND T. A. DRISCOLL

Assignment Value of vk Value of `vk Value of cvk v−1 = u1 4.0000 [1 1] [0 1] v0 = u2 0.7500 [1 1] [1 0] v1 = v−1 + v0 4.0000 + 0.7500 = 4.7500 [1 1] [0 0] v = 2v 2 ∗ 4.7500 = 9.5000 [1 1] [0 0] 2 √ 1 √ v3 = v−1 4.0000 = 2.0000 [0 1] [0 1] v4 = v2 − v3 9.5000 − 2.0000 = 7.5000 [0 1] [0 0] v5 = cos(v1) cos(4.7500) = 0.0376 [0 0] [0 0] v6 = v4 ∗ v5 7.5000 ∗ 0.0376 = 0.2820 [0 0] [0 0]

y = v6 0.2820 [0 0] [0 0] √  Table 3: An evaluation trace for y = 2(u1 + u2) − u1 cos(u1 + u2), showing the propagation of linearity information. method is called with an object of the type advar, the derivative information is updated according to the chain rule so that the returned variable contains the correct derivative information. Such standard AD implementations are described in more detail in [7] and [8]. Our implementation of the advar class can be found online at [1]. The advar constructor has two input arguments (stored as a property with the same name in the objects):

• val: The value of the variable.

• der: The value(s) of the derivative of the variable with respect the variable(s) we want to differentiate with respect to.

In addition, advars have the fields linear and const, which are used to propagate linearity information through ` and c as described in Section 3.2. In Matlab OOP syntax, the constructor looks as follows: classdef advar properties % Fields for value of function and derivative val = []; der = []; % Fields used for linearity detection. linear = []; const = []; end

methods function adv = advar(val,der) if nargin == 1 % Make sure we only work with column vectors using val(:) adv.val = val(:); adv.der = ones(length(val),1); % Default value for derivative else adv.val = val(:); AUTOMATIC LINEARITY DETECTION 11

adv.der = der; end % Initialise values. Assume linearity to start off. The MATLAB % method true gives a logical vector of desired size. adv.linear = true(1,length(der)); % The ~der will give a logical vector of 0 and 1, corresponding % to guaranteed independent values -- an entry will have value % 1 if entry in der is 0, and 1 for any other entries. adv.const = ~der; end end end

To be able to perform computations involving advars, we overload common Matlab methods for the class, such as the plus method which adds two advars and their derivatives, and computes the correct linearity information for the variable returned: function advOut = plus(adv1,adv2) advOut = advar(adv1.val + adv2.val, adv1.der + adv2.der); advOut.linear = adv1.linear & adv2.linear; advOut.const = adv1.const & adv2.const; end

Another example is the sin method: function advout = sin(advin) advout = advar(sin(advin.val), cos(advin.val).*advin.der); advinconst = advin.const; advout.const = advinconst; advout.linear = advinconst; end

All such methods carry out the expected commands on the values of the variables, and the derivatives are combined using the rules of normal calculus, in particular the chain rule. The information about linearity is computed as described in Section 3.2. We mention that this way of storing linearity information in the advar is the most elementary one. It is obvious that in order to avoid running into issues with running out of memory if the arity of the original operator is large that the vectors in the const and linear fields should be represented as sparse vectors. However, the use of logical vectors in Matlab already means that the vectors take up only one eigth of the memory compared to a vector of doubles. We are interested in the extra work required in the sequence of computing f for being able to detect linearity. We point out that the operations required for detecting linearity should only be carried out at the first time we compute the derivatives of f, as in the case of nonlinear problems, linearity information will not change between iterations. We look at unary and binary operators separately: 12 A. BIRKISSON AND T. A. DRISCOLL

1. For a unary operator, vk = ϕk(vi), we only need one memory access and two memory write operations, but no arithmetic operations, since `vk and cvk are simply assigned the value of cvi .

2. For a binary operator, vk = ϕk(vi, vj), as is evident from Table 2, we need four memory access and two memory write operations. Furthermore, we need O(n) logical arithmetic operations for computing each of `vk and cvk . The difference between this work and that described in [13] includes: 2 0 • We are able to detect that a function such as x + u1 is linear. This is important when linearity detection is applied in solving differential equations, as we want to be able to allow nonlinear dependence on the independent space/time variable without flagging the problem as nonlinear. • Rather than working with sets of index domains for every intermediate variable and maintaining nonlinear interaction domains for every input variable as described in [13], we work entirely with Boolean vectors (a single vector for c and another for `) which contain the overall information. • In particular, by not working with nonlinear interaction domains, we avoid the “dead end” problem discussed in [13]. The dead end problem is caused by per- forming nonlinear operations on variables that are not required for computing the final output variable, such as the sequence (assuming u1 and u2 have already been defined to be variables of the correct type previously in the code):

v = sin(u1)*cos(u2); w = u1 + u2;

This sequence would lead to u1 and u2 to be included in the nonlinear interaction domains (which determine sparsity patterns of Hessians), but if we are only inter- ested in the linearity of w, that is an overestimate. Note that this is not a result of that the two implementations give different results of the linearity of expressions 2 0 such as x + u1, but because we determine whether each intermediate variable is linear in the input variables, rather than keeping track of what input variables appear in nonlinear expressions. • The linearity detection in Chebfun, as described in Section 4, is done in a delayed manner, i.e., not until the information is required, rather than at the same time as functions are evaluated.

3.4 Examples As an example of advars in use, consider the expression 2 y = sin(x) + 3u1 + u2 evaluated at x = 2, u1 = 4, u2 = −1. We are interested in obtaining derivatives with respect to u1 and u2, so begin by creating those variables as follows: AUTOMATIC LINEARITY DETECTION 13

>> u1 = advar(4,[1 0]); >> u2 = advar(-1,[0 1]) u2 = val: -1 der: [0 1] linear: [1 1] const: [1 0]

The first argument to the constructor above is the value of variable. The second ar- gument to the constructor for u1, [1 0], indicates that u1 has the derivative 1 with respect to u1, and 0 with respect to u2 (similarly for u2). The information printed for u2 tells us that it is linear in both u1 and u2, but its value can not be guaranteed to be independent of the value of u2. To create an advar to represent an independent variable x we use:

>> x = advar(2,[0 0]) x = val: 2 der: [0 0] linear: [1 1] const: [1 1]

The information above tells us that the variable x is linear in both u1 and u2, and that its value is independent of both the values of u1 and u2. We can now create new variables from x, u1 and u2, which will contain the correct derivative and linearity information. For our expression above, we have

>> y = sin(x) + 3*u1 + u2^2;

The getval method returns the value stored in the val field of the advar:

>> getval(y) ans = 13.909297426825681 while the getder method returns the values of the derivatives ∂y and ∂y : ∂u1 ∂u2 >> getder(y) ans = 3 -2

The getlin method returns information about linearity, that is, the vector ` for that variable. An entry equal to 1 indicates that the input variable is linear in the corre- sponding variable we are differentiating with:

>> getlin(y) ans = 1 0 14 A. BIRKISSON AND T. A. DRISCOLL

i.e., y is linear in u1 but not u2. For overall linearity information, i.e. whether it can be detected whether y linear or not as according to Definition 1, we call the islinear method, which rather than returning a vector returns either a logical 0 or 1:

>> islinear(y) ans = 0 We can now check for linearity of various mathematical expressions:

>> islinear(x*u1+u2) ans = 1 >> islinear(x*exp(u1)) ans = 0 >> islinear(x+u1*u2) ans = 0 >> islinear(u2/x) ans = 1 >> islinear(x/u2) ans = 0 Finally, to demonstrate the accuracy of derivatives obtained via AD, let

u3u z = 1 2 . y Simple calculus gives the analytical derivatives of z to be

2 3 dy 2 3 3u u2y − u u2 dz 1 1 du1 3u1u2y − 3u1u2 = 2 = 2 , du1 y y 3 3 dy 2 3 2 u y − u u2 dz 1 1 du2 3u1u2y − 2u1u2 = 2 = 2 . du2 y y We compare the computed and analytical derivatives of z as follows:

>> z = (u1.^3*u2)./y; >> dzdu = getder(z); >> dzdu(1) - getval((3*u1^2*u2*y-3*u1^3*u2)/y^2) % Use getval to obtain a scalar ans = 0 >> dzdu(2) - getval((u1^3*y-2*u1^3*u2^2)/y^2) ans = 4.440892098500626e-16 AUTOMATIC LINEARITY DETECTION 15

4 Linearity Detection in Chebfun

As stated in the introduction, linearization plays a crucial role in Chebfun for solving nonlinear BVPs. All derivatives required for the linearization are computed via AD, so checking for linearity using the ideas described in Section 3 is natural.3 This section describes how such linearity detection is implemented in Chebfun (available since Version 4), and shows examples of how its usefulness.

4.1 Implementation As described in [2], since Version 3.1, Chebfun incorporates a forward-mode, operator- overloaded implementation of AD, similar to what is described in Section 3. One major difference however is that in Chebfun, derivatives are computed in a delayed manner (known as lazy evaluation or call-by-need in computer ) and using recursion, so that no derivatives are computed until they are actually needed.4 In contrast, in Section 3, the values of the derivatives were always computed at the same time as the values of the variables involved. Another important difference is that since Chebfun works in function space, as discussed in the introduction, the derivatives will be Fr´echet derivatives, linear operators which are represented by linops. The delayed evaluation is achieved by using a Chebfun service class named anon. The anon class offers similar functionality to regular anonymous functions in Matlab; in particular, it allows the values of the linops to be computed when required from values stored in memory. Its advantage is to address serious memory inefficiencies that are found when heavily using anonymous functions in Matlab. In addition, anons allow a series of commands to be executed when they are evaluated, and can return multiple outputs (which is not possible with anonymous functions in Matlab). As an example, take the overloaded plus method, which adds the chebfun represen- tations of two functions, f1 and f2, to return a new function, g = f1 + f2. The lines of interest here are:

function g = plus(f1,f2) g.funs = f1.funs + f2.funs; g.jacobian = anon([’[der1 linear1] = diff(f1,u,’’linop’’);’, ... ’[der2 linear2] = diff(f2,u,’’linop’’); der = der1 + der2;’, ... ’linear = linear1 & linear2;’],{’f1’ ’f2’},{f1 f2}); g.ID = newIDnum(); end

The second is facility of Chebfun to add the values of interpolants at Chebyshev points to obtain a new chebfun, returned as g. In the following lines, we assign the jac

3Originally, Chebfun used a vector η which denoted nonlinear dependence, rather than the vector ` denoting linear dependence as discussed in Section 3. Future versions will follow the convention from Section 3, and hence, this section will as well. 4To be more precise, Chebfun creates what are known as “thunks” in the computer science literature to emulate lazy evaluation in Matlab. 16 A. BIRKISSON AND T. A. DRISCOLL

field of g (used to store derivative information) to be an anon, by calling the anon con- structor. Finally, we assign g a unique ID which allows recursive evaluation of derivatives as described in [2]. The anon constructor has three arguments.

1. A string which contains information about how the derivative is to be composed.

2. A cell-array of strings with names of the variables passed in to the method.

3. The values of the variables passed in to the method.

The role of the second and third arguments is enable Chebfun to load variables back into memory when they are required to compute derivatives at a later time. The first argument of the anon is a string which contains information about how the derivative is to be composed. It consists of Matlab commands (separated by a semi-colon), and when the anon gets evaluated, each command is executed using the eval method of Matlab. An anon knows it is supposed to return two outputs, der and linear. In the string, u represents the function we eventually differentiate with respect to. We see that when the string is evaluated, we start by differentiating (recursively) the input functions f1 and f2 with respect to u. The differentiation of f1 with respect to u has two outputs (similar for f2):

• der1 — The Fr´echet derivative of f1 with respect to u.

• linear1 — The value of Boolean vector cf1 .

The Fr´echet derivatives are then combined according to the chain rule to yield the derivative of g with respect to u. The linearity information for g, that is, the value of `g, is computed using the values of `f1 and `f2 , combined as described in Section 3.2. The reason why we did not have to compute the value of cg is that we represent that as a property of every linop, so that information is computed in linop methods. That is, the information is stored as a property of the derivative, rather than as part of the intermediate variables like we did in Section 3.3. Thus, rather than using information about whether a variable has a constant dependence on the inputs, which in Section 3 we denoted by the vector c, it is more natural to check whether its derivative is zero, which we denote by the vector z. We emphasize that the vectors c and z are equivalent, the only difference is that one describes properties of variables, the other properties of derivatives. The value of z is for example used in the Chebfun sin method, shown below: function g = sin(f) g = comp(f, @(x) sin(x)); g.jacobian = anon([’diag1 = diag(cos(F)); der2 = diff(f,u,’’linop’’);’ ... ’der = diag1*der2; const = der2.iszero;’],{’f’},{f}); g.ID = newIDnum; end AUTOMATIC LINEARITY DETECTION 17

Again, we start by composing the input function f with the sin function to yield the chebfun returned, g. We then assign g an anon and an ID — the anon contains infor- mation about how to construct the derivative of g (according to the chain rule5) as well as how to return linearity information (according to Section 3). Note that here we need to use the value of the Boolean vector zf to get the values of `g. This is to ensure that an operator such as f(u1, u2) = sin(x) + u1 + u2 is detected to be linear in u1 and u2.

4.2 Examples We now show how linearity information is useful in Chebfun. The default approach of solving a BVP in Chebfun is to create a chebop to represent the problem, the ’nonlinear- backslash’ (\) operator then starts a damped Newton iteration for solving the problem. As an example, take the problem

u00 + u2 = 1, u(0) = 0, u(2) = 1. (4.1)

We solve it in Chebfun as follows (printing convergence information at every iteration):

>> cheboppref(’display’,’iter’); % Print convergence information >> N = chebop(@(u) diff(u,2)+u.^2,[0 2]); % Create a nonlinear chebop on [0,2] >> N.lbc = 0; N.rbc = 1; % Assign boundary conditions >> u = N\1; % Solve Iter. || update || length(update) stepsize length(u) ------1 1.178e+00 19 1 19 2 2.319e-01 23 1 23 3 1.090e-02 22 1 23 4 2.440e-05 21 1 23 5 1.224e-10 12 1 23

5 iterations Final residual norm: 7.18e-13 (interior) and 2.22e-25 (boundary conditions).

Its solution is plotted in Figure 1a. Should the problem however be linear, e.g.

u00 + 2u = 1, u(0) = 0, u(2) = 1,

we avoid starting the Newton iteration:

>> N = chebop(@(u) diff(u,2)+2*u,[0 2]); % Create a linear chebop on [0,2] >> N.lbc = 0; N.rbc = 1; % Assign boundary conditions >> u = N\1; % Solve Converged in one step. (Chebop is linear).

5The call to the diag method in the string of the anon returns a diagonal multiplication operator Mg(x) : u(x) 7→ g(x)u(x). 18 A. BIRKISSON AND T. A. DRISCOLL

Solution of u’’+u2 = 1, u(0) = 0, u(2) = 1 Shifted eigenmodes of harmonic oscillator 1.2 12

1 10

0.8 8

0.6 n λ 6 u(x) (x)+ n

0.4 u

4 0.2

2 0

−0.2 0 0 0.5 1 1.5 2 −8 −4 0 4 8 x x (a) Solution of the BVP (4.1). (b) Six eigenfunctions of eigenvalue problem (4.2), plotted so that they are shifted to their corresponding eigenvalues.

Figure 1: Solutions of problems discussed in Section 4.2.

Linear eigenvalue problems, such as the one describing a harmonic oscillator:

−u00 + x2u = λu, u(−8) = u(8) = 0, (4.2) can be easily solved with chebops as follows (the six first eigenfunctions found are shown in Figure 1b):

>> N = chebop(@(x,u) -diff(u,2)+x.^2.*u,[-8 8]); % Create a chebop >> N.lbc = 0; N.rbc = 0; % Assign BCs >> [U Lambda] = eigs(N,6); % Find six eigenmodes However, nonlinear eigenvalue problems are not supported, so an error is thrown:

>> N = chebop(@(x,u) -diff(u,2)+x.^2.*u.^2,[-8 8]); % Nonlinear chebop >> N.lbc = 0; N.rbc = 0; % Assign BCs >> [U Lambda] = eigs(N,6);

Error using chebop/eigs (line 50) Chebop appears to be nonlinear. Currently, eigs only has support for linear chebops.

5 Conclusions and Further Work

As discussed in the introduction, linearization plays a crucial role in scientific computing. However, when implementing a black-box solver for a class of problems, it can be valuable to be able to detect whether a given problem is linear or not, as this determines whether AUTOMATIC LINEARITY DETECTION 19

an iteration scheme has to be employed for solving the problem. Furthermore, some functionality is only defined for linear operators. In Section 2, we defined what we mean when we say a function or an operator is linear, and discussed properties and theorems important for linearity detection. In particular, derivatives of linear operators are constant over their domains. We also discussed the difference between mathematical definition of linearity and how linearity can be detected for computed quantities, which led to observing the possibility of false positives and false negatives. This led us to reject the simple option of checking for linearity by passing the operator few random arguments and comparing the results. In Section 3, we discussed how linearity detection is implemented in combination with automatic differentiation. In Section 3.3, we discussed such an AD linearity detection in the context of standard forward-mode, operator-overloaded AD, and in Section 4, it was discussed in the context of Chebfun. There are several interesting ways that the work described in this paper can be extended. They include: • Implementing linearity detection by propagating information about zero and con- stant derivatives for reverse mode and/or source transformation AD, the other main types of AD implementations. In particular, linearity detection for the source transformation approach to AD offers interesting challenges, as source transforma- tion generates code for evaluating derivatives before values of variables are known. Optimally, one would like to obtain linearity information before derivatives are computed so that it is known what elements of the Jacobian/Hessian are non-zero and thus need to be computed. With our approach, this would require propagating the linearity information as the original code is compiled. • When solving systems of equations with a scheme such as Newton iteration, it is possible that only some parts of the systems are nonlinear. It should be possible to use information about linearity to save the effort of re-obtaining derivatives of the linear parts at the next iteration. • In the work presented here, we obtain detailed linearity information, i.e. we are able to determine individually whether an operator f(u1, u2) is linear in u1 or u2. As an example, the output [1 0] from the getlin method in Section 3.4 which indicated linear dependence on u1 but nonlinear on u2. Some applications of linear- ity detection might not need such detailed information. Rather than working with vectors that show details about linearity, we could simplify the information and save computational effort by just using a single Boolean value indicating whether a function is linear or not.

Acknowledgements

We wish to thank Nick Trefethen for many comments, suggestions and revisions while working on this paper. We also wish to thank Nick Hale for useful discussions and ideas on how to be best implement linearity detection in Chebfun. 20 A. BIRKISSON AND T. A. DRISCOLL

References

[1] A. Birkisson and T. A. Driscoll, The advar class, 2012. http://people.maths.ox.ac.uk/birkisson/advar.

[2] A. Birkisson and T. A. Driscoll, Automatic Fr´echetdifferentiation for the nu- merical solution of boundary-value problems, ACM Trans. Math. Softw., 38 (2012), pp. 26:1–26:29.

[3] C. H. Bischof, H. M. Bucker,¨ and D. An Mey, A case study of computa- tional differentiation applied to neutron scattering, in Automatic Differentiation of Algorithms: From Simulation to Optimization, G. Corliss, C. Faure, A. Griewank, L. Hasco¨et,and U. Naumann, eds., Computer and Information Science, Springer, New York, NY, 2002, ch. 6, pp. 69–74.

[4] J. P. Boyd, Chebyshev and Fourier Spectral Methods: Second Edition, Dover Pub- lications, New York, 2000.

[5] T. A. Driscoll, F. Bornemann, and L. N. Trefethen, The chebop system for automatic solution of differential equations, BIT Numerical Mathematics, 48 (2008), pp. 701–723.

[6] N. I. M. Gould and P. L. Toint, Preprocessing for quadratic programming, Math. Program., 100 (2004), pp. 95–132.

[7] A. Griewank and A. Walther, Evaluating Derivatives: Principles and Tech- niques of Algorithmic Differentiation, SIAM, Philadelphia, 2nd ed., 2008.

[8] R. D. Neidinger, Introduction to automatic differentiation and MATLAB object- oriented programming, SIAM Review, 52 (2010), pp. 545–563.

[9] S. Schlenkrich, A. Walther, N. Gauger, and R. Heinrich, Differentiating fixed point iterations with ADOL-C: Gradient calculation for fluid dynamics, in Modeling, Simulation and Optimization of Complex Processes – Proceedings of 3rd HPSC 2006, H.-G. Bock, E. Kostina, H. X. Phu, and R. Rannacher, eds., 2008, pp. 499 – 508.

[10] M. M. Strout and P. Hovland, Linearity analysis for automatic differentia- tion, in Computational Science – ICCS 2006, V. N. Alexandrov, G. D. van Albada, P. M. A. Sloot, and J. Dongarra, eds., vol. 3994 of Lecture Notes in Computer Science, Heidelberg, 2006, Springer, pp. 574–581.

[11] L. N. Trefethen, Spectral Methods in MATLAB, SIAM, Philadelphia, 2000.

[12] L. N. Trefethen et al., Chebfun Version 4.2, The Chebfun Development Team, 2011. http://www.maths.ox.ac.uk/chebfun/. AUTOMATIC LINEARITY DETECTION 21

[13] A. Walther, Computing sparse Hessians with automatic differentiation, ACM Transaction on Mathematical Software, 34 (2008), pp. 3:1–3:15.

[14] F. Zaoui, Large electrical power systems optimization using automatic differentia- tion, in Advances in Automatic Differentiation, C. H. Bischof, H. M. B¨ucker, P. D. Hovland, U. Naumann, and J. Utke, eds., vol. 64 of Lecture Notes in Computational Science and Engineering, Springer, Berlin, 2008, pp. 293–302.