Basic Linear Inverse Method Theory - DRAFT NOTES
Total Page:16
File Type:pdf, Size:1020Kb
Basic Linear Inverse Method Theory - DRAFT NOTES Peter P. Jones1 1Centre for Complexity Science, University of Warwick, Coventry CV4 7AL, UK∗ (Dated: 21st June 2012) BASIC LINEAR INVERSE METHOD THEORY Eigen-Decompositions Drawing on the notes in [1, x7.7.10], assume we have a forward problem that is linear in form, Ax = b, and presume that A is square and not singular. If A is square it implies that x and b are the same length. Then, in a given basis, A has an eigen-decomposition (also called a spectral decomposition) Eig(A) = (fλng; f'ng) with eigenvalues λn and eigenvectors 'n, such that we can describe the vector x with: λnh'n; xi = bn (1) bn h'n; xi = (2) λn bn h'n; bi xn = = (3) λn λn N X αn ) x = ' where α := h' ; bi; λ 6= 0 (4) λ n n n n n=1 n Here h·; ·i is notation for the inner product, and N may be finite or infinite, depending on the space in question. Unless stated otherwise, from now on we will consider only N finite, to match with numerical discretisation. If we now generalise further by using X = (x1; x2;:::; xN ) and B = (b1; b2;:::; bN ) such that: N×N AX = B; with A; X; B 2 R (5) Then in a manner similar to (4) we can say: N (i) X ξn x = ' where ξ(i) := h' ; b i; λ 6= 0; b 2 B;; x 2 X; i = 1; 2;:::;N (6) i λ n n n i n i i n=1 n The above equations (4) and (6) could be considered the core equations of inverse method theory. They assert that if A has an eigen-decomposition with non-zero eigenvalues, then for any x bounded in some domain X such that the operation Ax results in some definite b, then given such a b it is possible to recover x directly. In reality, however, matters rarely work so neatly. In practice, the eigen-decomposition of A can result in eigenvalues that are too close to zero to obtain good solutions by computational methods (which is another way of saying that A doesn't have a useful inverse). Moreover, let us for the sake of argument consider b to reside in a space B = X. It is possible for some operator A that even if x can vary continuously in the space X that the results b are nonetheless distributed unevenly in that space (so that there can be `gaps' between different results for neighbouring input values of x). Thus there is a set of values fb0g 2 B that are not in the range of A given x. If the data one receives from an experiment is b0 then it is no longer clear that an x 2 X could be recovered by direct inverse solution, or whether neighbouring values of b0 would respect any notions of continuous variation of x. Which is another way of saying that even if an answer could be recovered, it might not make a lot of sense within the chosen forward model of the phenomena represented by A. Even if A is highly successful as a forward model in Ax = b the above can still be the case. 2 There are essentially three tactics that can be deployed in such a situation. The first is to hope that physical constraints mean that the results b are sufficiently well separated in X that suitable ensemble averaging will remove 0 0 0 0 noise. That is, for Bp = (b 1; b 2;:::; b p), as p ! M where M is suitably large, then hb ii = b. The second tactic is 0 to employ some method that presumes that (if results can be suitably ordered) then b1 4 b ≺ b2 can be coerced to 0 resolve in a sufficiently neat way such that x1 4 x ≺ x2 is the case. With the exception of Bayesian methods, most inverse methods for working with real data are variations on this second tactic. 0 0 0 The third tactic is to use Bayesian methods, which take in some Bp = (b 1; b 2;:::; b p) and then attempt to construct a probability density function (p.d.f.) for b0 as the input to the method. The output is then a distribution on x0 conditioned on the p.d.f. of b0. This is a useful approach when it is unclear whether the true b are neatly separated in space B, or where the lack of evidence for suitable ordering prevents the use of the second tactic above. When the input data b0 is known to have errors that conform to a Gaussian distribution and there is little or no knowledge about any prior distribution of x (so a flat prior is used) then it is known that Tikhonov-Regularised Least Squares methods return the maximum a posteriori (MAP) value that would be obtained from a Bayesian approach[2]. (A comprehensive discussion of Bayesian treatments is beyond the scope of these notes. Though the interested reader is referred to [3] for a pragmatic introduction, with [2] and [4] for more advanced coverage.) The second tactic above requires that there be an appropriate constraint on x0 that attempts to satisfy the desire for an ordering of potential solutions. Typically this might be to seek the minimum norm of x0 achievable with some Least-Squares fit. min kAx0 − b0k2 such that min kx0k2 (7) x0 The Least-Squares approach has the advantage that it doesn't require either that A−1 exist in a useful sense, nor that eigen-decomposition be undertaken. However, some algorithms used in implementations can require that the differential, @ 0 0 2 0 0 kAx − b k < 1; 8xi 2 x (8) @xi (and possibly the second-order differential) exist and be sufficiently bounded in order to operate. Singular Value Decomposition and the Generalised Inverse Another way to talk about the way such linear inverse methods work in practice is to discuss the Singular Value Decomposition (SVD) of the matrix A. The approach is similar to the discussion of eigen-decompositions earlier, but it reveals the link between such decompositions and the Least-Squares method more clearly. The following account draws heavily from [3, Ch. 4]. It is known that every matrix has a singular value decomposition (SVD). Given an m × n matrix A, it can be factored into the decomposition: A = USVT (9) Where, • U is an m × m orthonormal matrix, whose columns are basis vectors that span the data space Rm. • V is an n × n orthogonal matrix, whose columns span the model space Rn. • S is an m×n matrix with entries only on the diagonal, which are the non-negative singular values. (Non-negative implies some singular values may be zero.) (In some numerical software packages V is also an orthonormal matrix, and the entries in S are therefore rescaled appropriately.) The singular values, fsg 2 S are typically ordered with decreasing value, largest first to smallest last. If p values on the diagonal of S are non-zero, then S can be written in block form as, S 0 S = p (10) 0 0 3 In which case A can be described by, S 0 T A = U ; U p V ; V (11) p 0 0 0 p 0 Where the notation Up represents the first p columns of U, and U0 represents the remaining columns. Similarly for V. Which implies there is a compact form, T A = UpSpVp (12) Therefore, for some y in the range of A, we have, y = Ax T = UpSpVp x (13) The SVD can be used to compute the generalised inverse of A, also known as the Moore-Penrose Pseudoinverse, y −1 T A = VpSp Up (14) Hence, given a data vector b we can obtain the pseudoinverse solution as, y xy = A b −1 T = VpSp Up b (15) y For any matrix the pseudoinverse A always exists, hence xy always exists, and xy is also a (minimum length) least squares solution. Referring once again to the approach in [3, Ch. 4], the results of using the pseudoinverse can be classified into four cases depending upon the respective forms of the data null space N (AT) and the model null space N (A). Model Null N (A) Data Null N (AT) Ay Result Trivial = f0g Trivial = f0g Ay = A−1. Result is exact. Non-trivial Trivial = f0g An exact data fit, but non-unique solution. The solution is also a minimum length Least Squares solution. Trivial = f0g Non-trivial xy is the Least Squares solution such that if 0 0 Axy = b then b is nearest to b. If b is actu- ally in Range(A) then xy is an exact solution. The solution is unique but cannot fit general data exactly. Non-trivial Non-trivial xy minimizes both kAx − bk2 and kxk2 as a Least Squares solution, but it is not unique and cannot fit general data exactly. In geometric terms, a non-trivial null space for an operator T implies that some of the dimensions in a vector v are effectively projected by the transformation Tv on to a lower dimensional subspace of the vector space, and the resulting loss of dimensionality (because some dimensions are now connected together) permits cancellation among coefficients of the basis vectors for certain values of v 6= 0. This is one way to interpret the notion of the linear dependence of the columns of a linear operator. As a consequence, it is possible that scalings and rotations or reflections of v can also become solutions of Tv = 0.